Semidefinite relaxation of multi-marginal optimal transport for strictly correlated electrons in second quantization

Semidefinite relaxation of multi-marginal
optimal transport for strictly correlated electrons in second quantization

Yuehaw Khoo Department of Mathematics, Stanford University, Stanford, CA 94305, USA. Email: ykhoo@stanford.edu    Lin Lin Department of Mathematics, University of California, Berkeley, and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA. Email: linlin@math.berkeley.edu    Michael Lindsey Department of Mathematics, University of California, Berkeley, CA 94720, USA. Email: lindsey@math.berkeley.edu    Lexing Ying Department of Mathematics and Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, USA. Email: lexing@stanford.edu
Abstract

We consider the strictly correlated electron (SCE) limit of the fermionic quantum many-body problem in the second-quantized formalism. This limit gives rise to a multi-marginal optimal transport (MMOT) problem. Here the marginal state space for our MMOT problem is the binary set , and the number of marginals is the number of sites in the model. The costs of storing and computing the exact solution of the MMOT problem both scale exponentially with respect to . We propose an efficient convex relaxation which can be solved by semidefinite programming (SDP). In particular, the semidefinite constraint is only of size . Moreover, the SDP-based method yields an approximation of the dual potential needed to the perform self-consistent field iteration in the so-called Kohn-Sham SCE framework, which, once converged, yields a lower bound for the total energy of the system. We demonstrate the effectiveness of our methods on spinless and spinful Hubbard-type models. Numerical results indicate that our relaxation methods yield tight lower bounds for the optimal cost, in the sense that the error due to the semidefinite relaxation is much smaller than the intrinsic modeling error of the Kohn-Sham SCE method. We also describe how our relaxation methods generalize to arbitrary MMOT problems with pairwise cost functions.

1 Introduction

For ground-state electronic structure calculations, the success of the widely-used Kohn-Sham density functional theory (DFT) [13, 15] hinges on the accuracy of the approximate exchange-correlation functionals. Although tremendous progress has been made in the construction of approximate functionals [33, 2, 17, 32], these approximations are mostly derived by fitting known results for the uniform electron gas, single atoms, small molecules, and perfect crystal systems. Such functionals often perform well when the underlying quantum systems are ‘weakly correlated,’ i.e., when the single-particle energy is significantly more important than the electron-electron interaction energy. In order to extend the capability of DFT to the treatment of strongly correlated quantum systems, one recent direction of functional development considers the limit in which the electron-electron interaction energy is infinitely large compared to other components of the total energy. The resulting limit is known as strictly correlated electron (SCE) [38, 37, 5, 22, 10] limit. The SCE limit provides an alternative route to derive exchange-correlation energy functionals. The study of Kohn-Sham DFT with SCE-based functionals is still in its infancy, but such approaches have already been used to treat strongly correlated model systems and simple chemical systems (see e.g. [29, 7, 12]).

A system of interacting electrons in a -dimensional space can be described using either the first-quantized or the second-quantized representation. In the first-quantized representation, the number of electrons is fixed, and the electronic wavefunction is an anti-symmetric function in , which is a subset of the tensor product space . Here corresponds to the spin degree of freedom. In first quantization, the anti-symmetry condition needs to be treated explicitly. By contrast, in the second-quantized formalism, one chooses a basis for a subspace of . In practice, the basis is of some finite size , corresponding to a discretized model with sites that encode both spatial and spin degrees of freedom. The electronic wavefunction is an element of the Fock space . The Fock space contains wavefunctions of all possible electron numbers, and finding wavefunctions of the desired electron number is achieved by constraining to a subspace of the Fock space. In the second-quantized representation, the anti-symmetry constraint is in some sense baked into the Hamiltonian operator instead of the wavefunction, and this perspective often simplifies book-keeping efforts. Due to the inherent computational difficulty of studying strongly correlated systems such as high-temperature superconductors, it is often necessary to introduce simplified Hamiltonians such as in Hubbard-type models. These model problems are formulated directly in the second-quantized formalism via specification of an appropriate Hamiltonian.

To the extent of our knowledge, all existing works on SCE treat electrons in the first-quantized representation with (essentially) a real space basis. In this paper we aim at studying the SCE limit in the second-quantized setting. Note that generally Kohn-Sham-type theories in the second-quantized representation are known as ‘site occupation functional theory’ (SOFT) or ‘lattice density functional theory’ in the physics literature [36, 20, 6, 39, 8]. A crucial assumption of this paper is that the electron-electron interaction takes the form , which we call the generalized Coulomb interaction. (The meaning of the symbols will be explained in Section LABEL:sec:prelim.) We remark that the form of the generalized Coulomb interaction is more restrictive than the general form appearing in the quantum chemistry literature, to which our formulation does not yet apply. Assuming a generalized Coulomb interaction, we demonstrate that the corresponding SCE problem can be formulated as a multi-marginal optimal transport (MMOT) problem over classical probability measures on the binary hypercube . The cost function in this problem is of pairwise form. Hence the objective function in the Kantorovich formulation of the MMOT can be written in terms of only the 2-marginals of the probability measure. In order to solve the MMOT problem directly, even the storage cost of the exact solution scales as , and the computational cost also scales exponentially with respect to . Thus a direct approach becomes impractical even when the number of sites becomes moderately large.

Contribution:

Based on the recent work of Khoo and Ying [14], we propose a convex relaxation approach by imposing certain necessary constraints satisfied by the 2-marginals. The relaxed problem can be solved efficiently via semidefinite programming (SDP). While the 2-marginal formulation provides a lower bound to the optimal cost of the MMOT problem, we also propose a tighter lower bound obtained via an SDP involving the 3-marginals. The computational cost for solving these relaxed problems is polynomial with respect to , and, in particular, the semidefinite constraint is only enforced on a matrix of size . Numerical results for spinless and spinful Hubbard-type systems demonstrate that the 2-marginal and 3-marginal relaxation schemes are already quite tight, especially when compared to the modeling error due to the Kohn-Sham SCE formulation itself.

By solving the dual problems for our SDPs, we can obtain the Kantorovich dual potentials, which yield the SCE potential needed for carrying out the self-consistent field iteration (SCF) in the Kohn-Sham SCE formalism. To this end we need to show that the dual problem satisfies strong duality and moreover that the dual optimizer is actually attained. We show that a straightforward formulation of the primal SDP does not have any strictly feasible point, and hence Slater’s condition cannot be directly applied to establish strong duality (see, e.g., [4]). By a careful study of the structure of the dual problem, we prove that the strong duality and dual attainment conditions are indeed satisfied. We also explain how the SDP relaxations introduced in this paper can be applied to arbitrary MMOT problems with pairwise cost functions. We comment that the justification of the strong duality and dual attainment conditions holds in this more general setting as well.

Related work:

In the first-quantized formulation, for a fixed real-space discretization the computational cost of the direct solution of the SCE problem scales exponentially with respect to the number of electrons . This curse of dimensionality is a serious obstacle for SCE-based approaches to the quantum many-body problem. Notable exceptions to the unfavorable computational scaling are the cases of strictly one-dimensional systems (i.e., ) and spherically symmetric systems (for any [37], for which semi-analytic solutions exist.

In [3], the Sinkhorn scaling approach is applied to an entropically regularized MMOT problem. This method requires the marginalization of a probability measure on a product space of size that is exponential in the number of electrons . Thus the complexity of this method also scales exponentially with respect to . Meanwhile, a method based on the Kantorovich dual of the MMOT problem was proposed in [5, 28]. However, there are exponentially many constraints in the dual problem. Furthermore, [5] assumes a Monge solution to the MMOT problem, but it is unknown whether the MMOT problem with pairwise Coulomb cost has a Monge solution for . Moreover, if it exists, the Monge solution is hard to evaluate in the context of the Coulomb cost.

Recently, Khoo and Ying proposed a semidefinite relaxation-based approach to the MMOT problem arising from SCE in the first-quantized setting [14]. This is the first approximation method for the general SCE problem with polynomial complexity with respect to the system size. The relaxation avoids exponential scaling by directly handling only the 2-marginal distributions (known as the pair densities in the physics literature), which are subjected to certain necessary joint representability constraints. In particular, the method provides a lower bound to the SCE energy. Furthermore, by proper treatment of the 3-marginal distributions, an upper bound to the SCE energy is recovered as well. Numerical results indicate that both the lower and upper bounds are rather tight approximations to the SCE energy.

In the second-quantized setting, our semidefinite relaxation-based approach for finding a lower bound to the SCE energy is also related to the two-particle reduced density matrix (2-RDM) theories in quantum chemistry [9, 26, 24, 25]. However, the MMOT problem in SCE only requires the knowledge of the pair density instead of the entire 2-RDM. The number of constraints in our formulation is also considerably smaller than the number of constraints in 2-RDM theories, thanks to the generalized Coulomb form of the interaction.

Organization: In Section LABEL:sec:prelim, we describe the Hamiltonians under consideration and derive an appropriate formulation of Kohn-Sham DFT based on the SCE functional, which is in turn defined in terms of a MMOT problem. In Section LABEL:sec:two_marginal_convex, we solve the MMOT problem by introducing a convex relaxation of the set of representable 2-marginals, and we prove strong duality for the relaxed problem. In Section LABEL:sec:three_marginal_convex, a tighter lower bound is obtained by considering a convex relaxation of the set of representable 3-marginals. In Section LABEL:sec:general_OT, we comment on how a general MMOT problem with pairwise cost can be solved by directly applying the methods introduced in Sections LABEL:sec:two_marginal_convex and LABEL:sec:three_marginal_convex. We demonstrate the effectiveness of the proposed methods through numerical experiments in Section LABEL:sec:numeric, and we discuss conclusions and future directions in Section LABEL:sec:conclusion.

2 Preliminaries

2.1 Density functional theory in second quantization

Our goal is to compute the ground-state energy of a fermionic system with states. With some abuse of terminology, we will refer to fermions simply as electrons. Also for simplicity we use a single index for all of the states, as opposed to using separate site and spin indices in the case of spinful systems. Double indexing for spinful fermionic systems can be recovered simply by rearranging indices, e.g., by associating odd state indices with spin-up components and even state indices with spin-down components.

In the second-quantized formulation, the state space is called the Fock space, denoted by . The occupation number basis set for the Fock space is

which is an orthonormal basis set satisfying

\hb@xt@.01(2.1)

A state will be written as a linear combination of occupation number basis elements as follows:

\hb@xt@.01(2.2)

Hence the state vector can be identified with a vector , and is isomorphic to . We call normalized if the following condition is satisfied:

\hb@xt@.01(2.3)

We also refer to as the vacuum state.

The fermionic creation and annihilation operators are respectively defined as

\hb@xt@.01(2.4)

The number operator defined as satisfies

\hb@xt@.01(2.5)

The Hamiltonian operator is assumed to take the following form:

\hb@xt@.01(2.6)

Here is a Hermitian matrix, which is often interpreted as the ‘hopping’ term arising from the kinetic energy contribution to the Hamiltonian. is an on-site term, which can be viewed as an external potential. is also a Hermitian matrix, which may be viewed as representing the electron-electron Coulomb interaction. Note that , hence without loss of generality we can assume the diagonal entries by absorbing, if necessary, such contributions into the on-site potential . Following the spirit of Kohn-Sham DFT, one could think of as fixed matrices, and of the external potential as a contribution that may change depending on the system (in the context of DFT, represents the electron-nuclei interaction and is therefore ‘external’ to the electrons). We remark that the restriction of the form of the two-body interaction is crucial for the purpose of this paper. In particular, we do not consider the more general form as is done in the quantum chemistry literature when a general basis set (such as the Gaussian basis set) is used to discretize a quantum many-body Hamiltonian in the continuous space. In the discussion below, for simplicity we will omit the index range of our sums as long as the meaning is clear.

The exact ground state energy can be obtained by the following minimization problem:

\hb@xt@.01(2.7)

Here the minimizer is the many-body ground state wavefunction, and is the total number operator. , which is called the chemical potential, is a Lagrange multiplier chosen so that the ground state wavefunction has a number of electrons equal to a pre-specified integer , i.e., such that

\hb@xt@.01(2.8)

It is clear that is an on-site potential, and without loss of generality we absorb into , and hence write as in the discussion below.

The electron density is defined as

\hb@xt@.01(2.9)

which satisfies . Note that

\hb@xt@.01(2.10)

Then we follow the Levy-Lieb constrained minimization approach [18, 19] and rewrite the ground state minimization problem (LABEL:eqn:groundstate) as follows:

\hb@xt@.01(2.11)

where

\hb@xt@.01(2.12)

Here the notation indicates that the corresponding infimum is taken over states that yield the density in the sense of Eq. (LABEL:eqn:density), and the domain of is defined by

\hb@xt@.01(2.13)

Note that the external potential is only coupled with and is singled out in the constrained minimization. It is easy to see that for any , the set is non-empty, as we may simply choose

Therefore the constrained minimization problem (LABEL:eqn:dft) is in fact defined over a nonempty set for all .

The functional , which is called the Levy-Lieb functional, is a universal functional in the sense that it depends only on the hopping term and the interaction term , hence in particular is independent of the potential . Once the functional is known, can be obtained by minimization with respect to a single vector using standard optimization algorithms, or via the self-consistent field (SCF) iteration to be detailed below. The construction above is called the ‘site occupation functional theory’ (SOFT) or ‘lattice density functional theory’ in the physics literature [36, 20, 6, 39, 8]. To our knowledge, SOFT or lattice DFT often imposes an additional sparsity pattern on the matrix for the electron-electron interaction, so that the Hamiltonian becomes a Hubbard-type model.

2.2 Strictly correlated electron limit

Using the fact that the infimum of a sum is greater than the sum of infimums, we can lower-bound the ground state energy in the following way:

\hb@xt@.01(2.14)

where the functionals and are defined via the last equality in the manner suggested by the notation. The first of these quantities is called the kinetic energy, and the second the strictly correlated electron (SCE) energy. The SCE approximation is obtained by treating as an approximation for the Levy-Lieb functional. Though in general it is only a lower-bound for the Levy-Lieb functional, this bound is expected to become tight in the limit of infinitely strong interaction. We do not prove this fact in this paper (though we demonstrate it numerically below), but we nonetheless refer to this approximation as the SCE limit by analogy to the literature on SCE in first quantization [38, 37].

Due to the inequality in Eq. (LABEL:eqn:lower_bound), we have in general the following lower bound for the total energy, which we shall call the Kohn-Sham SCE energy:

\hb@xt@.01(2.15)

The advantage of the preceding manipulations is that now each term in this infimum can be computed. Specifically, is trivial to compute, is defined in terms of a non-interacting many-body problem (i.e., a problem with Hamiltonian only quadratic in the creation and annihilation operators), for which an exact solution can be obtained via the diagonalization of [30]. Finally, as we shall see below the SCE term (and its gradient) can be computed in terms of a MMOT problem (and its dual). Thus in principle, it would be possible to take gradient descent approach for computing the infimum in the definition (LABEL:eqn:E_Kohn-Sham_SCE) of .

2.2.1 The Kohn-Sham SCE equations

In practice, to compute the Kohn-Sham SCE energy we will instead adopt the self-consistent field (SCF) iteration as is common practice in Kohn-Sham DFT. It can be readily checked that is convex with respect to . By the convexity of , , and , the expression in Eq. (LABEL:eqn:E_Kohn-Sham_SCE) admits a minimizer, which is unique unless the functional fails to be strictly convex. We assume that the solution is unique and is differentiable for simplicity, and we derive nonlinear fixed-point equations satisfied by the minimizer as follows.

For suitable , define the SCE potential via

\hb@xt@.01(2.16)

and we will discuss how to compute this gradient later. Now assume that the (unique) infimum in Eq. (LABEL:eqn:E_Kohn-Sham_SCE) is obtained at , which is then in particular a critical point of the expression

\hb@xt@.01(2.17)

But then is also a critical point of the expression obtained by replacing with its expansion up to first order about , which is (modulo a constant term that does not affect criticality)

\hb@xt@.01(2.18)

Hence means the inner product, and we are motivated to try to minimize over . But we can write

where

Here is a diagonal matrix. Then

The latter infimum is a ground-state problem for a non-interacting Hamiltonian and is obtained [30] at a so-called Slater determinant of the form

\hb@xt@.01(2.19)

Here the are ‘canonically transformed’ creation operators defined by

\hb@xt@.01(2.20)

where is a matrix whose columns are the lowest eigenvectors of . We assume the eigenvectors form an orthonormal set, i.e. .

Moreover, one may directly compute that the electron density of as defined in Eq. (LABEL:eqn:slater) is given by

\hb@xt@.01(2.21)

i.e., . Hence the optimizer of Eq. (LABEL:eqn:E_Kohn-Sham_SCE) solves the Kohn-Sham SCE equations:

\hb@xt@.01(2.22)

Here are understood to be the lowest (orthonormal) eigenpairs of the matrix in the first line of Eq. (LABEL:eqn:KSSCE_eqn).

Eq. (LABEL:eqn:KSSCE_eqn) is a nonlinear eigenvalue problem and should be solved self-consistently. The standard iterative procedure for this task works as follows. (1) For the -th iterate , form the matrix , and compute by solving the corresponding eigenproblem. (2) Define . (3) Iterate until convergence, possibly using mixing schemes [1, 34, 21] to ensure or accelerate convergence.

Once self-consistency is reached, the total energy can be recovered by the relation

\hb@xt@.01(2.23)

as can be observed by adding back to the constant term discarded between equations (LABEL:eqn:infExp) and (LABEL:eqn:Gdef).

2.2.2 The SCE energy and potential

The problem is then reduced to the computation of and its gradient . To this end, let us rewrite

\hb@xt@.01(2.24)

where is the space of joint probability mass functions on . The 1-marginals are defined in terms of via

\hb@xt@.01(2.25)

and they satisfy

\hb@xt@.01(2.26)

Considering the alternately as vectors, we also write (by some abuse of notation)

\hb@xt@.01(2.27)

Note that the last line of Eq. (LABEL:eqn:esce_rewrite) is obtained by considering as a classical probability density . (The marginal condition derives from the condition .)

Define the cost function by

\hb@xt@.01(2.28)

Then our SCE energy may be written

\hb@xt@.01(2.29)

where the angle bracket notation is introduced to indicate the suggested inner product, i.e., the inner product of . This is precisely the form of a MMOT problem, namely, minimization of a linear functional of a joint probability measure subject to constraints on all of the marginals of the measure [31]. Note that the dimension of the feasible space for this problem is exponential in , rendering infeasible any direct approach based on the formulation as a general MMOT, at least for of moderate size.

Nonetheless, we remark that in this exact formulation, is the derivative of the optimal value of a convex optimization problem (in particular, a linear program) with respect to a variation of it constraints. This quantity can be obtained in terms of the variables dual to the varied constraints [4]. In the setting of MMOT, these dual variables are known as the Kantorovich potentials [40]. We will discuss the duality theory of our SDP relaxations in detail later on.

Despite the fact that it is possible to formulate our problem as a general MMOT problem, doing so loses the important structure of our pairwise cost. To wit, recall that the diagonal entries of are set to zero, can be written

Hence the sum can be taken over . Accordingly, the objective function of (LABEL:eqn:esce_rewrite) can be written as

\hb@xt@.01(2.30)

where angle brackets are now used to indicate the suggested inner product, i.e., that of , and where the 2-marginals are defined implicitly in terms of by marginalizing out all components other than , i.e., by

\hb@xt@.01(2.31)

Later we also identify with the matrix

\hb@xt@.01(2.32)

and we do likewise for . Using the matrix notation (and the symmetry of ), it follows that

\hb@xt@.01(2.33)

where ‘Tr’ indicates the matrix trace.

At first glance, it might seem that one may achieve a significant reduction of complexity by directly changing the optimization variable in Eq. (LABEL:2mar_optimization) from to . However, extra constraints would then need to be enforced in order to relate the different 2-marginals; i.e., the two-marginals must be jointly representable in the sense that all of them could simultaneously be yielded from a single joint probability measure on .

3 Convex relaxation

In this section, we show that a relaxation of the representability condition implicit in Eq. (LABEL:2mar_optimization) allows us to formulate a tractable optimization problem in terms of the alone. In fact, this optimization problem will be a semidefinite program (SDP).

3.1 Primal problem

We now derive certain necessary constraints satisfied by 2-marginals that are obtained from a probability measure on . In the following we adopt the notation

Then for any such , let be the Dirac probability mass function on localized at , i.e.,

Note that we can also write as an -tensor, i.e., an element of , via

where we adopt the (zero-indexing) convention , .

Any probability measure on can be written as a convex combination of the since they are the extreme points of the set of probability measures; in particular we can write a probability density as

\hb@xt@.01(3.1)

From the definitions of the 1- and 2-marginals (LABEL:eqn:onemarginal), (LABEL:eqn:twomarginal), it follows that

\hb@xt@.01(3.2)

Now define

\hb@xt@.01(3.3)

Then by Eq. (LABEL:eqn:tensorMarginals), is the matrix of blocks given by

\hb@xt@.01(3.4)

Accordingly we write . Then let be the matrix of the blocks defined above, which specifies the pairwise cost on each pair of marginals111Without loss of generality, one can assume .. Observe that the value of the objective function of Eq. (LABEL:2mar_optimization_2) can in fact be rewritten as

Then the MMOT problem Eq. (LABEL:2mar_optimization_2) can be equivalently rephrased as

subject to

Note that in our application to SCE, we have fixed

in advance, i.e., is not an optimization variable.

At this point, our reformulation of the problem has not alleviated its exponential complexity; indeed, note that is a vector of size . However, the reformulation does suggest a way to reduce the complexity by accepting some approximation. In fact, we will omit entirely from the optimization, retaining only as an optimization variable and enforcing several necessary constraints on that are satisfied by the solution of the exact problem.

First, note from the constraint (LABEL:eqn:Mconstraint) that is both entry-wise nonnegative (written ) and positive semidefinite (written ). Second, the fact that the 1-marginals can be written in terms of the 2-marginals imposes additional local consistency constraints on . Indeed, with denoting the vector of all ones, we can write

\hb@xt@.01(3.7)

from which it follows that

\hb@xt@.01(3.8)

Then we obtain the relaxation

\hb@xt@.01(3.9)
subject to

Again, is not an optimization variable. It is actually helpful to reformulate the primal 2-marginal SDP (LABEL:eqn:sdp2marPre) as

\hb@xt@.01(3.13)
subject to \hb@xt@.01(3.18)

Note that this formulation is equivalent to (LABEL:eqn:sdp2marPre), given the symmetry of (implicit in the notation ). However, the new formulation removes a few redundant constraints and will help us derive a more intuitive dual problem. The problem (LABEL:eqn:sdp2mar) will be referred to as the primal 2-marginal SDP, or the primal problem for short. Note that the optimal value of the primal problem is in fact attained because the constraints (LABEL:eqn:psdCon)-(LABEL:eqn:diagCon) define a compact feasible set.

Reflecting back on the derivation, we caution that replacing with comes at a price. Since we only enforce certain necessary conditions on , the 2-marginals that we recover from may not in fact be the 2-marginals of a joint probability measure on . Thus should in general only be expected to be a lower-bound to , though we will see that the error is often small in practice.

3.2 Dual problem

As detailed in Section LABEL:sec:kssceEq, in order to implement the SCF for Kohn-Sham SCE it is necessary to compute . After replacing the density functional with the efficient approximation , the same derivation motivates us to compute . This quantity can obtained by examining the convex duality of our primal 2-marginal SDP.

We let be the variable dual to the constraint (LABEL:eqn:psdCon), be dual to (LABEL:eqn:nnCon), be dual to (LABEL:eqn:mar1Con), be dual to (LABEL:eqn:mar2Con), and finally let be dual to (LABEL:eqn:diagCon). Note that and for each , and for each .

Then our formal Lagrangian is of the form

where the domains of is the set of symmetric matrices (equivalently, it is convenient to think of as depending only on its upper-block-triangular part), and the dual variables are as specified above (i.e., only and are constrained), and more specifically we have (omitting the arguments of from the notation)

It is helpful to realize the identities

Then, recognizing that and (so that and ), minimization over of the Lagrangian (LABEL:eqn:lagrangian) yields the dual problem

\hb@xt@.01(3.23)
subject to

Observe that the variables can be removed by combining constraints (LABEL:eqn:ZpqCon) and (LABEL:eqn:KantConPre) to yield

Moreover, can be removed simply by substituting into the objective function (recall that ). These reductions yield

\hb@xt@.01(3.24)
subject to \hb@xt@.01(3.26)

Here we think of as the entry of the matrix , and likewise is the -th entry of .

The dual problem may be interpreted as follows. Observe that for fixed (e.g., fixed to its optimal value), the maximization problem decouples into a set of independent maximization problems for each pair of marginals. We think of as defining an effective cost function for each pair of marginals. Then the decoupled problem for a pair is exactly the Kantorovich dual problem in standard (i.e., not multi-marginal) optimal transport, specified by cost function and marginals , [40]. In other words, after fixing , our problem decouples into independent standard optimal transport problems for each pair of marginals. Nonetheless, these problems are in turn themselves coupled via the optimization over .

Recall that we wanted to compute . Assuming that strong duality holds, as shall be established later, the optimal value of the dual problem (LABEL:eqn:sdp2marDual) is in fact equal to . (Recall that here we think of the 1-marginals as being defined in terms of .) Hence we can compute derivatives by evaluating the gradient of the objective function (LABEL:eqn:sdp2marDual) with respect to at the optimizer . (If the optimizer is not unique, then in general we will get a subgradient [35].)

To carry out this program, first note that . Therefore the partial derivative of the objective function (LABEL:eqn:sdp2marDual) with respect to yields

If one extends the definition of to via the stipulation , then one has

3.3 Strong duality and dual attainment

In order to faithfully compute the SCE energy and potential via the dual problem (LABEL:eqn:sdp2marDual), we need to verify that the dual problem satisfies strong duality, i.e., that the duality gap defined by the difference between the infimum of Eq. (LABEL:eqn:sdp2mar) and the supremum of Eq. (LABEL:eqn:sdp2marDual) is zero. In fact, since the domain of the primal problem is compact, Sion’s minimax theorem [16] immediately guarantees that the duality gap is zero. We state this result as a lemma:

Lemma 3.1

The primal and dual problems (LABEL:eqn:sdp2mar) and (LABEL:eqn:sdp2marDual), respectively, have the same (finite) optimal value.

However, in order to compute the SCE potential, we actually require not only that the duality gap is zero, but also that the supremum in the dual problem is attained. One might hope to verify Slater’s condition [4], which provides a standard method for verifying both strong duality and such ‘dual attainment’ simultaneously.

The trouble is that Slater’s condition requires the existence of a feasible interior point , i.e., a point satisfying and for all . This scenario is in fact impossible since for example the vector

\hb@xt@.01(3.27)

lies in the null space of any feasible , hence never holds for feasible .

Instead of using Slater’s condition, we will prove dual attainment via a very careful study of the structure of the dual problem.

Theorem 3.2

The optimal value of the dual 2-marginal SDP (LABEL:eqn:sdp2marDual) is attained. By Lemma LABEL:lem:strongDuality, this optimal value is equal to the optimal value of the primal 2-marginal SDP (LABEL:eqn:sdp2mar).

Proof. Without loss of generality we assume

\hb@xt@.01(3.28)

To see why this assumption can be made, observe that if for some , then attainment for the dual problem (LABEL:eqn:sdp2marDual) can be reduced to attainment for a strictly smaller dual 2-marginal SDP. We leave further details of such a reduction to the reader.) Also, for later reference, we let denote the objective function (LABEL:eqn:sdp2marDual), and we let denote the feasible domain defined by the constraints (LABEL:eqn:sdp2marDualY), (LABEL:eqn:sdp2marDualPotentials).

Now to get started, observe that if we fix and view (LABEL:eqn:sdp2marDual) as an optimization problem over only, the resulting problem is in fact a linear program. Let us call this the -program, more specifically:

subject to

In fact we may consider the -program for any matrix , and this will slightly simplify some discussion later. Observe that each -program is feasible, and the optimal values of all -programs are finite. Since they are linear programs, this means that the optimal values of the -programs can be attained. Thus for each , there exist , for which optimize the -program, i.e., attain the value . By construction is concave, hence continuous, in .

Now let , so , where is the optimal value of the dual problem (LABEL:eqn:sdp2marDual). Hence the feasible set of (LABEL:eqn:sdp2marDual) could be refined to , where

without altering the optimal value. Now if were compact, then the lemma would follow. To see this, note that since (which follows from weak duality), we could take an optimizing sequence for (LABEL:eqn:sdp2marDual), where