# Couplings and quantitative contraction rates for Langevin dynamics

###### Abstract.

We introduce a new probabilistic approach to quantify convergence to equilibrium for (kinetic) Langevin processes. In contrast to previous analytic approaches that focus on the associated kinetic Fokker-Planck equation, our approach is based on a specific combination of reflection and synchronous coupling of two solutions of the Langevin equation. It yields contractions in a particular Wasserstein distance, and it provides rather precise bounds for convergence to equilibrium at the borderline between the overdamped and the underdamped regime. In particular, we are able to recover kinetic behavior in terms of explicit lower bounds for the contraction rate. For example, for a rescaled double-well potential with local minima at distance , we obtain a lower bound for the contraction rate of order provided the friction coefficient is of order .

###### Key words and phrases:

Langevin diffusion, kinetic Fokker-Planck equation, stochastic Hamiltonian dynamics, reflection coupling, convergence to equilibrium, hypocoercivity, quantitative bounds, Wasserstein distance, Lyapunov functions###### 2010 Mathematics Subject Classification:

60J60, 60H10, 35Q84, 35B40## 1. Introduction and main result

Suppose that is a function in such that is Lipschitz continuous, and let . We consider a (kinetic) Langevin diffusion with state space that is given by the stochastic differential equation

(1.1) | |||||

Here is a -dimensional Brownian motion that is defined on a probability space . Since the coefficients are Lipschitz continuous, a unique strong solution of the Langevin equation exists for any initial condition, and the solution gives rise to a strong Markov process with generator

(1.2) |

The corresponding Kolmogorov forward equation is the kinetic Fokker-Planck equation. Under the assumptions on imposed below, it can be verified that , and that the probability measure

(1.3) |

is invariant for the transition semigroup , see e.g. [38, Proposition 6.1].

In statistical physics, the Langevin equation (1.1) describes the motion of a particle with position and velocity in a force field subject to damping and random collisions [19, 45, 37, 30, 40]. In the physical interpretation, is the friction coefficient (per unit mass), and is the inverse mass. Discretizations of the Langevin equation are relevant for molecular dynamics simulations [31]. Hamiltonian Monte Carlo methods for sampling and integral estimation are based on different types of discrete time analogues to Langevin dynamics [12, 36, 31, 6]. In numerical simulations, often a better performance of these HMC methods compared to traditional MCMC approaches is observed, but the corresponding convergence acceleration is still not well understood theoretically.

For these and other reasons, an important question is how to obtain explicit bounds on the speed of convergence of the law of towards the invariant probability measure . Since the noise is only acting on the second component, the generator of the Langevin diffusion is degenerate, and thus classical approaches can not be applied in a straightforward way. Indeed, is a typical example of a hypocoercive operator in the sense of Villani [43, 44]. Several analytic approaches to convergence to equilibrium for kinetic Fokker-Planck equations have been proposed during the last 15 years [10, 18, 29, 27, 28, 43, 44, 11, 7, 35, 21, 22, 3]. These are based respectively on Witten Laplacians and functional inequalities, semigroup theory, and in particular on hypocoercivity methods, see also [23] for some explorations around the Gaussian case and the effect of hypoellipticity. There are only few articles which study the ergodic properties of Langevin processes using more probabilistic arguments, cf. [41, 46, 33, 39, 1, 5, 20]. Most of these results ultimately rely on arguments used in Harris’ type theorems, i.e., they assume a Lyapunov drift condition which implies recurrence of the process w.r.t. a compact set together with a control over the average excursion length. This condition is then combined with an argument showing that for starting points in the recurrent set, the transition probabilities are not singular w.r.t. each other. While the approaches are of a probabilistic nature, the behaviour of the process inside the recurrent set is not very transparent. Correspondingly, these approaches lead to qualitative rather than quantitative convergence results.

An open question asked by Villani in [42, Ch. 2, Bibliographical notes] is how to prove exponential convergence to equilibrium by a direct coupling approach. The motivation for this is two-fold: On the one hand, coupling methods often provide a good probabilistic understanding of the dynamics. On the other hand, couplings have been proven useful in establishing precise bounds on the long-time behaviour of non-degenerate diffusion processes [32, 8, 16, 13]. The only results for Langevin processes in this direction that we are aware of are rather restrictive: Under the assumption that the force field is a small perturbation of a linear function, Bolley, Guillin and Malrieu [5] use a synchronous coupling to show exponential mixing for (1.1) in Wasserstein distances. More recently, Gorham, Duncan, Vollmer and Mackey [20, Theorem 11] used synchronous coupling to derive a quantitative bound in the case of a strongly convex potential. Moreover, in [4, 2], couplings for the Kolmogorov diffusion have been considered. This process solves an equation similar to (1.1) without damping and with .

Here, and in the companion paper [17], we develop a novel coupling approach for Langevin equations that works for a much wider class of force fields. The intuition behind this approach can be described as follows: At first, we identify regions of the state space where the deterministic system corresponding to (1.1) admits a contraction property. We then use a mixture of synchronous coupling and reflection coupling to approximate a coupling which is sticky in these regions [14]. The speed of convergence is then measured in Kantorovich distances ( Wasserstein distances) by adapting and optimizing the underlying (semi-)metric w.r.t. the given model and the chosen coupling. Here, we basically follow the strategy developed in [15, 16, 13], which is partially based on ideas from [25, 26, 9]. In these works, exponential Kantorovich contractions have been established for non-degenerate overdamped Langevin equations, yielding explicit and, in several cases, sharp convergence rates under the assumption that the corresponding potential is strictly convex outside a ball [15, 16], and under a Lyapunov drift condition [13], respectively. Our approach is also related to the application of the strategy described above to infinite-dimensional stochastic differential equations with possibly degenerate noise in [48], cf. also [24, 34].

Besides providing an intuitive understanding for the mechanism of convergence to equilibrium, the coupling approach yields both qualitatively new, and explicit quantitative results in several cases of interest. Before explaining the coupling construction and stating the results in detail, we illustrate this by an example:

###### Example 1.1 (Double-well potential).

Suppose that is a Lipschitz continuous double-well potential defined by

and let be the rescaled potential with the same height for the potential well, but minima at distance . Then our main result shows that for any there exist a constant , a semimetric on , and a corresponding Kantorovich semimetric such that for all probability measures on ,

As a consequence, we also obtain convergence to equilibrium in the standard Wasserstein distance with the same exponential rate . Below, we give explicit lower bounds for the contraction rate. For example, if then

see Example 1.13 (with parameters , and ). In general, if the value of and are fixed (i.e., the friction coefficient is adjusted to the potential) then the contraction rate is of order , i.e., for a positive constant . This clearly reflects the kinetic behaviour, and it is in contrast to the rate for convergence to equilibrium of the overdamped limit .

### 1.1. Idea for coupling construction

We first explain the idea behind the construction of the coupling in an informal way. Suppose that is an arbitrary coupling of two solutions of the Langevin equation (1.1) driven by Brownian motions and . Then the difference process satisfies the stochastic differential equation

Introducing the new coordinates , the system takes the form

(1.4) | |||||

(1.5) |

Since , the first equation is contractive if . The key idea is now to apply a synchronous coupling whenever , and a reflection coupling if and with appropriate constants , cf. Figure 1. The synchronous coupling guarantees that the noise coefficient in (1.5) vanishes if , i.e., the dynamics is not driven away from the “contractive region” by random fluctuations (although it may leave this region by the drift). On the other hand, the reflection coupling for ensures that the contractive region is recurrent. The resulting coupling process is a diffusion on that is sticky on the -dimensional hyperplane of contractive states, i.e., it spends a positive amount of time in this region, cf. [14]. By designing a special semi-metric on that is based on a concave function of the distance and on a Lyapunov function, we can then (similarly as in [13]) make use of the random fluctuations and of a drift condition in order to derive average contractivity for the coupling distance .

### 1.2. Drift condition and Lyapunov function

We now make the following assumption that guarantees, among other things, that the process is non-explosive:

###### Assumption 1.2.

There exist constants and such that

(1.6) | |||||

(1.7) | |||||

(1.8) |

Notice that the assumption can only be satisfied if

(1.9) |

Up to the choice of the constants, the drift condition (1.8) is equivalent to the simplified drift condition (1.29) considered further below. It implies the existence of a Lyapunov function for the Langevin process. Indeed, let

(1.10) |

Note that since ,

(1.11) | |||||

In particular, as . Moreover:

###### Lemma 1.3.

If the drift condition (1.8) holds then .

The proof of the lemma is included in Appendix A. The choice of the Lyapunov function is motivated by Mattingly, Stuart and Higham [33], see also [41, 46, 1].

In combination with (1.11), the lemma shows that the process is decreasing on average in regions where .

### 1.3. Choice of metric

Next, we introduce an appropriate semi-metric on w.r.t. which the coupling considered below will be contractive on average. Inspired by [25], a similar semi-metric has been considered in [13]. For we set

(1.12) | |||||

(1.13) |

where are appropriately chosen positive constants, and is a continuous, non-decreasing concave function such that , is on for some constant with right-sided derivative and left-sided derivative , and is constant on . The function and the constants and will be chosen explicitly below in order to optimize the resulting contraction rates. For the moment let us just note that by concavity,

(1.14) |

For probability measures on we define

(1.15) |

where the infimum is over all couplings of and . We remark that and the corresponding transportation cost are semimetrics but not necessarily metrics, i.e., the triangle inequality may be violated. An important remark is that the distance can be controlled by the Lyapunov function. Indeed, let

(1.16) |

By (1.11) and since ,

for any . Hence for ,

(1.18) | |||||

(1.19) |

by Lemma 1.3. The bound (1.19) guarantees that for the coupling to be considered below, the process is decreasing on average if . We will show that by choosing the coupling and the parameters and defining the metric in an adequate way, we can ensure that is also decreasing on average (up to a small error term) for . As a consequence, we will obtain our basic contraction result.

### 1.4. Main result

We can now state our main result:

###### Theorem 1.4.

Suppose that Assumption 1.2 is satisfied. Then there exist constants and a continuous non-decreasing concave function with such that for all probability measures on ,

(1.20) |

where the contraction rate is given by

(1.21) | |||||

(1.22) |

Explicitly, one can choose the constants , and the function determining in such a way that

(1.23) |

and is constant on and on with

(1.24) |

###### Remark 1.5.

###### Remark 1.6.

We shortly comment on the requirement that is Lipschitz, cf. Assumption 1.2 further above. This condition is not necessary to conclude exponential convergence to equilibrium in Kantorovich distances, cf. [33, Theorem 3.2]. We have chosen to limit ourselves here to the Lipschitz case to concentrate on the key techniques rather than on tedious calculations. In the case of overdamped Langevin equations, the contraction results from [13] are extended in [47] replacing global Lipschitz bounds by local ones. In a similar spirit, it might be possible to extend the results presented here. However, optimizing and keeping track of the constants is more involved in this case.

The proof of Theorem 1.4 is given in Section 4. As a preparation, we introduce the relevant couplings in Section 2, and we apply these to derive a more general contraction result in Section 3. Theorem 1.4 will be obtained from this more general result by choosing the constants and in a specific way.

Theorem 1.4 directly implies convergence of the Langevin process to a unique stationary distribution with exponential rate :

###### Corollary 1.7.

In the setting of Theorem 1.4 there exists a constant such that for all probability measures on ,

(1.26) |

Here, denotes the standard Wasserstein distance w.r.t. the euclidean metric. In particular, is the unique invariant probability measure for the Langevin process, and converges towards exponentially fast with rate for any initial law such that . Here, the constant and the semimetric are given as in Theorem 1.4, and the constant can be chosen explicitly as

The proof is given in Section 4.

### 1.5. Bounds under simplified drift condition

In order to make the dependence of the bounds on the parameters more explicit, we now replace (1.8) by a simplified drift condition. Instead of Assumption 1.2, we assume:

###### Assumption 1.8.

There exist constants such that

(1.27) | |||||

(1.28) | |||||

(1.29) |

Observe that (1.27) may be assumed w.l.o.g. by subtracting a constant and shifting the coordinate system such that the global minimum of (which exists if (1.29) holds) is attained at . The Lipschitz condition (1.28) has been assumed before, and up to the values of the constants, the drift condition (1.29) is equivalent to (1.8). This condition guarantees that the marginal of the invariant probability measure (1.3) concentrates on balls of radius . Notice that if (1.28) and (1.29) are both satisfied then

(1.30) |

###### Lemma 1.9.

The proof is included in Appendix A. The lemma shows that if Assumption 1.8 holds with fixed constants , then there is a lower bound for the contraction rate in Theorem 1.4 that only depends on the natural parameters , , and . The bound is particularly nice if there is sufficient damping:

###### Corollary 1.10.

The proof of the corollary is given in Section 4. Bounds of the same order as in (1.34) hold if the constant is replaced by any other strictly positive constant. The specific value has been chosen in a somehow ad hoc way in order to obtain relatively small constants in the prefactors.

###### Remark 1.11 (Kinetic behaviour, Hamiltonian Monte Carlo).

Corollary 1.10 shows that by adjusting the friction coefficient appropriately, one can obtain a kinetic lower bound for the contraction rate: If is chosen such that the value of is a given constant, and the parameters and are fixed as well (i.e., is within a fixed range), then the lower bound for in (1.34) is of order . This should be relevant for MCMC methods based on discretizations of Langevin equations [36, 31, 6], because it indicates that by adjusting appropriately, one can improve on the diffusive order for the convergence rate to equilibrium.

Before discussing the parameter dependence of the lower bounds for the contraction rate that have been stated above, we check the quality of the bounds in the linear case and for drifts that are linear outside a ball:

###### Example 1.12 (Linear drift).

Suppose that . Then (1.1) reads

(1.35) |

Applying Corollary 1.10 with and shows that for ,

(1.36) |

Lower bounds of a similar order can be derived from Theorem 1.4 if is bounded from above by a fixed constant. On the other hand, the linear Langevin equation (1.35) can be solved explicitly. The solution is a Gaussian process. By [38, Section 6.3], the spectral gap of the corresponding generator is

(1.37) | |||

(1.38) |

The spectral gap provides an upper bound for the contraction rate . For example, for and , we obtain the lower bound

(1.39) |

for the contraction rate, whereas the upper bound given by the spectral gap is

###### Example 1.13 (Multi-well potentials, linear drift outside a ball).

Assumption 1.8 with is satisfied for the one-dimensional double-well potential

and also for the triple-well potential . Here, for , Corollary 1.10 yields the lower bounds

(1.40) | |||||

for the contraction rate. Again, lower bounds of similar order hold if is bounded from above by a fixed constant. More generally, we obtain corresponding bounds if is a potential satisfying conditions (1.27) and (1.28), and there exist constants and with such that for . The lower bound is of order if is fixed and is bounded from above by a fixed constant.

We stress that in low dimensions, we can obtain numerical values for our lower bounds that are in reach for current computer simulations. This is quite remarkable because we have lost some factors during our estimates. For high dimensions, our bounds deteriorate rapidly.

### 1.6. Parameter dependence of lower bounds for the contraction rate

We now discuss the parameter dependence of the bounds derived above, and we compare our results to previously derived bounds on convergence to equilibrium for kinetic Fokker-Planck equations.

Let us first recall that the computation of the spectrum shows that in the linear case, there are two different regimes, cf. [38]. For (underdamped regime), the spectral gap (1.37) is a linear function of . In this case, the friction coefficient is so small that the rate of convergence to equilibrium is determined by . Conversely, for , the spectral gap is a decreasing function of . In this regime, the rate of convergence to equilibrium is determined by the transfer of noise from the -variable to the -variable. If increases then the noise is damped more strongly before it can be transferred to the -component, and hence the rate of convergence decreases. In particular, the spectral gap as a function of has a sharp maximum for .

Comparing (1.38) and
(1.21),
we see that our general lower bound for the contraction
rate contains similar terms as the bounds in (1.38).
However, these terms are multiplied by constants that again
depend on the parameters and . We now discuss
the parameter dependence of the lower bounds in different regimes:

(overdamped case).
As , the lower bound in (1.34) is
of order . This
differs from the order
of the spectral gap in the linear case by a factor .

fixed (kinetic case).
If the friction coefficient is chosen such that the value of
is a given constant, and the parameters and are fixed as well, then
the lower bound in (1.34) is of order .

(underdamped case).
For large values of , our bounds for the
contraction rate in Lemma 1.9 are considerably worse
than the order of the spectral gap in the linear
case. This is not surprising, because for small it is not
clear, how the contractive term in (1.1)
can make up for the term with a potential
that may be locally very non-convex.

.
For large values of , the lower bound in
(1.34) degenerates exponentially in this parameter. This
is natural because could be a double-well
potential with valleys of depth , see the example above.

.
Our bounds depend exponentially on the dimension. In the general
setup considered here, this is unavoidable. An important open
question is whether a better dimension dependence can be obtained
for a restricted class of models. For overdamped Langevin diffusions,
corresponding results have been obtained for example in
[16, 48, 47].

Let us now set our results in relation to explicit bounds for convergence to equilibrium of kinetic Fokker-Planck equations that have been obtained in [29, 44, 11] by analytic methods. These results are not directly comparable, because they quantify convergence to equilibrium in different distances (e.g. in weighted or Sobolev norms, or in relative entropy). Moreover, the constants have not always been tracked as precisely as here. Nevertheless, it seems plausible to compare the orders of the convergence rates. Already in [29, Theorem 0.1], Hérau and Nier have derived a nice explicit bound on the convergence rate in weighted Sobolev spaces under quite general conditions. However, even in the linear case, this bound is far from sharp in the overdamped regime and at the boundary between the overdamped and the underdamped regime. In particular, it seems not to be able to recover kinetic behavior for properly adjusted friction coefficients. Most of the more recent works are based on extensions of Villani’s hypocoercivity approach [44]. In particular, explicit bounds are given in [44, Section 7] in norms, corresponding bounds in norms can be deduced from [11], and recently some Wasserstein bounds have been derived in [35]. However, we have not been able to recover a similar behaviour as above for the bounds on the convergence rates in these results.

## 2. Coupling and evolution of coupling distance

We fix a positive constant . Eventually, we will consider the limit . In order to couple two solutions of (1.1), we consider the following SDE on :

(2.1) | |||||

Here and are independent Brownian motions,

(2.2) | |||||

(2.3) |

Moreover, are Lipschitz continuous functions such that ,

(2.4) | |||||

(2.5) |

The values of the constants will be fixed later. Note that by (2.4), is a Lipschitz continuous function of . Therefore, existence and uniqueness of the coupling process holds by Itô’s theorem. Moreover, by Lévy’s characterization, for any solution of (2.1),

are again Brownian motions. Thus (2.1) defines indeed a coupling of two solutions of (1.1). For and , the coupling is a reflection coupling, whereas for and it is a synchronous coupling. By (2.4) and (2.5), we choose and such that the synchronous coupling is applied when is large or is close to , and the reflection coupling is applied otherwise. Ideally, we would like to choose . Since the indicator function is not continuous, we consider Lipschitz continuous approximations instead.

The processes and satisfy the following equations:

(2.6) | |||||

(2.7) | |||||

(2.8) |

Note that in particular, is contractive when , and would be a local martingale if would be constant. We set

(2.9) | |||||

(2.10) | |||||

(2.11) |