Error Analysis of Modified Langevin Dynamics
We consider Langevin dynamics associated with a modified kinetic energy vanishing for small momenta. This allows us to freeze slow particles, and hence avoid the re-computation of inter-particle forces, which leads to computational gains. On the other hand, the statistical error may increase since there are a priori more correlations in time. The aim of this work is first to prove the ergodicity of the modified Langevin dynamics (which fails to be hypoelliptic), and next to analyze how the asymptotic variance on ergodic averages depends on the parameters of the modified kinetic energy. Numerical results illustrate the approach, both for low-dimensional systems where we resort to a Galerkin approximation of the generator, and for more realistic systems using Monte Carlo simulations.
Keywords:Langevin dynamics Variance reduction Ergodicity Functional estimates Linear response
A fundamental purpose of molecular simulation is the computation of macroscopic quantities, typically through averages of functions of the variables of the system with respect to a given probability measure , which defines the macroscopic state of the system. In the most common setting, the probability measure with respect to which averages are computed corresponds to the canonical ensemble (see for instance Tuckerman (2010)). Its distribution is defined by the Boltzmann-Gibbs density, which models the configurations of a conservative system in contact with a heat bath at fixed temperature. Numerically, high-dimensional averages with respect to are often approximated as ergodic averages over realizations of appropriate stochastic differential equations (SDEs):
A typical dynamics to this end is the Langevin dynamics
where is a standard Wiener process, the potential energy function, a friction coefficient, the mass matrix of the system, and is proportional to the inverse temperature (see Section 2 for more precise definitions). For references on the ergodicity of Langevin dynamics, we refer the reader to Talay (2002) and Mattingly et al. (2002), for instance.
There are two main sources of error in the computation of average properties such as through time averages as in (1): (i) a systematic bias (or perfect sampling bias) related to the use of a discretization method for the SDEs (and usually proportional to a power of the integration step size ), and (ii) statistical errors, due to the finite lengths of the sampling paths involved and the underlying variance of the random variables. The first point was studied in Leimkuhler et al. (2015) for standard Langevin dynamics. Our focus in this work is on the statistical error.
Statistical errors may be large when the dynamics is metastable, i.e. when the system remains trapped for a very long time in some region of the configuration space (called a metastable region) before hopping to another metastable region. Metastability implies that the convergence of averages over trajectories is very slow, and that transitions between metastable regions (which are typically the events of interest at the macroscopic level) are very rare. In fact, metastability arises from the multi-modality of the probability measure sampled by the dynamics. We refer for instance to Lelièvre (2013) for a review on ways to quantify the metastability of sampling dynamics. There are various strategies to reduce the variance of time averages by reducing the metastability. The most famous one is importance sampling: the potential energy function is modified by an additional term so that the Langevin dynamics associated with is less metastable. An automatic way of doing so is to consider a so-called reaction coordinate, and define as the opposite of the associated free energy (see Lelièvre et al. (2010); Lelièvre and Stoltz (2015) for further precisions).
We explore here an alternative path, which consists in modifying the kinetic energy rather than the potential energy. Indeed, recall that the difficult part in sampling the canonical measure is in sampling positions (see Section 2 for a more precise discussion of this point). There is therefore some freedom in the choice of the kinetic energy if the goal is to compute average properties.
Previous works in this direction focused on changing the mass matrix in order to increase the time steps used in the simulation (see e.g. Bennett (1975); Plechac and Rousset (2010)). The mathematical analysis we provide is inspired by a recent work by Artemova and Redon (2012) where the kinetic energy of each particle is more drastically modified: it is set to 0 when the particle’s momenta are small, while it remains unchanged for larger momenta. In such adaptively restrained (AR) simulations, particles may become temporarily frozen, while their momenta may continue to evolve. Since, in many cases, inter-particle forces only depend on relative particle positions, and hence do not have to be updated when particles are frozen, adaptively restrained particle simulations may yield a significant algorithmic speed-up when a sufficiently large number of particles are frozen at each time step (or, more generally, when inter-particle distances remain constant and particle forces are expressed in local reference frames). This has been demonstrated in several contexts, e.g. for modeling hydrocarbon systems (Bosson et al. (2012)), proteins (Rossi et al. (2007)), and for electronic structure calculations (Bosson et al. (2013)).
Unfortunately, freezing particles even temporarily may make iterates more correlated, which may translate into an increase of the statistical error observed for modified Langevin dynamics, compared to the statistical error observed for standard Langevin dynamics. The actual speed-up of the method, in terms of the total wall-clock time needed to achieve a given precision in the estimation of an observable, should therefore be expressed as:
Our aim here is thus to quantify the increase in the variance as a function of the parameters of the modified kinetic energy. In fact, a first task is to prove that the Langevin dynamics with modified kinetic energy is indeed ergodic, and that the variance is well defined. This is unclear at first sight since the modified dynamics fails to be hypoelliptic (see the discussion in Section 3.1).
This article is organized as follows. In Section 2, we introduce the modified Langevin dynamics we consider, and present the particular case of the AR-Langevin dynamic. The ergodicity of these dynamics is proved in Section 3, both in terms of almost-sure convergence of time averages along a single realization, and in terms of the law of the process. We also provide a result on the regularity of the evolution semi-group, adapted from similar estimates for standard Langevin dynamics in Talay (2002). Such estimates allow us to analyze the statistical error in Section 4. We state in particular a Central Limit Theorem for , and perform a perturbative study of the asymptotic variance of the AR-Langevin dynamics in some limiting regime. Our theoretical findings are illustrated by numerical simulations in Section 5, both in a simple one-dimensional case where the variance can be accurately computed using an appropriate Galerkin approximation, as well as for a more realistic system for which we resort to Monte-Carlo simulations. The proofs of our results are gathered in Section 6.
2 Modified Langevin dynamics
We consider a system of particles in spatial dimension , so that the total dimension of the system is . The vectors of positions and momenta are denoted respectively by and . Periodic boundary conditions are used for positions, so that the phase-space of admissible configurations is with , being the one-dimensional unit torus and the size of the simulation box.
In order to possibly increase the rate of convergence of the ergodic averages (1), we modify the Langevin dynamics (2) by changing the kinetic energy. More precisely, instead of the standard quadratic kinetic energy
we introduce a general kinetic energy function . The total energy of the system is then characterized by the Hamiltonian
In order to ensure that the measure can be normalized, and in order to simplify the mathematical analysis, we make in the sequel the following assumption.
The potential energy function belongs to , and grows sufficiently fast at infinity in order to ensure that .
The Langevin dynamics associated with a general Hamiltonian reads
where is a standard -dimensional Wiener process and is the friction constant. For the separable Hamiltonian (4), the general Langevin dynamics simplifies as
The generator of the process (5) reads
A simple computation shows that the canonical distribution
is invariant under the dynamics (5), i.e. for all functions with compact support,
Note that, in view of the separability of the Hamiltonian, the marginal of the distribution in the position variables is, for any kinetic energy ,
In particular, this marginal distribution therefore coincides with the one of the standard Langevin dynamics (2). This allows to straightforwardly estimate canonical averages of observables depending only on the positions with the modified Langevin dynamics (5). In fact, there is no restriction in generality in considering observables depending only on the positions, since general observables depending both on momenta and positions can be reduced to functions of the positions only by a partial integration in the momenta variables. This partial integration is often very easy to perform since momenta are independent Gaussian random variables under the canonical measure associated with the standard kinetic energy.
2.1 AR-Langevin dynamics
A concrete example for the choice of the kinetic energy function in (4) is the one proposed for the adaptively restrained Langevin dynamics in Artemova and Redon (2012). It is parameterized by two constants . In this model, the kinetic energy is a sum of individual contributions
For large values of momenta, the modified individual kinetic energies are equal to the standard kinetic energy of one particle, but they vanish for small momenta:
An appropriate function allows to smoothly interpolate between these two limiting regimes (see Definition 1 for the precise expression). A possible choice of an individual kinetic energy , as well as the associated canonical distribution of momenta are depicted in Figure 1 and Figure 2 when .
The interest of AR-Langevin dynamics is that, when their individual kinetic energies are sufficiently small, particles do not move. When two particles are frozen in this way, their pairwise interactions need not be updated. This allows decreasing the computational complexity of the force computation, which is typically the most time-consuming part of a molecular dynamics solver. Note that this can be generalized to higher-order interactions (such as three-body interactions based on bending angles for instance).
Note that, due to the additive structure of the kinetic energy, the momenta are independent and identically distributed (i.i.d.) under the canonical measure. It is however possible to choose different parameters and for different particles, for example to focus calculations on a specific part of the particle system, in which case the momenta are still independent but not longer identically distributed. Such a situation is considered in the numerical example presented in Section 5.2.
3 Ergodicity of the modified Langevin dynamics
There are several notions of ergodicity for stochastic processes. We focus here on two of them: the convergence of ergodic averages over a single trajectory, and the convergence of the law of the process.
3.1 Convergence of ergodic averages
The convergence of ergodic averages over one trajectory is automatically ensured by the existence of an invariant probability measure and the irreducibility of the dynamics (see for instance Kliemann (1987); Meyn and Tweedie (1993) for early results on such convergences for possibly degenerate diffusions). Since, by construction, an invariant probability measure is known (namely the canonical measure (7)), it suffices to show that the process generated by the modified Langevin equation is irreducible to conclude to the convergence of ergodic averages.
As reviewed in Rey-Bellet (2006), the most standard argument to prove the irreducibility of degenerate diffusions is to prove the controllability of the dynamics relying on the Stroock-Varadhan support theorem, and the regularity of the transition kernel thanks to some hypoellipticity property. These conditions are satisfied for standard Langevin dynamics (see for instance Mattingly et al. (2002)), but not for the modified Langevin dynamics we consider, since the Hessian of the kinetic energy function may not be invertible on an open set. This is the case for the AR kinetic energy function presented in Section 2.1.
To illustrate this point, let us show for instance how the standard way of proving hypoellipticity fails (the proof of the controllability faces similar issues). The first task is to rewrite the generator (6) of the process as
and is the adjoint of on the flat space . We next compute, for , the commutators
When is invertible, it is possible to recover the full algebra of derivatives by an appropriate combination of and . Here, we consider a situation when this is not the case and, even more dramatically, where the Hessian may vanish on an open set. In this situation, on the same open set, and in fact all iterated commutators also vanish.
We solve this problem by a direct constructive approach, where we see the modified dynamics as a perturbation of the standard Langevin dynamics. We rely on the following assumption:
The kinetic energy function of the modified Langevin dynamics is such that
for some constant .
Under this assumption, we can prove that the modified Langevin dynamics is irreducible by proving an appropriate minorization condition, which crucially relies on the compactness of the position space (see Section 6.2 for the proof).
Lemma 1 (Minorization condition)
Suppose Assumption 3.1 holds. Then for any fixed and , there exists a probability measure on and a constant such that, for every Borel set ,
with when .
The minorization condition implies the irreducibility of the dynamics, so that the following convergence result readily follows.
Theorem 3.2 (Convergence of ergodic averages)
When Assumption 3.1 holds, ergodic averages over trajectories almost surely converge to the canonical average:
3.2 Convergence of the law
There are various functional frameworks to measure the convergence of the law of the process. We consider here weighted estimates on the semi-group . More precisely, we introduce a scale of Lyapunov functions
for . Recall indeed that only momenta need to be controlled since positions remain in a compact space. The associated weighted spaces are
In order to prove the exponential convergence of the law, we rely on the result of Hairer and Mattingly (2011), which states that if a Lyapunov condition and a minorization condition hold true, then the sampled chain converges exponentially fast to its steady state in the following sense.
Theorem 3.3 (Exponential convergence of the law)
Suppose that Assumption 3.1 holds. Then the invariant measure is unique, and for any , there exist constants such that
As mentioned above, the proof of this result directly follows from the arguments of Hairer and Mattingly (2011). The minorization condition is already stated in Lemma 1, while the appropriate Lyapunov condition reads as follows (see Section 6.1 for the proof, which uses the same strategy as Leimkuhler et al. (2015) and Joubaud et al. (2015)).
Lemma 2 (Lyapunov Condition)
Suppose that Assumption 3.1 holds. Then, for any and , there exist and such that
3.3 Regularity results for the evolution semi-group
We provide in this section decay estimates for the spatial derivatives of , following the approach pioneered in Talay (2002) and further refined in Kopec (2013). Such estimates were obtained for the standard Langevin dynamics, but can in fact straightforwardly be extended to modified Langevin dynamics with Hessians bounded from below by a positive constant. Our aim in this section is to provide decay estimates for the spatial derivatives of in the situation when fails to be strictly convex, for instance because vanishes on an open set as is the case for AR particle simulations.
In order to state our results, we first need to define the weighted Sobolev spaces for :
These spaces gather all functions which grow at most like , and whose derivatives of order at most all grow at most like . We also introduce the space of smooth functions , the vector space of functions such that, for any , there exists for which .
We also make the following assumption on the kinetic energy function, which can be understood as a condition of “almost strict convexity” of the Hessian .
The kinetic energy has bounded second-order derivatives:
and there exist a function and constants and such that
By following the same strategy as in (Kopec, 2013, Proposition A.1.) (which refines the results already obtained in Talay (2002)), and appropriately taking care of the lack of strict positivity of the Hessian by assuming that is sufficiently small, we prove the following result in Section 6.3.
The parameter can in fact be made explicit, see (36) below. The decay estimate (13) shows that the derivatives of the evolution operator can be controlled in appropriate weighted Hilbert spaces. Note however that the Lyapunov functions entering in the estimates are not the same a priori on both sides of the inequality (13). Let us emphasize, though, that we can obtain a control in all spaces for sufficiently large (depending on the order of derivation).
4 Analysis of the statistical error
The asymptotic variance characterizes the statistical error. In Section 4.1, we show that the asymptotic variance is well defined for the modified Langevin dynamics. We can in fact prove a stronger result, namely that a Central Limit Theorem (CLT) holds true for ergodic averages over one trajectory. In a second step, we more carefully analyze in Section 4.2 the properties of the variance of the AR-Langevin dynamics by proving a linear response result in the limit of a vanishing lower bound on the kinetic energies. To obtain the latter results, we rely on the estimates provided by Lemma 3.
4.1 A Central Limit theorem for ergodic averages
Let us first write the asymptotic variance in terms of the generator of the dynamics. To simplify the notation, we introduce the orthogonal projection onto the orthogonal of the kernel of the operator (with respect to the scalar product): for any ,
Since , we can define . The ergodicity result (9) allows us to conclude that the operator is invertible on since the following operator equality holds on , the Banach space of bounded operators on :
This leads to the following resolvent bounds (the second part being a direct corollary of Lemma 3).
This already allows us to conclude that the asymptotic variance of the time average defined in (1) is well defined for any observable since
by the dominated convergence theorem. Therefore,
In fact, a Central Limit Theorem can be shown to hold for using standard results (see e.g. Bhattacharya (1982)).
4.2 Perturbative study of the variance for the AR-Langevin dynamics
Our aim in this section is to better understand, from a quantitative viewpoint, the behavior of the asymptotic variance for the AR-Langevin dynamics defined in Section 2.1, at least in some limiting regime where the parameter is small. For intermediate values, we need to rely on numerical simulations (see Section 5).
The regime where both and go to 0 is somewhat singular since the transition from to becomes quite abrupt, which prevents a rigorous theoretical analysis. The regimes where either or go to infinity are also of dubious interest since the dynamics strongly perturbs the standard Langevin dynamics. Therefore, we restrict ourselves to the situation where with fixed.
In order to highlight the dependence of the AR kinetic energy function on the restraining parameters , we denote it by in the remainder of this section. Let us however first give a more precise definition of this function, having in mind that is fixed while eventually goes to 0. We introduce to this end an interpolation function such that
We next define an interpolation function obtained from the function by an appropriate shift of the lower bound and a rescaling. More precisely, with
A plot of is provided in Figure 3.
Definition 1 (AR kinetic energy function)
For two parameters , the AR kinetic energy function is defined as
where the individual kinetic energy functions are
Of course, converges to as . The limiting kinetic energy function corresponds to what we call the Zero--AR-Langevin dynamics (see Figure 4 for an illustration). Let us emphasize that the limiting dynamics is not the standard Langevin dynamics, so that the expansion in powers of of the variance we provide is with respect to the limiting variance of the dynamics corresponding to . To simplify the notation, we denote by the variance associated with the kinetic energy .
There exists such that, for any , there is a constant for which
The proof can be read in Section 6.4. The assumption that is sufficiently small ensures that Assumption 3.4 holds (see Section 6.4.3). The result is formally clear. The difficulty in proving it is that the kinetic energy is not a smooth function of because the shift function is only piecewise smooth.
An inspection of the proof of Proposition 1 shows that the linear response result can be generalized to non-zero values of and in fact to linear responses in the parameter as well. For the latter case, we consider . Denoting now by the variance associated with the kinetic energy , it can be proved that, for not too large, there are such that, for sufficiently small,
5 Numerical results
The aim of this section is to quantify the evolution of the variance of AR-Langevin dynamics as the parameters of the kinetic energy function are modified. We first consider in Section 5.1 a simple system in spatial dimension 1, for which the variance can be very precisely computed using a Galerkin-type approximation. We next consider more realistic particle systems in Section 5.2, relying on molecular dynamics simulations to estimate the variance. In this section, the function is chosen to be of the form , with a fifth-order spline function.
5.1 A simple one-dimensional system
We first consider a single particle in spatial dimension , in the periodic domain and at inverse temperature . In this case, it is possible to directly approximate the asymptotic variance (16) using some Galerkin discretization, as in Risken (1984) or Latorre et al. (2013).
We denote by the generator of the modified Langevin dynamics associated with the AR kinetic energy function defined in (19), by the associated canonical measure, and by the projector onto functions of with average 0 with respect to .
For a given observable , we first approximate the solution of the following Poisson equation:
and then compute the variance as given by (16):
To achieve this, we introduce the basis functions , where (for ) and are the Hermite polynomials:
The choice of is natural in view of the spatial periodicity of the functions under consideration, while Hermite polynomials are eigenfunctions of the generator associated with the Ornstein-Uhlenbeck process on the momenta for the standard quadratic kinetic energy . Note however that, when the kinetic energy is modified as , the Hermite polynomials are no longer orthogonal for the scalar product.
We approximate the Poisson equation (22) on the basis
for given integers , and we look for approximate solutions of the form with
where is a vector of size . Restricting (22) to leads to
where is a matrix of size and a vector of size , whose entries respectively read
The approximated solution of the Poisson equation (22) can therefore be computed by solving (23). Note however that some care is needed at this stage since is not invertible on , because the basis functions are not of integral 0 with respect to . We correct this by performing a singular value decomposition of , removing the component of associated with the singular value 0, and computing the inverse of on the subspace generated by the eigenvectors associated with non-zero eigenvalues. In practice, we compute the entries of and by numerical quadrature. Since the Hermite polynomials are no longer orthogonal for the scalar product, quadratures are required both in position and momentum variables. The variance is finally approximated as
In the simulations presented in this section, the potential is , the observable under study is , and we always set . Figure 5 presents the convergence of the variance with respect to the basis size, for the standard Langevin dynamics and the AR Langevin dynamics with and various values of . The results show that the choice is sufficient in all cases to approximate the asymptotic value. We checked in addition in one case, namely for the standard dynamics, that the values we obtain are very close to a reference value obtained with : the relative variation is of order for , for and for . We therefore set in the remainder of this section.
The variation of the computed variance for is plotted in Figure 6 for various parameters of the AR-Langevin dynamics. Note that, as expected, the variance increases with increasing values of for fixed , but also with increasing values of for fixed . We next illustrate the linear response results of Proposition 1 and Remark 3 in Figures 7 and 8: in both situations, the variance increases linearly with the parameter under consideration is varied in a sufficiently small neighborhood of its initial value. After that initial regime, nonlinear variations appear. Note also that the relative increase of the variance is more pronounced as a function of than .
In practice, the idea usually is to set the lower bound sufficiently large when performing Monte Carlo simulations, in order to decrease as much as possible the computational cost. The gap should however not be too small in order to have a sufficiently smooth transition from a vanishing kinetic energy to a quadratic one. This requires therefore to be quite large if is large. The results presented in Figure 8 suggest that this may not be the optimal choice, unless the algorithmic speed-up is quite large.
5.2 A more realistic system
In order to study the variation of the variance as a function of and in systems of higher dimensions, we resort to Monte Carlo simulations. This requires discretizing the AR-Langevin dynamics (5), and we resort to a scheme of weak order 2, obtained by a splitting strategy where the generator of the modified Langevin dynamics (6) is decomposed into three parts:
The transition kernel obtained by a Strang splitting reads . Contrarily to the standard kinetic energy functions, the elementary evolution associated with cannot be integrated analytically. To preserve the order of the scheme, we approximate by a midpoint rule, encoded by a transition kernel satisfying for smooth test functions . This gives the following discretization scheme:
where are i.i.d. standard -dimensional Gaussian random variables. The first and the last line are obtained by implicit schemes, solved in practice by a fixed point strategy (the termination criterion being that the distance between successive iterates is smaller than , and the initial iterate being obtained by a Euler-Maruyama step). By following the same approach as in Leimkuhler et al. (2015), it can indeed be proved that this scheme is of weak order 2; see Stoltz and Trstanova (In preparation) for further precisions.
The ergodicity of some second-order schemes was proved for the standard Langevin dynamics in Leimkuhler et al. (2015). Since the AR-Langevin dynamics can be seen as a perturbation of the standard Langevin dynamics, it can be proved by combining the proofs from Leimkuhler et al. (2015) and the proof of Theorem 3.3 that, when are sufficiently small, the corresponding discretization of the AR-Langevin dynamics remains ergodic (see Stoltz and Trstanova (In preparation)). The corresponding invariant measure is denoted by . It also follows by the results of Leimkuhler et al. (2015) that the error on averages of smooth observables with respect to is of order 2, i.e. there exists such that
As already mentioned in Remark 4, the reduction of the gap between the parameters and reduces the smoothness of the transition between the restrained dynamics and the full dynamics. This raises issues in the stability of the scheme, which can be partly cured by resorting to a Metropolis-Hastings correction (Metropolis et al. (1953); Hastings (1970) and Stoltz and Trstanova (In preparation)).
The system we consider is composed of particles in dimension 2, so that and . The masses are set to 1 for all particles. Among these particles, two particles (numbered 1 and 2 in the following) are designated to form a dimer while the others are solvent particles. All particles, except the two particles forming the dimer, interact through the purely repulsive WCA pair potential, which is a truncated Lennard-Jones potential Straub et al. (1988):
where denotes the distance between two particles, and are two positive parameters and . The interaction potential between the two particles of the dimer is a double-well potential
where and are two positive parameters. The potential has two energy minima. The first one, at , corresponds to the compact state. The second one, at , corresponds to the stretched state. The total energy of the system is therefore, for with ,
where the solvent-solvent and dimer-solvent potential energies respectively read
We choose , , , , , and set the particle density to 0.56 in the numerical results presented in this section, sufficiently high to ensure that the solvent markedly modifies the distribution of configurations of the dimer compared to the gas phase.
The source of metastability in the system is the double-well potential on the dimer. In such a system, it makes sense to restrain only solvent particles (since they account for most of the computational cost), and keep the standard kinetic energy for the particles forming the dimer (since the observable depends on their positions). As noted in Remark 1, the method allows us to choose different individual kinetic energies for different particles. Since the solvent interacts with the dimer, we study how the variance of time averages of observables related to the configuration of the dimer, such as the dimer potential energy , depend on the restraining parameters chosen for the solvent particles. We also estimate the variance of time averages based on observables depending only on the solvent degrees of freedom, such as the solvent-solvent potential energy .
The asymptotic variance of time averages for a given observable is estimated by approximating the integrated auto-correlation function
where the expectation is with respect to initial conditions and all realizations of the AR Langevin dynamics. This is done by first truncating the upper bound in the integral by a sufficiently large time , and using a trapezoidal rule: