Efficiency of a micromacro acceleration method for scaleseparated stochastic differential equations
Abstract
We discuss through multiple numerical examples the accuracy and efficiency of a micromacro acceleration method for stiff stochastic differential equations (SDEs) with a timescale separation between the fast microscopic dynamics and the evolution of some slow macroscopic state variables. The algorithm interleaves a short simulation of the stiff SDE with extrapolation of the macroscopic state variables over a longer time interval. After extrapolation, we obtain the reconstructed microscopic state via a matching procedure: we compute the probability distribution that is consistent with the extrapolated state variables, while minimally altering the microscopic distribution that was available just before the extrapolation. In this work, we numerically study the accuracy and efficiency of micromacro acceleration as a function of the extrapolation time step and as a function of the chosen macroscopic state variables. Additionally, we compare the effect of different hierarchies of macroscopic state variables. We illustrate that the method can take significantly larger time steps than the inner microscopic integrator, while simultaneously being more accurate than approximate macroscopic models.
Keywords and phrases: Monte Carlo methods, Micromacro acceleration, stochastic differential equations, stiff differential equations
1 Introduction
Many applications in science and engineering are modeled with stochastic differential equations (SDEs)
(1) 
where is a diffusion process on a domain . The drift term has the same dimension as , the diffusion term and represents an dimensional Brownian motion. In practice however, we are often only concerned with the evolution of a few macroscopic state variables , which are defined as the expectation of some functions of interest over the distribution of , i.e.,
(2) 
In many situations, it is difficult or impossible to derive a closed model for the dynamics of the macroscopic states . Instead, we need to approximate the expectation in (2) at different points in time via Monte Carlo methods and simulate the SDE (1) for each of the generated samples.
In this work, we are concerned with SDE systems that are stiff, in the sense that individual microscopic paths vary over fast time scales, while the evolution of the macroscopic state variables takes place over much longer time scales. Due to the discrepancy in time scales, explicit time discretization methods, such as the EulerMaruyama scheme, are forced to take small time steps as they have a restricted stability domain. Hence, explicit methods are inadequate to simulate the macroscopic state variables up to a large end time .
Over the years, many multiscale methods have been proposed to overcome the problem of stiffness. We mention implicit methods [tian2001implicit, amiri2015class], SROCK [abdulle2008s], the equationfree framework [kevrekidis2009equation, kevrekidis2003equation] and the heterogeneous multiscale method (HMM) [weinan2003multiscale, abdulle2012heterogeneous]. Implicit methods have proven to be very successful for stiff ODEs since they allow for much larger time steps than their explicit counterparts, due to a much larger stability domain, [byrne1987stiff]. The cost of implicit time steppers is usually higher, since at every time step a (non)linear system needs to be solved. Nevertheless, implicit schemes are a major improvement over explicit time integrators on stiff ODEs because they allow taking much larger time steps. However, a problem arises when applying implicit techniques to SDEs [burrage2004numerical]. The authors of [li2008effectiveness] showed that these methods are unable to capture the probability distribution of the fast modes when taking large time steps. The idea of SROCK is to extend the stability domain maximally along the negative real axis by bounding the stability domain by a Chebyshev polynomial [abdulle2008s, abdulle2007stabilized, komori2012weak]. One can then construct a RungeKutta scheme that has this stability domain. A drawback is that SROCK schemes do not attain a high convergence order [komori2013strong].
Both the equationfree framework [kevrekidis2009equation] and the heterogeneous multiscale method [abdulle2012heterogeneous] can yield considerable simulation speedups. However, the convergence analysis is usually restricted to showing converge in the limit when the timescale separation becomes infinite. In that limit, the microscopic dynamics itself converges to a limiting macroscopic model, and both the equationfree method and HMM recover that limiting macroscopic model when the timescale separation becomes infinite[abdulle2012heterogeneous]. An alternative approach is to average out the fast dynamics, either analytically or numerically, to obtain a the evolution of the slow dynamics only [legoll2010effective, pavliotis2008multiscale]. All of these methods introduce a modeling error when the timescale separation is finite. However, in cases of large timescale separation, the limiting macroscopic model becomes accurate and this modeling error becomes small, compared to the time discretization error.
Recently, a new micromacro acceleration algorithm [debrabant2017micro] was introduced as an alternative to the above techniques. Micromacro acceleration exploits the stiffness of the SDE by introducing a second time step, besides the microscopic time step , which is used to simulate the stiff SDE (1). This second, much larger time step , is introduced to extrapolate the evolution of the macroscopic state variables . One time step of micromacro acceleration consists of four steps: (i) Monte Carlo simulation of the SDE (1) with a small time step ; (ii) restriction to approximate the macroscopic state variables (2) at every microscopic time step; (iii) extrapolation of the macroscopic state variables over a time interval of size ; and (iv) matching to generate a new microscopic probability distribution, consistent with the extrapolated macroscopic states, and with minimal deviation from a prior distribution (which we choose to be the final microscopic distribution obtained during the simulation stage). There are many ways of expressing the deviation of one probability distribution from another. One possibility is minimizing the norm between the matched and prior distribution, but this choice does not guarantee positivity of the matched distribution [debrabant2017micro]. In this text, we consider matching in KullbackLeibler divergence, which is based on notions from information theory [lelievre2018analysis]. We provide a more detailed mathematical description of the algorithm in Section 2.
It is shown that micromacro acceleration can converge to the exact dynamics of the SDE, even when fixing a finite timescale separation, for both matching in the norm [debrabant2017micro] and in KullbackLeibler divergence [lelievre2018analysis]. The conditions under which convergence has been demonstrated are the following: (i) the number of state variables must increase to infinity; and (ii) the time steps present (the microscopic time step and the extrapolation time step ), must tend to zero [lelievre2018analysis, debrabant2017micro]. Additionally, it was shown that the method is stable for the extrapolation step sizes independent of the fast time scales in the system [debrabant2018study].
The convergence analysis in [debrabant2017micro, lelievre2018analysis] reveals that the errors in the micromacro acceleration method that are caused by the combination of extrapolation and matching, can be viewed as a time discretization error, with a size that depends on the extrapolation time step. In contrast, equationfree [kevrekidis2009equation, kevrekidis2003equation] and heterogeneous multiscale methods [weinan2003multiscale, abdulle2012heterogeneous] introduce a modeling error, because they compute the time evolution of an approximate macroscopic model in which a numerical closure relation is introduced. Since micromacro acceleration is stable for time steps that do not depend on the fast time scales, the extrapolation time step can be chosen based on accuracy considerations only. Hence, we need to study accuracy as a function of extrapolation time step to assess the efficiency of the method. This study is exactly the aim of the current work.
The main parameters that determine the accuracy of micromacro acceleration are the following:

The number of macroscopic state variables . We may expect that for a larger value of , the matched distribution lies closer to the exact microscopic distribution, since the method converges as tends to infinity. However, matching is computationally more expensive when increases.

The choice of the functions of interest in equation (2) and their potential to capture the essential features of the underlying microscopic distribution. Two different hierarchies of macroscopic state variables may reach a different accuracy using the same number of state variables, due to the different nature of both hierarchies.

The extrapolation step size . When decreases, we may expect that micromacro acceleration is more accurate, but the computational cost increases.
Based on the arguments above, we choose to study the accuracy that can be reached for a given number of state variables and extrapolation step size . We investigate the choice of and on a set of representative toy examples. In particular, we investigate for which parameters and micromacro acceleration can be more accurate than the approximate macroscopic model (or numerical closure), while simultaneously being faster than a full microscopic simulation.
The text is organized as follows. In Section 2, we describe the micromacro acceleration algorithm in mathematical detail. We discuss some implementation issues in Section 3. Section 4 is devoted to the choice of the hierarchy and the number of macroscopic state variables to extrapolate. We focus on two problems from molecular dynamics: FENE dumbbells and a simple threeatom molecule. In Section 5, we study the accuracy of micromacro acceleration as a function of the extrapolation step , when fixing the number of macroscopic state variables. More specifically, we discuss two stochastic models with a timescale separation for which an approximate macroscopic model can be derived for the slow component of the system, in the limit of infinite timescale separation. The first example is an overdamped Langevin equation with a doublewell potential; the second is a linear system with an external timeperiodic force. We investigate for which extrapolation step size micromacro acceleration is more accurate than the corresponding approximate macroscopic models. Finally, Section 6 presents a concluding discussion.
2 A micromacro acceleration algorithm
In this Section, we introduce the micromacro acceleration algorithm and delineate each of its four steps. We define the microscopic timestepper that integrates the stochastic process (1) over small time steps using a weighted Monte Carlo ensemble in Section 2.1. In Section 2.2, we present the restriction operator to retrieve the macroscopic state variables from the microscopic Monte Carlo particle ensemble. Then, in Section 2.3, we introduce the extrapolation of the macroscopic state variables over a larger time step . Finally, Section 2.4 contains the description of the matching operator with which we return from the macroscopic to the microscopic level of description. Matching is performed by reweighting the Monte Carlo particle ensemble. We collect the complete algorithm in Section 2.5.
2.1 Monte Carlo simulation
Suppose at time we have a weighted particle ensemble , where the state variable denotes the th realization of the SDE (1) and is its associated weight. The particle ensemble determines an empirical probability distribution
that approximates the exact continuous probability distribution of the diffusion process (1) at time . In the first stage of micromacro acceleration, we perform microscopic time steps of size with a Monte Carlo time integrator. The propagation of the particle ensemble through the SDE (1) at time is given by a (possibly timedependent) transition operator
(3) 
where every particle ensemble determines the empirical probability distribution that approximates the exact probability distribution at time . For completeness, we define . For instance, the EulerMaruyama scheme propagates each particle as
with an dimensional standard normally distributed random variable. For the remainder of the manuscript, we will use the EulerMaruayama method as the inner time integrator.
2.2 Restriction
To transit from the microscopic to the macroscopic description of the diffusion process, we introduce the restriction operator that acts on a probability distribution as follows
(4) 
where is the vector of state functions that we introduced in equation (2). To ensure that the matched density in Section 2.4 has unit mass, we add an additional macroscopic state function to the vector of state functions from (2) without changing notation. In the context of Monte Carlo simulations, we approximate the continuous probability distribution by a discrete particle ensemble with an associated empirical probability distribution , and define a discrete version of the restriction operator as
(5) 
The macroscopic state variables at are then given by , .
2.3 Extrapolation
To extrapolate the macroscopic state variables over a time interval of size , we approximate time derivative using a finite difference approximation. Throughout this text, all experiments use linear extrapolation, for which the macroscopic state variables at time read
(6) 
Linear extrapolation of the macroscopic state variables mimics a forward Euler step of the unavailable macroscopic model for the macroscopic state variables. Extensions to higher order extrapolation methods are also possible and we refer to [lafitte2017high] for a treatment of higher order projective integration techniques for hyperbolic conservation laws.
2.4 Matching
To recover the microscopic level of description from the macroscopic level, we need to find a probability distribution that is consistent with the extrapolated states . A priori, however, many possible distributions can be consistent with . To resolve this illposedness, we introduce a matching procedure that finds a unique probability distribution that is consistent with the states, while minimizing a dissimilarity functional compared to some prior distribution. We choose the prior distribution to be the final distribution at time during the simulation stage in Section 2.1. We first introduce the matching operator acting on continuous probability distributions in Section 2.4.1. In Section 2.4.2 we explain how matching can be implemented on probability distributions represented by weighted particle ensembles.
2.4.1 The continuous matching operator
In our approach, matching is formulated as an optimization problem that reads
with the matched distribution that serves as the initial distribution for the simulation stage in Section 2.1. Note that the functional does not need to be a distance metric. There are many possible matching strategies that come with a different choice for , see [debrabant2017micro] for more details. In this work, we will use matching in KullbackLeibler divergence, or relative entropy, which is based on informationtheoretic considerations [lelievre2018analysis]. The objective is to minimize
(7) 
which has an analytic solution of the form
(8) 
where the Lagrange multipliers solve the nonlinear system
(9) 
In this text, we will employ the NewtonRaphson method to solve the system above to find the Lagrange multipliers [debrabant2017micro]. We give more details in Section 3. Minimizing the KullbackLeibler divergence (7) amounts to solving a dual system.
2.4.2 Matching with weighted particle ensembles
In the context of Monte Carlo simulations, we replace the prior distribution by the particle ensemble and the matched distribution by a new particle ensemble at time . We rewrite the matching formula on the ensemble level as
(10) 
where is the matching operator that minimizes the KullbackLeibler divergence (7).
By the multiplicative nature of matching in KullbackLeibler divergence (8), sampling from the matched distribution can be performed efficiently by reweighting the particle ensemble to represent the matched ensemble . The new weights read
(11) 
2.5 The complete micromacro acceleration algorithm
Algorithm 1 presents the micromacro acceleration algorithm in four steps, as defined in Sections 2.1 to 2.4.
Assume we have a microscopic ensemble at time , a microscopic step size , the extrapolation step size and a number of microscopic steps such that . Let be the end time of the simulation and the number of macroscopic state variables. The algorithm produces the microscopic ensemble in four steps:
(i) Monte Carlo simulation: simulate the microscopic ensemble over small inner steps of size
(ii) Restriction: compute the macroscopic states corresponding to these microscopic ensembles
(iii) Extrapolation: approximate the macroscopic state variables over a larger time interval with linear extrapolation
(iv) Matching: compute the new particle ensemble consistent with by reweighting the prior ensemble using the matching operator
and advance time with until the end time is reached.
3 Practical implementation details
To implement the micromacro acceleration algorithm in practice, several issues need to be addressed. In this Section, we focus on three aspects. First, in Section 3.1, we formulate an efficient implementation of the NewtonRaphson scheme to solve the nonlinear system (9) using the weighted particle ensemble available at the end of the simulation stage.
Second, due to the particle reweighting (11), some particles can end up with very large or very small weights. Such a variation in the weights can render the approximation of the macroscopic state variables in (5) inaccurate as some particles with small weights are essentially ignored. In Section 3.2, we go deeper into the problem of detecting a large variation in the weights. To alleviate this variation, we use a resampling strategy that randomly duplicates some particles and removes others in such way that, after resampling, all resulting particles have equal weights [hol2006resampling].
Finally, in practice it can happen that the extrapolated macroscopic states fall outside the domain of the matching operator , i.e., there is no probability distribution that is consistent with . We call such a phenomenon a matching failure. An efficient way of dealing with matching failures is decreasing the extrapolation step size such that the extrapolated states deviate less from the already available states in the simulation stage. The detection of matching failures and the adaptive time stepping strategy are discussed in Section 3.3.
3.1 An efficient formulation of the NewtonRaphson scheme
To compute the matched distribution , we need to find the Lagrange multipliers that solve the dual system of nonlinear equations (9), where we define
(12) 
Suppose we have the Lagrange multipliers at the th iteration of the NewtonRaphson scheme, . One NewtonRaphson iteration for the next set of Lagrange multipliers for the dual problem (9) reads
(13) 
where the gradient of is given by
In our setting, the prior probability distribution is only available as an empirical measure . We can approximate the function and its Jacobian using the prior particle ensemble as
(14) 
Note that since the prior particle ensemble is fixed, the solution to the system is also deterministic.
The cost of one NewtonRaphson iteration increases linearly with the number of particles , making matching the most expensive part of Algorithm 1. There exist other numerical techniques, such as a sparse approximation of the Jacobian [lucia1983explicit] or the BFGS algorithm [martinez2000practical] that could speedup matching. These methods usually require more iterations to converge to the correct multipliers, but the iterations require fewer computations. We use the NewtonRaphson procedure in all the numerical experiments in this manuscript.
As initial value for (13), we take . When there is no extrapolation, i.e. , the extrapolated macroscopic state variables are consistent with the prior distribution , i.e., . In this case, the optimal Lagrange multipliers equal 0 so that equations (13) and (9) are already solved by the initial guess. As stopping criterion we require that extrapolated moments are met, within a tolerance
where the tolerance is usually . In practice, we also impose a maximum number of NewtonRaphson iterations due to the possibility of matching failures, which we will explain further in Section 3.3.
3.2 Particle resampling
Besides the problem of matching failures, there is also the issue of increasing variance between particle weights. By the multiplicative form of the matching procedure (8) and particle reweighting (11), some Monte Carlo replicas may end up with a high weight after a few steps, while some may have a low weight. Particles with a small weight will be neglected in the discrete approximation of the macroscopic state variables (5), while a few particles with large weights will dominate the approximation. Hence, the discrete approximation to the macroscopic state variables becomes prone to statistical errors. This is in fact a wellknown issue in many particle filter methods [arulampalam2002tutorial].
To resolve the large variation in particle weights, we resample the particles from time to time. The technique we use in this paper is stratified resampling [hol2006resampling, debrabant2017micro]. The idea is to randomly discard and duplicate each of the particles in such way that the distribution of the particles is unchanged, and all particles have an equal weight after resampling. In practice, one divides the unit interval in equal pieces and generates a uniformly distributed random number in each subinterval
One then duplicates each particle , times where is chosen by counting how many bins of equal size in the weight covers [douc2005comparison]. Mathematically, we thus take . If we discard particle and if we duplicate the particle into more particles. When then the particle remains and we simply adjust its weight to . The above procedure ensures that the particle distribution remains unchanged, in expectation [douc2005comparison].
To determine whether the variation between the weights is too large, we propose to compute the discrete relative entropy of the weights compared to uniform weights ,
which is zero when all weights are equal. We propose, as a heuristic, to resample when this relative entropy exceeds a fixed value of . We choose this value to ensure that the variation in the weights is not too high and prevent inaccurate computations of the macroscopic states. In practice, we compute the relative entropy of the weights every five time steps of the micromacro acceleration algorithm, and we resample the weights if the entropy exceeds the chosen maximal value.
3.3 An adaptive timestepping strategy for matching failures
As we mentioned in the introduction of this Section, a problem arises when the extrapolated macroscopic state variables fall outside the domain of the matching operator . In this case, there is no probability distribution that is consistent with the extrapolated states, and the micromacro acceleration algorithm fails. We call such behavior a matching failure [lelievre2018analysis, debrabant2017micro]. When the extrapolation step is smaller, the extrapolated macroscopic state variables will lie closer to the macroscopic state variables of the prior distribution, for which there exists a probability distribution consistent with the prior state variables, namely the prior distribution itself. We can thus expect that, the smaller the extrapolation step, the more likely we will find a probability distribution consistent with . In practice, we thus perform micromacro acceleration with adaptive time stepping based on matching failures. If we detect a matching failure, we drastically decrease by half, and if matching succeeds we cautiously increase again by a factor .
The question remains how we can detect a matching failure in practice. When there is no probability distribution consistent with the extrapolated state variables , the NewtonRaphson solver will also fail to find Lagrange multipliers that solve the nonlinear system (9). We thus only need to monitor the convergence of the NewtonRaphson solver to determine whether there is a matching failure or not. We impose a maximum number of iterations for the nonlinear solver. If this maximum number is reached before convergence, we consider matching to have failed.
Imposing a maximum number of iterations before lowering the extrapolation time step size also has another beneficial effect. For larger , the extrapolated state variables will lie far from the prior macroscopic state variables so that the NewtonRaphson method will need more iterations to convergence compared to lower extrapolation step sizes. Taking two time steps with a lower could speed up the algorithm compared to one step with twice the extrapolation time step. In the light of performance, we can hence take the maximum number of NewtonRaphson iterations to be small. In this text, we allow 6 NewtonRaphson iterations before lowering the time step.
4 The influence of the choice of state variables
In this Section, we present two examples from molecular dynamics: FENEdumbbells (Section 4.1) and a threeatom molecule (Section 4.2). In both examples, we investigate the accuracy of micromacro simulation for different choices of macroscopic state variables. In Section 4.1, we introduce the model of FENEdumbbells and look at three different hierarchies of macroscopic state variables to approximate the quantities of interest. Section 4.2 introduces and discusses the threeatom molecule, where we compare the accuracy of micromacro acceleration to existing approximate macroscopic models for slow quantities of interest. We will show that even with some macroscopic state variables that are not entirely slow, the micromacro acceleration method can generate more accurate approximations to the exact dynamics than these approximate models. Finally, we present a short overarching discussion on the effect of the macroscopic state variables in Section 4.3.
4.1 Three state hierarchies for FENEdumbbells
FENE stands for ‘Finitely Extensible Nonlinear Elastic’, and FENEdumbbells model represents a dilute polymer solution in a solvent[laso1993calculation]. The state variable denotes the endtoend vector between two beads of the polymer chain, which are connected by a spring. As the polymers move through the solvent, they experience Stokes drag due to the velocity gradient of the ambient solvent, Brownian motion due to collisions with solvent molecules and a spring force
(15) 
due to intramolecular interactions. The set is the open ball of radius around the origin. The diffusion process for FENEdumbbells reads
(16) 
where is the Weissenberg number, the ratio of the characteristic relaxation time of the polymers in the solvent to the characteristic time of the solvent. We refer to [bird1987dynamics] for a derivation of SDE (16) and to [jourdain2003mathematical, jourdain2004existence] for the existence of a global solution to the SDE.
In practice, the FENEdumbbells process is usually coupled to the macroscopic NavierStokes equations that models the evolution of the solvent. This coupling happens through the Stokes drag and by adding a nonNewtonian term in the form of the stress tensor [masmoudi2008well]
In the following experiments, we will use a 1dimensional FENEdumbbells process, with parameters . We also define the ambient velocity gradient as in [keunings1997peterlin]
For this choice of the velocity gradient, one can show that hysteresis occurs between the stress tensor and the first even moment, as shown on the phase diagram in Figure 1. The hysteric effect is due to the interplay between the nonlinearities in the FENE model (16) and the dispersity between the elongations of the individual polymers. We refer to [sizaire1999hysteretic] for more details on this hysteric effect.
The objective of the numerical experiments in this Section is to study the accuracy of micromacro acceleration in approximating this hysteretic curve, and the effect of the choice of the macroscopic state variables on this accuracy. We propose three different hierarchies of macroscopic state variables and study the accuracy of the resulting approximations by micromacro acceleration. The micromacro acceleration algorithm has already been successfully applied to the FENEdumbbells diffusion process [debrabant2017micro], but a study of the accuracy of different hierarchies of macroscopic state variables has not yet been performed. The authors of [samaey2011numerical] have, however, already carried out a similar study in the equationfree context.
In the following experiments, we will use the same three hierarchies of macroscopic state variables as [samaey2011numerical], and investigate the accuracy of micromacro acceleration on the hysteretic curve and the stress tensor, as a function of the hierarchy and the number of macroscopic state variables per hierarchy. The families of macroscopic state variables are:

Hierarchy 1: Use the first even moments of the diffusion process. By symmetry of the spring force, the odd moments of the FENE process vanish, so there is no need to use these states for matching. The macroscopic state variables read

Hierarchy 2: Replace the final state variable from strategy 1 with the stress tensor itself. Since we want to approximate the evolution stress tensor, adding the latter as a state variable could improve the accuracy of micromacro acceleration. The second hierarchy of state variables hence reads

Hierarchy 3: Start with and add terms that pop up in the Taylor expansion of using Itô’s lemma. For each of those new terms that pop up in the evolution equation for , write down their evolution equation as well and keep adding new terms as macroscopic state variables. This way, the first four state variables that pop up are
(17) We refer to Appendix A for the derivation of the above macroscopic state variables.
For the numerical experiments, we compute the evolution of the stress tensor and the first even moment and also plot the hysteric curve in the phase space. We will use up to macroscopic state variables for the first two hierarchies and up to state variables for the third hierarchy (17). We take a maximal time step for the extrapolation time step with adaptive time stepping, since the expected gain is quite small due to a small relative timescale separation in the FENE model. Furthermore, we choose for all experiments and we use particles during the Monte Carlo simulations. The numerical results are summarized in Figures 2 to 4, for each of the three hierarchies of state variables, respectively.
With the first hierarchy of macroscopic state variables in Figure 2, micromacro acceleration does not capture the hysteresis curve well for . For in brown, the approximation to the exact hysteresis curve is somewhat better, but there is still a lot of room for improvement.
The second hierarchy of state variables, which includes the stress tensor, improves the approximation of micromacro acceleration in Figure 3. The green and purple lines enclose a larger area and the brown line approximates the hysteresis curve slightly better. There is however still no reasonable fit of the exact hysteresis curve when .
The third hierarchy of macroscopic states yields the best results. For , micromacro acceleration (in purple) follows the exact hysteresis curve closely. With the third hierarchy of state variables, we hence perform less computational work while obtaining a better approximation to the exact stress tensor and hysteresis curve. A priori selecting the set of macroscopic state variables has a profound impact on the accuracy and efficiency of micromacro acceleration. This conclusion is very similar to the one in the equationfree context [samaey2011numerical].
4.2 Reaction coordinates of a threeatom molecule
The threeatom molecule is a simple threedimensional molecular system, where the bonds between the individual atoms vibrate at a high frequency, compared to the frequency of global conformational changes of the molecule. Figure 5 depicts the simple molecule. To remove some degrees of freedom, atom A is restricted to move on the axis with coordinates and atom B is fixed at the origin. We define the coordinates for atom C as . The individual atoms are influenced by a potential that describes the interactions between the atoms, and also by collisions with the ambient solvent, modeled via Brownian motion. The magnitude of these collisions is proportional to the inverse temperature of the solvent.
The evolution of the atoms is given by an overdamped Langevin dynamics of the form[legoll2010effective]
(18) 
where is the inverse temperature. The potential energy that governs the drift of the individual atoms is given as
(19) 
where is the equilibrium length for the bonds and is the angle between atoms A and C. As we can see from the expression of the potential energy, the atom has two stable conformations where . Due to the timescale separation in the potential energy, expressed by the small scale parameter , the bonds between atoms A and B and atoms B and C vibrate quickly relative to the bimodal behavior of the angle , which is the variable of interest in this system. For the experiments in this section, we will use the same parameter values as in [legoll2010effective], i.e., and .
In Section 4.2.1 we describe how an approximate model, or an effective dynamics model can be constructed for a quantity of interest, or reaction coordinate, when the timescale separation is large. In Section 4.2.2 we choose two reaction coordinates and discuss the accuracy of the effective dynamics. In Section 4.2.3, we apply micromacro acceleration to the threeatom molecule with macroscopic state variables closely related to the quantity of interest (the angle ) and look at its efficiency gain. Finally, in Section 4.2.4, we look at the accuracy of micromacro acceleration with macroscopic state variables that are loosely coupled to .
4.2.1 Effective dynamics
When the parameter is small, a direct simulation of (18) becomes prohibitively costly. In many situations in molecular dynamics, however, we are usually not interest in the evolution of the complete stochastic system, but rather in the evolution of a lowdimensional set of reaction coordinates. For the threeatom molecule (18), we consider a scalar reaction coordinate . In [legoll2010effective], the authors propose to use a socalled effective dynamics model, or an approximate macroscopic model for the reaction coordinate . The effective dynamics is based on the conditional expectation of the state variable given a value of the reaction coordinate , and reads
(20) 
The drift and diffusion term are defined by the conditional expectations
(21)  
(22) 
where is the invariant measure of (18) with the normalization constant. We refer to [legoll2010effective] for convergence properties of the effective dynamics (20).
Analytic expressions for the functions and are usually hard or impossible to derive. Therefore, we precompute the functions on a grid of values and interpolate between these points when an intermediate value is required [legoll2010effective].
4.2.2 Choice of reaction coordinates
As we mentioned in the introduction of this Section, we are mostly interested in the possible conformations that the molecule can reside in. To this end, we define a first reaction coordinate to be the angle . Alternatively, we define a reaction coordinate as the distance squared between the two outer atoms A and C, i.e.,
The effective dynamics evolution equation for is readily given by the potential energy function (18), while for the second reaction coordinate, we compute the drift and diffusion terms, and on uniform grid of values between 0 and 5. The reaction coordinates seem very similar, as they both can capture the bimodal nature of the dynamics of . However, the accuracy of the resulting effective dynamics differ considerably. The numerical results for and , as reproduced from [legoll2010effective], are summarized in Figure 6. The microscopic time step is for all simulations and we use Monte Carlo replicas.
The effective dynamics based on follows the exact dynamics of the angle very well. On the other hand, the effective dynamics with does not follow the exact dynamics of at all. There is a large steadystate error between the effective dynamics and the exact microscopic dynamics. The authors of [legoll2010effective] attribute this error to the fact that is not orthogonal to the fast dynamics, while is. The a priori choice of which reaction coordinate to use, is thus of great importance.
4.2.3 Efficiency gain by micromacro acceleration
The choice of a good set of reaction coordinates to use can be very hard in more complex applications [legoll2010effective]. Micromacro acceleration can reduce this difficulty. We now perform the same numerical experiments as above with the micromacro acceleration method, and study when micromacro acceleration can improve on the results of the effective dynamics. In the following experiments, we extrapolate the mean of either or as a macroscopic state variable, for different extrapolation step sizes. The simulation results are summarized in Figure 7, with reaction coordinate on the left and on the right.
On the left plot of Figure 7, we see that micromacro acceleration closely follows the exact dynamics of the first reaction coordinate, for different extrapolation step sizes. Only a small transient error appears as increases. This is already an efficiency gain compared to the microscopic simulation: we obtain the same steadystate value while taking larger time steps. The right plot of Figure 7 contains a surprising result. Micromacro acceleration with extrapolating only the mean of completely eliminates the modeling error made by the effective dynamics of , even though the mean of contains some fast dynamics. This is the case for multiple extrapolation step sizes. One reason why micromacro acceleration is more accurate than the effective dynamics is that the microscopic dynamics is taken into account as well during the simulation stage. A complete explanation of the high level of accuracy for large is however not yet available.
We can conclude, numerically, that micromacro acceleration yields accurate simulations results when extrapolating either one of the two reaction coordinates. Micromacro acceleration attains the same level of accuracy or is more accurate than the effective dynamics. At the same time, we are able to bridge the gap in the timescale separation by taking extrapolation time steps that are about ten times larger than the microscopic time step. The timescale separation is approximately a factor of ten between the slow and ten fast dynamics, due to the particular choice of values for and . We thus gain in efficiency compared to both the effective dynamics and the microscopic time integrator.
4.2.4 A hierarchy of moments of and
Reaction coordinates and are very closely connected (or identical to) the angle , and micromacro acceleration yields very accurate results when extrapolating the mean of either reaction coordinates. We now consider a set of pure moments of the threeatom molecule (18) that also determines the angle uniquely and investigate the numerical results. Suppose we take the first and second moments of and as macroscopic state variables
Figure 8 depicts the numerical approximations by micromacro acceleration with these moments as macroscopic state variables, with the same large extrapolation time steps and particles.
The numerical results indicate that micromacro acceleration follows the exact microscopic dynamics of and well, for small extrapolation steps, i.e., . However, when the extrapolation step grows, micromacro acceleration remains unbiased but the statistical noise increases. The noise on in the right panel of Figure 8 is higher than the noise on . This effect is mainly due to the fact that we are not extrapolating any states of , such that can move freely. In the experiments in previous section, both and depend on and the noise is a lot lower. We leave the quantification of statistical error through micromacro acceleration for future work.
4.3 Conclusion on the number of macroscopic state variables
For both FENEdumbbells and the threeatom molecule, we can conclude that choosing a set of macroscopic state variables that is in close connection to some quantity of interest (the angle for the threeatom molecule and the stress tensor for FENEdumbbells) yields accurate simulation results. In case of FENEdumbbells, the third hierarchy of macroscopic states approximates the stress very well, with fewer moments than the other hierarchies of macroscopic state variables. Also, in case of the threeatom molecule, micromacro acceleration can improve on the effective dynamics for reaction coordinates that are related to the quantity of interest.
As a second observation, we need to ensure to extrapolate enough macroscopic state variables related to the quantity of interest to keep the statistical error low enough. When extrapolating only moments of atom in the threeatom molecule, the noise on the angle is smaller than on the distance since the variable can move around freely.
5 Impact of the extrapolation step size on accuracy
In this Section, we look in more detail into the effect of the extrapolation time step on accuracy. We assume that the macroscopic state variables are given, for example through an analysis similar to those in Section 4. In the following examples, we are interested in the maximal extrapolation step we can employ, such that the micromacro acceleration error remains within a given tolerance, specifically as a function of the timescale separation present in the system. For systems in which the timescale separation can be expressed in terms of a smallscale parameter , an approximate macroscopic model usually exists for the slow dynamics [pavliotis2008multiscale], that converges to the microscopic dynamics when to 0. The goal of this Section is to determine the maximal such that micromacro acceleration is more accurate than the approximate macroscopic model. In Section 5.1, we first investigate the efficiency gain on a slowfast system with a doublewell potential, and in Section 5.2, we discuss a simple linear SDE with a periodic external force.
5.1 A slowfast bimodal system
As the first example, we consider a slowfast system, where the fast component is governed by a doublewell potential
(23)  
The potential energy function is depicted in Figure 9.
With this doublewell potential, the fast component will reside in one well for some time and then switch at a random instant in time to the other well due to the Brownian motion term in the evolution equation for . The mean switching time between the two wells depends on and is derived in detail in [bruna2014model]. Figure 10 depicts the switching process of one particle for a small (left) and a large (right). For large timescale separations (small ), the fast component switches frequently between the wells, so that the slow component (in red) is not heavily influenced by the switching. The slow component only feels the invariant distribution of the position of the fast component. For low timescale separations, the fast component resides longer in one well before switching to the other. In this case, the slow component feels in which well the fast component resides and evolves accordingly.
The slowfast bimodal system in this section differs from the bimodal behavior of the threeatom molecule. In the current example, the bimodal motion is fast while in the previous example it is slow. Also, the effective dynamics from the threeatom molecule (20) is not applicable here since there is an present in the diffusion term of (23).
In Section 5.1.1, we introduce an approximate macroscopic model for the slow component of general slowfast systems, based on averaging out the fast component. We show the accuracy of the approximate macroscopic model for small and larger values of . Finally, in Section 5.1.2, we discuss the accuracy and efficiency gain of micromacro acceleration compared to the approximate macroscopic model, applied to the bimodal system (23).
5.1.1 An approximate macroscopic model for general slowfast systems
For general slowfast systems, one can derive an approximate macroscopic model for the slow component by averaging out the mean of the fast dynamics , given value of the slow component . We then substitute the averaged value of into the evolution equation for the slow component. In case of the bimodal system, the fast dynamics is autonomous and the invariant measure is [pavliotis2008multiscale]
The mean of the , given a value of equals zero
so that the approximate macroscopic model for the bimodal system (23) reads
(24) 
Equation (24) converges weakly to (23) when decreases to 0 [pavliotis2008multiscale], implying that the approximate macroscopic model makes a modeling error whenever . The reason for a modeling error can easily be identified from the left plot of Figure 10. Indeed, as we mentioned, for larger the fast mode heavily influences the slow mode, so that we cannot just replace the fast behavior by its mean.
There is a large variance in induced by the switching of that we ignore by substituting the mean of for itself. When is lower, the error also lowers as we identified in the right plot of Figure 10. This induced variance is illustrated in Figure 11, which shows the variance of the slow component of (23) as a function of time, for (left) and (right). The step size is for both models is and the initial condition is far from the equilibrium distribution of the system.
The left plot of Figure 11 indicates a large steadystate error on the slow variance for moderate ; this error is also present for but a lot smaller. With the micromacro acceleration method, we aim to reduce this steadystate error, especially for moderate timescale separations.
5.1.2 Efficiency gain by micromacro acceleration
In the following experiments, we run the micromacro acceleration method on the bimodal system (23), and extrapolate the first and second moment of , with particles. Both moments are required, since we need to approximate the variance of the slow component as closely as possible. We consider values for the timescale parameter: and . In the former case, we expect a large efficiency gain due to a high timescale separation. In the latter case, the timescale separation is only moderate and the expected efficiency gain is rather small. The numerical results are summarized in Figure 12. On the one hand, for small , micromacro acceleration can take extrapolation steps that are a hundred times larger than the microscopic step size, while only creating a small transient error. On the other hand, when is large, micromacro acceleration is able to eliminate the error of the approximate macroscopic model and reach the same steady state value for the variance as the microscopic integrator. At the same time, micromacro acceleration also gains in efficiency compared to the microscopic integrator by taking larger time steps. There, however, remains a transient error on the variance of the slow mode, that enlarges as increases. There is currently no error control available for micromacro acceleration and we leave this topic for further research.
To conclude, with micromacro acceleration, we can almost bridge the timescale separation gap by taking extrapolation steps on the order of the small dynamics then is small, and when is large, we can eliminate the steadystate error of the approximate macroscopic model with time steps that are slightly larger than those of the microscopic time integrator. Thus, on this doublewell system, we gain in efficiency for all values of .
5.2 Accuracy analysis of extrapolation on a periodically driven linear system
For the final numerical example of this manuscript, we consider a slowfast linear system where we add a periodic external force to evolution equation of the the slow component [li2008effectiveness],
(25) 
Hence, the mean of has a periodic invariant solution. The periodic external force makes it easier to compare the numerical solution of different methods to the exact solution. The errors between different models are present in every period of the invariant solution and do not damp out to zero as time increases. We can thus compute errors of different methods by simply computing norm of their difference with the exact solution, over one period of the solution.
We can also apply the averaging strategy from the previous example (23) to the periodic slowfast model (25). The invariant distribution of , given a value for is Gaussian and reads
with the normalization constant. The mean of , computed against its invariant distribution, for a given value of , thus reads
Therefore, the approximate macroscopic model for the slow component is
(26) 
Let us first compare the numerical solution of the approximate macroscopic model (26) to the exact microscopic dynamics (25), for different values of . For all the numerics in this Section, the initial condition is taken on the invariant curve of the exact continuous solution, the microscopic time step is and we use Monte Carlo particles. Figure 13 depicts the evolution of the mean of for on the left, and on the right. The numerical results show that the approximate macroscopic model (in blue) makes an error, compared to the exact microscopic model (in orange). The error is mostly visible in the amplitude of the periodic solution, and the discrepancy decreases when decreases to zero. We show in Appendix B that the error between the microscopic and approximate macroscopic model decreases linearly with , as is also visible on Figure 14.
5.2.1 Tuning the error of micromacro acceleration
Let us now investigate the accuracy of micromacro acceleration for the periodically driven linear system (25). For every value of , we want to examine the maximal extrapolation step micromacro acceleration can take, while keeping the error smaller than the corresponding error of the approximate macroscopic model.
To determine the maximal extrapolation step, we first need to know how the micromacro acceleration error decreases as decreases to . As of now, there exist no theoretical result on the convergence rate of micromacro acceleration with relative entropy matching [lelievre2018analysis], so we can only investigate the rate of convergence numerically. In Figure 15, we show the error of micromacro acceleration as a function of the extrapolation time step , for the same values of as in Figure 13. For large values of , the micromacro acceleration error decreases linearly with as is shown on the left plot in Figure 15. We expect firstorder convergence since the linear extrapolation of macroscopic state variables mimics the forward Euler method. However, for small , we observe that the error decreases quadratically with .
Having the convergence order of both the approximate macroscopic model and micromacro acceleration for small in place, we will derive an expression for the maximal extrapolation step we can take, before the approximate macroscopic model becomes more accurate than micromacro acceleration. Using the linear convergence of the approximate macroscopic model, and the quadratic convergence of micromacro acceleration, both methods attain the same accuracy when
(27) 
with the maximal extrapolation step possible before the approximate macroscopic model is more accurate than micromacro acceleration. We define as the constant of proportionality for the approximate macroscopic model and the function as the proportionality constant for micromacro acceleration, which can in principle depend on . For a more natural comparison of the maximal extrapolation step for different values of , we define the dimensionless ‘extrapolation factor’ as . Furthermore, due to the stability requirement for the inner integrator, we choose the inner microscopic step proportional to , . Putting all the parameters together, equation (27) yields
(28) 
such that a direct expression for follows. Unfortunately, we lack a direct expression for , so we cannot use (28) directly. We must resort to numerical experiments to deduce the maximal extrapolation step and the maximal extrapolation factor .
In Figure 16, we plot in solid lines the micromacro acceleration error for multiple values of as a function of the extrapolation factor . The error of the approximate macroscopic model is also given by the dashed lines, and the crosses indicate where both errors are equal. Denote the associated maximal extrapolation factor by . Whenever , the micromacro acceleration method is more accurate than the approximate macroscopic model for that value of , and vice versa when . The numerical results indicate that the maximal extrapolation factor increases as decreases.
An important consequence of the the fact that increases, is that the maximal extrapolation step decreases slower than linearly with , since the microscopic inner step decreases linearly with . Indeed, Figure 17 depicts that decreases approximately as . As a result, micromacro acceleration can be more accurate than the approximate macroscopic model if we choose . We can also tune the micromacro acceleration error. Given a tolerance, or maximal value for the error, we can choose the extrapolation step such the error is beneath the tolerance, while still taking larger time steps than the microscopic time integrator. We conclude that micromacro acceleration gains in efficiency compared to the approximate macroscopic model by lowering the error, but also compared to the inner microscopic time integrator by taking larger time steps. The gain compared to the microscopic time integrator even becomes infinite when decreases to zero.
6 Conclusion
We studied the efficiency of a micromacro acceleration procedure. We introduced the micromacro acceleration algorithm in mathematical detail in Section 2 and discussed some implementation issues in Section 3.
We investigated the effect of the number and nature of the macroscopic state variables in detail in Section 4. The state variables have a big impact on the efficiency of micromacro acceleration. Increasing the number of state variables yields a higher accuracy of the matched distributions but increases the cost of matching.
The numerical results of Section 4 show that a hierarchy of macroscopic state variables that is closely connected to some quantity of interest, gives accurate simulation results for that quantity. In case of FENEdumbbells, the hierarchy of state variables that includes terms in the Itô expansion of the stress tensor, gives better approximations to the stress tensor than other hierarchies. In fact, this hierarchy of states requires only four macroscopic state variables, while the others do not approximate the stress tensor, even with five macroscopic state variables.
A similar conclusion holds for the threeatom molecule, where we worked with two reaction coordinates: the angle and the distance between the two outer atoms. Micromacro acceleration with extrapolating the mean of the two reaction coordinates, follows the exact microscopic dynamics very well, even for quite large extrapolation step sizes. Moreover, the second reaction coordinates eliminates the steadystate error made by the approximate model based on the effective dynamics for that reaction coordinate. The reaction coordinate does not necessarily need to be slow, as we have seen with the second reaction coordinate, it just needs to be linked to the quantity of interest, the angle in this example. A hierarchy based only on moments also produced an unbiased results but the statistical error is larger. Thus, micromacro acceleration achieves the same or a better accuracy than the effective dynamics, while allowing for larger time steps than the stiff microscopic solver, gaining efficiency compared to both models.
Besides the choice of macroscopic state variables, the extrapolation step size greatly influences the accuracy of micromacro acceleration, as we studied in Section 5. For the slowfast bimodal model, the approximate macroscopic model makes a modeling error on the variance of the slow component, especially visible when there is almost no timescale separation. Micromacro acceleration attains the correct steadystate value for different extrapolation step sizes, at the expense of a controllable transient error. We thus gain in accuracy compared to the approximate macroscopic model while gaining in efficiency against the microscopic integrator by taking larger time steps. For the slowfast periodic model, we proved numerically that the efficiency gain by micromacro acceleration becomes infinite as the timescale separation increases to infinity. The maximal extrapolation step micromacro acceleration can take before the approximate macroscopic model becomes more accurate decreases approximately as , while the microscopic time step must decrease linearly due to stability. Furthermore, we can also tune the extrapolation step such that the error stays below a given tolerance, while still taking larger time steps than the microscopic time integrator. The efficiency gain is thus also present in the periodic linear system.
References
Appendix A Derivation of the macroscopic state variables for FENEdumbbells
In this Section, we give a short derivation of the macroscopic state variables of the third hierarchy of the FENEdumbbells process (17). We start by writing out the evolution equation of the first even moment and consequently add all terms that appear in the evolution equation. For all these new terms we also write the evolution equations and select extra terms and so forth.
First note that for any twicedifferentiable function , the evolution of the expectation is given by Itô’s law
(29) 
Starting from the first even moment of the diffusion process , we find that
Since the state variable appears in its evolution equation, we do not need to add it to the list of macroscopic state variables. We also do not need to add the constant term, so we add as the second macroscopic state variable for the FENEprocess.
Let us now consider the second function . The first and second derivative are
Using equation (29), the evolution equation of hence becomes
To complete our choice of four macroscopic state variables, we choose the first two moments popping up in the evolution equation of : and .
Appendix B Derivation of first order convergence of approximate model (26)
In this Appendix, we show that the error between the slow means of a general system of linear SDEs with an external period driving force, and the corresponding approximate macroscopic model for the slow component, decreases linearly to zero when the smallscale parameter decreases to zero. For the derivation, we consider a more general form of the system in Section 5.2
where the associated approximate macroscopic model is given by
One can show using elementary techniques from calculus that the mean vector propagates as
(30) 
where and is the initial condition to the equation. The constants and are the solution of the linear system
Since we are only interested in the evolution of , we only need to know the explicit expressions for and . Using the symbolic engine MUPAD [fuchssteiner1996mupad], the constants and are