The Derivation and Approximation of Coarse-grained Dynamics from Langevin Dynamics

# The Derivation and Approximation of Coarse-grained Dynamics from Langevin Dynamics

Lina Ma Department of Mathematics, the Pennsylvania State University, University Park, PA 16802-6400, USA.    Xiantao Li Department of Mathematics, the Pennsylvania State University, University Park, PA 16802-6400, USA.    Chun Liu Department of Mathematics, the Pennsylvania State University, University Park, PA 16802-6400, USA.
July 18, 2019
###### Abstract

We present a derivation of a coarse-grained description, in the form of a generalized Langevin equation, from the Langevin dynamics model that describes the dynamics of bio-molecules. The focus is placed on the form of the memory kernel function, the colored noise, and the second fluctuation-dissipation theorem that connects them. Also presented is a hierarchy of approximations for the memory and random noise terms, using rational approximations in the Laplace domain. These approximations offer increasing accuracy. More importantly, they eliminate the need to evaluate the integral associated with the memory term at each time step. Direct sampling of the colored noise can also be avoided within this framework. Therefore, the numerical implementation of the generalized Langevin equation is much more efficient.

## I Introduction

One of the most outstanding problems in molecular modeling of bio-molecular systems is the construction of coarse-grained (CG) models, in which only a few degrees of freedom are explicitly retained. The importance of such development and other various perspectives have been discussed in many review papers and books Leach01 (); espanol2004statistical (); riniker_developing_2012 (); noid_perspective:_2013 (); voth2008coarse (). These efforts have been driven by the fact that direct simulations using an all-atom model are restricted by the small time steps, typically femto-seconds, while the time scale of interest is at least microseconds. Therefore, a CG model that allows one to efficiently explore events occurring on long time scales is critical to the understanding of molecular conformations and ultimately, biological functions. The last two decades have witnessed a great deal of progress toward this goal noid_perspective:_2013 (); riniker_developing_2012 (); espanol2004statistical (). While many existing models have demonstrated their capability to recover (or predict) equilibrium properties, a systematic framework to incorporate dynamic properties is still challenging. In particular, as has been observed in the development of MARTINI monticelli2008martini (), a remarkably successful coarse-grained force field, the effective friction mechanism is missing in such coarse-graining procedure. This observation has also been a strong motivation for the current work.

A very important theoretical development in coarse-graining molecular models is the projection approach, originally formulated by Mori and Zwanzig Mori1965b (); Zwanzig73 (). This approach is directly based on the dynamics of the full system, rather than the equilibrium statistical properties. Such formalism (or similar reduction procedure) has recently been widely used to derive CG models based on the deterministic Newton’s equations of motion curtarolo_dynamics_2002 (); IzVo06 (); kauzlaric2011three (); kauzlaric2011bottom (); lange2006collective (); Li2009c (); oliva2000generalized (); stepanova_dynamics_2007 (); LiXian2014 (); li2015incorporation (), known as molecular dynamics (MD) models. The typical result is a generalized Langevin equation (GLE), with a memory (or frictional) kernel implicitly incorporating the influences of the degrees of freedom that have been projected out. See the early works tully1980dynamics (); guardia1985generalized (); ciccotti1981derivation (); AdDo76 () for various derivations for interacting particle systems. Recently, there has been increasing interest in modeling complex dynamical systems using GLEs Li14 (); kauzlaric2011bottom (); IzVo06 (); LiE07 (); fricks2009time (); Darve_PNAS_2009 (); curtarolo_dynamics_2002 (); ChSt05 (). The GLE is also driven by a stochastic force, which can be attributed to the uncertainty from the initial condition ChHaKu02 (). Various schemes have been proposed to compute the memory and the random noise termsberkowitz1983generalized (); berkowitz1981memory (); hijon2006markovian (); oliva2000generalized (); LiXian2014 (); li2015incorporation (), but both of them are highly nontrivial due to the nonlocality of the kernel function in time. But in general, the practical issue of predicting correct dynamics properties still remains.

In contrast to the large body of works for deterministic models, coarse-graining a stochastic system still remains a challenge. As an initial attempt to treat a full molecular model that is stochastic in nature, we start with Langevin dynamics, which arises naturally from a molecular system in solvent Schlick2002 (). The influence of solvent is not explicitly included. Rather it is modeled by a damping term and a white noise. In this sense, the Langevin dynamics model is already a coarse-grained model since the solvent particles have been removed. However, simulating the dynamics of a macromolecule using such a model is still challenging due to the number of atoms involved, and the large intrinsic vibration frequencies, which requires small time steps. Motivated by these facts, we consider a further reduction, aiming to derive a model with even fewer variables. These CG variables could represent averaged atomic degrees of freedom, such as the center of mass of a cluster of atoms, or torsion angles. Typically, the time scale will be significantly improved, since the fast components that require small scale simulations are projected out.

In coarse-graining the Langevin dynamics model, treating the stochastic random term usually requires special considerations, compared to its deterministic counterpart. For instance, the Mori-Zwanzig formalism Mori1965b (); Zwanzig73 () is not directly applicable to stochastic differential equations (SDE), since the semi-group operator is not available. We therefore suggest an appropriate linearization, and then partition the full dynamics into subspaces. The variables associated with the subspace orthogonal to the CG variables are then eliminated by direct substitution.

The CG model that we have derived is a slight generalization of the GLE, with an additional Markovian damping term. Further, we prove the fluctuation-dissipation theorem (FDT) Kubo66 () for this CG model, and the theorem takes a combined form of the first and second FDT (see Kubo66 () for the distinction between these two FDTs). To our knowledge, such models have not been reported in the literature. In particular, we show that the memory kernel function depends on the damping coefficient in the full Langevin dynamics. Establishing such direct connection is important for understanding the friction mechanism in the CG dynamics.

Although the new GLE model properly incorporates the influence of the degrees of freedom that have been removed, the numerical implementation faces several challenges, as has been noted in many previous works berkowitz1983generalized (); berkowitz1981memory (); hijon2006markovian (); oliva2000generalized (). In particular, a direct solution procedure would involve the computation of a matrix function at every step, and the dimension of the matrix is almost the same as the dimension of the full system. To alleviate the computational burden, we suggest an alternative computational approach, in which the kernel function is approximated by a rational function in the Laplace space. The goal is to find an efficient approximation so that only a few parameters need to be calculated a priori. In this paper, we make use of the explicit formula for the memory kernel to extract the numerical parameters in the approximate models. This has several practical advantages. First, the approximate model in the time domain can be written as an extended system of SDEs, which are memory-less. As a result, no datum needs to be stored and no integral needs to be evaluated at each time step. This significantly reduces the computational cost since numerical quadrature for the memory term is not needed. Secondly, the random noise in the GLE can be approximated indirectly by introducing white Gaussian noises in the extended system. Therefore, there is no need to sample the random noise in the GLE, which otherwise requires non-trivial effort, e.g., Fourier-transform over long time period, or singular value decomposition of the covariance matrix li2015incorporation (). We will provide the explicit forms of these approximations and illustrate how to determine the covariance of the noise to exactly satisfy the FDT.

The rational approximation is a novel, and yet quite flexible approach to model the memory effects by embedding a nonlocal model within a local one. In principle, there are various ways to determine the coefficients in the rational function. In this paper, we will test an idea of using the limiting values, both at zero and infinity, as interpolation points. But it is clear that this interpolation scheme may not be optimal: One may introduce other fitting procedures to obtain better accuracy. We leave this issue to future works.

In general, evaluating the accuracy of a CG model with both mean force and damping coefficients is subtle, partly because the mean force in the GLE needs to be parameterized and calibrated a priori. The error from that effort and the error from the rational approximation of the kernel function is difficult to separate in the present approach. Therefore, we consider a simple case in which the full model is linearized instead of a more complicated function form, with coefficients computed from a principal component analysis (PCA). This ensures that the covariance of the atomic coordinates and momenta are exactly captured. Starting with the harmonic model as the exact full model, we are able to compute the memory function explicitly. One implication is that the mean force is linear, which perhaps is the simplest and and most efficient coarse-grained force field, and the model can be viewed as an elastic network model (ENM)delarue2002simplified (); atilgan2001anisotropy (). This model preserves the correct vibration models, and provides an ideal test problem for error assessment. Further, the velocity auto-correlation can be computed analytically as well, allowing us to examine the accuracy without numerical and sampling errors.

Perhaps the closest work to the present approach is the normal mode partition method for Langevin dynamics sweet2008normal (), in which the Langevin dynamics is projected to a subspace and its orthogonal complement space. Various truncation steps are taken to simplify the model. The method in sweet2008normal () is at the level of numerical algorithms. What is presented in this paper also starts with such subspace partitions. However, instead of introducing a numerical algorithm at discrete time steps, we derive a CG model, and then introduce a systematic approximation procedure afterwards.

## Ii Mathematical derivation

### ii.1 The full model

 {˙x=v,M˙v=F(x)−Γv+f(t), (1)

where denotes the coordinates of all the atoms, is a diagonal matrix containing the mass of each atom, is the force from an empirical potentials , denotes the damping coefficient for the friction term, and is a stochastic force, usually modeled by a Gaussian white noise, which satisfies the fluctuation-dissipation theorem (FDT),

 ⟨f(t),f(t′)T⟩=2kBTM−1Γδ(t−t′). (2)

The FDT is crucial to ensure that the system reaches the correct equilibrium state.

By introducing the following scaling,

 ~x=M12x,~F=M−12F,~Γ=M−12ΓM−12,~f=M−12f,

we can remove the mass matrix from the system, and work with a normalized system with unit mass for each atom. Rewriting every term without the tilde, the new system is expressed in the following form,

 {˙x=v,˙v=F(x)−Γv+f(t). (3)

One can easily show that the new random noise still obeys the FDT, i.e.,

 ⟨f(t),f(t′)T⟩=2kBTΓδ(t−t′). (4)

### ii.2 A subspace partition

There are many existing methods to implement the Langevin dynamics model (3) numerically. However solving this system in its full form at every time step can be expensive, due to the large number of atoms, and the small time steps determined by the stability condition of the numerical methods. In contrast, coarse-grained (CG) models that involve much fewer variables are more attractive. This will be the primary focus of this paper. To begin with, we let and be two orthogonal subspaces, generated by basis functions and respectively. Namely,

 Y=Range(Φ),Y⊥=Range(Ψ). (5)

We assume the dimension of to be , where is much smaller than . The matrix has dimension , and it will span the space , the subspace generated by the CG variables. To ensure orthogonality, we choose with dimension such that the following identities holds,

 ΦTΨ=0,ΦTΦ=Im×m,ΨTΨ=I(3N−m)×(3N−m).

We will project the Langevin equations into and . For this purpose, we decompose the solution in the following form,

 x=Φq+Ψξ, (6)

where and are nodal values associated with the basis vectors in and . There are many choices for the matrix . One choice can be the eigenvectors corresponding to low vibrational modes, as in the work of normal mode partition sweet2008normal (), or the rotation and translation blocks (RTB) that describe the rigid-body motions. Here we will first focus on the general framework and postpone the specific choices to later discussions.

Our next step is to rewrite the first-order system (3) into a second-order equation, together with the decomposition (6),

 Φ¨q+Ψ¨ξ=F(Φq+Ψξ)−ΓΦ˙q−ΓΨ˙ξ+f(t).

Our goal is to eliminate , which represents the degrees of freedom associated with .

To proceed, by left multiplying both sides by , we turn the equation above into,

 ¨q=ΦTF(Φq+Ψξ)−ΦTΓΦ˙q−ΦTΓΨ˙ξ+ΦTf(t). (7)

Notice disappeared from the left hand side thanks to the orthogonality condition. Similarly, we left multiply both sides by , and arrive at a second order differential equation for ,

 ¨ξ=ΨTF(Φq+Ψξ)−ΨTΓΦ˙q−ΨTΓΨ˙ξ+ΨTf(t). (8)

The nodal values and are still coupled together for now. To eliminate the variable , we solve equation (8) analytically, together with a subsequent substitution into (7). This is clearly intractable due to the nonlinearity of in (8). Therefore, we simplify the derivation by using a linearization . Since the matrix can be related to the covariance of the atomic coordinates, it can be determined from the principal component analysis (PCA). Namely we have

 ⟨x(t),x(t)T⟩=kBTA−1,

which can be directly computed from a molecular simulation. Such linear approximation is a necessary route in many coarse-graining procedures IzVo06 (); Li2009c (); oliva2000generalized (); stepanova_dynamics_2007 (). In principle, higher order expansions can be introduced, e.g., ChSt05 (); WuLiLi2015 (), but the derivation is exceedingly complicated.

Recall that the basis functions are normalized, i.e., and . We further define the following terms,

 A11=ΦTAΦ,A12=ΦTAΨ,Γ11=ΦTΓΦ,Γ12=ΦTΓΨ,f1=ΦTf(t),
 A21=ΨTAΦ,A22=ΨTAΨ,Γ21=ΨTΓΦ,Γ22=ΨTΓΨ,f2=ΨTf(t).

Since it is usually easier to work with first order systems, we now convert the higher order equations back to a coupled first order system, with representing the momentum of , and being the momentum of . Notice that we are dealing with unit mass system, the momentum also represents velocity. The first order system reads

 ⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩˙q=p,˙p=ΦTF(Φq)−A12ξ−Γ11p−Γ12η+f1(t),˙ξ=η,˙η=−A21q−A22ξ−Γ21p−Γ22η+f2(t). (9)

So far, we have not done any dimension reduction yet, and these equations are equivalent to the original dynamics within the linear approximation. In addition, the random forces and are projections of the original white noise. Since is directly influencing the CG variable , it will be retained in the CG model. On the other hand, the influence of on the CG variables will be revealed by the coarse-graining procedure.

### ii.3 The reduction of the number of degrees of freedom

We take as the quantities of interest, i.e., the CG variables, solve explicitly, substitute them back to the equations for (), and thus eliminate (,) in the system (9). Detailed computations are shown in Appendix B.1 due to the lengthy calculations. The CG equations for are then given by,

 ⎧⎪⎨⎪⎩˙q=p,˙p=Feff(q)−Γ11p−∫t0θ(t−τ)p(τ)dτ+ˆf. (10)

Here is an effective force field for the CG variables.

We refer to as the memory term, and as the memory kernel function, which is expressed in terms of a matrix exponential golub2012matrix (),

 θ(t)=[A12,Γ12]e−Gt[A−12200−I][A21Γ21], (11)

where the matrix is defined as,

 G=[0−IA22Γ22]. (12)

What complicates the derivation from (9) is the presence of the stochastic noise in the last equation. With lengthy calculations, we have shown that is a combined Gaussian random noise. It is a stationary Gaussian random process with mean zero, satisfying the second fluctuation-dissipation theorem:

 ⟨ˆf(t)ˆf(t′)T⟩=2kBTΓ11δ(t−t′)+kBTθ(t−t′). (13)

Interestingly, this takes a combined from of the first and second FDT. The proof of this FDT is provided in the Appendix B.2. Stationary Gaussian processes with mean zero are uniquely determined by their time correlation functions Doob44 (). Therefore, the CG model (10) is closed once the memory kernel is known.

## Iii Further Approximation of the Generalized Langevin Equations

Solving the GLEs (10) directly is clearly not practical: On one hand, one needs to keep the history of the solution in order to compute the memory term; on the other hand, evaluating the memory function at each time step is very expensive due to the large dimensionality of the matrix (the size is ). For example, direct evaluation of involves the computation of the matrix exponential , which in general is quite expensive golub2012matrix (). It is thus natural to develop algorithms to approximate the memory term to make the CG model (10) easier to implement, and become truly useful in practice.

In order to approximate the memory term, we propose a general approximation method, which will address these issues under the same framework. Rather than targeting the time-domain values of the memory kernel directly, we will work with its Laplace transform. The coefficients in our approximation, which only need to be computed once, can be determined by fitting or interpolation a priori. As an example, we first present an interpolation procedure similar to the standard Hermite interpolation in numerical analysis. This interpolation requires the following terms: etc, which can all be computed from the explicit expression of (11). In particular, we define the moments,

 M0=θ(0),M1=θ′(0),M2=12!θ′′(0),⋯,Mℓ=1ℓ!θ(ℓ)(0),⋯,M∞=∫∞0θ(t)dt. (14)

The first approximation is to replace the memory function by a delta function, i.e.,

 θ(t)≈M∞δ(t), (15)

which leads to the approximation of memory term,

 z≈M∞p(t). (16)

Clearly this results in an added damping to the dynamics. Therefore, we will define . This simple selection ensures that is preserved, mimicking a Green-Kubo type of formula. It predicts the correct long-time behavior of the dynamics. The resulting model is still a Langevin dynamics model, which will be referred to as the zeroth-order approximation. At the same time, we still need to ensure that the second FDT (13) is satisfied here. Therefore we need to add an appropriate noise such that equation (13) holds. More specifically, we have,

Here is a white noise, satisfying the FDT,

 (18)

Since the zeroth order approximation is in the same form as the Langevin dynamics (3), the implementation is straightforward. Many methods are available helfand1979numerical (); leimkuhler2011comparing (); skeel1999integration (); van1982algorithms (); wang2003analysis (). One only needs to change a few parameters in the numerical scheme and work with a much smaller number of variables.

In light of the second FDT (13), we observe that is analogous to the correlation time, and therefore represents important time scales. Known as the Markovian approximation, the approximation by has been used in many other works kauzlaric2011bottom (); hijon2006markovian (); hijon2010mori (), and as observed in many numerical tests, the approximation is only satisfactory where there is significant time scale separation. But in general it is inadequate. Next we will present higher order approximations.

Our general approximation scheme is based on the Laplace transform of , defined as,

 Θ(λ)=∫+∞0θ(t)e−t/λdt. (19)

Notice that we have chosen to work with the variable (which has unit of time), instead of the usual choice (). As an example, we approximate the Laplace transform of by a rational function,

 Θ(λ)≈R1,1(λ),R1,1(λ)% def=[I−λB]−1Cλ, (20)

and the matrices and are to be determined. More specifically, we enforce the following two conditions:

 Θ′(0)= R′1,1(0), (21) Θ(+∞)= R1,1(+∞).

Direct calculations yield,

 Θ′(0)=M0,Θ(+∞)=M∞. (22)

After solving the equations (21), we find that,

 C= M0, (23) B= −M0M−1∞.

With this rational approximation, the memory term satisfies an additional equation,

 ˙z=Bz+Cp+ζ, (24)

and is an added white noise, which will facilitate the approximation of the colored noise in the GLE (10). This is motivated by the fact that the Gaussian noise in the GLE is correlated in time. We construct a colored Gaussian noise through a first order stochastic differential equation. The resulting stochastic force is an Ornstein-Uhlenbeck process. Such an approximation scheme is known as Markovian embedding bao2005non (). It effectively eliminates the need to sample the colored noise directly.

This amounts to an approximate model, which will be referred to as the first-order approximation, given by,

 (25)

It is not yet clear how the memory kernel and the random noise are approximated in the time domain, and more importantly, whether they still satisfy the FDT (13). To demonstrate that this procedure indeed leads to a consistent approximation of the memory term and the noise , we formulated the following theorem. In particular, we provide a simple formula for the covariance of the additive noise

###### Theorem 1.

Assume that is a Gaussian random variable with mean zero and covariance given by (23). Further, assume that the noise has covariance given by,

 Σ=−2kBTCB. (26)

Then the first-order model (25) is equivalent to an approximation of the GLE, in which the memory function is approximated by,

 θ(t)≈eBtC, (27)

and the second FDT (13) is exactly preserved.

###### Proof.

This demonstrates how the memory function and random noises in the GLEs are consistently approximated in this approach. To show the equation (26), we let the covariance of be , and the covariance of be , which is to be determined. It is clear that we can write the solution of the last equation of system (25) as follows using the variation of constant formula,

 z(t)=eBtz(0)+∫t0eB(t−τ)ζ(τ)dτ+∫t0eB(t−τ)Cp(τ)dτ. (28)

Therefore, the memory term is approximated by the third term with the kernel function approximated by (27).

Meanwhile, the first two terms make a stationary Gaussian process, denoted by if the following Lyapunov equation risken1984fokker () holds,

 kBT[CBT+BC]=−Σ. (29)

Observe that and are determine from (23). In particular, we have Thus a simple substitution leads to (26).

Finally, a substitution of into the second equation in (25) shows that the random process will become an approximation of With direct calculations, we find that,

 ⟨g(t)g(t′)T⟩=kBTe(t−t′)BC,

for any In light of (27), we find that the approximate kernel function and the approximate random noise still satisfy the second FDT (13).

This approach can be easily extended to higher order. For example, we can choose a rational function as follows,

 Θ(λ)≈R2,2(λ),R2,2(λ)% def=[I−λB0−λ2B1]−1[λC0+λ2C1]. (30)

The parameters and will be determined from an interpolation procedure. We will adopt the conventional Padé approximation, and expand both and around Also known as moment matching, the Padé approximation will ensure that the first few coefficients match with those in the rational function. This leads to the following matching conditions, referred to as moment equations,

 Θ′(0)= R′2,2(0), (31) Θ′′(0)= R′′2,2(0), Θ′′′(0)= R′′′2,2(0), Θ(+∞)= R2,2(+∞).

This last condition, which is not from the standard Padé approximation, is enforced here to incorporate the limit as

With direct calculations, we have the equations for the coefficients,

 C0= M0, (32) C1+B0C0= M1, B0M1+B1C0= M2, C1= −B1M∞.

Here, the moments s have been defined in (14).

By substituting the first and last equations into the second and third equations, we can simplify the equations into a 2-by-2 block system,

 −B1M∞+B0M0= M1, B0M1+B1M0= M2,

from which the coefficients and can be determined. Then and are immediately available from the first and last equations in (32).

As in the first order approximation, we can also eliminate the memory by introducing auxiliary variables that satisfy additional equations. To see this, we start with the memory term and with the second order rational approximation, we have,

 s2Z−sB0Z−B1Z=sC0P+C1P,

where and are respectively the Laplace transform of and In order to convert this equation to the time domain, we need the initial values for . In particular, we have and by direct differentiations, we have

Next using the fact that the Laplace transform of is given by and the Laplace transform of is given by we can convert this equation to the time domain,

 ¨z−B0˙z−B1z=C0˙p+C1p,

provided that which is exactly the first matching condition in (32).

We can write this second order equation into a first order form, by introducing another variable : . They satisfy the following differential equations,

 (33)

Again, we have added a white noise and to each additional equation, which will lead to an approximation of the colored noise in the exact CG model (10).

We would like to point out that approximating the memory kernel using exponential functions has been used in baczewski2013numerical (), where the memory function is approximated by a sum of exponential functions for the case when the dimension of is 1. Known as Prony sum, such a method is very useful in approximating convolutional integrals ou2014reconstruction (); jiang2004fast (); arnold2003discrete (). On the other hand, our ansatz is more general, and it is suitable for matrix-valued kernel functions.

The corresponding approximation will be referred to as the second-order approximation. In the Appendix D, we have shown how to choose the initial conditions for and , along with the covariance for and , so that the approximation of the memory and random noise terms are consistent, in the same spirit as Theorem 1. The result can be summarized as the second theorem,

###### Theorem 2.

Assume that and are Gaussian random variables with mean zero and appropriate covariance. Then the second order model (33) is equivalent to an approximation of the GLE (10), in which the approximations of the memory kernel and the random noise are consistent in the sense that the second FDT (13) is exactly preserved.

The proof of this theorem is provided in the Appendix D.

From the first and second order approximations, one can already see the advantages of the rational approximation in terms of the Laplace transform. On one hand, the memory kernel in the original GLE does not need to be computed at every step. The memory effect, however, is not neglected. Rather, it is incorporated via an extended system. Clearly, solving a few additional linear differential equations is much more efficient than computing an integral at every time step. For example, a direct solution method would involve computing the memory term at every step. At the th step, this would require matrix-vector multiplications to collect terms from all previous time steps. If the total number of time steps is , then the number of such operations would be about . In contrast, the implementation of the model (33) would only require about such operations in total. Of course, in a direct method, computing the memory function at each step also adds to the computational cost. On the other hand, the random noise is approximated by a colored noise, generated from the same extended system by just adding a white noise to each additional equation. This way, we avoid the problem of sampling the correlated noise , which in practice, can be highly nontrivial.

Finally, we present the third-order approximation, i.e.,

 Θ(λ)≈R3,3(λ),R3,3(λ)% def=[I−λB0−λ2B1−λ3B2]−1[λC0+λ2C1+λ3C2].

Similarly to the second order approximation, we only need to match the limiting values as and . More specifically, we write the rational function as,

 R3,3∼λM0+λ2M1+λ3M2+λ4M3+⋯λ5M4+⋯, (34)

and we enforce the first five moments to match those of the exact kernel function. As a result, one can proceed as follows,

 λC0+λ2C1+λ3C2∼[I−λB0−λ2B1−λ3B2][λM0+λ2M1+⋯λ5M4+⋯]. (35)

Matching the first five moments, one arrives at,

 C0=M0,C1+B0C0=M1,B1M0+C2=M2,B0M2+B1M1+B2M0=M3,B0M3+B1M2+B2M1=M4,C2=−B2M∞. (36)

Again the last equation comes from matching the moment

By directly substituting the first and last equations into the third equation, one obtains a complete set of linear equations for , and (equations 3-5 in (36)). Then, the remaining coefficients can be determined directly from the remaining three equations. This procedure for solving the coefficients s seems to be general.

We can continue to approximations of arbitrary order. The matching procedure involves the values of which are provided here,

 M∞=[A12,Γ12]G−1[A−12200−I][A21Γ21]. (37)

and,

 Mℓ=1ℓ!Θℓ(0)=(−1)ℓ[A12,Γ12]Gℓ[A−12200−I][A21Γ21], (38)

for all .

## Iv Numerical Results

To test the effectiveness of the approximate models, several numerical tests have been conducted. As alluded to in the introduction, we linearized the dynamics with the matrix determined from the PCA analysis. As a specific example, we consider the protein Chignolin (PDB id 1uao, see Figure 1), which is a peptide with 10 residues, amino acids bonded together by peptide bonds. Simulations have been run in TINKER ponder2004tinker () at temperature for .4 ns with time step 1fs. The system is set up in solvation, modeled by the generalized Born (GB) model. We then use the data upon equilibrium, and compute the matrix . Two separate runs have been conducted with constant damping coefficients (a) and (b) They model respectively a high friction and a low friction case. The CG variables are defined using the rotational and translational blocks (RTB), which is a useful way to capture the low vibrational modes licu02 (); TaGaMaSa00 (). More specifically, each residue is regarded as a rigid body and represented by six degrees of freedom, including three translational and three rotational modes. For our model system, the full model has dimension (three physical dimension for each particle). The CG variable has dimension , with 6 dimensions for each residual. We comment that the RTB blocks have also been used to derive CG models, e.g., in essiz2006rigid (). But in essiz2006rigid () the memory effect has been ignored.

We choose the velocity auto-correlation as a target dynamic quantity, to test the accuracy of our approximate models. Due to the linearity, the velocity auto-correlation function can be expressed explicitly using matrix exponential. The derivation is given in Appendix E. The correlation function from the full dynamics is regarded as the exact result. For the approximate models, we have also derived the formulas for the auto-correlations expressed in terms of matrix exponential again, as shown in appendix E. All the matrix exponentials are computed in MATLAB using its built-in function expm.

First, we compare the approximate memory functions from the first, second and third order approximations to the exact memory kernel given by (11). Since is a matrix-valued function, we pick out the sixth diagonal entry of the matrix and evaluate it for the time period This corresponds to the last rotational component of the first residue. As shown in Figure 2, our hierarchy of approximations offer increasing accuracy in the approximation of the kernel function in both cases (high friction case and low friction case ). In the high friction case, we can observe improvement as the approximation order gets higher, and the third order approximation is the most satisfactory. In the low friction case, the kernel function is quite oscillatory. In this case, the first order approximation is not acceptable at all. The second and third order approximations show very good agreement, but only up to , and the fourth order model predicts the kernel well in a larger interval, up to The fourth order approximation is included here to show that the approximations still have improving accuracy. This can be attributed to the fact that the moments are related to the derivatives of at and as more moments are incorporated, the accuracy of the approximation can be guaranteed for a longer period of time. The zeroth-order approximation is not shown here since it is a delta function.

Next, in Figure 3, we show a comparison among the velocity auto-correlations for the case , which is the default value in the molecular simulation package TINKER. Interested readers are referred to Appendix E for the details on the computation of the auto-correlation. In this case, all the time correlation functions exhibit exponential decay, indicating that the dynamics is over damped. The correlation is already close to zero around time . In this case, the zeroth-order method gives poor result. But the results from the other three methods are in excellent agreement with the exact result. The second and third order methods have slightly better accuracy.

Following the previous experiment, we repeat the computation with damping coefficient , and the results are shown in Figure 4. In this case, the time correlation functions start to shown oscillatory patterns, indicating that the memory effect is much stronger. In light of the slow decay, we present results for a longer time period compared to the over-damped case. Again, we see that the zeroth order approximation gives poor results, while the first-order method give is slightly better. Meanwhile, the second and third order methods provide significant improvement around . The inset figure shows a close-up view of the resulting correlation functions near

## V Conclusion

This paper presented a derivation of a coarse-grained model from the full Langevin dynamics. The derivation has been focused on the resulting random noise, memory effect, and the fluctuation-dissipation theorem, which is a necessary condition for the coarse-grained model to have the correct equilibrium statistics. Our main finding is a generalization of the generalized Langevin dynamics, together with a combined form of the first and second fluctuation-dissipation theorem.

In the second part of the paper, a systematic approach to approximate the memory term was illustrated. The novel aspect is a rational approximation in the Laplace domain, which in the time domain, corresponds to an extended system with no memory. This significantly reduces the computational cost. Furthermore, it has been shown that the random noise term in the generalized Langevin equation can be approximated indirectly by introducing white noises in the extended system. More importantly, the fluctuation-dissipation theorem still holds at each level of approximations. This is a property that has not been emphasized in other approximation methods, e.g., tuckerman1991stochastic (); lange2006collective (); IzVo06 (); berkowitz1981memory (); fricks2009time (); li2014construction (); Darve_PNAS_2009 ().

The current approach can be extended/improved in several directions. First, a Hermite type of interpolation has been used in the approximation of the Laplace transform of memory function, and the interpolation is done at and It is clear that one can introduce other data points or interpolation methods to enhance the accuracy of the approximation. As a demonstration, we did a simple test simulation (results shown in Figure 5) using the same interpolation points at and but with different order of derivatives involved. In short, for the second order scheme , we determine the four coefficients in the rational function as follows: We matched first and second derivatives at , and zeroth and first derivatives at (or ). For the third order scheme, for the two additional coefficients, we matched the third derivative at and second derivative at . The results are overall more satisfactory than our previous choices, indicating that there is a lot of flexibility in choosing the matching conditions. This approach would be more useful for the cases where the memory effect is much stronger.

Secondly, we have only tested the methods for the case when the mean force is linear, e.g., an elastic network type of model. In this case, explicit forms for all the solutions are available, so that direct comparison can be made without the influence of the numerical error. It would be of great practical importance to test problems with a more realistic potential of mean forces, e.g., the ones obtained from the coarse-grained force field noid_multiscale_2008 (); monticelli2008martini ().

In this paper, we have based our approximation on the moments (14), which can be extracted from the spectra of the molecular structure (the matrix ) and the damping coefficient (). We would like to mention a data-driven approach, which makes use of the time series of the coarse-grain variables, and formulate the problem as an inverse problem. For instance, the Kalman filter technique has been used in fricks2009time (); harlim2015parametric () to estimate the parameters and in the first order model (25), and in lei2016generalized (), the moments are directly linked to the correlations of the CG variables, which in turn determine the coefficients s and s. In all these works, the rational approximation in terms of the Laplace transform has been crucial. Which approach is more appropriate depends on the information available to the practitioners.

Another interesting scenario is when the GLE is used to model subdiffusive behavior. One well-known example is where the kernel function obeys a power law kou2008stochastic (). In this case, we anticipate the current methodology to be useful up to certain time scale. When the long-time sub-diffusive behavior is of interest, the method certainly has to be modified. For example, when the kernel takes the form of , the Laplace transform will exhibits a singularity at the origin. Meanwhile, the current rational approximating function approaches to a finite value, and therefore the form of the rational function has to be modified accordingly in order to take into account the singularity. This would be an interesting line of work for us to pursue further.

Finally, it is possible for the kernel function to depend on the current state of the coarse-grain variables, which means that they have to be continuously updated. These issues are important for the application to protein simulations that involve conformational changes, and they will be considered in separate works.

The derivations presented in this paper, along with the calculations of the velocity correlation functions, involve some important, but lengthy mathematical manipulations. We included the details in the Appendix for interested readers.

###### Acknowledgements.
This research was supported by NSF under grant DMS-1412005, DMS-1216938 and DMS-1619661.

## Appendix A Time correlation for linear Langevin models

For linear Langevin dynamics, the velocity auto-correlation can be computed explicitly. This section illustrates the calculations.

Suppose that we have a linear Langevin dynamics model,

 ¨u=−Au−Γ˙u+W. (A.1)

We may write it into a first order system as follows,

 ˙w=Dw+Σμ(t), (A.2)

in which,

 w=(up),μ=(0W),D=[0I−A−ΓI],Σ=[0002kBTΓI]. (A.3)

For the linear Langevin dynamics, the equilibrium probability density is given by,

 ρ∼e−βH,H=12uTAu+12p2, (A.4)

with and

Therefore, the covariance of the solution is given by,

 Q=kBT[A−100I]. (A.5)

Notice that . This is known as the Lyapunov equation. In particular, when is Gaussian with covariance , is a stationary Gaussian process with time correlation given by,

 ⟨w(t)w(0)T⟩=kBTetDQ. (A.6)

This formula will be used in many of our calculations.

Applying this formula to the full model, we find the time correlation of the coarse-grained momentum ,

 (A.7)

## Appendix B The derivation of the GLE

### b.1 Derivation of the memory kernel

We start with the last two equations in (9). To begin with, we recall the matrix , defined in (12). Notice that the matrix can be factorized as follows,

 G=[0II−Γ22][A2200−I]. (B.1)

or,

 G=[0−IIΓ22][A2200I]. (B.2)

It is also useful to have the inverse of , given by,

 G−1=[A−122Γ22A−122−I0]. (B.3)

Now the last two equations in (9) can be expressed explicitly as,

 [ξ(t)η(t)]=e−Gt[ξ(0)η(0)]−∫t0e−G(t−s)[0A21q(s)+Γ21p(s)]ds+∫t0e−G(t−s)[0f2(s)]ds.

We take part of the memory term, and integrate by parts:

 = [A−122A21q(t)0]−e−Gt[A−122A21q(0)0]−∫t0e−G(t−s)[A−122A21p(s)0]ds

Combining this with the remaining term in the memory integral, we have,

 −∫t0e−G(t−s)[0A21q(s)+Γ21p(s)]ds= ∫t0e−G(t−s)[A−122A21−Γ21]p(s)ds (B.4) + e−Gt[A−122A210]q(0)−[A−122A210]q(t).

In the next step, we will substitute into the first two equations in (9), to eliminate the additional degrees of freedom and derive an effective equation for and .

For clarity, we introduce more notations,

 Feff(q)=ΦTF(Φq)−A12A−122A21q,ˆξ=ξ+A−122A21q, (B.5)

and,

 (B.6)

Collecting terms, we find that,

 ⎧⎪⎨⎪⎩˙q=p,˙p=Feff(q)−Γ11p−∫t0θ(t−s)p(s)ds+ˆf. (B.7)

This is a generalized Langevin equation with an additional damping, in the form of a memory term. The new random force is given by,

 ˆf=f1(t)−[A12,Γ12]∫t0e−G(t−s)[0f2(s)]ds−[A12,Γ12]e−Gt[ˆξ(0)η(0)]. (B.8)

### b.2 The fluctuation-dissipation theorem

Here we look at the random noise term and see how it is related to the damping coefficients. Let the three terms in (B.8) be , and , respectively. One can see directly that,

 ⟨f1(t)f1(t′)T⟩=2kBTΓ11δ(t−t′). (B.9)

For , we have,

 ⟨f3(t)f3(t′)T⟩ (B.10) =kBT[A12,Γ12]e−Gt[A−12200I]e−GTt′[A21Γ21]

We now consider Assume that we have,

 =kBT[A12,Γ12]∫t′0e−G(t−s′)[0002Γ22]e−GT(t′−s′)ds′[A21Γ21]

We notice that

 G[A−12200I]+[A−12200I]GT=[0002Γ22].

Therefore, this integral can be simplified to,

 (B.11) −kBT[A12,Γ12]e−Gt[A−12200I]e−GTt′[A21Γ21]

The second term will be cancelled by . But the first term is slightly different from the memory function In particular, the matrix in the middle has an entry instead of

To complete the derivation, we have to compute the cross terms and . It is straightforward to show that For the other term, we have,

 ⟨ˆf2(t)f1(t′)T⟩= (B.12) =

This term can be combined with the first term in (B.11), and it gives .

This proves the fluctuation-dissipation theorem:

 ⟨ˆf(t)ˆf(t′)T⟩=2kBTΓ1δ(t−t′)+kBTθ(t−t′). (B.13)

A natural extension of the current framework is to Langevin dynamics models, in which the damping coefficient is depends on the position of the particles. For instance, in the dissipative particle dynamics (DPD) models espanol1995statistical (); hoogerbrugge1992simulating (), they are expressed as functions of the inter-particle distances. An immediate observation is that depends on the current time, the stochastic model will have variable coefficients. In this case, we define the matrix as in (B.1), but write it as to show the time-dependence. To facilitate the derivation, we introduce the fundamental matrix, defined by the ODEs

 ∂∂tY(t,s)=−G(t)Y(t,s),Y(s,s)=I. (B.14)

It also satisfies the equation,

With the fundamental matrix, we can write the solution of the last two equations in (9) as follows,

 [ξ(t)η(t)]= Y(t,0)[ξ(0)η(0)]+∫t0Y(t,s)[0σw(s)]ds (B.15) +∫t0Y(t,s)[0−A21q(s)−Γ21(s)p(s)]ds.

Here, to demonstrate the ideas more easily, we have omitted the pair-wise form of the damping coefficients in DPD and simply wrote it in a matrix form.

The remaining steps are the same as the derivation in the previous section. In particular, the memory term becomes,

 −∫θ(t,t′)p(t′)dt′,withθ(t,t′)=[A21Γ21(t)]Y(t,t′)[A−122A12−Γ21(t′)]. (B.16)

The random noise is still a Gaussian process, having time correlation,

 ⟨ˆf(t)ˆf(t′)T⟩=2kBTΓ11(t)δ(t−t′)+kBTθ(t,t′). (B.17)

The main observation here is that the noise is no long a stationary process, since the correlation can not be written as a function of and the memory kernel is no longer a convolution.

## Appendix C Properties of the memory kernel

We can show that this matrix is symmetric.

 θ(t)=[A12,Γ12]∑ntnn!Gn[A−12200−I][A21Γ21]. = ∑ntnn![A12,Γ12][0II−Γ2][A2200−I]⋯[0II−Γ2][A2200−I][A−12200−I][A21Γ21].
 θT(t)=∑ntnn![A12,Γ12][0II−Γ2][A2200−I]⋯[0II−Γ2][A21Γ21]=θ(t)

 θ(0)=A12A−122A21−Γ12M−12Γ21. (C.1)

Finally,

 ∫∞0θ(t)dt=[A12,Γ12]G−1[A−12200−I][A21Γ21]. (C.2)

## Appendix D The proof of Theorem 2

Using the form of the rational function (30) and the properties of Laplace transform, we can write down a differential equation for the approximate memory kernel,

 θ′′=B0θ′+B1θ, (D.1)

together with the initial conditions,

 θ(0)=M0,θ′(0)=M1 (D.2)

which are drawn from the interpolation conditions (32).

By defining we can write this in a first order form,

 θ′1=B1θ, (D.3) θ′=B0θ+θ1, θ(0)=C0,θ1(0)=M1−B0C0.

From the second matching conditions (32), we find that

As a result, the approximate memory kernel can be written in a matrix exponential form,

 θ(t)=[0I]etˆB[C0C1],ˆB=[0B1IB0]. (D.4)

We will derive the initial covariance for the second order approximation (33). Consider the linear system as in Appendix (A.2) for . In particular, we have,

 D=⎡⎢⎣−Γ110−IC10B1C0IB0⎤⎥⎦.

Let us choose the initial condition for as Gaussian with mean zero and covariance,

 Q=⎡⎢⎣I000Q