Information-theoretic tools for parametrized coarse-graining of non-equilibrium extended systems

# Information-theoretic tools for parametrized coarse-graining of non-equilibrium extended systems

Markos A. Katsoulakis Department of Mathematics and Statistics, University of Massachusetts, Amherst, MA 01003    Petr Plecháč Department of Mathematical Sciences, University of Delaware, Newark, DE 19716
July 31, 2019
###### Abstract

In this paper we focus on the development of new methods suitable for efficient and reliable coarse-graining of non-equilibrium molecular systems. In this context, we propose error estimation and controlled-fidelity model reduction methods based on Path-Space Information Theory, combined with statistical parametric estimation of rates for non-equilibrium stationary processes. The approach we propose extends the applicability of existing information-based methods for deriving parametrized coarse-grained models to Non-Equilibrium systems with Stationary States (NESS). In the context of coarse-graining it allows for constructing optimal parametrized Markovian coarse-grained dynamics within a parametric family, by minimizing information loss (due to coarse-graining) on the path space. Furthermore, we propose an asymptotically equivalent method–related to maximum likelihood estimators for stochastic processes–where the coarse-graining is obtained by optimizing the information content in path space of the coarse variables, with respect to the projected computational data from a fine-scale simulation. Finally, the associated path-space Fisher Information Matrix can provide confidence intervals for the corresponding parameter estimators. We demonstrate the proposed coarse-graining method in (a) non-equilibrium systems with diffusing interacting particles, driven by out-of-equilibrium boundary conditions, as well as (b) multi-scale diffusions and their well-studied corresponding stochastic averaging limits, comparing them to our proposed methodologies.

coarse-grained dynamics, non-equlibrium stationary states, driven diffusion, relative entropy rate, Fisher information matrix, parametrization, kinetic Monte Carlo, Markov processes, driven diffusion of interacting particles, stochastic averaging of two-scale diffusions
thanks: Submitted to the Journal of Chemical Physics.

## I Introduction

Non-equilibrium systems at transient or steady state regimes are typical in applied science and engineering, and are the result of coupling between different physicochemical mechanisms, driven by external couplings or boundary conditions. Typical examples include reaction-diffusion systems in heteroepitaxial catalytic materials, polymeric flows and separation processes in microporous materials, Stamatakis and Vlachos (2012); Padding and Briels (2011); Smit and Maesen (2008). In this paper we develop reliable model-reduction methods, i.e., having controlled fidelity of approximation, and capable to handle extended, non-equilibrium statistical mechanics models. These coarse-graining methods allow for constructing optimal parametrized Markovian coarse-grained dynamics within a parametric family, by minimizing information loss (due to coarse-graining) on the path space. Model-reduction (or coarse-graining) approaches can be often described in the context of parameter estimation in parametrized statistical models. However, atomistic models of materials lead to high-dimensional probability distributions and/or stochastic processes to which the standard methods of statistical inference and model discrimination are not directly applicable. The emphasis on information theory tools is also partly justified since often we are interested in probability density functions (PDF), typically non-Gaussian, due to the significance of tail events in complex systems. A primary focus of this paper is on systems with Non-Equilibrium Steady States (NESS), i.e., systems in which a steady state is reached but the detailed balance condition is violated and explicit formulas for the stationary distribution, e.g., in the form of a Gibbs distribution, are not available.

Information-theoretic methods for the analysis of stochastic models typically employ entropy-based tools for analyzing and estimating a distance between (probability) measures. In particular, the relative entropy (Kullback-Leibler divergence) of two probability measures and

 R(μ|ν)=∫μ(x)logμ(x)ν(x)dx

allows us to define a pseudo-distance between two measures. A key property of the relative entropy is that with equality if and only if , which allows us to view relative entropy as a “distance” (more precisely a semi-metric) between two probability measures and . Moreover, from an information theory perspective Cover and Thomas (1991), the relative entropy measures loss/change of information. Relative entropy for high-dimensional systems was used as measure of loss of information in coarse-graining Katsoulakis and Trashorras (2006); Katsoulakis et al. (2007); Arnst and Ghanem (2008), and sensitivity analysis for climate modeling problems Majda and Gershgorin (2010).

Using entropy-based analytical tools has proved essential for deriving rigorous results for passage from interacting particle models to mean-field descriptions, Kipnis and Landim (1999). The application of relative entropy methods to the error analysis of coarse-graining of stochastic particle systems have been introduced and studied in Katsoulakis et al. (2007, 2008, 2013); Katsoulakis, Majda, and Vlachos (2003); Katsoulakis and Vlachos (2003). Aside of this rigorous numerical analysis direction, entropy-based computational techniques were also developed and used for constructing approximations of coarse-grained (effective) potentials for models of large biomolecules and polymeric systems (fluids, melts). Optimal parametrization of effective potentials based on minimizing the relative entropy between equilibrium Gibbs states, e.g.,Chaimovich and Shell (2011); Carmichael and Shell (2012); Bilionis and Zabaras (2012), extended previously developed inverse Monte Carlo methods, primarily based on force matching approaches, used in coarse-graining of macromolecules (see, e.g., Tschöp et al. (1998); Müller-Plathe (2002)). In Espanol and Zuniga (2011) an extension to dynamics is proposed in the context of Fokker-Planck equations, by considering the corresponding relative entropy for discrete-time approximations of the transition probabilities. Furthermore, relative entropy was used as means to improve model fidelity in a parametric, multi-model approximation framework of complex dynamical systems, at least when the model’s steady-state distributions are explicitly known, Majda and Gershgorin (2011). Overall, such parametrization techniques are focusing on systems with a known steady state, such as a Gibbs equilibrium distribution. More specifically, computational implementations of optimal parametrization in the inverse Monte Carlo methods is relatively straightforward for equilibrium systems in which the best-fit procedure is applied to an explicitly known equilibrium distribution and where relative entropy is explicitly computable.

On the other hand, this is not the case in non-equilibrium systems, even at a steady state where typically we do not have a Gibbs structure and the steady-state distribution is unknown altogether, setting up one of the primary challenges for this paper. Indeed, here we show that, in non-equilibrium systems the general information theory ideas based on Kullback-Leibler divergence are still applicable but they have to be properly formulated in the context of Non-equilibrium Statistical Mechanics by focusing on the probability distribution of the entire time series, i.e., on the path space of the underlying stochastic processes. We show that, surprisingly, such a path-space relative entropy formulation is: (a) general in the sense that it applies to any Markovian models (e.g. Langevin dynamics, Kinetic Monte Carlo, etc), and (b) is easily computable as an ergodic average in terms of the Relative Entropy Rate  (RER), therefore allowing us to construct optimal parametrized Markovian coarse-grained dynamics for large classes of models. This procedure involves the minimization of information loss in path space, where information is inadvertently lost due to coarse-graining procedure. In fact, the proposed parametrization scheme in Espanol and Zuniga (2011) is mathematically justified by reformulating it on the path space using the Relative Entropy Rate and it is a specific, but reversible (i.e., it has a Gibbs steady state) example of our methodology. Furthermore, we propose an asymptotically equivalent method to RER minimization, which is related to to maximum likelihood estimators for stochastic processes, and where the optimal coarse-graining is obtained by optimizing the information content in path space of the coarse variables, with respect to the projected computational data from a fine-scale simulation. Finally, the path-space Fisher Information Matrix (FIM) derived from Relative Entropy Rate (RER) can provide confidence intervals for the corresponding statistical estimators of the optimal parameter obtained through the minimization problem.

The Relative Entropy Rate (RER) was earlier proposed as a (pseudo) metric in order to evaluate the convergence of adaptive sampling schemes for Markov State Models (MSM)Bowman, Ensign, and Pande (2010); the proposed coarse-graining perspective, which is related to Maximum Likelihood Estimators (MLE), e.g., (31), could provide new insights in comparing MSMs since it only requires fine-scale data–and not explicitly a fine-scale MSM for estimating the RER. Furthermore, RER and in particular the path-space FIM where introduced recently as gradient-free sensitivity analysis tools for complex, non-equilibrium stochastic systems with a wide range of applicability to reaction networks, lattice Kinetic Monte Carlo and Langevin dynamics Pantazis and Katsoulakis (2013). Unlike the coarse-graining setting we propose here and where models are set in different state spaces depending on their granularity, in the latter work the same state space is assumed between compared models.

The paper is structured as follows. In Section II we formulate the path-space information theory tools used in the paper, including the key concept of Relative Entropy Rate. In Section III we present the parameterization method of coarse-grained dynamics and the connections with maximum likelihood estimators and the Fisher Information Matrix. In Section IV we briefly discuss statistical estimators for RER and FIM. Finally, in Section V we demonstrate the proposed coarse-graining method in (a) non-equilibrium systems with diffusing interacting particles, driven by out-of-equilibrium boundary conditions, as well as (b) multi-scale diffusions and their well-studied corresponding stochastic averaging limits, comparing them to our proposed methodologies.

## Ii Relative Entropy Rate, path space information theory and error quantification for non-equilibrium systems

First, we formulate a general entropy-based error analysis for coarse-graining, dimensional reduction and parametrization of high-dimensional Markov processes, simulated by Kinetic Monte Carlo (KMC) and Langevin Dynamics. Typically such systems have Non-Equilibrium Steady States (NESS) for which detailed balance fails as they are irreversible. The stationary distributions are not known explicitly and have to be studied computationally.

### ii.1 Coarse-grained models

We consider a parameterized class of coarse-grained Markov processes , associated with the fine scale stochastic process . The coarse-graining procedure is based on projecting the microscopic space into a coarse space with less degrees of freedom. We denote the coarse space variables

 η=Tσ,whereT:Σ→¯Σ (1)

is a coarse-graining (projection) operator, see also (37) for a specific example. In general, given a fine-scale probability measure defined on the fine-grained configuration space , then we can define the exact (renormalized) coarse-grained probability measure on the coarse space :

 ¯μM(dη)=∫{σ:Tσ=η}μN(dσ). (2)

However, the exact CG probability cannot be explicitly calculated , except in trivial cases. Typically, we consider classes of parametric CG models which approximate the exact CG model, see for instance the quantification of such approximations using relative entropy Katsoulakis et al. (2007).

In the case of continuous time processes such as Kinetic Monte Carlo, the coarse-grained stochastic process is defined in terms of coarse transition rates which captures macroscopic information from the fine scale rates . For example, for stochastic lattice systems, approximate coarse rate functions are explicitly known from coarse graining (CG) techniques of Katsoulakis, Majda, and Vlachos (2003); Katsoulakis and Vlachos (2003); Dai, Seider, and Sinno (2008), see (36). Similarly, when we consider temporally discretized stochastic processes such as Langevin Dynamics, the coarse-grained process is given in terms of transition probabilities which capture macroscopic information from the fine scale transition probabilities .

Interpolated dynamics. Given coarse-grained dynamics we can always construct corresponding microscopic dynamics. For example, given coarse-grained transition probabilities with corresponding stationary distribution , where the latter is typically unknown in non-equilibrium systems, we define the corresponding fine-scale rates

 q(σ,σ′):=U(σ′|Tσ′)¯p(Tσ,Tσ′), (3)

where ( denotes the cardinality) is the uniform conditional distribution over all fine-scale states corresponding to the same coarse-grained state . Clearly (3) is properly normalized, while a more general formulation can be found in Appendix A, (48). In (3) we apply a piece-wise constant interpolation for all microscopic states (reps. ) corresponding to the same coarse state (reps. ) and thus transitions to these states occur with the same probability rates. The reconstruction step (3) is necessary when we want to compare fine and coarse processes on the path space in terms of the relative entropy rate, since both processes need to be defined on the same probability space, see for example (17) below. In this paper, for the sake of simplicity, we assume that all reconstructions are based on (3). The reconstruction is obviously not unique and we refer to Appendix A for the mathematical details, see also Trashorras and Tsagkarogiannis (2010).

### ii.2 Relative Entropy and Error Quantification in Non-equilibrium Systems

Although all systems we consider here are ergodic, i.e., they have a unique steady state distribution, we typically assume they only have a Non-Equilibrium Steady State (NESS) due to a lack of detailed balanceBonetto, Lebowitz, and Rey-Bellet (2000). Inevitably, in such systems the stationary distribution is not known explicitly and can be studied primarily computationally, at least in systems which are not small perturbations from equilibrium. Quantifying and controlling the coarse-graining error in such high-dimensional stochastic systems can be achieved by developing computable and efficient methods for estimating distances of probability measures on the path space. The relative entropy between two path measures and (see (6) for a specific example) for the processes on the interval is

 R(P[0,T]|Q[0,T])=EP[0,T][logdP[0,T]dQ[0,T]], (4)

where is the Radon-Nikodym derivative of with respect to . If these probability measures have probability densities , respectively, (4) becomes . In the setting of coarse-graining or model-reduction the measure is associated with the exact process and with the approximating (coarse-grained) process.

From an information theory perspective, the relative entropy measures the loss of information as we approximate the exact stochastic process with the coarse-grained one . In general the relative entropy (4) in this dynamic setting is not a computable object; we refer for instance to related formulas in the Shannon-MacMillan-Breiman Theorem, Cover and Thomas (1991). However, as we show next, in practically relevant cases of stationary Markov processes we can work with the relative entropy rate

 H(P|Q)=limT→∞1TR(P[0,T]|Q[0,T]), (5)

where and denote the distributions of the corresponding stationary processes.

Relative Entropy Rate for Markov Chains. In order to explain the basic concept we restrict to the case of two Markov chains, , on the countable state space , defined by the transition probability kernels and . A typical example would be the embedded Markov chain used for KMC simulations of a continuous time Markov chain. Similarly, in the case of a continuous state space, a temporal discretization of a Langevin process, leads to a Markov process with the transition kernel defined by the time-discretization scheme of the underlying stochastic dynamics. We assume that the initial states are from the invariant distributions and . The path measure defining the probability of a path is then

 P(σ0,…,σT)=μ(σ0)p(σ0,σ1)…p(σT−1,σT), (6)

and similarly for the measure . The Radon-Nikodym derivative is easily computed

 dPdQ=μ(σ0)∏T−1i=0p(σi,σi+1)ν(σ0)∏T−1i=0q(σi,σi+1).

Using the fact that the processes are stationary with invariant measures and , we obtain an expression for the relative entropy

 R(P|Q)=TEμ[∑σ′∈Σp(σ,σ′)logp(σ,σ′)q(σ,σ′)]+R(μ|ν), (7)

and thus the relative entropy rate is given explicitly as

 H(P|Q)=∑σ∈Σμ(σ)∑σ′∈Σp(σ,σ′)logp(σ,σ′)q(σ,σ′)]. (8)

We will refer from now on to the quantity (8) as the Relative Entropy Rate (RER), which can be thought as the change in information per unit time. Notice that RER has the correct time scaling since it is actually independent of the interval . Furthermore, it has the following key features that make it a crucial observable for simulating and coarse-graining complex dynamics:

(i)

The RER formula (8) provides a computable observable that can be sampled from the steady state in terms of conventional Kinetic Monte Carlo (KMC), bypassing the need for a histogram or an explicit formula for the high-dimensional probabilities involved in (4).

(ii)

In stationary regimes, when in (7), the term becomes unimportant. This is especially convenient since and are typically not known explicitly in non-reversible systems, for instance in reaction-diffusion or driven-diffusion KMC or non-reversible Langevin dynamics.

In view of these features, we readily see that if we consider a Markov chain as an approximation, e.g., a coarse-graining, of the chain , we can estimate the loss of information at long times by computing as an ergodic average. This observation is the starting point of the proposed methodology and relies on the fact that the observable is computable; efficient statistical estimators for (8) are discussed in Section IV. A similar calculation can be carried out for continuous time Markov Chains, as we see next.

Continuous Time Markov Chains and Kinetic Monte Carlo. In models of catalytic reactions or epitaxial growth the systems are often described by continuous time Markov chains (CTMC) that are simulated by KMC algorithms. For example, the microscopic Markov process describes the evolution of molecules on a substrate lattice. Mathematically the continuous time Markov chain is defined completely by specifying the local transition rates where is a vector of the model parameters. The transition rates determine the updates from any current state (configuration) to a (random) new state . In the context of the spatial models considered here, the transition rates take the form , denoting by a lattice site on a -dimensional lattice and , where is the set of all possible configurations that correspond to an update in a neighborhood of the site . From local transition rates one defines the total rate , which is the intensity of the exponential waiting time for a jump from the state . The transition probabilities for the embedded Markov chain are . In other words once the exponential “clock” signals a jump, the system transitions from the state to a new configuration with the probability . In the context of coarse-graining we are led to finding an optimal parametrization for the rates of a processes that approximates the dynamics given by the microscopic process . A similar calculation as in the case of Markov chains gives the analogue of the formula (8)

 H(P|Q)=Eμ[λ(σ)−~λ(σ;θ)−∑σ′c(σ,σ′)logc(σ,σ′)~c(σ,σ′;θ)], (9)

where is the stationary distribution of the microscopic process and denotes the total transition rate. In Kalligiannaki et al. (2012) we used this quantity in order to quantify error in a two-level coarse-grained kinetic Monte Carlo method. Based on these considerations, we show in Section III that minimizing the error measured by (9) leads to a CTMC coarse-grained dynamics that best approximates long-time behavior of the microscopic process projected to the coarse degrees of freedom.

###### Remark II.1.

We consider the special case where the transition probability function of the Markov chain is sampled directly from the invariant measure, i.e.,

 p(σ,σ′)=μ(σ′),and q(σ,σ′)=ν(σ′), for all σ,σ′∈Σ.

This sampling is equivalent to the fact that the path space samples in (7) are independent and identically distributed from the stationary probability distributions. Then the RER between the path probabilities becomes the usual relative entropy between the stationary distributions:

 H(P|Q)=R(μ|ν). (10)

Estimating RER using (8) is far simpler than directly estimating the relative entropy , since (8) only involves local dynamics rather than the full steady state measure, which typically may not be available. Furthermore even when it is available in the form of a Gibbs state it will require computations that will typically involve a full Hamiltonian, Chaimovich and Shell (2011); Katsoulakis et al. (2008).

###### Remark II.2.

We also note that the RER can be written as a relative entropy, inheriting all its properties Cover and Thomas (1991), e.g., non negativity, convexity, etc. In fact, we can rewrite (8) as

 H(P|Q)=R(μ⊗p|μ⊗q), (11)

where we define the product probability measures as .

Estimation and error of observables. The estimates on relative entropy and RER can provide an upper bound for a large family of observable functions through the Pinsker (or Csiszar-Kullback-Pinsker) inequality. The Pinsker inequality states that the total variation norm between and is bounded in terms of the relative entropy, Cover and Thomas (1991). The Pinsker inequality gives an estimate for a difference of the mean computed with respect to the distribution and

 |EP[0,T][f]−EQ[0,T][f]|≤||f||∞√2R(P[0,T]|Q[0,T]), (12)

where . An important conclusion that is immediately drawn from the above inequality is that if the relative entropy of a distribution with respect to another distribution is small then the error between any bounded observable functions is also accordingly small. Using (7) we readily obtain the estimate

 |EP[0,T][f]− EQ[0,T][f]|≤ (13) ||f||∞√2T√H(P|Q)+1TR(μ|ν),

involving the relative entropy rate (8) or (9). As in virtually all numerical analysis estimates for stochastic dynamical systems, the bound (13) may not be sharp, but it is indicative of the error in the observables when the distribution approximates .

## Iii Parametrization of coarse-grained dynamics and Inverse Dynamic Monte Carlo

### iii.1 Inverse Dynamic Monte Carlo methods.

In many applications the coarse-grained models are defined by effective potentials or effective rates which are sought in a family of parameter-dependent functions, Tschöp et al. (1998); Müller-Plathe (2002); Lyubartsev et al. (2003). The parameters are then fitted by minimizing certain functionals that attempt to capture different aspects of modeling errors, e.g., radial distribution functions in Müller-Plathe (2002). Compared to such Inverse Monte Carlo methods applied to equilibrium systems we cannot work directly with equilibrium distributions since the NESS is not explicitly known. Thus we apply the information-theoretic framework on the path space, i.e., on the approximating measure that depends on the parameters , and which are subsequently fitted using entropy based criteria for the best approximation.

The optimal parametrized coarse-grained transition probabilities are constructed as follows. First, given the parametrized coarse-grained transition probabilities we define the fine-scale projected rates , which can be defined, for instance, by (3) as

 qθ(σ,σ′)=U(σ′|Tσ′)¯pθ(Tσ,Tσ′), (14)

and the corresponding coarse-grained path-distribution is

 Qθ(σ0,…,σT)=¯μ(Tσ0)qθ(σ0,σ1)…qθ(σT−1,σT). (15)

Subsequently the best-fit can be obtained by minimizing the relative entropy rate, i.e., finding a solution

 θ∗=argminθH(P|Qθ), (16)

where now we have that the RER is

 H(P|Qθ)=∑σ∈Σμ(σ)∑σ′∈Σp(σ,σ′)logp(σ,σ′)qθ(σ,σ′)]. (17)

This optimization problem on one hand is similar to more common parametric inference in which the log-likelihood function is maximized, and this perspective will be further clarified in Section III.2. Furthermore, due to the parametric identification of the coarse-grained dynamics, i.e., transition probabilities, or rates in the case of (9), we refer to the proposed methodology as an Inverse Dynamic Monte Carlo method in analogy to the Inverse Monte Carlo methods for equilibrium systems, Tschöp et al. (1998); Müller-Plathe (2002); Lyubartsev et al. (2003).

The optimization algorithm for (16) is based on iterative procedures that locate a solution of the optimality condition

 θ(n+1)=θ(n)−αnG(n+1), (18)

for some and being a suitable approximation of the gradient , more precisely . The crucial ingredient of this algorithm is an efficient and reliable estimator for the sequence of the gradient estimates. Similar to the deterministic case the minimization can be accelerated by combining this step with the Newton-Raphson method and choosing the vector as

 Gn=Hess(H(P|Qθn))−1∇θH(P|Qθn). (19)

While the evaluation of the Hessian presents an additional computational cost, it also offers additional information about the parametrization, sensitivity and identifiability of the approximating model, Pantazis and Katsoulakis (2013). Indeed the first and the second derivatives of the rate function are of the form:

 ∇θ(H(P|Qθ))=−Eμ[∑σ′p(σ,σ)∇θlogqθ(σ,σ′)], (20)

and , where

 FH(Qθ)=−Eμ[∑σ′p(σ,σ)∇2θlogqθ(σ,σ′)]. (21)

The Hessian can be interpreted as a dynamic analogue of the Fisher Information Matrix (FIM) on the path space. A similar quantity, in the context of sensitivity analysis, was recently considered in Pantazis and Katsoulakis (2013), where the authors also developed efficient statistical estimators for the derivatives of RER and . We discuss related estimators in Section IV.

###### Remark III.1.

The proposed approach carries sufficient level of generality in order to be applicable to a wide class of stochastic processes, e.g., Langevin dynamics and KMC, without restriction to the dimension of the system, provided scalable efficient simulators are available to simulate the observables, (8) and (9). The proposed parametrized coarse-graining is applicable to any system for which a parametrized coarse-grained models are available, e.g., in coarse-graining of macromolecules and biomembranes, Padding and Briels (2011); Tschöp et al. (1998); Müller-Plathe (2002). An obvious obstacle is that the path measure is absolutely continuous with respect to , however, it does not significantly restrict the class of relevant applications as we typically deal with KMC or Markov Chain approximations resulting from a discretization of Molecular Dynamics with noise. In the latter case, Markov chains obtained by numerical approximations of stochastic differential equations (SDEs) allow us to compute RER through (8) and can be used for quantification of errors or inverse Monte Carlo fitting for non-equilibrium or irreversible models in Section III.

For example, the overdamped molecular dynamics with positions and the forcefield at the inverse temperature is a diffusion process given by the stochastic differential equations driven by the -dimensional Wiener process

 dXt=a(Xt)dt+√2β−1dWt,X0=x.

We assume that the drift field satisfies standard conditions that guarantee existence of solutions for all and the process is ergodic with the stationary distribution . The stochastic differential equation can be discretized by the Euler scheme with the time-step

 Xn+1=Xn+a(Xn)h+√2β−1Z√h, (22)

where is a random increment from the standard normal distribution. The Euler scheme discretization defines the Markov chain with the transition kernel

 ph(x,x′)dx′∼e−βh|x′−x−ha(x)|2dx′.

The time-continuous case presents technical difficulties that we do not address here, instead we demonstrate application of the proposed method at the level of the approximating Markov chain only. The case where the process is driven by a multiplicative noise can be handled in a similar way using a discrete scheme. For the sake of simplicity we define the coarse-graining operator as an orthogonal projection from the state space to a subspace , and we write , denoting , . The reduced model is then viewed as an approximation of the projected Markov chain

 ΠXn+1=ΠXn+Πa(Xn)h+√2β−1ΠZ√h, (23)

by

 ¯Xn+1=¯Xn+¯a(¯Xn;θ)h+√2β−1¯Z√h, (24)

where the Gaussian increments are . Denoting and the drift increments in (22) and (24) respectively we have the transition kernel of given by . Given the transition kernel of the coarse-grained chain we define, similarly as in (14), the transition kernel of a reconstructed chain on the original state space

 qh(x,x′;θ)=¯ph(Πx,Πx′;θ)ν(x′|Πx′).

As long as the reconstruction measure does not depend on the parameters the particular choice of does not enter the optimality condition for . Hence the relative entropy rate is given by

 H(P|Pθ)=∫∫μ(x)ph(x,x′)logph(x,x′)qh(x,x′;θ)dx′dx. (25)

Note that due to the choice of the orthogonal projection the transition kernel for (22) of the full model becomes

 ph(x,x′)=1¯Ze−βh|Πx′−ΠΔ(x)|2d¯x′×1~Ze−βh|Π⊥x′−Π⊥Δ(x)|2d~x′,

and this factorization into a product simplifies the evaluation of the necessary condition for a minimizer of . Removing the terms that are independent of we have

 ∇θ∫∫βh¯Ze−βh|Πx′−ΠΔ(x)|2|Πx′−ΠΔθ(Πx)|2d¯x′μ(x)dx=∇θ∫|Δθ(Πx)−ΠΔ(x)|2μ(x)dx=0. (26)

In other words the minimization of is equivalent to the minimization

 minθ∈Rn∫|Πa(x)−¯a(Π(x);θ)|2μ(x)dx. (27)

The minimization becomes particularly straightforward when the parametrization of the coarse-grained drift is chosen as an approximation over the set of polynomials , i.e., . In such a case the minimization of the entropy rate functional defines the projection on the subspace in the space , i.e., the least-square fit with respect to the stationary measure . The functional is then convex in and the problem has the unique solution which is the solution of the linear system

 Φθ=a,where Φij=Eμ[ϕiϕj] and ai=Eμ[Πaϕi].

The expected values can then be estimated as ergodic averages on a single trajectory realization of the original process as . The parametrization scheme in Espanol and Zuniga (2011) proposed in the context of the Fokker-Planck equation is an example of the proposed method for reversible stochastic differential equations, i.e., those having a Gibbs steady state. Application of the proposed method to the diffusion process also shows that widely applied “force-matching” method, Izvekov and Voth (2005a, b); Noid et al. (2008), used in computational coarse-graining is the best-fit in the sense of entropy rate minimization.

### iii.2 Path-space likelihood methods and data-based parametrization of coarse-grained dynamics

A different, and asymptotically equivalent perspective on parametrizing coarse-grained dynamics relies on viewing the microscopic simulator as means of producing statistical data in the form of a time-series. Although the proposed method can be applied to systems simulated by Langevin-type dynamics we demonstrate its application in the Kinetic Monte Carlo algorithms in Section V. The primary new element of the presented coarse-graining approach lies in deriving the parametrization by optimizing the information content (in path-space) compared to the available computational data from a fine-scale simulation, taking advantage of computable formulas for relative entropy discussed earlier.

More specifically, we consider a fine-scale data set of configurations obtained for example from a fine-scale KMC algorithm. As is typical in the KMC framework, we assume that the atomistic model can be described by a spatial, continuous-time Markov jump process, Stamatakis and Vlachos (2012). The path-space measure of this KMC process, see for the Markov Chain analogue of the path measure (6), is parametrized as . In this sense we assume that for the particular data set the “true” parameter value is . Identifying amounts, mathematically, to minimizing the pseudo-distance given by the relative entropy, . Furthermore, following (7) it suffices to minimize . On the other hand, using the ergodicity of the fine scale process associated with the data set , we have the estimators

 H(Pθ∗|Qθ)=limN→∞^HN(Pθ∗|Qθ) (28)

where we define the unbiased estimator for RER, see Section IV,

 ^HN(Pθ∗|Qθ):=1NN∑i=1logpθ∗(σi,σi+1)qθ(σi,σi+1), (29)

and is defined in (15). For simplicity in notation we only demonstrate the estimator (28) for the Markov Chain case, where denotes the transition probability. The continuous-time case, which is relevant to the coarse-grained KMC simulations in Section V, is obtained similarly using (9).

Therefore, the minimization of RER becomes

 minθ^HN(Pθ∗|Qθ)= maxθ1NN∑i=1logqθ(σi,σi+1) (30) −1NN∑i=1logpθ∗(σi,σi+1),

which does not require (a) a priori the knowledge of , (b) the microscopic reconstruction defined by in (14) since is independent of . Therefore, we define the coarse-grained path space Likelihood maximization as

 maxθL(θ;{σi}Ni=0):=maxθ1NN∑i=1log¯pθ(Tσi,Tσi+1). (31)

Note that if the transition probabilities in (31) are replaced with a stationary measure and corresponding independent samples , then (31) becomes the classical Maximum Likelihood Principle (MLE). In this sense (31) is a Maximum Likelihood for the coarse-graining of the stationary time series, of the fine-scale process, and thus includes dynamics information. Furthermore, due to the stationarity of the time series, it allows us (if necessary) to obtain the Markovian best-fit from the dynamical simulation and observations on a single, long-time realization of the process.

Fisher Information Matrix and Confidence Intervals. The Fisher Information Matrix (FIM) in (21) is clearly computable as an ergodic average and can provide confidence intervals for the corresponding estimator , based on the asymptotic normality of the MLE estimator . Indeed, under additional mild hypotheses on the samples , this general procedure guarantees convergence in analogy to the central limit theorem, by employing a martingale formulation, Crowder (1976). In a much simpler context, when consecutive pairs in the FIM estimator, e.g., (34), are sampled beyond the decorrelation time of the time series, the usual central limit theorem applies and yields the asymptotic normality result

 ^θN→θ∗a.s.and N1/2(^θN−θ∗)⇀N(0,FH−1(Qθ∗)), (32)

where the variance is determined by the Fisher Information Matrix , or asymptotically by . Thus estimating the FIM using (21) provides rigorous error bars on computed optimal parameter values .

## Iv Statistical estimators for RER and FIM

The Relative Entropy Rate (17), as well as the Fisher Information Matrix (21) are observables of the stochastic process and can be estimated as ergodic averages. Thus, both observables are computationally tractable since they depend only on the local transition quantities. We give explicit formulas for the case of the continuous-time Markov chain.

The first estimator for RER is given by

 ˆH(n)1(P|Qθ)=1Tn−1∑i=0Δτi[∑σ′∈Ec(σi,σ′) (33) ×logc(σi,σ′)cθ(σi,σ′)−(λ(σi)−λθ(σi))],

where is an exponential random variable with parameter while is the total simulation time. The sequence is the embedded Markov chain with transition probabilities at the step and are the rates of the parametrized process, e.g., the coarse-grained rates . Notice that the weight which is the waiting time at the state at each step, is necessary for the correct estimation of the observable, Gillespie (1976). Similarly, the estimator for the FIM is

 ˆF(n)1=1Tn−1∑i=0Δτi∑σ′∈Ecθ(σi,σ′)∇θlogcθ(σi,σ′)∇θlogcθ(σi,σ′)T . (34)

The computation of the local transition rates for all is needed for the simulation of the jump Markov process when Monte Carlo methods such as stochastic simulation algorithm (SSA), Gillespie (1976) is utilized. Thus, the estimators and present only a minor additional computational cost in the simulation.

The second numerical estimator for RER is based on the Girsanov representation of the Radon-Nikodym derivative and it is given by

 ˆH(n)2(P|Qθ)= 1nn−1∑i=0logc(σi,σi+1)cθ(σi,σi+1) (35) −1Tn−1∑i=0Δτi(λ(σi)−λθ(σi)).

Similarly we can construct an FIM estimator. The term in (35) involving logarithms should not be weighted since the counting measure is approximated with this estimator. Unfortunately, the estimator (35) has the same computational cost as (33) due to the need for the computation of the total rate which is the sum of the local transition rates. Furthermore, in terms of the variance, the latter estimator has worse performance due to the discarded sum over the states . For more details we also refer to Pantazis and Katsoulakis (2013).

## V Benchmarks

Example I: coarse-grained driven Arrhenius diffusion of interacting particles. Non-equilibrium systems at transient or steady state regimes are typical in applied science and engineering, and are the result of coupling between different physicochemical mechanisms, driven by external couplings or boundary conditions. For example reaction-diffusion systems in heteroepitaxial catalytic materials, Stamatakis and Vlachos (2012) are typically non-reversible, i.e., their steady state is not a Gibbs distribution. This is due to the fact that different mechanisms between species (reaction, diffusion, adsorption, etc.) do not have a common Hamiltonian and thus a common invariant distribution. Instead, a steady state distribution exists but it is a NESS and is typically not known. Similarly separation processes in microporous materials are typically driven by boundary conditions and/or coupled flows Smit and Maesen (2008), so the steady state is not necessarily a Gibbs state, but it is again an unknown NESS.

We demonstrate the proposed methodology by selecting a simple but illustrative system in the latter class of non-equilibrium problems. We consider an example of a driven, non-equilibrium diffusion process of interacting particles formulated as a lattice gas model with spin variables , corresponding to occupied or empty lattice sites ; here denotes a uniform one dimensional lattices with sites. This is a prototype driven system introduced as a model problem for the influence of microscopic dynamics to macroscopic behavior in driven separations problems in Vlachos and Katsoulakis (2000). This model problem is also intimately related to works on the structure of non-equilibrium steady states (NESS),Eckmann, Pillet, and Rey-Bellet (1999); Rey-Bellet and Thomas (2002), as well as to the general formalism of non-equilibrium statistical mechanics, Bonetto, Lebowitz, and Rey-Bellet (2000); Jakšić, Pillet, and Rey-Bellet (2011).

The evolution of particles is described in the context of the lattice-gas model as an exchange dynamics with the Arrhenius migration rate from the site to the nearest-neighbor sites

 c(x,y,σ)=de−βU(x,σ)σ(x)(1−σ(y)),

which describes the diffusion of a particle at moving to and interacting through a two-body potential and an external field , defining an energy barrier . The continuous-time Markov chain is defined by its rates and updates to new configurations in which the spin variables and exchanged its values. Finally, the system is driven by the concentration gradient given by different concentrations at the boundary sites and . In the long time behavior the distribution converges to a stationary distribution that gives rise to a NESS concentration profile across the computational domainVlachos and Katsoulakis (2000), see also Figure 1.

Next we discuss the parametric family of coarse-grained models we will use. Under the assumption of a local equilibrium a straightforward local averaging yields the coarse-grained rates, Katsoulakis and Vlachos (2003),

 ¯c(k,l,η)=1qη(k)(q−η(l))de−β¯U(k,η), (36)

for the lattice-gas model with local concentrations defined as the number of particles in a coarse cell of (lattice) size . Thus we have coarse cells, where . In fact, according to (1) we define the coarse graining operator

 η(k)=Tσ(k)=∑x∈Ckσ(x), (37)

where . Keeping the two-body interactions as a basis for the coarse-grained approximation the effective potentials between block spins and are obtained by a straightforward spatial averaging

 ¯J(k,l) =1q2∑x∈Ck∑y∈ClJ(x−y), (38) ¯J(k,k) =1q(q−1)∑x∈Ck∑y∈CkJ(x−y).

Assuming that the approximating dynamics is of Arrhenius type we obtain the energy barriers

 ¯U(k,η)=∑l¯J(k,l)η(k)+¯J(0,0)(η(k)−1)−¯h.

The resulting dynamics is a Markovian approximation of the coarse-grained evolution and it is defined as CTMC with the rates .

As a prototype example of the interactions we consider the constant potential for and otherwise. This coarse-grained potential is parametrized by a single parameter corresponding to the strength of the coarse-grained interactions.

First we note that he local mean-field approximation which defines the interaction potential in (38) between two block spins and by averaging contributions from all spin-spin interactions in the cells does not provide a good approximation as demonstrated in Figure 1(a), where the inset depicts the error estimated in terms of the entropy rate . However, the mean-field potential is a good initial datum for (18).

In this benchmark we stay in the family of two-body potentials and chose to fit only a single parameter that defines the total strength of the interaction. Thus the rates are parametrized by the effective potential using the single parameter , keeping the interaction range fixed in each set of simulations. Clearly we could consider much richer families with parametrized potentials, e.g., of Morse or Lennard-Jones type, however, we opt for the simplest parametrizations in order to demonstrate with clarity the proposed methods. The best-fit was obtained by solving the optimization problem (31), hence, minimizing the error defined by , see Figure 2(a) and Figure 2(b). Figure 1(b) depicts concentration profiles for different sizes of the coarse cells. The dashed lines represent results from simulations with mean-field interactions between cells only (i.e., the initial guess in the optimization), while the solid lines represent simulations with the parametrized effective interactions.

Comparison with the profile obtained from the microscopic simulation (the solid black line) clearly indicates that when the coarse-graining size becomes close to the interaction range of the microscopic potential the best-fit in a one-parameter family is not sufficient for obtaining good approximation and a better candidate class of models, in this case coarse-grained (CG) dynamics , needs to be found for improved parametrization. Indeed, in Are et al. (2008); Katsoulakis et al. (2013) we showed that coarse grained, multi-body cluster Hamiltonians provide such a parametrization. More specifically, in Katsoulakis et al. (2013) we demonstrated, through rigorous cluster expansions that (typical in the state-of-the-art) two-body CG approximations break down in lower temperatures and/or for short range particle-particle interactions, and additional multi-body CG terms need to be included in the models in order the CG model to capture accurately phase transitions and other physical important properties. Hence, a specific parametrization cannot consistently address this issue on its own, unless the proper classes of parametric models are first identified. The RER computations such as the one depicted in Figure 2(b) can assess the accuracy of different coarse-grained dynamics within the same parametric family (shown here), as well as determine the comparative coarse-graining accuracy of different parametric families.

Example II: Two-scale diffusion process and averaging principle. In the second example we demonstrate the parametric approximation of the coarse-grained process on a system of two stochastic differential equations with slow and fast time scales

 dXϵt=a(Xϵ,Yϵ)dt+dW1t (39) dYϵt=ϵ−1b(Xϵ,Yϵ)dt+ϵ−1/2dW2t, (40)

where , are independent standard Wiener processes. Under suitable assumptions on , (see E, Liu, and Vanden-Eijnden (2005); Freidlin and Wentzell (2012)) the dynamics with held fixed has a unique invariant measure and as we have the effective dynamics for the process given by

 d¯Xt=¯a(¯Xt)+dWt. (41)

The drift is given by the averaging principle

 ¯a(x)=limϵ→0∫a(x,y)μϵx(dy).

In this example we choose the component as the coarse variable and compute an approximation of the drift for the coarse-grained (projected) process . The averaging principle suggests that as .

For the specific choice

 a(x,y)=−y,andb(x,y)=y−x,

we have that where is the normalizing constant and thus which yields the effective dynamics in the limit . In the computational benchmarks we thus compare stationary processes resulting as in solving

 dXϵt=−Yϵtdt+dW1t,the% two-scale model, (42) dYϵt=ϵ−1(Yϵt−Xϵt)dt+ϵ−1/2dW2t, d¯Xϵt=−K∑k=1θkϕ(¯Xϵt)dt+dWt,CG model, (43) d¯Xt=−¯Xtdt+dWt,asymptotic at ϵ→0. (44)

The proposed method for approximating the coarse-grained dynamics constructs an effective potential for a finite value . The set of interpolating polynomials has been chosen to be in this example. It is expected that as , the approximation approaches the averaged coefficient .

The simulation results depicted in Figure 3 demonstrate that in the case of sufficient time-scale separation, , the coarse-grained model well approximates the invariant distribution as well as the autocorrelation function for stationary dynamics of the component . Furthermore, the drift of the coarse-grained dynamics deviates by a small error (see Figure 5) from the drift of the averaged model. On the other hand, for a larger value , where the averaging principle does not apply, the proposed method yields a coarse-grained model (an “effective dynamics”) which still approximates reasonably well the invariant distribution. However, the dynamics of the stationary process does not approximate properly the stationary behaviour of the slow component of the original process, as demonstrated in Figure 4 by comparison of autocorrelation functions.