Informationtheoretic tools for parametrized coarsegraining of nonequilibrium extended systems
Abstract
In this paper we focus on the development of new methods suitable for efficient and reliable coarsegraining of nonequilibrium molecular systems. In this context, we propose error estimation and controlledfidelity model reduction methods based on PathSpace Information Theory, combined with statistical parametric estimation of rates for nonequilibrium stationary processes. The approach we propose extends the applicability of existing informationbased methods for deriving parametrized coarsegrained models to NonEquilibrium systems with Stationary States (NESS). In the context of coarsegraining it allows for constructing optimal parametrized Markovian coarsegrained dynamics within a parametric family, by minimizing information loss (due to coarsegraining) on the path space. Furthermore, we propose an asymptotically equivalent method–related to maximum likelihood estimators for stochastic processes–where the coarsegraining is obtained by optimizing the information content in path space of the coarse variables, with respect to the projected computational data from a finescale simulation. Finally, the associated pathspace Fisher Information Matrix can provide confidence intervals for the corresponding parameter estimators. We demonstrate the proposed coarsegraining method in (a) nonequilibrium systems with diffusing interacting particles, driven by outofequilibrium boundary conditions, as well as (b) multiscale diffusions and their wellstudied corresponding stochastic averaging limits, comparing them to our proposed methodologies.
I Introduction
Nonequilibrium systems at transient or steady state regimes are typical in applied science and engineering, and are the result of coupling between different physicochemical mechanisms, driven by external couplings or boundary conditions. Typical examples include reactiondiffusion systems in heteroepitaxial catalytic materials, polymeric flows and separation processes in microporous materials, Stamatakis and Vlachos (2012); Padding and Briels (2011); Smit and Maesen (2008). In this paper we develop reliable modelreduction methods, i.e., having controlled fidelity of approximation, and capable to handle extended, nonequilibrium statistical mechanics models. These coarsegraining methods allow for constructing optimal parametrized Markovian coarsegrained dynamics within a parametric family, by minimizing information loss (due to coarsegraining) on the path space. Modelreduction (or coarsegraining) approaches can be often described in the context of parameter estimation in parametrized statistical models. However, atomistic models of materials lead to highdimensional probability distributions and/or stochastic processes to which the standard methods of statistical inference and model discrimination are not directly applicable. The emphasis on information theory tools is also partly justified since often we are interested in probability density functions (PDF), typically nonGaussian, due to the significance of tail events in complex systems. A primary focus of this paper is on systems with NonEquilibrium Steady States (NESS), i.e., systems in which a steady state is reached but the detailed balance condition is violated and explicit formulas for the stationary distribution, e.g., in the form of a Gibbs distribution, are not available.
Informationtheoretic methods for the analysis of stochastic models typically employ entropybased tools for analyzing and estimating a distance between (probability) measures. In particular, the relative entropy (KullbackLeibler divergence) of two probability measures and
allows us to define a pseudodistance between two measures. A key property of the relative entropy is that with equality if and only if , which allows us to view relative entropy as a “distance” (more precisely a semimetric) between two probability measures and . Moreover, from an information theory perspective Cover and Thomas (1991), the relative entropy measures loss/change of information. Relative entropy for highdimensional systems was used as measure of loss of information in coarsegraining Katsoulakis and Trashorras (2006); Katsoulakis et al. (2007); Arnst and Ghanem (2008), and sensitivity analysis for climate modeling problems Majda and Gershgorin (2010).
Using entropybased analytical tools has proved essential for deriving rigorous results for passage from interacting particle models to meanfield descriptions, Kipnis and Landim (1999). The application of relative entropy methods to the error analysis of coarsegraining of stochastic particle systems have been introduced and studied in Katsoulakis et al. (2007, 2008, 2013); Katsoulakis, Majda, and Vlachos (2003); Katsoulakis and Vlachos (2003). Aside of this rigorous numerical analysis direction, entropybased computational techniques were also developed and used for constructing approximations of coarsegrained (effective) potentials for models of large biomolecules and polymeric systems (fluids, melts). Optimal parametrization of effective potentials based on minimizing the relative entropy between equilibrium Gibbs states, e.g.,Chaimovich and Shell (2011); Carmichael and Shell (2012); Bilionis and Zabaras (2012), extended previously developed inverse Monte Carlo methods, primarily based on force matching approaches, used in coarsegraining of macromolecules (see, e.g., Tschöp et al. (1998); MüllerPlathe (2002)). In Espanol and Zuniga (2011) an extension to dynamics is proposed in the context of FokkerPlanck equations, by considering the corresponding relative entropy for discretetime approximations of the transition probabilities. Furthermore, relative entropy was used as means to improve model fidelity in a parametric, multimodel approximation framework of complex dynamical systems, at least when the model’s steadystate distributions are explicitly known, Majda and Gershgorin (2011). Overall, such parametrization techniques are focusing on systems with a known steady state, such as a Gibbs equilibrium distribution. More specifically, computational implementations of optimal parametrization in the inverse Monte Carlo methods is relatively straightforward for equilibrium systems in which the bestfit procedure is applied to an explicitly known equilibrium distribution and where relative entropy is explicitly computable.
On the other hand, this is not the case in nonequilibrium systems, even at a steady state where typically we do not have a Gibbs structure and the steadystate distribution is unknown altogether, setting up one of the primary challenges for this paper. Indeed, here we show that, in nonequilibrium systems the general information theory ideas based on KullbackLeibler divergence are still applicable but they have to be properly formulated in the context of Nonequilibrium Statistical Mechanics by focusing on the probability distribution of the entire time series, i.e., on the path space of the underlying stochastic processes. We show that, surprisingly, such a pathspace relative entropy formulation is: (a) general in the sense that it applies to any Markovian models (e.g. Langevin dynamics, Kinetic Monte Carlo, etc), and (b) is easily computable as an ergodic average in terms of the Relative Entropy Rate (RER), therefore allowing us to construct optimal parametrized Markovian coarsegrained dynamics for large classes of models. This procedure involves the minimization of information loss in path space, where information is inadvertently lost due to coarsegraining procedure. In fact, the proposed parametrization scheme in Espanol and Zuniga (2011) is mathematically justified by reformulating it on the path space using the Relative Entropy Rate and it is a specific, but reversible (i.e., it has a Gibbs steady state) example of our methodology. Furthermore, we propose an asymptotically equivalent method to RER minimization, which is related to to maximum likelihood estimators for stochastic processes, and where the optimal coarsegraining is obtained by optimizing the information content in path space of the coarse variables, with respect to the projected computational data from a finescale simulation. Finally, the pathspace Fisher Information Matrix (FIM) derived from Relative Entropy Rate (RER) can provide confidence intervals for the corresponding statistical estimators of the optimal parameter obtained through the minimization problem.
The Relative Entropy Rate (RER) was earlier proposed as a (pseudo) metric in order to evaluate the convergence of adaptive sampling schemes for Markov State Models (MSM)Bowman, Ensign, and Pande (2010); the proposed coarsegraining perspective, which is related to Maximum Likelihood Estimators (MLE), e.g., (31), could provide new insights in comparing MSMs since it only requires finescale data–and not explicitly a finescale MSM for estimating the RER. Furthermore, RER and in particular the pathspace FIM where introduced recently as gradientfree sensitivity analysis tools for complex, nonequilibrium stochastic systems with a wide range of applicability to reaction networks, lattice Kinetic Monte Carlo and Langevin dynamics Pantazis and Katsoulakis (2013). Unlike the coarsegraining setting we propose here and where models are set in different state spaces depending on their granularity, in the latter work the same state space is assumed between compared models.
The paper is structured as follows. In Section II we formulate the pathspace information theory tools used in the paper, including the key concept of Relative Entropy Rate. In Section III we present the parameterization method of coarsegrained dynamics and the connections with maximum likelihood estimators and the Fisher Information Matrix. In Section IV we briefly discuss statistical estimators for RER and FIM. Finally, in Section V we demonstrate the proposed coarsegraining method in (a) nonequilibrium systems with diffusing interacting particles, driven by outofequilibrium boundary conditions, as well as (b) multiscale diffusions and their wellstudied corresponding stochastic averaging limits, comparing them to our proposed methodologies.
Ii Relative Entropy Rate, path space information theory and error quantification for nonequilibrium systems
First, we formulate a general entropybased error analysis for coarsegraining, dimensional reduction and parametrization of highdimensional Markov processes, simulated by Kinetic Monte Carlo (KMC) and Langevin Dynamics. Typically such systems have NonEquilibrium Steady States (NESS) for which detailed balance fails as they are irreversible. The stationary distributions are not known explicitly and have to be studied computationally.
ii.1 Coarsegrained models
We consider a parameterized class of coarsegrained Markov processes , associated with the fine scale stochastic process . The coarsegraining procedure is based on projecting the microscopic space into a coarse space with less degrees of freedom. We denote the coarse space variables
(1) 
is a coarsegraining (projection) operator, see also (37) for a specific example. In general, given a finescale probability measure defined on the finegrained configuration space , then we can define the exact (renormalized) coarsegrained probability measure on the coarse space :
(2) 
However, the exact CG probability cannot be explicitly calculated , except in trivial cases. Typically, we consider classes of parametric CG models which approximate the exact CG model, see for instance the quantification of such approximations using relative entropy Katsoulakis et al. (2007).
In the case of continuous time processes such as Kinetic Monte Carlo, the coarsegrained stochastic process is defined in terms of coarse transition rates which captures macroscopic information from the fine scale rates . For example, for stochastic lattice systems, approximate coarse rate functions are explicitly known from coarse graining (CG) techniques of Katsoulakis, Majda, and Vlachos (2003); Katsoulakis and Vlachos (2003); Dai, Seider, and Sinno (2008), see (36). Similarly, when we consider temporally discretized stochastic processes such as Langevin Dynamics, the coarsegrained process is given in terms of transition probabilities which capture macroscopic information from the fine scale transition probabilities .
Interpolated dynamics. Given coarsegrained dynamics we can always construct corresponding microscopic dynamics. For example, given coarsegrained transition probabilities with corresponding stationary distribution , where the latter is typically unknown in nonequilibrium systems, we define the corresponding finescale rates
(3) 
where ( denotes the cardinality) is the uniform conditional distribution over all finescale states corresponding to the same coarsegrained state . Clearly (3) is properly normalized, while a more general formulation can be found in Appendix A, (48). In (3) we apply a piecewise constant interpolation for all microscopic states (reps. ) corresponding to the same coarse state (reps. ) and thus transitions to these states occur with the same probability rates. The reconstruction step (3) is necessary when we want to compare fine and coarse processes on the path space in terms of the relative entropy rate, since both processes need to be defined on the same probability space, see for example (17) below. In this paper, for the sake of simplicity, we assume that all reconstructions are based on (3). The reconstruction is obviously not unique and we refer to Appendix A for the mathematical details, see also Trashorras and Tsagkarogiannis (2010).
ii.2 Relative Entropy and Error Quantification in Nonequilibrium Systems
Although all systems we consider here are ergodic, i.e., they have a unique steady state distribution, we typically assume they only have a NonEquilibrium Steady State (NESS) due to a lack of detailed balanceBonetto, Lebowitz, and ReyBellet (2000). Inevitably, in such systems the stationary distribution is not known explicitly and can be studied primarily computationally, at least in systems which are not small perturbations from equilibrium. Quantifying and controlling the coarsegraining error in such highdimensional stochastic systems can be achieved by developing computable and efficient methods for estimating distances of probability measures on the path space. The relative entropy between two path measures and (see (6) for a specific example) for the processes on the interval is
(4) 
where is the RadonNikodym derivative of with respect to . If these probability measures have probability densities , respectively, (4) becomes . In the setting of coarsegraining or modelreduction the measure is associated with the exact process and with the approximating (coarsegrained) process.
From an information theory perspective, the relative entropy measures the loss of information as we approximate the exact stochastic process with the coarsegrained one . In general the relative entropy (4) in this dynamic setting is not a computable object; we refer for instance to related formulas in the ShannonMacMillanBreiman Theorem, Cover and Thomas (1991). However, as we show next, in practically relevant cases of stationary Markov processes we can work with the relative entropy rate
(5) 
where and denote the distributions of the corresponding stationary processes.
Relative Entropy Rate for Markov Chains. In order to explain the basic concept we restrict to the case of two Markov chains, , on the countable state space , defined by the transition probability kernels and . A typical example would be the embedded Markov chain used for KMC simulations of a continuous time Markov chain. Similarly, in the case of a continuous state space, a temporal discretization of a Langevin process, leads to a Markov process with the transition kernel defined by the timediscretization scheme of the underlying stochastic dynamics. We assume that the initial states are from the invariant distributions and . The path measure defining the probability of a path is then
(6) 
and similarly for the measure . The RadonNikodym derivative is easily computed
Using the fact that the processes are stationary with invariant measures and , we obtain an expression for the relative entropy
(7) 
and thus the relative entropy rate is given explicitly as
(8) 
We will refer from now on to the quantity (8) as the Relative Entropy Rate (RER), which can be thought as the change in information per unit time. Notice that RER has the correct time scaling since it is actually independent of the interval . Furthermore, it has the following key features that make it a crucial observable for simulating and coarsegraining complex dynamics:
 (i)
 (ii)

In stationary regimes, when in (7), the term becomes unimportant. This is especially convenient since and are typically not known explicitly in nonreversible systems, for instance in reactiondiffusion or drivendiffusion KMC or nonreversible Langevin dynamics.
In view of these features, we readily see that if we consider a Markov chain as an approximation, e.g., a coarsegraining, of the chain , we can estimate the loss of information at long times by computing as an ergodic average. This observation is the starting point of the proposed methodology and relies on the fact that the observable is computable; efficient statistical estimators for (8) are discussed in Section IV. A similar calculation can be carried out for continuous time Markov Chains, as we see next.
Continuous Time Markov Chains and Kinetic Monte Carlo. In models of catalytic reactions or epitaxial growth the systems are often described by continuous time Markov chains (CTMC) that are simulated by KMC algorithms. For example, the microscopic Markov process describes the evolution of molecules on a substrate lattice. Mathematically the continuous time Markov chain is defined completely by specifying the local transition rates where is a vector of the model parameters. The transition rates determine the updates from any current state (configuration) to a (random) new state . In the context of the spatial models considered here, the transition rates take the form , denoting by a lattice site on a dimensional lattice and , where is the set of all possible configurations that correspond to an update in a neighborhood of the site . From local transition rates one defines the total rate , which is the intensity of the exponential waiting time for a jump from the state . The transition probabilities for the embedded Markov chain are . In other words once the exponential “clock” signals a jump, the system transitions from the state to a new configuration with the probability . In the context of coarsegraining we are led to finding an optimal parametrization for the rates of a processes that approximates the dynamics given by the microscopic process . A similar calculation as in the case of Markov chains gives the analogue of the formula (8)
(9) 
where is the stationary distribution of the microscopic process and denotes the total transition rate. In Kalligiannaki et al. (2012) we used this quantity in order to quantify error in a twolevel coarsegrained kinetic Monte Carlo method. Based on these considerations, we show in Section III that minimizing the error measured by (9) leads to a CTMC coarsegrained dynamics that best approximates longtime behavior of the microscopic process projected to the coarse degrees of freedom.
Remark II.1.
We consider the special case where the transition probability function of the Markov chain is sampled directly from the invariant measure, i.e.,
This sampling is equivalent to the fact that the path space samples in (7) are independent and identically distributed from the stationary probability distributions. Then the RER between the path probabilities becomes the usual relative entropy between the stationary distributions:
(10) 
Estimating RER using (8) is far simpler than directly estimating the relative entropy , since (8) only involves local dynamics rather than the full steady state measure, which typically may not be available. Furthermore even when it is available in the form of a Gibbs state it will require computations that will typically involve a full Hamiltonian, Chaimovich and Shell (2011); Katsoulakis et al. (2008).
Remark II.2.
Estimation and error of observables. The estimates on relative entropy and RER can provide an upper bound for a large family of observable functions through the Pinsker (or CsiszarKullbackPinsker) inequality. The Pinsker inequality states that the total variation norm between and is bounded in terms of the relative entropy, Cover and Thomas (1991). The Pinsker inequality gives an estimate for a difference of the mean computed with respect to the distribution and
(12) 
where . An important conclusion that is immediately drawn from the above inequality is that if the relative entropy of a distribution with respect to another distribution is small then the error between any bounded observable functions is also accordingly small. Using (7) we readily obtain the estimate
(13)  
involving the relative entropy rate (8) or (9). As in virtually all numerical analysis estimates for stochastic dynamical systems, the bound (13) may not be sharp, but it is indicative of the error in the observables when the distribution approximates .
Iii Parametrization of coarsegrained dynamics and Inverse Dynamic Monte Carlo
iii.1 Inverse Dynamic Monte Carlo methods.
In many applications the coarsegrained models are defined by effective potentials or effective rates which are sought in a family of parameterdependent functions, Tschöp et al. (1998); MüllerPlathe (2002); Lyubartsev et al. (2003). The parameters are then fitted by minimizing certain functionals that attempt to capture different aspects of modeling errors, e.g., radial distribution functions in MüllerPlathe (2002). Compared to such Inverse Monte Carlo methods applied to equilibrium systems we cannot work directly with equilibrium distributions since the NESS is not explicitly known. Thus we apply the informationtheoretic framework on the path space, i.e., on the approximating measure that depends on the parameters , and which are subsequently fitted using entropy based criteria for the best approximation.
The optimal parametrized coarsegrained transition probabilities are constructed as follows. First, given the parametrized coarsegrained transition probabilities we define the finescale projected rates , which can be defined, for instance, by (3) as
(14) 
and the corresponding coarsegrained pathdistribution is
(15) 
Subsequently the bestfit can be obtained by minimizing the relative entropy rate, i.e., finding a solution
(16) 
where now we have that the RER is
(17) 
This optimization problem on one hand is similar to more common parametric inference in which the loglikelihood function is maximized, and this perspective will be further clarified in Section III.2. Furthermore, due to the parametric identification of the coarsegrained dynamics, i.e., transition probabilities, or rates in the case of (9), we refer to the proposed methodology as an Inverse Dynamic Monte Carlo method in analogy to the Inverse Monte Carlo methods for equilibrium systems, Tschöp et al. (1998); MüllerPlathe (2002); Lyubartsev et al. (2003).
The optimization algorithm for (16) is based on iterative procedures that locate a solution of the optimality condition
(18) 
for some and being a suitable approximation of the gradient , more precisely . The crucial ingredient of this algorithm is an efficient and reliable estimator for the sequence of the gradient estimates. Similar to the deterministic case the minimization can be accelerated by combining this step with the NewtonRaphson method and choosing the vector as
(19) 
While the evaluation of the Hessian presents an additional computational cost, it also offers additional information about the parametrization, sensitivity and identifiability of the approximating model, Pantazis and Katsoulakis (2013). Indeed the first and the second derivatives of the rate function are of the form:
(20) 
and , where
(21) 
The Hessian can be interpreted as a dynamic analogue of the Fisher Information Matrix (FIM) on the path space. A similar quantity, in the context of sensitivity analysis, was recently considered in Pantazis and Katsoulakis (2013), where the authors also developed efficient statistical estimators for the derivatives of RER and . We discuss related estimators in Section IV.
Remark III.1.
The proposed approach carries sufficient level of generality in order to be applicable to a wide class of stochastic processes, e.g., Langevin dynamics and KMC, without restriction to the dimension of the system, provided scalable efficient simulators are available to simulate the observables, (8) and (9). The proposed parametrized coarsegraining is applicable to any system for which a parametrized coarsegrained models are available, e.g., in coarsegraining of macromolecules and biomembranes, Padding and Briels (2011); Tschöp et al. (1998); MüllerPlathe (2002). An obvious obstacle is that the path measure is absolutely continuous with respect to , however, it does not significantly restrict the class of relevant applications as we typically deal with KMC or Markov Chain approximations resulting from a discretization of Molecular Dynamics with noise. In the latter case, Markov chains obtained by numerical approximations of stochastic differential equations (SDEs) allow us to compute RER through (8) and can be used for quantification of errors or inverse Monte Carlo fitting for nonequilibrium or irreversible models in Section III.
For example, the overdamped molecular dynamics with positions and the forcefield at the inverse temperature is a diffusion process given by the stochastic differential equations driven by the dimensional Wiener process
We assume that the drift field satisfies standard conditions that guarantee existence of solutions for all and the process is ergodic with the stationary distribution . The stochastic differential equation can be discretized by the Euler scheme with the timestep
(22) 
where is a random increment from the standard normal distribution. The Euler scheme discretization defines the Markov chain with the transition kernel
The timecontinuous case presents technical difficulties that we do not address here, instead we demonstrate application of the proposed method at the level of the approximating Markov chain only. The case where the process is driven by a multiplicative noise can be handled in a similar way using a discrete scheme. For the sake of simplicity we define the coarsegraining operator as an orthogonal projection from the state space to a subspace , and we write , denoting , . The reduced model is then viewed as an approximation of the projected Markov chain
(23) 
by
(24) 
where the Gaussian increments are . Denoting and the drift increments in (22) and (24) respectively we have the transition kernel of given by . Given the transition kernel of the coarsegrained chain we define, similarly as in (14), the transition kernel of a reconstructed chain on the original state space
As long as the reconstruction measure does not depend on the parameters the particular choice of does not enter the optimality condition for . Hence the relative entropy rate is given by
(25) 
Note that due to the choice of the orthogonal projection the transition kernel for (22) of the full model becomes
and this factorization into a product simplifies the evaluation of the necessary condition for a minimizer of . Removing the terms that are independent of we have
(26) 
In other words the minimization of is equivalent to the minimization
(27) 
The minimization becomes particularly straightforward when the parametrization of the coarsegrained drift is chosen as an approximation over the set of polynomials , i.e., . In such a case the minimization of the entropy rate functional defines the projection on the subspace in the space , i.e., the leastsquare fit with respect to the stationary measure . The functional is then convex in and the problem has the unique solution which is the solution of the linear system
The expected values can then be estimated as ergodic averages on a single trajectory realization of the original process as . The parametrization scheme in Espanol and Zuniga (2011) proposed in the context of the FokkerPlanck equation is an example of the proposed method for reversible stochastic differential equations, i.e., those having a Gibbs steady state. Application of the proposed method to the diffusion process also shows that widely applied “forcematching” method, Izvekov and Voth (2005a, b); Noid et al. (2008), used in computational coarsegraining is the bestfit in the sense of entropy rate minimization.
iii.2 Pathspace likelihood methods and databased parametrization of coarsegrained dynamics
A different, and asymptotically equivalent perspective on parametrizing coarsegrained dynamics relies on viewing the microscopic simulator as means of producing statistical data in the form of a timeseries. Although the proposed method can be applied to systems simulated by Langevintype dynamics we demonstrate its application in the Kinetic Monte Carlo algorithms in Section V. The primary new element of the presented coarsegraining approach lies in deriving the parametrization by optimizing the information content (in pathspace) compared to the available computational data from a finescale simulation, taking advantage of computable formulas for relative entropy discussed earlier.
More specifically, we consider a finescale data set of configurations obtained for example from a finescale KMC algorithm. As is typical in the KMC framework, we assume that the atomistic model can be described by a spatial, continuoustime Markov jump process, Stamatakis and Vlachos (2012). The pathspace measure of this KMC process, see for the Markov Chain analogue of the path measure (6), is parametrized as . In this sense we assume that for the particular data set the “true” parameter value is . Identifying amounts, mathematically, to minimizing the pseudodistance given by the relative entropy, . Furthermore, following (7) it suffices to minimize . On the other hand, using the ergodicity of the fine scale process associated with the data set , we have the estimators
(28) 
where we define the unbiased estimator for RER, see Section IV,
(29) 
and is defined in (15). For simplicity in notation we only demonstrate the estimator (28) for the Markov Chain case, where denotes the transition probability. The continuoustime case, which is relevant to the coarsegrained KMC simulations in Section V, is obtained similarly using (9).
Therefore, the minimization of RER becomes
(30)  
which does not require (a) a priori the knowledge of , (b) the microscopic reconstruction defined by in (14) since is independent of . Therefore, we define the coarsegrained path space Likelihood maximization as
(31) 
Note that if the transition probabilities in (31) are replaced with a stationary measure and corresponding independent samples , then (31) becomes the classical Maximum Likelihood Principle (MLE). In this sense (31) is a Maximum Likelihood for the coarsegraining of the stationary time series, of the finescale process, and thus includes dynamics information. Furthermore, due to the stationarity of the time series, it allows us (if necessary) to obtain the Markovian bestfit from the dynamical simulation and observations on a single, longtime realization of the process.
Fisher Information Matrix and Confidence Intervals. The Fisher Information Matrix (FIM) in (21) is clearly computable as an ergodic average and can provide confidence intervals for the corresponding estimator , based on the asymptotic normality of the MLE estimator . Indeed, under additional mild hypotheses on the samples , this general procedure guarantees convergence in analogy to the central limit theorem, by employing a martingale formulation, Crowder (1976). In a much simpler context, when consecutive pairs in the FIM estimator, e.g., (34), are sampled beyond the decorrelation time of the time series, the usual central limit theorem applies and yields the asymptotic normality result
(32) 
where the variance is determined by the Fisher Information Matrix , or asymptotically by . Thus estimating the FIM using (21) provides rigorous error bars on computed optimal parameter values .
Iv Statistical estimators for RER and FIM
The Relative Entropy Rate (17), as well as the Fisher Information Matrix (21) are observables of the stochastic process and can be estimated as ergodic averages. Thus, both observables are computationally tractable since they depend only on the local transition quantities. We give explicit formulas for the case of the continuoustime Markov chain.
The first estimator for RER is given by
(33)  
where is an exponential random variable with parameter while is the total simulation time. The sequence is the embedded Markov chain with transition probabilities at the step and are the rates of the parametrized process, e.g., the coarsegrained rates . Notice that the weight which is the waiting time at the state at each step, is necessary for the correct estimation of the observable, Gillespie (1976). Similarly, the estimator for the FIM is
(34) 
The computation of the local transition rates for all is needed for the simulation of the jump Markov process when Monte Carlo methods such as stochastic simulation algorithm (SSA), Gillespie (1976) is utilized. Thus, the estimators and present only a minor additional computational cost in the simulation.
The second numerical estimator for RER is based on the Girsanov representation of the RadonNikodym derivative and it is given by
(35)  
Similarly we can construct an FIM estimator. The term in (35) involving logarithms should not be weighted since the counting measure is approximated with this estimator. Unfortunately, the estimator (35) has the same computational cost as (33) due to the need for the computation of the total rate which is the sum of the local transition rates. Furthermore, in terms of the variance, the latter estimator has worse performance due to the discarded sum over the states . For more details we also refer to Pantazis and Katsoulakis (2013).
V Benchmarks
Example I: coarsegrained driven Arrhenius diffusion of interacting particles. Nonequilibrium systems at transient or steady state regimes are typical in applied science and engineering, and are the result of coupling between different physicochemical mechanisms, driven by external couplings or boundary conditions. For example reactiondiffusion systems in heteroepitaxial catalytic materials, Stamatakis and Vlachos (2012) are typically nonreversible, i.e., their steady state is not a Gibbs distribution. This is due to the fact that different mechanisms between species (reaction, diffusion, adsorption, etc.) do not have a common Hamiltonian and thus a common invariant distribution. Instead, a steady state distribution exists but it is a NESS and is typically not known. Similarly separation processes in microporous materials are typically driven by boundary conditions and/or coupled flows Smit and Maesen (2008), so the steady state is not necessarily a Gibbs state, but it is again an unknown NESS.
We demonstrate the proposed methodology by selecting a simple but illustrative system in the latter class of nonequilibrium problems. We consider an example of a driven, nonequilibrium diffusion process of interacting particles formulated as a lattice gas model with spin variables , corresponding to occupied or empty lattice sites ; here denotes a uniform one dimensional lattices with sites. This is a prototype driven system introduced as a model problem for the influence of microscopic dynamics to macroscopic behavior in driven separations problems in Vlachos and Katsoulakis (2000). This model problem is also intimately related to works on the structure of nonequilibrium steady states (NESS),Eckmann, Pillet, and ReyBellet (1999); ReyBellet and Thomas (2002), as well as to the general formalism of nonequilibrium statistical mechanics, Bonetto, Lebowitz, and ReyBellet (2000); Jakšić, Pillet, and ReyBellet (2011).
The evolution of particles is described in the context of the latticegas model as an exchange dynamics with the Arrhenius migration rate from the site to the nearestneighbor sites
which describes the diffusion of a particle at moving to and interacting through a twobody potential and an external field , defining an energy barrier . The continuoustime Markov chain is defined by its rates and updates to new configurations in which the spin variables and exchanged its values. Finally, the system is driven by the concentration gradient given by different concentrations at the boundary sites and . In the long time behavior the distribution converges to a stationary distribution that gives rise to a NESS concentration profile across the computational domainVlachos and Katsoulakis (2000), see also Figure 1.
Next we discuss the parametric family of coarsegrained models we will use. Under the assumption of a local equilibrium a straightforward local averaging yields the coarsegrained rates, Katsoulakis and Vlachos (2003),
(36) 
for the latticegas model with local concentrations defined as the number of particles in a coarse cell of (lattice) size . Thus we have coarse cells, where . In fact, according to (1) we define the coarse graining operator
(37) 
where . Keeping the twobody interactions as a basis for the coarsegrained approximation the effective potentials between block spins and are obtained by a straightforward spatial averaging
(38)  
Assuming that the approximating dynamics is of Arrhenius type we obtain the energy barriers
The resulting dynamics is a Markovian approximation of the coarsegrained evolution and it is defined as CTMC with the rates .
As a prototype example of the interactions we consider the constant potential for and otherwise. This coarsegrained potential is parametrized by a single parameter corresponding to the strength of the coarsegrained interactions.
First we note that he local meanfield approximation which defines the interaction potential in (38) between two block spins and by averaging contributions from all spinspin interactions in the cells does not provide a good approximation as demonstrated in Figure 1(a), where the inset depicts the error estimated in terms of the entropy rate . However, the meanfield potential is a good initial datum for (18).
In this benchmark we stay in the family of twobody potentials and chose to fit only a single parameter that defines the total strength of the interaction. Thus the rates are parametrized by the effective potential using the single parameter , keeping the interaction range fixed in each set of simulations. Clearly we could consider much richer families with parametrized potentials, e.g., of Morse or LennardJones type, however, we opt for the simplest parametrizations in order to demonstrate with clarity the proposed methods. The bestfit was obtained by solving the optimization problem (31), hence, minimizing the error defined by , see Figure 2(a) and Figure 2(b). Figure 1(b) depicts concentration profiles for different sizes of the coarse cells. The dashed lines represent results from simulations with meanfield interactions between cells only (i.e., the initial guess in the optimization), while the solid lines represent simulations with the parametrized effective interactions.
Comparison with the profile obtained from the microscopic simulation (the solid black line) clearly indicates that when the coarsegraining size becomes close to the interaction range of the microscopic potential the bestfit in a oneparameter family is not sufficient for obtaining good approximation and a better candidate class of models, in this case coarsegrained (CG) dynamics , needs to be found for improved parametrization. Indeed, in Are et al. (2008); Katsoulakis et al. (2013) we showed that coarse grained, multibody cluster Hamiltonians provide such a parametrization. More specifically, in Katsoulakis et al. (2013) we demonstrated, through rigorous cluster expansions that (typical in the stateoftheart) twobody CG approximations break down in lower temperatures and/or for short range particleparticle interactions, and additional multibody CG terms need to be included in the models in order the CG model to capture accurately phase transitions and other physical important properties. Hence, a specific parametrization cannot consistently address this issue on its own, unless the proper classes of parametric models are first identified. The RER computations such as the one depicted in Figure 2(b) can assess the accuracy of different coarsegrained dynamics within the same parametric family (shown here), as well as determine the comparative coarsegraining accuracy of different parametric families.
Example II: Twoscale diffusion process and averaging principle. In the second example we demonstrate the parametric approximation of the coarsegrained process on a system of two stochastic differential equations with slow and fast time scales
(39)  
(40) 
where , are independent standard Wiener processes. Under suitable assumptions on , (see E, Liu, and VandenEijnden (2005); Freidlin and Wentzell (2012)) the dynamics with held fixed has a unique invariant measure and as we have the effective dynamics for the process given by
(41) 
The drift is given by the averaging principle
In this example we choose the component as the coarse variable and compute an approximation of the drift for the coarsegrained (projected) process . The averaging principle suggests that as .
For the specific choice
we have that where is the normalizing constant and thus which yields the effective dynamics in the limit . In the computational benchmarks we thus compare stationary processes resulting as in solving
(42)  
(43)  
(44) 
The proposed method for approximating the coarsegrained dynamics constructs an effective potential for a finite value . The set of interpolating polynomials has been chosen to be in this example. It is expected that as , the approximation approaches the averaged coefficient .
The simulation results depicted in Figure 3 demonstrate that in the case of sufficient timescale separation, , the coarsegrained model well approximates the invariant distribution as well as the autocorrelation function for stationary dynamics of the component . Furthermore, the drift of the coarsegrained dynamics deviates by a small error (see Figure 5) from the drift of the averaged model. On the other hand, for a larger value , where the averaging principle does not apply, the proposed method yields a coarsegrained model (an “effective dynamics”) which still approximates reasonably well the invariant distribution. However, the dynamics of the stationary process does not approximate properly the stationary behaviour of the slow component of the original process, as demonstrated in Figure 4 by comparison of autocorrelation functions.