# Escorted free energy simulations

## Abstract

We describe a strategy to improve the efficiency of free energy estimates by reducing dissipation in nonequilibrium Monte Carlo simulations. This strategy generalizes the targeted free energy perturbation approach [Phys. Rev. E. 65, 046122, 2002] to nonequilibrium switching simulations, and involves generating artificial, “escorted” trajectories by coupling the evolution of the system to updates in external work parameter. Our central results are: (1) a generalized fluctuation theorem for the escorted trajectories, and (2) estimators for the free energy difference in terms of these trajectories. We illustrate the method and its effectiveness on model systems.

## I Introduction

The computation of free energy differences is an essential component of computer studies of biological, chemical, and molecular processes, with applications to topics such as phase coexistence and phase equilibria, ligand binding events, and solvation of small molecules (1); (2). Given the importance of free energy calculations in computational thermodynamics, there is a need for robust, efficient and accurate methods to estimate free energy differences.

In a standard formulation of the free energy estimation problem, we consider two equilibrium states of a system, corresponding to the same temperature but different values of an external parameter, , and we are interested in the free energy difference between the two states, . While many widely used free energy estimation methods, such as thermodynamic integration and free energy perturbation rely on equilibrium sampling, there has been considerable interest in methods for estimating that make use of nonequilibrium simulations (1); (2). In the most direct implementation of this approach, a number of independent simulations are performed in which the external parameter is varied at a finite rate from to , with initial conditions sampled from the equilibrium state . The free energy difference can then be estimated using the nonequilibrium work relation (3); (4)

(1) |

where denotes the work performed on the system during a particular realization (i.e. simulation) of the process, angular brackets denote an average over the realizations of the process and . In principle, this approach allows one to compute from trajectories of arbitrarily short duration. However, the number of realizations required to obtain a reliable estimate of grows rapidly with the dissipation, , that accompanies fast switching simulations (5); (6); (7) . The dissipation is positive as a consequence of the second law of thermodynamics, and reflects the lag that builds up as the system pursues – but is unable to keep pace with – the equilibrium distribution corresponding to the continuously changing parameter (8); (9); (10); (11). This idea is illustrated schematically in Fig 1.

In Ref. (12), we described a strategy to improve the efficiency of free energy estimates obtained with nonequilibrium molecular-dynamics simulations. This strategy involved adding non-physical terms to the equations of motion, to reduce the lag and therefore the dissipation. As illustrated in Ref. (12) using a simple model system, when these terms successfully “escorted” the system through a near-equilibrium sequence of states, the convergence of the free energy estimate improved dramatically. In the present paper we extend these results to simulations evolving according to Monte Carlo dynamics. We then show that the escorted trajectories satisfy a fluctuation theorem, and we discuss and illustrate the application of this result to the estimation of free energy differences.

In Section II we introduce escorted nonequilibrium switching simulations for systems evolving according to Monte Carlo dynamics. The approach we take here is motivated by previous work (12); (13); (14); (15) and involves generating artificial, or “escorted”, trajectories, Eq. 10, by modifying the dynamics with terms that directly couple the evolution of the system to changes in the external parameter. The central result of this section is an identity for in terms of these escorted trajectories, Eq. 19. In Section III we extend this result by showing that these trajectories satisfy a fluctuation relation analogous to Crooks’s fluctuation relation (16); (17); (18). This in turn allows us to combine our approach with Bennett’s acceptance ratio method (19) which provides an optimal, asymptotically unbiased estimator, Eq. 38, for (20). In Section IV, we show that while Eqs. 19 and 38 are identities for all escorted simulations, they are particularly effective as estimators of when the modified dynamics successfully reduce the lag described above. In particular, if these terms eliminate the lag entirely, then Eqs. 19 and 38 provide perfect (zero variance) estimators: W = for every realization. Finally in Section V, we illustrate the effectiveness of our approach on two model systems.

## Ii Escorted nonequilibrium simulations

Consider a system whose energy is given by a classical hamiltonian, , where denotes a microstate, that is a point in the -dimensional configuration space of the system, ^{1}

(2) |

with the free energy . We wish to compute the free energy difference between two equilibrium states at the same temperature, , but different values of the work parameter, .

To estimate the value of , we assume we have at our disposal a discrete-time Monte Carlo algorithm, parametrized by the value of and defined by the transition probability : if represents the microstate of the system at one time step, then the next microstate is sampled randomly from . We assume this algorithm satisfies the conditions of detailed balance,

(3) |

and ergodicity (22). Routinely used Monte Carlo schemes such as the Metropolis algorithm (1) satisfy these conditions. Eq. 3 implies the somewhat weaker condition of balance,

(4) |

which we will use in the analysis below. With this Monte Carlo algorithm in place, we first describe a standard procedure for estimating using nonequilibrium simulations, Eqs. 5-9 below, and then we introduce our modified version of this approach.

Imagine a process in which the system is initially prepared in equilibrium, at and temperature , and then the system evolves under the Monte Carlo dynamics described above, as the value of is switched from to in steps according to some pre-determined protocol. This evolution generates a trajectory that can be represented in more detail using the notation

(5) |

Here, the symbol denotes an update in the value of , with the microstate held fixed, while denotes a Monte Carlo step at fixed , e.g. the microstate is sampled from the distribution . Moreover,

(6) |

and the initial point is sampled from .

Because it is specified by the sequence of microstates , the trajectory can be viewed as a point in a -dimensional trajectory space, with . For the process described in the previous paragraph, the probability density for generating this trajectory is

(7) |

where the factors in this equation (read from right to left) correspond to the symbols in Eq. 5 (read from left to right). The work performed on the system during this process is the sum of energy changes due to updates in , (23); (24); (4); (17)

(8) |

Using Eqs. 3, 7 and 8, we arrive at the nonequilibrium work relation for Monte Carlo dynamics (4); (17)

(9) |

Thus we can estimate by repeatedly performing simulations to generate trajectories of the sort described by Eq. 5, computing the work associated with each trajectory, Eq. 8, and finally constructing the exponential average, Eq. 9. As mentioned in the Introduction, however, this average converges poorly when the process is highly dissipative.

To address the issue of poor convergence, let us now assume that for every integer , we have a deterministic function that takes any point in configuration space and maps it to a point . We assume that each of these functions is invertible ( exists), but otherwise the functions are arbitrary. These ’s then constitute a set of bijective mappings, which we use to modify the procedure for generating trajectories, as follows. When the value of the work parameter is switched from to , the configuration space coordinates are simultaneously subjected to the mapping . Eq. 5 then becomes

(10) |

where

(11) |

as indicated by the notation . (As before, the symbol denotes a Monte Carlo move at fixed .) The bijective maps effectively escort the system by directly coupling increments in to changes in the microstate. This is similar to the “metric scaling” approach introduced by Miller and Reinhardt (15), in which each update in is accompanied by a linear scaling of coordinates; however, in the present paper we do not assume the ’s are linear in .

In the escorted trajectory (Eq. 10), the system visits a sequence of points in configuration space: the “primary” microstates , alternating with the “secondary” microstates . Since each is uniquely determined from (Eq. 11), the sequence of primary microstates fully specifies the trajectory; that is, trajectory space remains -dimensional, with . The probability density for generating a trajectory is given by the following modification of Eq. 7:

(12) |

Taking a cue from Refs (13); (15), let us now define

(13) |

where is the Jacobian associated with the map . Averaging over the ensemble of trajectories, we have

(14) |

To evaluate this expression, we first identify all factors in the integrand that do not depend on or , and we pull these outside the innermost integral, , which gives us (for that integral):

(15) | |||||

(16) | |||||

(17) |

We have used Eq. 13 to get to the second line, followed by a change in the variables of integration to get to the third line, , and we have invoked Eq. 4 to arrive at the final result. This process can be repeated for the integrals to , which brings us to:

(18) | |||||

and therefore

(19) |

Eq. 19 is an identity for in terms of escorted trajectories, generated as per Eq. 10. For the special case in which each mapping is the identity, , we recover the usual scheme, Eq. 5, and then Eq. 19 reduces to the nonequilibrium work relation, Eq. 9. Following Miller and Reinhardt (15), we will find it convenient to interpret as the work done during the switching process and simply denote it by . As we will discuss in Section IV below, when the mappings are chosen so as to reduce the dynamic lag illustrated in Fig. 1, then the efficiency of the estimate of improves, often dramatically.

## Iii Fluctuation Theorem

Let us now consider not only the switching process described by Eq. 10, which we will henceforth designate the forward process, but also its time-reversed analogue, the reverse process. In the reverse process, the system is prepared in equilibrium at and temperature . The work parameter is then switched to in steps, following a sequence that is the reversal of the protocol used during the forward process:

(20) |

During the reverse process, changes in are coupled to the system’s evolution through the inverse mapping functions, , generating a trajectory

(21) |

where , and the initial state is sampled from . The direction of the arrows indicates the progression of time. The probability density for obtaining a trajectory is

(22) |

with . Following Eq. 13, the work performed during this process is

(23) |

where is the Jacobian for the mapping . Here and below we use the subscripts and to specify the forward and reverse processes, respectively.

We will now show that the work distributions corresponding to these two processes satisfy Crooks’s fluctuation relation, (16); (17); (18) namely

(24) |

where

(25) |

denotes the distribution of work values for the forward process, and is similarly defined for the reverse process.

To establish this result, consider a conjugate pair of trajectories, and , related by time-reversal. Specifically, if is a trajectory generated during the forward process, that visits the sequence of microstates

(26) |

then its conjugate twin, , generated during the reverse process, visits the same microstates, in reverse order:

(27) |

that is and (see Eq. 21). Note that the primary microstates of are the secondary microstates of , and vice-versa, and the work function is odd under time-reversal:

(28) |

We wish to evaluate the quantity

(29) |

with given by Eq. 12. To this end, we first decompose as follows:

(30) |

where

(31a) | |||||

(31b) | |||||

(31c) |

Here is the total change in the energy of the system as it evolves along the trajectory , can be interpreted as the heat transfered to the system from the reservoir (15), and is an entropy-like term, which arises because the mappings need not preserve volume. The quantities defined in Eq. 31 satisfy the properties

(32a) | |||||

(32b) |

where we have used Eqs. 2 and 3. These properties then give us

(33) |

hence

(34) |

Substituting this result into the integrand on the right side of Eq. 29, then changing the variables of integration from to , and invoking Eq. 28, we finally arrive at the result we set out to establish:

(35) |

Eq. 35 in turn implies that the average of any function over work values generated in the forward process, can be related to an average over work values obtained in the reverse process: (16)

(36) |

In principle, this result can be used with any to estimate . The problem of determining the optimal choice of was solved by Bennett in the context of equilibrium sampling, (19) and this solution can be applied directly to the nonequilibrium setting. (16); (20) Specifically, if we have work values from the forward simulation, and work values from the reverse simulation, then the optimal choice is

(37) |

where . The value of is then estimated by recursively solving the equation,

(38) |

as described in detail in Ref. (19). This procedure for estimating is known as Bennett’s Acceptance Ratio method (BAR).

## Iv Computational efficiency and figures of merit

While Eqs. 19 and 38 are valid for any set of invertible mapping functions, , the efficiency of using escorted simulations to estimate depends strongly on the choice of these functions. Since the convergence of exponential averages such as Eq. 19 deteriorates rapidly with dissipation (5); (6); (7), which in turn correlates with the lag illustrated in Fig. (1), it is reasonable to speculate that a choice of mappings that decreases the lag will improve the convergence of estimator (Eq. 19).

To pursue this idea, let us first consider the extreme case of a set of mapping functions that entirely eliminates the lag. By this we mean the following: for an ensemble of trajectories generated using Eq. 10, with sampled from , the subsequent microstates are distributed according to , for all . That is, the shaded and unshaded ovals coincide in Fig. (1). This occurs if under the bijective mapping , the equilibrium distribution transforms to the distribution (13), in other words

(39) |

[Under a bijective map , a distribution is transformed to the distribution , where .] When all the ’s satisfy this condition, we will say that the set of mappings is perfect. Using , and taking the logarithm of both sides of Eq. 39, we obtain (for a perfect set of mappings)

(40) |

hence for every trajectory (Eq. 13). Thus for a perfect set of mappings we have , and Eq. 19 provides a zero-variance estimate of the free energy difference. It is straightforward to show that if the ’s form a set of perfect mappings for the forward process, then the ’s form a set of perfect mappings for the reverse process, and .

The considerations of the previous paragraph support the idea that reducing lag improves convergence. While we generally cannot expect to be able to construct a perfect set of mapping functions (this is likely to be far more difficult than the original problem of estimating ! (12)), in many cases it might be possible to use either intuition or prior information about a system to construct a set of ’s that reduce the lag substantially. In such cases the dissipation accompanying the escorted simulations is less than that for the unescorted simulations, leading to improved convergence of the free energy estimate.

As an example of a strategy that can be used to construct good mappings, consider a system of identical, mutually interacting particles, in an external potential :

(41) |

The probability distribution of a single, tagged particle is then given by the single-particle density

(42) |

where specifies the coordinates of the tagged particle as a function of the microstate . Now consider a reference system of non-interacting particles, described by a Hamiltonian

(43) |

with a similarly defined single-particle density ; and imagine that is chosen so that these single-particle densities are identical or nearly identical: . In this case a set of mappings that are perfect or near-perfect for the reference system (), might be quite effective in reducing lag in the original system (). We will illustrate this mean-field-like approach in Section V.2, and we note that a similar strategy was explored by Hahn and Then in the context of targeted free energy perturbation (14).

It will be useful to develop a figure of merit, allowing us to compare the efficiency of our method for different sets of mappings. One approach would be simply to compare the error bars associated with the statistical fluctuations in the respective free energy estimates. Unfortunately, estimates of obtained from convex nonlinear averages such as the one obtained from Eq. 19, are systematically biased for any finite number of realizations (7); (25). This bias can be large, and as a result the statistical error bars by themselves might not be sufficiently reliable to quantify the efficiency of the mapping. In the following paragraphs we discuss alternative figures of merit.

We begin by noting that when the unidirectional estimator, Eq. 19, is used in conjunction with simulations of the forward process, then the number of realizations () required to obtain a reliable estimate of is roughly given by (6); (5)

(44) |

where is the dissipation accompanying the reverse process. While this provides some intuition for the convergence of Eq. 19, its usefulness as a figure of merit is somewhat limited as it requires simulations of both the forward and the reverse processes, and in that case we are better off using a bidirectional estimator such as Eq. 38.

When we do have simulations of both processes, then an easily computed figure of merit is the hysteresis, . The value of this quantity is zero if the mappings are perfect, otherwise it is positive. It is interesting to note that the hysteresis can be related to an information-theoretic measure of overlap between the forward and reverse work distributions and : (26)

(45) |

Here denotes the relative entropy between the distributions and , and the symmetrized quantity (also known as the Jeffreys divergence (27)) provides a measure of the difference, or more precisely the lack of overlap, between the distributions. The right side of Eq. 45 can be estimated from a modest sample of forward and reverse simulations. If a set of mappings reduces the hysteresis, , then this indicates increased overlap between the work distributions, and therefore improved convergence (6).

When , the mean square error of the Bennett estimator is (20); (14); (19); (28)

(46) |

Here denotes the estimate of obtained from Eq. 38, and

(47) |

(This result can be generalized to the case (14).) As discussed by Bennett (19) and Hahn and Then (14); (28), the value of measures the overlap between and , and provides a rough figure of merit for the Bennett estimator. When lag is eliminated and the two distributions coincide, then attains its maximum value, , whereas when there is poor overlap, . Thus we expect that the higher the value of the overlap function , the smaller the number of realizations required to estimate from Eq. 38 with a prescribed accuracy. Indeed, Eq. 46 suggests a lower bound on the number of realizations needed to achieve a mean square error less than : . Note that since is an ensemble average (Eq. 47), it can readily be estimated from available simulation data.

In the Appendix, we derive an upper bound on the number of realizations needed to obtain a reliable estimate of using Bennett’s method, (Eq. 62). Combining these bounds gives us

(48) |

While Eq. 48 cannot be used to obtain a good estimate for ^{2}

## V Examples

### v.1 Cavity Expansion

As a first example, we estimate the free energy cost associated with growing a hard-sphere solute in a fluid. Consider a system composed of point particles inside a cubic container of volume , centered at the origin with periodic boundaries. The particles are excluded from a spherical region of radius , also centered at the origin. The particles interact with one another via the WCA pairwise interaction potential (1) which is denoted by . The energy of the system at a microstate is given by

(49) |

where whenever for all , that is when there are no particles inside the spherical cavity; and otherwise. The function ensures that particles are excluded from the spherical region around the origin. We wish to compute the free energy cost, , associated with increasing the radius of the cavity from to (See Fig. 2).

A hypothetical estimate of using unescorted nonequilibrium simulations (Eq. 5) involves “growing out” the spherical cavity in discrete increments, as follows. Starting with a microstate sampled from equilibrium at , the radius of the sphere is increased by an amount . If all fluid particles remain outside the enlarged sphere, then ; but if one or more particles now finds itself inside the sphere () then . One or more Monte Carlo steps are then taken, after which the radius is again increased by some amount, , and is determined in the same fashion as . In principle this continues until the radius of the sphere is , and then the work is tallied for the entire trajectory: . In practice the trajectory can be terminated as soon as at some step , since this implies . For this procedure, Eq. 1 can be rewritten as

(50) |

where is the probability of generating a trajectory for which ; that is, a trajectory in which the sphere is successfully grown out to radius , without overtaking any fluid particles along the way. The quantity is estimated directly, by generating a number of trajectories and counting the “successes” (). For a sufficiently dense fluid, however, a successful trajectory is a rare event (), and this approach converges poorly. Note also that this approach does not give the correct free energy difference in the reverse case of a shrinking sphere (from to ), since for every trajectory in that situation.

For the hypothetical procedure just described, Eq. 50 implies that the probability to generate a successful trajectory does not depend on the number of increments used to grow the cavity from to . Therefore the most computationally efficient implementation is to grow the sphere out in a single step, which corresponds to the free energy perturbation method (FEP) (2); (1). In this case is just the probability to observe no particles in the region , for an equilibrium simulation at cavity radius .

To improve convergence by means of escorted simulations (Eq. 10), we constructed mapping functions that move the fluid particles out of the way of the growing sphere, to prevent infinite values of . Specifically, as the cavity radius is increased from to , the location of the particle, , is mapped to , where (13)

(51) |

and if .
The notation denotes a single-particle mapping;
the full mapping is obtained by applying to all fluid particles.
To picture the effect of this mapping, let denote the region of space defined by the conditions ,
that is a spherical shell of inner radius and outer radius (just touching the sides of the cubic container).
Under the mapping
,
the shell is compressed uniformly onto the shell , leaving the eight corners of the box untouched. ^{3}

In this manner, the particles that would otherwise have found themselves inside the enlarged sphere are pushed outside of it, resulting in a finite contribution to the work (Eq. 13),

(52) |

where is the number of particles found within the shell (before the mapping is applied), and is the ratio of shell volumes, . The first term on the right side of Eq. 52 gives the net change in the energy of the system associated with the escorted switch , while the second is the Jacobian term .

Unlike the unescorted approach or free energy perturbation, the escorted approach with the mapping given by Eq. 51 is applicable in both the forward (growing spherical cavity) and reverse (shrinking cavity) directions. In the reverse direction, as the solute radius is decreased from to , the shell is uniformly expanded onto the shell . The corresponding increment in work is given by a formula similar to Eq. 52. As a result, one can combine work values from forward and reverse escorted simulations using Bennett’s Acceptance Ratio (BAR), Eq. 38.

22.288 0.012 | |

-14.458 0.013 | |

7.830 0.018 | |

C |

We have performed both forward and reverse simulations of this system using WCA particles, with , ,