A characteristic of Bennett’s acceptance ratio method

A characteristic of Bennett’s acceptance ratio method

A.M. Hahn Current address: Technische Universität Berlin, Institut für Theoretische Physik, 10623 Berlin, Germany Institut für Physik, Carl von Ossietzky Universität, 26111 Oldenburg, Germany    H. Then Institut für Physik, Carl von Ossietzky Universität, 26111 Oldenburg, Germany

A powerful and well-established tool for free-energy estimation is Bennett’s acceptance ratio method. Central properties of this estimator, which employs samples of work values of a forward and its time reversed process, are known: for given sets of measured work values, it results in the best estimate of the free-energy difference in the large sample limit. Here we state and prove a further characteristic of the acceptance ratio method: the convexity of its mean square error. As a two-sided estimator, it depends on the ratio of the numbers of forward and reverse work values used. Convexity of its mean square error immediately implies that there exists an unique optimal ratio for which the error becomes minimal. Further, it yields insight into the relation of the acceptance ratio method and estimators based on the Jarzynski equation. As an application, we study the performance of a dynamic strategy of sampling forward and reverse work values.

fluctuation theorem, nonequilibrium thermodynamics
05.40.-a, 05.70.Ln

I Introduction

A quantity of central interest in thermodynamics and statistical physics is the (Helmholtz) free-energy, as it determines the equilibrium properties of the system under consideration. In practical applications, e.g. drug design, molecular association, thermodynamic stability, and binding affinity, it is usually sufficient to know free-energy differences. As recent progress in statistical physics has shown, free-energy differences, which refer to equilibrium, can be determined via non-equilibrium processes Jarzynski1997 (); Crooks1999 ().

Typically, free-energy differences are beyond the scope of analytic computations and one needs to measure them experimentally or compute them numerically. Highly efficient methods have been developed in order to estimate free-energy differences precisely, including thermodynamic integration Kirkwood1935 (); Gelman1998 (), free-energy perturbation Zwanzig1954 (), umbrella sampling Torrie1977 (); Chen1997 (); Oberhofer2008 (), adiabatic switching Watanabe1990 (), dynamic methods Sun2003 (); Ytreberg2004 (); Jarzynski2006 (), asymptotics of work distributions vonEgan-Krieger2008 (), optimal protocols Then2008 (), targeted and escorted free-energy perturbation Meng2002 (); Jarzynski2002 (); Oberhofer2007 (); Vaikuntanathan2008 (); Hahn2009 ().

A powerful Meng1996 (); Kong2003 (); Shirts2008 () and frequently Ceperley1995 (); Frenkel2002 (); Collin2005 () used method for free-energy determination is two-sided estimation, i.e. Bennett’s acceptance ratio method Bennett1976 (), which employs a sample of work values of a driven nonequilibrium process together with a sample of work values of the time-reversed process Crooks2000 ().

The performance of two-sided free-energy estimation depends on the ratio


of the number of forward and reverse work values used. Think of an experimenter who wishes to estimate the free-energy difference with Bennett’s acceptance ratio method and has the possibility to generate forward as well as reverse work values. The capabilities of the experiment give rise to an obvious question: if the total amount of draws is intended to be , which is the optimal choice of partitioning into the numbers of forward and of reverse work values, or equivalently, what is the optimal choice of the ratio ? The problem is to determine the value of that minimizes the (asymptotic) mean square error of Bennett’s estimator when is held constant.

While known since Bennett Bennett1976 (), the optimal ratio is underutilized in the literature. Bennett himself proposed to use a suboptimal equal time strategy, instead, because his estimator for the optimal ratio converges too slowly in order to be practicable. Even questions as fundamental as the existence and uniqueness are unanswered in the literature. Moreover, it is not always clear a priori whether two-sided free-energy estimation is better than one-sided exponential work averaging. For instance, Shirts et al. have presented a physical example where it is optimal to draw work values from only one direction Shirts2005 ().

The paper is organized as follows: in Secs. II and III we rederive two-sided free-energy estimation and the optimal ratio. We also remind that two-sided estimation comprises one-sided exponential work averaging as limiting cases for , a result that is also true for the mean square errors of the corresponding estimators.

The central result is stated in Sec. IV: the asymptotic mean square error of two-sided estimation is convex in the fraction of forward work values used. This fundamental characteristic immediately implies that the optimal ratio exists and is unique. Moreover, it explains the generic superiority of two-sided estimation if compared with one-sided, as found in many applications.

To overcome the slow convergence of Bennett’s estimator of the optimal ratio, which is based on estimating second moments, in Sec. V we transform the problem into another form such that the corresponding estimator is entirely based on first moments, which enhances the convergence enormously.

As an application, in Sec. VII we present a dynamic strategy of sampling forward and reverse work values that maximizes the efficiency of two-sided free-energy estimation.

Ii Two-sided free-energy estimation

Given a pair of samples of forward and reverse work values drawn from the probability densities and of forward and reverse work values and provided the latter are related to each other via the fluctuation theorem Crooks1999 (),


Bennett’s acceptance ratio method Bennett1976 (); Meng1996 (); Crooks2000 (); Shirts2003 () is known to give the optimal estimate of the free-energy difference in the limit of large sample sizes. Throughout the paper, and are understood to be measured in units of the thermal energy . The normalized probability densities and are assumed to have the same support , and we choose the following sign convention: and .

Figure 1: (Color online) The overlap density bridges the densities and of forward and reverse work values, respectively. is the fraction of forward work values, here schematically shown for , , and  . The accuracy of two-sided free-energy estimates depends on how good is sampled when drawing from and .

Now define a normalized density with


, where is a real number and


The normalization constant is given by


The density is a normalized harmonic mean of and , , and thus bridges between and , see Fig. 1. In the limit , converges to the forward work density , and conversely for it converges to the reverse density . As a consequence of the inequality of the harmonic and arithmetic mean, , is bounded from above by unity,


. Except for and , the equality holds if and only if . Using the fluctuation theorem (2), can be written as an average in and ,


where the angular brackets with subscript denote an ensemble average with respect to , i.e.


for an arbitrary function .

In setting , Eq. (7) reduces to the nonequilibrium work relation Jarzynski1997 ()


in the forward direction, and conversely with we obtain the nonequilibrium work relation in the reverse direction,


The last two relations can, of course, be obtained more directly from the fluctuation theorem (2). An important application of these relations is the one-sided free-energy estimation: Given a sample of forward work values drawn from , Eq. (9) is commonly used to define the forward estimate of with


Conversely, given a sample of reverse work values drawn from , Eq. (10) suggests the definition of the reverse estimate of ,


If we have drawn both, a sample of forward and a sample of reverse work values, then Eq. (7) can serve us to define a two-sided estimate of by replacing the ensemble averages with sample averages:


is understood to be the unique root of Eq. (13), which exists for any . Different values of result in different estimates for . Choosing


, the estimate (13) coincides with Bennett’s optimal estimate, which defines the two-sided estimate with least asymptotic mean square error for a given value , or equivalently, for a given ratio Bennett1976 (); Meng1996 (). We denote the optimal two-sided estimate, i.e. the solution of Eq. (13) under the constraint (14), by and simply refer to it as the two-sided estimate. Note that the optimal estimator can be written in the familiar form


In the limit the two-sided estimate reduces to the one-sided forward estimate (11), , and conversely . Thus the one-sided estimates are the optimal estimates if we have given draws from only one of the densities or .

A characteristic quantity to express the performance of the estimate is the mean square error,


which depends on the total sample size and the fraction . Here, the average is understood to be an ensemble average in the value distribution of the estimate for fixed and . In the limit of large and , the asymptotic mean square error (which then equals the variance) can be written Bennett1976 (); Meng1996 ()


Provided the r.h.s. of Eq. (17) exists, which is guaranteed for any , the -dependence of is simply given by the usual -factor, whereas the -dependence is determined by the function given in Eq. (5). Note that if a two-sided estimate is calculated, then essentially the normalizing constant is estimated from two sides, and , cf. Eqs. (7) and (13). With an estimate we therefore always have an estimate of the mean square error at hand. However, the reliability of the latter naturally depends on the degree of convergence of the estimate . The convergence of the two-sided estimate can be checked with the convergence measure introduced in Ref. Hahn2009 ().

In the limits and , respectively, the asymptotic mean square error of the two-sided estimator converges to the asymptotic mean square error of the appropriate one-sided estimator Gore2003 (),




where denotes the variance operator with respect to the density , i.e.


for an arbitrary function and .

Iii The optimal ratio

Now we focus on the question raised in the introduction: Which value of in the range minimizes the mean square error (17) when the total sample size, , is held fixed?

Let be the rescaled asymptotic mean square error given by


which is a function of only. Assuming , a necessary condition for a minimum of is that the derivative of vanishes at . Before calculating explicitly, it is beneficial to rewrite by using the identity


Subtracting from Eq. (22) and recalling the definition (3) of , one obtains


where the functions are defined as


and describe the relative fluctuations of the quantities that are averaged in the two-sided estimation of , cf. Eq. (13).

With the use of formula (23), can be written


and the derivative yields


The derivatives of the -functions involve the first two derivatives of , which will thus be computed first:




From this equation it is clear that is convex in , , with a unique minimum in (as ). We can rewrite the -functions with and as follows:


Differentiating these expressions gives


and are monotonically increasing and decreasing, respectively. This immediately follows from writing the term occurring in the brackets of Eqs. (III) as a variance in the density ,


which is thus positive.

As a consequence of Eq. (III), the relation


holds and reduces to


The derivatives of the -functions do not contribute to due to the fact that the special form of the two-sided estimator (13) originates from minimizing the asymptotic mean square error, cf. Bennett1976 (). The necessary condition for a local minimum of at , , now reads


where is introduced. Using Eqs. (III) and (2), the condition (34) results in


This means, the optimal ratio is such that the variances of the random functions which are averaged in the two-sided estimation (15) are equal. However, the existence of a solution of is not guaranteed in general.

Writing Eq. (35) in the form


prevents the equation from becoming a tautology.

Iv Convexity of the mean square error


The asymptotic mean square error is convex in .

In order to prove the convexity, we introduce the operator which is defined for an arbitrary function by


is positive semidefinite, i.e.


For and , the equality holds if and only if .

Proof of the Lemma..

Let , . Then


which is clearly positive. Provided and , the integrand in the last line is zero if and only if . This completes the proof of the Lemma. ∎

Proof of the Theorem..

Consulting Eqs. (33) and (32), the second derivative of reads


Expressing and in center- and relative “coordinates” and , respectively, gives


Therefore, , which is positive according to the Lemma. ∎

The convexity of the mean square error is a fundamental characteristic of Bennett’s acceptance ratio method. This characteristic allows us to state a simple criterion for the existence of a local minimum of the mean square error in terms of its derivatives at the boundaries. Namely, if


is negative and


is positive there exists a local minimum of for . Otherwise, no local minimum exists and the global minimum is found on the boundaries of : if , the global minimum is found for , thus it is optimal to measure work values in the reverse direction only and to use the one-sided reverse estimator (12). Else, if , the global minimum is found for , implying the one-sided forward estimator (11) to be optimal.

In addition, the convexity of the mean square error proves the existence and uniqueness of the optimal ratio, since a convex function has a global minimum on a closed interval.


If a solution of exists, it is unique and attains its global minimum () there.

V Estimating the optimal ratio with first moments

In situations of practical interest the optimal ratio is not available a priori. Thus, we are going to estimate the optimal ratio. There exist estimators of the optimal ratio since Bennett. In addition we have just proven that the optimal ratio exists and is unique. However there is still one obstacle to overcome. Yet, all expressions for estimating the optimal ratio are based on second moments, see e.g. Eq. (35). Due to convergence issues, it is not practicable to base any estimator on expressions that involve second moments. The estimator would converge far too slowly. For this reason, we transform the problem into a form that employs first moments, only.

Assume we have given and work values in forward and reverse direction, respectively, and want to estimate , with . According to Eq. (7) we can estimate the overlap measure by using draws from the forward direction,


where equals and for the best available estimate of is inserted, i.e. the two-sided estimate based on the work values. Similarly, we can estimate the overlap measure by using draws from the reverse direction,


Since in general draws from both directions are available, it is reasonable to take an arithmetic mean of both estimates


where the weighting is chosen such that the better estimate, or , contributes stronger: with increasing the estimate becomes more reliable, as is the normalizing constant of the bridging density , Eq. (3), and ; and conversely for decreasing .

From the estimate of the overlap measure we can estimate the rescaled mean square error by


for all , a result that is entirely based on first moments. The infimum of finally results in an estimate of the optimal choice of ,


When searching for the infimum, we also take


into account which follow from a series expansion of Eq. (47) in at and , respectively.

Vi Incorporating costs

The costs of measuring a work value in forward direction may differ from the costs of measuring a work value in reverse direction. The influence of costs on the optimal ratio of sample sizes is investigated here.

Different costs can be due to a direction dependent effort of experimental or computational measurement of work (unfolding a RNA may be much easier than folding it). We assume the work values to be uncorrelated, which is essential for the validity of the theory presented in this paper. Thus, a source of nonequal costs, which arises especially when work values are obtained via computer simulations, is the difference in the strength of correlations of consecutive Monte-Carlo steps in forward and reverse direction. To achieve uncorrelated draws, the “correlation-lengths” or “correlation-times” have to be determined within the simulation, too. However, this is advisable in any case of two-sided estimation, independent of the sampling strategy.

Let and be the costs of drawing a single forward and reverse work value, respectively. Our goal is to minimize the mean square error while keeping the total costs constant. Keeping constant results in


which in turn yields


If a local minimum exists, it results from which leads to


a result Bennett was already aware of Bennett1976 (). However, based on second moments, it was not possible to estimate the optimal ratio accurately and reliably. Hence, Bennett proposed to use a suboptimal equal time strategy or equal cost strategy, which spends an equal amount of expenses to both directions, i.e.  or


where is the equal cost choice for . This choice is motivated by the following result


which states that the asymptotic mean square error of the equal cost strategy is at most sub-optimal by a factor of Bennett1976 (). Note however that the equal cost strategy can be far more sub-optimal if the asymptotic limit of large sample sizes is not reached.

Since we can base the estimator for the optimal ratio on first moments, see Sec. V, we propose a dynamic strategy that performs better than the equal cost strategy. The infimum of


results in the estimate of the optimal choice of ,


We remark that opposed to , is not necessarily convex. However, a global minimum clearly exists and can be estimated.

Vii A dynamic sampling strategy

Suppose we want to estimate the free-energy difference with the acceptance ratio method, but have a limit on the total amount of expenses that can be spend for measurements of work. In order to maximize the efficiency, the measurements are to be performed such that finally equals the optimal fraction of forward measurements.

The dynamic strategy is as follows:

  1. In absence of preknowledge on , we start with Bennett’s equal cost strategy (53) as an initial guess of .

  2. After drawing a small number of work values we make preliminary estimates of the free-energy difference, the mean square error, and the optimal fraction .

  3. Depending on whether the estimated rescaled mean square error is convex, which is a necessary condition for convergence, our algorithm updates the estimate of .

  4. Further work values are drawn such that dynamically follows , while is updated repeatedly.

There is no need to update after each individual draw. Splitting the total costs into a sequence , not necessarily equidistant, we can predefine when and how often an update in is made. Namely, this is done whenever the actually spent costs reach the next value of the sequence.

The dynamic strategy can be cast into an algorithm.


Set the initial values , . In the -th step of the iteration, , determine




where means rounding to the next lower integer. Then, additional forward and additional reverse work values are drawn. Using the entire present samples, an estimate of is calculated according to Eq. (13). With the free-energy estimate at hand, is calculated for all values of via Eqs. (44)–(47) and (49), discretized, say in steps  . If is convex, we update the recent estimate of to via Eqs. (55) and (56). Otherwise, if is not convex, the corresponding estimate of is not yet reliable and we keep the recent value, . Increasing by one, we iteratively continue with Eq. (57) until we finally obtain which is the optimal estimate of the free-energy difference after having spend all costs .

Note that an update in may result in negative values of or . Should happen to be negative, we set and


We proceed analogously, if happens to be negative.

The optimal fraction depends on the cost ratio , i.e. the algorithm needs to know the costs and . However, the costs are not always known in advance and may also vary over time. Think of a long time experiment which is subject to currency changes, inflation, terms of trade, innovations, and so on. Of advantage is that the dynamic sampling strategy is capable of incorporating varying costs. In each iteration step of the algorithm one just inserts the actual costs. If desired, the breakpoints may also be adapted to the actual costs. Should the costs initially be unknown (e.g. the “correlation-length” of a Monte-Carlo simulation needs to be determined within the simulation first) one may use any reasonable guess until the costs are known.

Viii An example

For illustration of results we choose exponential work distributions


, . According to the fluctuation theorem (2) we have and .

Figure 2: The main figure displays the exponential work densities (thick line) and (thin line) for the choice of and, according to the fluctuation theorem, . The inset displays the corresponding Boltzmann distributions (thick) and (thin) both for . Here, is set equal to arbitrarily, hence . The free-energy difference is  .

Exponential work densities arise in a natural way in the context of a two-dimensional harmonic oscillator with Boltzmann distribution , where is a normalizing constant (partition function) and Shirts2005 (). Drawing a point from the initial density , defined by setting , and switching the frequency to instantaneously amounts in the work . The probability density of observing a specific work value is given by the exponential density with . Switching the frequency in the reverse direction, , with the point drawn from with , the density of work (with interchanged sign) is given by with . The free-energy difference of the states characterized by and is the log-ratio of their normalizing constants, . A plot of the work densities for is enclosed in Fig. 2.

Now, with regard to free-energy estimation, is it better to use one- or two-sided estimators? In other words, we want to know whether the global minimum of is on the boundaries of or not. By the convexity of , the answer is determined by the signs of the derivatives and at the boundaries. The asymptotic mean square errors (18) and (19) of the one-sided estimators are calculated to be


for the forward direction and


for the reverse direction. For the variance of the reverse estimator diverges. Note that holds for all , i.e. forward estimation of is always superior if compared to reverse estimation. Furthermore, a straightforward calculation gives


where , and


and for . Thus, for the range we have as well as and therefore , i.e. the forward estimator is superior to any two-sided estimator (13) in this range. For we have and , specifying that , i.e. two-sided estimation with an appropriate choice of is optimal.

Figure 3: The overlap function and the rescaled asymptotic mean square error for . Note that diverges for .

Numerical calculation of the function and subsequent evaluation of allows to find the “exact” optimal fraction . Examples for and are plotted in Fig. 3.

Figure 4: The optimal fraction of forward work values for the two-sided estimation in dependence of the average forward work . For the one-sided forward estimator is optimal, i.e. .

The behavior of as a function of is quite interesting, see Fig. 4. We can interpret this behavior in terms of the Boltzmann distributions as follows. Without loss of generality, assume is fixed. Increasing then means increasing . The density is fully nested in , cf. the inset of Fig. 2 (remember that ) and converges to a delta-peak at the origin with increasing . This means that by sampling from we can obtain information about the full density quite easily, whereas sampling from provides only poor information about . This explains why holds for small values of . However, with increasing the density becomes so narrow that it becomes difficult to obtain draws from that fall into the main part of . Therefore, it is better to add some information from , hence, decreases. Increasing further, the relative number of draws needed from will decrease, as the density converges towards the delta distribution. Finally, it will become sufficient to make only one draw from in order to obtain the full information available. Therefore, converges towards in the limit .

Figure 5: (Color online) Example of a single run using the dynamic strategy: the optimal fraction of forward measurements for the two-sided free-energy estimation is estimated at predetermined values of total sample sizes of forward and reverse work values. Subsequently, taking into account the current actual fraction , additional work values are drawn such that we come closer to the estimated .
Figure 6: (Color online) Displayed are estimated mean square errors in dependence of for different sample sizes. The global minimum of the estimated function determines the estimate of the optimal fraction of forward work measurements.
Figure 7: (Color online) Comparison of a single run of free-energy estimation using the equal cost strategy versus a single run using the dynamic strategy. The errorbars are the square roots of the estimated mean square error .

In the following the dynamic strategy proposed in Sec. VII is applied. We choose and . The equal cost strategy draws according to which is used as initial value in the dynamic strategy. The results of a single run are presented in Figs. 57. Starting with , the estimate of is updated in steps of . The actual forward fractions together with the estimated values of the optimal fraction are shown in Fig. 5. The first three estimates of are rejected, because the estimated function is not yet convex. Therefore, remains unchanged at the beginning. Afterwards, follows the estimates of and starts to fluctuate about the “exact” value of . Some estimates of the function corresponding to this run are depicted in Fig. 6. For these estimates is discretized in steps  . Remarkably, the estimates of that result from these curves are quite accurate even for relatively small . Finally, Fig. 7 shows the free-energy estimates of the run (not for all values of ), compared with those of a single run where the equal cost strategy is used. We find some increase of accuracy when using the dynamic strategy.

Figure 8: (Color online) Averaged estimates from independent runs with dynamic strategy versus runs with equal cost strategy in dependence of the total cost spend. The cost ratio is , , and . The errorbars represent one standard deviation. Here, the initial value of in the dynamic strategy is , while the equal cost strategy draws with . We note that .
Figure 9: (Color online) Displayed are mean square errors of free-energy estimates using the same data as in Fig. 8. In addition, the mean square errors of estimates with constant are included, as well as the asymptotic behavior, Eq. (51). The inset shows that the mean square error of the dynamic strategy approaches the asymptotic optimum, whereas the equal cost strategy is suboptimal. Note that for small sample sizes the asymptotic behavior does not represent the actual mean square error.

In combination with a good a priori choice of the initial value of , the use of the dynamic strategy enables a superior convergence and precision of free-energy estimation, see Figs. 8 and 9. Due to insight into some particular system under consideration, it is not unusual that one has a priori knowledge which results in a better guess for the initial choice of in the dynamic strategy than starting with . For instance, a good initial choice is known when estimating the chemical potential via Widom’s particle insertion and deletion Widom1963 (). Namely, it is a priori clear that inserting particles yields much more information then deleting particles, since the phase-space which is accessible to particles in the “deletion-system” is effectively contained in the phase-space accessible to the particles in the “insertion-system”, cf. e.g. Hahn2009 (). A good a priori initial choice for may be with which the dynamic strategy outperforms any other strategy that the authors are aware of.

Once reaching the limit of large sample sizes, the dynamic strategy is insensitive to the initial choice of , since the strategy is robust and finds the optimal fraction of forward measurements itself.

Ix Conclusion

Two-sided free-energy estimation, i.e. the acceptance ratio method Bennett1976 (), employs samples of forward and reverse work measurements in the determination of free-energy differences in a statistically optimal manner. However, its statistical properties depend strongly on the ratio of work values used. As a central result we have proven the convexity of the asymptotic mean square error of two-sided free-energy estimation as a function of the fraction of forward work values used. From here follows immediately the existence and uniqueness of the optimal fraction which minimizes the asymptotic mean square error. This is of particular interest if we can control the value of , i.e. can make additional measurements of work in either direction. Drawing such that we finally reach , the efficiency of two-sided estimation can be enhanced considerably. Consequently, we have developed a dynamic sampling strategy which iteratively estimates and makes additional draws or measurements of work. Thereby, the convexity of the mean square error enters as a key criterion for the reliability of the estimates. For a simple example which allows to compare with analytic calculations, the dynamic strategy has shown to work perfectly.

In the asymptotic limit of large sample sizes the dynamic strategy is optimal and outperforms any other strategy. Nevertheless, in this limit it has to compete with the near optimal equal cost strategy of Bennett which also performs very good. It is worth mentioning that even if the latter comes close to the performance of ours, it is worthwhile the effort of using the dynamic strategy, since the underlying algorithm can be easily implemented and does cost quite anything if compared to the effort required for drawing additional work values.

Most important for experimental and numerical estimation of free-energy differences is the range of small and moderate sample sizes. For this relevant range, it is found that the dynamic strategy performs very good, too. It converges significantly better than the equal cost strategy. In particular, for small and moderate sample sizes it can improve the accuracy of free-energy estimates by half an order of magnitude.

We close our considerations by mentioning that the two-sided estimator is typically far superior with respect to one-sided estimators: assume the support and