# Alchemical Response Parameters from an Analytical Model of Molecular Binding

###### Abstract

We present a parameterized analytical model of alchemical molecular binding. The model describes accurately the free energy profiles of linear single-decoupling alchemical binding free energy calculations. The parameters of the model, which are physically motivated, are obtained by fitting model predictions to numerical simulations. The validity of the model has been assessed on a set of host-guest complexes. The model faithfully reproduces both the binding free energy profiles and the probability densities of the perturbation energy as a function of the alchemical progress parameter . The model offers a rationalization for the characteristic shape of the free energy profiles. The parameters obtained from the model are potentially useful descriptors of the association equilibrium of molecular complexes.

## 1 Introduction

The primary goal of a quantitative model of molecular binding is to provide an estimate of the standard free energy of binding, , or, equivalently, of the equilibrium constant, , for the association equilibrium , between two molecules and . For example, the binding of a drug molecule to a receptor. A brute-force molecular simulation approach to the calculation of the binding constant, based on following the motion of the ligand in and out of the receptor, is generally not feasible due to the long times between binding and unbinding events.[1] To overcome obstacles such as this, biased methods have been developed to accelerate the dynamics of association and obtain the free energy profile of ligand binding along pathways in and out of the receptor.[2, 3, 4, 5, 6, 7, 8, 9, 10]

Alchemical descriptions of the binding equilibrium provide an alternative to the study of physical binding/unbinding paths.[11, 12, 13, 14] The idea is that, because a free energy change depends only on the end states, the bound and unbound states of the molecular system can be connected by any thermodynamic path, whether physical or unphysical. In alchemical methods, the potential energy function is modified parametrically in a series of steps traced by a progress parameter to go from a description of the unbound state to that of the bound state. These methods effectively “grow” the ligand in place within the binding site. The field has a long history, [15, 16, 17, 18] but only relatively recently it has converged into an unified statistical thermodynamics theory of bimolecular binding. [19, 20, 21, 12] The double-decoupling method,[19, 11, 22] which is used to compute absolute binding free energies, is so called because it involves free energy calculations to decouple the ligand to an intermediate gas phase from the bound and solution states of the ligand. Free energy perturbation methods,[23, 24, 25, 26] are suitable for the analysis of relative binding, such as in ligand optimization.

We have developed an alchemical single-decoupling methodology, based on an implicit description of the solvent,[27] that enables the transfer of the ligand directly into the binding site rather than through multiple thermodynamic pathways.[28, 29, 30] Among other advantages, the single-decoupling approach leads naturally to a statistical representation of the equilibrium in terms of probability distributions of the binding energy. For example, it is possible to relate the binding free energy to the probability distribution, , of the the binding energy in the absence of receptor-ligand interactions.[12]

Analogously to approaches based on physical binding pathways, alchemical binding free energy calculations yield free energy profiles along the thermodynamic transformations. Alchemical free energy profiles are functions of the alchemical charging parameter , rather than, for instance, the ligand-receptor distance. A typical alchemical calculation involves collecting distributions of perturbation energies as a function of the alchemical parameter . These are merged together by thermodynamic reweighting algorithms[31, 32] to yield the free energy profile along . Typically, only the difference between the end points of the free energy profile, which is the binding free energy, is considered. However the shape of the free energy profile can also yield useful information regarding the physical characteristics of the molecular complex. For example, a quadratic dependence on , typical of linear response, is often observed during the alchemical transformation.

In this work we present a method to relate the shape of the free energy profile to physical observables of the complex. Working within the single-decoupling framework, we develop an analytic probabilistic model of binding and construct a procedure to estimate the parameters of the model from the results of alchemical molecular simulations. The model is based on the the statistics of ligand-receptor interaction energies when the ligand uniformly explores the binding site volume as if the receptor atoms were not present. This general strategy has a long history in the treatment of solvation (examples are scaled particle theory, particle insertion, and information/fluctuation theories [33, 34, 35, 36, 37]) but it has not been fully explored to study molecular recognition. The main distinction is that a receptor, unlike a homogeneous solvent, has a specific shape and distribution of interaction sites. We show that the single decoupling theory offers a useful starting point to think about this problem.

## 2 Theory and Methods

### 2.1 Statistical mechanics theory of non-covalent molecular association

The standard free energy of binding, , between a receptor and a ligand is given by

(1) |

where , is the absolute temperature, is Boltzmann’s constant and is the dimensionless binding constant that, assuming ideal solutions, is expressed as

(2) |

where are equilibrium concentrations and is the standard state concentration (conventionally set as 1M or 1 molecule/1668 Å).

In a widely employed classical statistical mechanics theory of non-covalent association,[19, 12] the binding constant is expressed as

(3) |

where is the effective potential
energy function of the receptor-ligand complex expressed in terms
of the internal degrees of freedom, , of receptor and ligand,
and the external degrees of freedom (i.e. overall translation and
rotations),[20] , of the ligand with
respect to the receptor, and
is the binding energy of the complex in conformation ,
where is the effective potential energy of the system
when receptor and ligand are at infinite separation.
is the chosen volume of the binding site, that is the volume of the
region of positions and orientations of the ligand relative
to the receptor which are considered to correspond to the bound state
of the complex.^{1}^{1}1Eq. (3) refers to the case in which only overall
translations are used to define the binding site volume. In general,
a term corresponding to the integration over orientational degrees
of freedom is also present.[20, 12] The average in Eq. (3)
is conducted over the decoupled equilibrium ensemble corresponding
to , in which receptor and ligand do not interact, while
the ligand samples uniformly the binding site volume. Here
is the potential energy of the system and is the solvent
potential of mean force, which represents the solvation free energy
of the complex in conformation .[12]

Inserting Eq. (3) into Eq. (1) yields

(4) |

where is the concentration-dependent component of the standard free energy of binding independent of the specific form of the potential energy, and

(5) |

is the excess free energy of the complex. In the following we will focus on the excess component of the standard free energy of binding and, to simplify the notation, we will denote the excess free energy as simply and we will measure all energies and free energies in units thereby eliminating factors of throughout.

### 2.2 Alchemical binding free energy methods

Molecular simulations aimed at computing the excess free energy of binding by means of Eqs. (4) and (5) and are referred to as “alchemical” in that they sample the unphysical uncoupled state in which receptor and ligand, while being close to each other, they behave as if the other were not present. In practice, Eq. (5) converges very slowly because, due to atomic overlaps, in the uncoupled state large and positive values of (and, consequently, negligibly small values of ) are much more likely to be sampled than favorable ones, causing the average to be dominated by the infrequent occurrences of overlap-free configurations. To overcome this obstacle, it is common to adopt a stratification scheme based on an alchemical hybrid potential , dependent on an alchemical progress parameter conventionally ranging from 0 and 1, which implies a -dependent excess free energy defined as

(6) |

where

(7) |

is the -dependent binding constant, and where, using the notation introduced above, is the perturbation energy at for the complex in conformation . In the following we will refer to as the alchemical free energy profile and as the binding constant profile.

The stratification approach above leads to the familiar computational algorithms for the calculation of free energy differences based on the accumulation of the effects of small progressive increments of . For instance, Eq. (7) is easily generalized to yield an expression of the ratio of equilibrium constants at nearby values of :

(8) |

which is the basis of the Free Energy Perturbation (FEP) method. It should be noted that, while Eq. (8) is mathematically exact, modern numerical implementations of FEP employ more efficient BAR and MBAR free energy estimators.[31] Similarly, inserting Eq. (7) into Eq. (6) and differentiating with respect to , leads to the well-known Thermodynamic Integration (TI) formula:

(9) |

which, when integrated, yields the free energy profile.

Being related to ensemble averages, it is helpful for the current purpose to note that both the FEP and TI formulas can be expressed in terms of probability density functions. For instance, Eq. (7) can be rewritten as

(10) |

where is the probability density of the perturbation energy, , at in the ensemble. Analogously, denoting , Eq. (9) is rewritten as

(11) |

where is the probability density of the function in the ensemble at .

### 2.3 Linear alchemical transformations

Eqs. (10) and (11) take a particular convenient form when the alchemical potential energy function is linear with respect to :

(12) |

where is the potential energy of the decoupled state and is the so-called binding energy function of the complex, which, critically, is assumed here independent of . It is straightforward to show that for an alchemical potential of the form (12) the perturbation potential is proportional to the binding energy function

(13) |

and that the -derivative employed in the TI formula is independent of and is given by the binding energy function itself:

(14) |

Inserting Eq. (13) into Eq. (10) we obtain

(15) |

where , which plays a central role in this work, is the probability density of the binding energy function in the uncoupled state, that is in the state in which the ligand is uniformly distributed in the binding site region and receptor and ligand do not interact with each other.

Mathematically, Eq. (15) expresses the fact that the binding constant profile is given by the two-sided Laplace transform of . In turn, the binding free energy profile is related to the by Eq. (6), and the excess binding free energy is . Finally, the Potential Distribution Theorem[38] provides a relationship between and the binding energy distributions at any other value of :

(16) |

It is therefore apparent that knowledge of determines all of the other quantities that characterize the alchemical transformation, including the binding free energy profile and the binding free energy. In this respect the function serves the same role in the alchemical theory of binding that the density of states plays in classical statistical mechanics. For instance note the parallel between Eq. (16) and the well know Boltzmann’s relationship , which gives the energy distribution of a system at any temperature given the density of states.

The main aim of the work presented here is to develop a probabilistic analytical model for from which to derive all of the other quantities discussed above and, conversely, to estimate the parameters of the model against the results of alchemical molecular simulations.

### 2.4 Statistical model for

In this section we turn to derivation of a model for the probability distribution, , of the binding energy in the uncoupled ensemble at , that is in the state when the ligand and the receptor are not interacting. Note the key distinction between the state from which samples are collected (the uncoupled ensemble), and the quantity being sampled (the binding energy function): we are interested in the distribution of binding energies, which are in general not zero, when receptor and ligand configurations are sampled in the absence of receptor-ligand interactions. As illustrated in Fig. 1, due to the fact that in absence of interactions clashes between ligand and receptor atoms are likely, is characterized by a long tail at large and positive values of the binding energy. has also a much smaller, but finite, tail at favorable binding energies. The low energy tail of is amplified by the exponential term, to yield, through Eq. (16), the expected distribution of binding energies in the bound state narrowly centered around a favorable mean binding energy (see Fig. 1).

To start thinking about a functional form for , here we consider a monoatomic ligand. The model will be generalized to arbitrary ligand molecules later in this section. Consider Fig. 2, which depicts the binding site volume containing receptor atoms arranged in some configuration, with a monoatomic ligand placed in two alternative positions (blue spheres, labeled “B” and “C”). The binding site volume here is a represented as a rectangular box, although the following arguments apply to any definition of the binding site volume. Because at it does not interact with receptor atoms, the ligand occupies the binding site with uniform probability. The effective interaction energy between the ligand and the receptor is the sum of many individual interatomic interactions. In regions of the binding site sufficiently removed from the interior of the atoms of the receptor, as for example location “B” in Fig. 2, the interaction energy is approximately the result of many, relatively weak and favorable pair-wise interactions of similar magnitude. This mode of interaction describes the behavior of at favorable values of the binding energy.

When, instead, the ligand is found within the inner core of a receptor atom, as location “C” in Fig. 2, the repulsion energy of that individual interaction dominates all of the others. This interaction mode, expected to important to describe the high energy tail of . The atomic core of an atom is considered here as its most immediate region where its interaction potential dominates over other longer-ranged interactions. Because receptor atoms can not interpenetrate each other to more than a certain degree, strong repulsive interactions can be understood as the result of a single pair interaction rather than of cooperative contributions of many interactions.

To model these two distinct statistical behaviors, it is useful to think of the ligand-receptor binding energy as the results of two contributions

(17) |

where represent the collisional component, which corresponds to short-ranged repulsive interactions which dominate within the atomic cores and are represented by a single pair interaction at a time, and is the background component given by the sum of contributions of many weak and favorable longer-ranged pair interactions. Motivated by the central limit theorem, we model the probability distribution of the background component by a Gaussian distribution:

(18) |

where is the mean and is the standard deviation of the distribution.

The statistics of the collisional energy in the region outside the atomic cores is unimportant since is much smaller than there. On the other hand, when inside one of the atomic cores, is large and positive. Here, for a single pair collision, we assume a repulsive pair potential of the Lennard-Jones (LJ), Weeks-Chandler-Andersen (WCA) form

(19) |

which, as shown in the Appendix, within the atomic core region corresponds to the binding energy distribution

(20) |

where is Heavyside’s step function, , and . Here is an adjustable energy parameter that defines the level set of the boundary of the core of receptor atoms. It is defined as the repulsive energy above which the energy of the collision follows the probability density (20).

We show in the Appendix that, under reasonable assumptions, the form (20) of the probability density for the collisional binding energy component applies for a receptor composed of many atoms, albeit perhaps with values of parameters and not obviously related to the assumed Lennard-Jones form of the repulsive potential.

We now turn to the generalization of the binding energy probability distribution for a polyatomic ligand. In this case, because it is now the sum over both multiple ligand atoms and receptor atoms, the distribution of the background component is expected to obey the central limit theorem to an even greater extent and, consequently, it is expected to continue to be well described by the Gaussian form (18), where now the parameters and refer to average binding energy and corresponding standard deviation for the whole ligand rather than a single atom.

Even though the total collisional energy is the sum of the collisional energies of each ligand atom, the central limit theorem is not applicable because the mean and variances of each contribution, described by probability density (20), are undefined. We can assume however that the collisional energy is dominated by the largest repulsive interaction among all of the ligand atoms , where is the collisional energy of ligand atom . The probability density of the maximum, , of a set of independent random variables distributed according to the probability density is given by the expression[39]

(21) |

where is the integrated form of , that is the cumulative distribution corresponding to . In general, the positions of the atoms of the ligand are not independent so Eq. (21) is an approximation. It is expected however that this form, with an effective number of statistically independent number of atoms groups, , is of general applicability. If the ligand is small and rigid it will behave as a single atom. On the other extreme, a large and flexible ligand can be thought of being composed of groups of atoms with nearly uncorrelated position.

Combining Eqs. (37), (42), and (21) yields

(22) |

where is the normalization factor and the other symbols have the same meaning as in Eq. (20).

The probability density (22) is the probability density of the collisional energy conditional on there being a collision. That is of being at least one atomic clash defined as . Denoting by the probability that one such collision occurs when the ligand is within the binding site volume, and by its complement, we obtain the following expression for the probability density of the collisional energy:

(23) |

where is given by Eq. (22) and the -function expresses the fact that outside the core region the collisional energy is zero.

Finally, assuming that the background and collisional contributions are statistically independent, the probability density of the total binding energy is given by the convolution of the respective probability densities:

(24) |

where is a Gaussian distribution (18) and the collisional density given by (23). Substituting these definitions in Eq. (24) we obtain:

(25) |

where is the Gaussian distribution of mean and standard deviation [see Eq. (18)]. While the integral in Eq. (25) is not available in analytical form, it is amenable to numerical computation by for example Gauss-Hermite quadrature. Fig. 1 shows for a particular choice of the parameters , , , , and . Also shown in this figure is [see Eq. (16)]. These distributions indeed reflect the behavior of binding energy distributions obtained from actual molecular simulations (see Results).

### 2.5 Model for the free energy profile

Since the Laplace transform of a convolution of two functions is the product of their Laplace transforms, from Eq. (15) and Eqs. (20) and (18), we have

(26) |

where

(27) |

where is the two-sided Laplace transform of . From Eq. (22):

(28) |

Finally, the two-sided Laplace transform of (a Gaussian) is:

(29) |

An illustrative binding free energy profile, , obtained from Eqs. (26), (27), (28) and (29) for some choice of parameter values is shown in Fig. 3. Free energy profiles from simulations indeed follow have the shape illustrated in Fig. 3 (see Results). Note that in this model is given by the sum of the free energies corresponding to the collisional and background processes:

(30) |

### 2.6 Mixture model of background component

The analytical model described so far predicts Gaussian-distributed binding energies at , where the collisional contribution is negligible. We have encountered, however, systems displaying bimodal binding energy distributions in this regime (see for example Fig. 7). These occurrences are interpreted as the system undergoing a conformational transition from a high-entropy/high-energy state to a low-entropy/low-energy state as is increased. We found that these systems can be described well by a mixture model of the background binding energy component described by the weighted sum of two Gaussian distributions:

(31) |

where and () are the probabilities of occurrence of conformational states and at , respectively, and and are the corresponding average and standard deviation parameters of their background components at .

To formulate the full model of for this case, Eq. (31) replaces the single Gaussian in Eq. (25). The remainder of the analytical theory is unchanged. Note that this model can be expanded to an arbitrary number of states and that it reduces to the single-state model [Eq. (18)] when only one state is present (that is , for example).

### 2.7 Model parameterization

The analytical model of binding defined by Eq. (25) with Eqs. (22) and (18) depends on six parameters: , the average background binding energy in the coupled state, , the standard deviation of the background binding energy in the decoupled state, , the effective Lennard-Jones parameter of the repulsive potential within the atomic core, , the repulsive energy above which the collisional binding energy contribution is dominated by the closest contact, , the effective number of statistically independent atom groups of the ligand, and , the probability that in the uncoupled state the ligand does not overlap with receptor atoms.

These parameters are obtained for each complex by varying them so as to fit Eq. (25) to histograms of binding energies observed in alchemical molecular simulations at multiple values of (see Fig. 5 as an example). Near , where ligand-receptor interactions are established, atomic collisions are unlikely and the binding energy is mainly determined by the background component. Thus, histograms obtained from molecular dynamics trajectories near are most useful in the estimation of the background binding energy parameters and . An initial first guess for these parameters can be extracted from the average and standard deviation of the binding energies at , observing that, because the background energy is assumed to be Gaussian-distributed, its parameters follow linear response behavior upon variation of :

(32) |

(33) |

which can be easily derived by applying the potential distribution theorem [Eq. (16)] to the Gaussian distribution of at : .

Conversely, the histograms at small values are most useful to estimate the collisional energy parameters , , , and , once a first guess for the values of and is available. We observed (see Results), as it would be expected, a high degree of universality of the parameters and , which describe the extent and softness of the repulsive potential within the atomic cores common to all complexes investigated. We used the parameter, which regulates the relative magnitude of the two components in Eq. (25), to match the shape of histograms at intermediate values of . Finally, we employed the parameter to reproduce the shape of the high energy tail of histograms at small values (with larger values describing slower decaying tails). Given the difficulty of binning the unbound high energy portion of binding energies, this last step was performed by also matching at the shape of the free energy profile at small .

The mixture model (Section 2.6) introduces three additional parameters of the background energy model (the relative occupancy of the two states, and one additional set of average and standard deviation parameters of the background component). These are best obtained by fitting the bimodal distribution of binding energies as a function of , exploiting the linear response behavior of each of the average and standard deviation parameters [Eqs. (32) and (33)], and those of the state probabilities:

(34) |

(35) |

where

(36) |

which can be derived by application of the potential distribution theorem to the Gaussian mixture distribution (31).

In future work we will implement an automated mechanism based on maximum likelihood statistical inference to estimate the parameters of the model.[40]

### 2.8 Computational details

The host-guest complexes were prepared as described.[41, 42, 43] Single-decoupling[28] Hamiltonian Replica-exchange Molecular dynamics simulations [44] employed 22 intermediate steps as follows: = , , , , , , , , , , , , , , , , , , , , , and . The calculation employed the OPLS-AA force field[45, 46] and the AGBNP2 implicit solvent model.[27] The replica-exchange (AsyncRE) simulations were started from energy-minimized and thermalized structures from manually docked models. A flat-bottom harmonic restraint with a tolerance of Å between the centers of mass of the host and the guest was applied to define the binding site volume. Each cycle of a replica lasted for 100 picoseconds with 1 fs time-step. The average sampling time for a replica was approximately 10 ns. Calculations were performed on the campus computational grid at Brooklyn College. The binding energies obtained from all replicas were analyzed using UWHAM[32] method and the R-statistical package to compute the binding free energy profile .

## 3 Results

We tested the analytical model of binding presented above on four host-guest complexes: cyclohexanol, nabumetone, and N-tBOC-L-alanine binding to -cyclodextrin[41] and trans-4-methylcyclohexanoate binding to the octa-acid cavitand host[43] (Figs. 4 and 6). The results for the complexes with cyclohexanol, nabumetone, and trans-4-methylcyclohexanoate are shown in Fig. 5 and Table 1. The results for the complex of N-tBOC-L-alanine and -cyclodextrin, which undergoes a -dependent conformational transition, are presented in Fig. 7 and Table 2.

cyclohexanol | |||||||
---|---|---|---|---|---|---|---|

nabumetone | |||||||

trans-4-methylcyclohexanoate |

In kcal/mol

The analytic model fits very well the binding energy distributions and free energy profiles from numerical calculations (Fig. 4). The model correctly interpolates the Gaussian behavior of the binding energy distributions at and the diffuse and asymmetric aspects of the distributions at . As shown in Fig. 4, the distributions at intermediate values present characteristics of both limits and are also correctly described by the model.

Free energy profiles (Fig. 4, right column) are also closely described by the analytic model. For large values of (, approximately), the free energy profiles vary quadratically with , consistent with linear response behavior. The quadratic regime is preceded by a highly non-linear variation of the free energy near . The analytic model correctly captures the singularity of the first derivative of the free energy profile at .[47] The maximum of the free energy corresponds to the value of at which the average binding energy is zero. In general, as it can be shown from Eqs. (9) and (14), the first derivative of the free energy profile is in fact proportional to the average binding energy. The singularity of the first derivative at is, thus, consistent with the undefined first moment of the probability density. As the data in Fig. 4 illustrates, the analytic model successfully interpolates between the linear response regime at and the collisional regime at .

The model parameters obtained by fitting the analytic predictions to the numerical results for three of the four host-guest complexes are listed in Table 1. The values of the standard binding free energies (2nd column) differ from the corresponding free energy profiles (Fig. 5) at by the standard state concentration-dependent term (see Methods). The stronger binding affinities of nabumetone and trans-4-methylcyclohexanoate to their respective hosts relative to cyclohexanol is driven by stronger interaction energies as reflected by the parameter. The average binding energies at the bound state match closely the linear response predictions from Eq. (33): , , and kcal/mol, from Eq. (33) and fitted , parameters (Table 1), for cyclohexanol, nabumetone and trans-4-methylcyclohexanoate, respectively, compared to the direct numerical estimates , , and kcal/mol, from direct numerical averaging of the binding energies from the simulation replicas.

The trend toward stronger interaction energies is partially offset by the progressively smaller probabilities of fitting the guest into the host without causing atomic clashes, as illustrated by the parameter (Table 1, 5th column). For example, the estimates indicate that it is almost 6 orders of magnitude more difficult to fit trans-4-methylcyclohexanoate into the octa-acid cavitand that it is to fit cyclohexanol into -cyclodextrin. This presumably reflects the fact that the -cyclodextrin interior is larger than that of the octa-acid cavitand, which, in addition, is open only on one end. The variations of could also represent the probabilities of occurrence of binding-competent conformations of guest and host.

As expected, a common set of values of the and parameters, corresponding loosely to the magnitude and softness of the core inter-atomic repulsion potential, describes all of the complexes investigated. The magnitude of the fitted parameter ( kcal/mol) is significantly larger than typical Lennard-Jones force field parameters. This confirms the expectation that these parameters should be interpreted to represent the shape and intensity of the repulsive potential exercised by groups of atoms, rather than by individual atoms.

Finally, in Table 1 we report the fitted values of the the parameter (8th column) which represents the number of statistically independent number of atom groups of the guests. Indeed, values roughly scale as the size of the guest. For example nabumetone binding to -cyclodextrin corresponds to , whereas the smaller cyclohexanol has . Despite the smaller size, the value of trans-4-methylcyclohexanoate binding to the octa-acid cavitand is similar to that of nabumetone, possibly reflecting an influence of the nature of the receptor on the value of this parameter.

N-tBOC-L-alanine | |||||||
---|---|---|---|---|---|---|---|

state | |||||||

state |

In kcal/mol

The complex of N-tBOC-L-alanine with -cyclodextrin (Fig. 6) undergoes a transition along the alchemical path from one conformational state (state in Fig. 6), in which the carboxylate terminus is hydrogen-bonded to one of the hydroxyl groups of the wide rim of the host, to another conformational state (state in Fig. 6), in which the carboxylate group is rotated toward the solvent, and the body of the aminoacid, including the tert-butyl moiety, is seated deeper in the host interior than in state .

The transition is particular evident in the distributions of binding energy values as a function of (Fig. 7). For the other complexes studied (Fig. 5) the peaks of the binding energy distributions linearly shift toward more negative values as is increased. In contrast, the peak of the distribution for N-tBOC-L-alanine, starting at , instead of shifting, it develops a low energy peak as is increased. Between and the distribution is bimodal, with a high energy mode (corresponding to state ) centered near kcal/mol and the low energy mode (corresponding to state ) near kcal/mol. Starting at , where state is predominant, population abruptly shifts to state , which becomes the predominant state at . At the two states have almost the same population. This behavior is the hallmark of a pseudo first-order phase equilibrium,[48, 49] in which two phases, characterized by compensating differences in average energy and entropy, cohexist within the same free energy range.

Indeed, the mixture model (Section 2.6 and Table 2) captures the tread off between interaction energy (the parameter) and probability of occurrence (the parameter). State has a much higher probability of occurrence in the uncoupled state than state . However in state the guest interacts more favorably with the host than in state by about kcal/mol (Table 2, 3rd column). For small values, the binding energy component of the alchemical potential energy function [Eq. (12)] is small and the complex tends to visit exclusively state given its overwhelmingly large probability. However, as is increased, the weight of the binding energy term increases and state becomes competitive with state .

The conformational transition is also apparent in the abrupt change of slope of the binding free energy profile near (Fig. 7). As mentioned, the slope of the binding free energy profile corresponds to the average binding energy as a function of . Correspondingly, at , the system transitions to a state of lower binding energy thereby causing the change in slope. Note that, while the transition appears slight in the binding free energy profile, the shift in slope causes a significant decrease (by about 1 kcal/mol) of the binding free energy. Furthermore, the shift in slope of the binding free energy profile and the bimodal character of the binding energy distributions can not be described without invoking the mixture model.

## 4 Discussion

The results obtained as part of this work clearly indicate that it is feasible to represent alchemical binding free energy profiles by parameterized analytic functions. The model offers a rationalization for the shape of the free energy profile and of the binding energy distributions. The critical feature of the model is the ability to bridge the two limiting behaviors of the free energy profile, the region near determined by atomic clashes and the region near characterized by linear response. The main conceptual advance that enabled this versatility of the model is the description of the binding energy in the uncoupled state of the complex as the sum of two interaction energy components with radically distinct statistical signatures. The first, termed “collisional” interaction energy, describes atomic clashes dominated by nearest neighbor pairs and follows the statistics of the maximum of a set of random variables. The second, that we termed “background” interaction energy, describes the sum of many weak and favorable interatomic interactions and follows the central limit theorem. The two statistical components, assumed statistically independent, are then combined using standard convolution to obtain the distribution of the total binding energy and, by means of a Laplace transformation, the binding free energy profile.

The general strategy of describing free energy changes along a thermodynamic path by means of probability models applied to the “decoupled” end point has a long history in the treatment of solvation phenomena in condensed phases. Examples are scaled particle theory, particle insertion models, and information/fluctuation theories.[50, 33, 34, 35, 37] Early work in this area by Pratt & Chandler,[51] introduced the connection between the solubility of hard sphere particles[52] and the probability of formation of suitable cavities in the neat solvent, a prediction that was confirmed by Pangali, Rao, and Berne[53] and subsequent computer simulation work.[54, 55, 56] Both Pohorille and Pratt[57] and Hummer et al.,[35] elaborated on the concept of, , the probability that a cavity of size occurs in a neat liquid, which was first introduced in scaled particle theory[50, 58] to model the probability of occurrences of cavities based on the moments of the number of solvent molecules that occupy the solute volume in neat water.

The same essential concepts have been used here to formulate a model connecting the free energy of inserting a ligand molecule into a receptor binding site to probability distributions collected in the decoupled state. The main difference between the solvation process, seen as solute insertion, and binding, seen as ligand insertion, is that, unlike a homogeneous solution, the distribution of receptor atoms is not homogeneous. In particular, there are regions in the receptor binding site where a ligand can fit without requiring conformational reorganization. Conversely, there are interior regions of the receptor from where the ligand is effectively excluded. The model we formulated takes into account these complex geometric and energetic effects in terms of effective physical parameters which are set so as to reproduce the results of alchemical molecular simulations. The close agreement obtained here between model predictions and molecular simulations of a set of relatively simple but yet chemically-relevant host-guest complexes, is evidence that the model is sound and deserving of further investigation and development.

The physical parameters returned by the model can be useful in the characterization of molecular complexes. For instance, the and parameters measure the strength of favorable electrostatic and dispersion receptor-ligand interactions as a function of [Eq. (33)]. In particular, the parameter measures the linear response of the complex to the establishment of favorable interactions. A larger can be an indication, for example, of a larger polarizability of the receptor and can be interpreted in terms of local dielectric constant.[59, 60, 61, 62] On the other hand, the parameter, which is the probability that ligand and receptor do not overlap, is a measure of the size of the binding cavity, if present, relative to the size of the ligand, or, alternatively, the likelihood of the formation of a suitable binding cavity that can fit the ligand. Similarly, the parameter can be taken as a measure of ligand size and ligand flexibility. As discussed, the mixture model parameters indicate the presence of multiple conformational states of the complex and of their average interaction energies and relative probabilities. Taken together, these parameters, when tabulated over a series of systems, can be useful to characterize and categorize receptor-ligand complexes and, when correlated with binding affinities, can inform receptor and ligand design.

Future work will also assess the potential usefulness of the analytic model toward the improvement of alchemical simulation protocols. A possible application of the model is as a framework to analyze and measure free energy changes near the decoupled state without the need for extrapolation[22] or soft-core alchemical potentials.[32, 63] As analyzed by Simonson[47] and reproduced by our model, the first derivative of the free energy profile has a singularity at . This singularity causes problems for numerical free energy estimators,[64, 31] which are usually addressed by the adoption of non-linear soft-core alchemical potentials.[65, 66] These difficulties can also be addressed by replacing the numerical estimation of free energies near the singularity with the estimation of the parameters (which are free of singularities) of the analytic free energy function (30). The analytic model can also be potentially useful to evaluate alchemical thermodynamic lengths to optimize the schedule[67, 68] of alchemical transformations.

The model, as currently expressed, is limited to single-decoupling linear alchemical transformations.[12, 69] Single-decoupling requires pre-averaging to the solvent degrees of freedom by means of a solvent potential of mean force treatment[70] implemented here using the AGBNP2[27] implicit solvent model. The requirement of linearity of the alchemical transformation with respect to the charging parameter is introduced so as to deploy potential distribution theorem identities[38] relating binding energy distributions at different values of . Future work will attempt to extend the model to non-linear coupling schemes and to explicit solvation models. Binding free energy calculations with explicit solvation are typically conducted according to the double-decoupling scheme,[19] which is based on the difference of the free energies of coupling the ligand to the hydrated receptor and the free energy of solvation. Hence, it is conceivable that an analogous analytic model can be developed for double-decoupling alchemical calculations by considering each free energy leg separately.

## 5 Conclusion

We have presented a parameterized analytical model describing the free energy profile of linear single-decoupling alchemical binding free energy calculations. The parameters of the model, which are physically motivated, are obtained by fitting model predictions to numerical simulations. The validity of the model has been assessed on a set of host-guest complexes. The model faithfully reproduces the binding free energy profiles and the probability densities of the perturbation energy as a function of the alchemical progress parameter . The model offers a rationalization for the characteristic shape of the free energy profiles. The parameters obtained from the model are potentially useful descriptors of the association equilibrium of molecular complexes.

## References

- [1] Albert C Pan, David W Borhani, Ron O Dror, and David E Shaw. Molecular determinants of drug–receptor binding kinetics. Drug Discovery Today, 18(13):667–673, 2013.
- [2] James C Gumbart, Benoît Roux, and Christophe Chipot. Efficient determination of protein–protein standard binding free energies from first principles. J. Chem. Theory Comput., 9(8):3789–3798, 2013.
- [3] Vittorio Limongelli, Massimiliano Bonomi, and Michele Parrinello. Funnel metadynamics as accurate binding free-energy method. Proc. Natl. Acad. Sci., 110(16):6358–6363, 2013.
- [4] Andrea Cavalli, Andrea Spitaleri, Giorgio Saladino, and Francesco L Gervasio. Investigating drug–target association and dissociation mechanisms using metadynamics-based algorithms. Accounts of chemical research, 48(2):277–285, 2014.
- [5] Jeffrey Comer, James C Gumbart, Jérôme Hénin, Tony Lelièvre, Andrew Pohorille, and Christophe Chipot. The adaptive biasing force method: everything you always wanted to know but were afraid to ask. J. Phys. Chem. B, 119(3):1129–1151, 2014.
- [6] Francesco Saverio Di Leva, Ettore Novellino, Andrea Cavalli, Michele Parrinello, and Vittorio Limongelli. Mechanistic insight into ligand binding to G-quadruplex DNA. Nucl. Acids Res., pages 5447–5455, 2014.
- [7] Pratyush Tiwary, Vittorio Limongelli, Matteo Salvalaglio, and Michele Parrinello. Kinetics of protein–ligand unbinding: Predicting pathways, rates, and rate-limiting steps. Proc. Natl. Acad. Sci., 112(5):E386–E391, 2015.
- [8] Robert B Sandberg, Martina Banchelli, Carlo Guardiani, Stefano Menichetti, Gabriella Caminati, and Piero Procacci. Efficient nonequilibrium method for binding free energy calculations in molecular dynamics simulations. J. Chem. Theory Comput., 11(2):423–435, 2015.
- [9] Yinglong Miao, Victoria A Feher, and J Andrew McCammon. Gaussian accelerated molecular dynamics: Unconstrained enhanced sampling and free energy calculation. J. Chem. Theory Comput., 11(8):3584–3595, 2015.
- [10] Ali S Saglam and Lillian T Chong. Highly efficient computation of the basal kon using direct simulation of protein-protein association with flexible molecular models. The Journal of Physical Chemistry B, 120(1):117–122, 2015.
- [11] John D. Chodera, David L. Mobley, Michael R. Shirts, Richard W. Dixon, Kim Branson, and Vijay S. Pande. Alchemical free energy methods for drug discovery: Progress and challenges. Curr. Opin. Struct. Biol., 21:150–160, 2011.
- [12] Emilio Gallicchio and Ronald M Levy. Recent theoretical and computational advances for modeling protein-ligand binding affinities. Adv. Prot. Chem. Struct. Biol., 85:27–80, 2011.
- [13] David L Mobley and Pavel V Klimovich. Perspective: Alchemical free energy calculations for drug discovery. J. Chem. Phys., 137:230901, 2012.
- [14] Michel A Cuendet and Mark E Tuckerman. Alchemical free energy differences in flexible molecules from thermodynamic integration or free energy perturbation combined with driven adiabatic dynamics. J. Chem. Theory Comput., 8(10):3504–3512, 2012.
- [15] B. L. Tembe and J. A. McCammon. Ligand-receptor interactions. Comput. Chem., 8:281, 1984.
- [16] William L. Jorgensen. Interactions between amides in solution and the thermodynamics of weak binding. J. Am. Chem. Soc., 111:3770–3771, 1989.
- [17] V. A. Payne, N. Matubayasi, L. Reed Murphy, and R. M. Levy. Monte carlo study of the effect of pressure on hydrophobic association. J. Phys. Chem. B, 101:2054–2060, 1997.
- [18] E. Gallicchio, M. M. Kubo, and R. M. Levy. Entropy-enthalpy compensation in solvation and ligand binding revisited. J. Am. Chem. Soc., 120:4526–27, 1998.
- [19] M. K. Gilson, J. A. Given, B. L. Bush, and J. A. McCammon. The statistical-thermodynamic basis for computation of binding affinities: A critical review. Biophys. J., 72:1047–1069, 1997.
- [20] S Boresch, F Tettinger, M Leitgeb, and M Karplus. Absolute binding free energies: A quantitative approach for their calculation. J. Phys. Chem. B, 107:9535–9551, 2003.
- [21] Chipot and Pohorille (Eds.). Free Energy Calculations. Theory and Applications in Chemistry and Biology. Springer Series in Chemical Physics. Springer, Berlin Heidelberg, Berlin Heidelberg, 2007.
- [22] Nan-jie Deng, Peng Zhang, Piotr Cieplak, and Luhua Lai. Elucidating the energetics of entropically driven protein–ligand association: calculations of absolute binding free energy and entropy. J. Phys. Chem. B, 115(41):11902–11910, 2011.
- [23] Terry P Lybrand, J Andrew McCammon, and Georges Wipff. Theoretical calculation of relative binding affinity in host-guest systems. Proc. Natl. Acad. Sci. USA, 83(4):833–835, 1986.
- [24] Julien Michel, Marcel L. Verdonk, and Jonathan W. Essex. Protein-ligand complexes: Computation of the relative free energy of different scaffolds and binding modes. Journal of Chemical Theory and Computation, 3:1645–1655, 2007.
- [25] Anita de Ruiter and Chris Oostenbrink. Free energy calculations of protein–ligand interactions. Curr. Op. Chem. Biol., 15:547–552, 2011.
- [26] Lingle Wang, Yujie Wu, Yuqing Deng, Byungchan Kim, Levi Pierce, Goran Krilov, Dmitry Lupyan, Shaughnessy Robinson, Markus K Dahlgren, Jeremy Greenwood, Donna L. Romero, Craig Mass, L. Jennifer Knight, Thomas Steinbrecher, Thijs Beuming, Wolfgang Damm, Ed Harder, Woody Sherman, Mark Brewer, Ron Wester, Mark Murcho, Leah Frye, Ramy Farid, Teng Lin, David L. Mobley, William L. Jorgensen, Bruce J. Berne, Richard A. Friesner, and Robert Abel. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J. Am. Chem. Soc., 137:2695–2703, 2015.
- [27] Emilio Gallicchio, Kristina Paris, and Ronald M. Levy. The AGBNP2 implicit solvation model. J. Chem. Theory Comput., 5:2544–2564, 2009.
- [28] Emilio Gallicchio, Mauro Lapelosa, and Ronald M. Levy. Binding energy distribution analysis method (BEDAM) for estimation of protein-ligand binding affinities. J. Chem. Theory Comput., 6:2961–2977, 2010.
- [29] E. Gallicchio and R. M. Levy. Prediction of SAMPL3 host-guest affinities with the binding energy distribution analysis method (BEDAM). J. Comp. Aided Mol. Design., 25:505–516, 2012.
- [30] E. Gallicchio. Role of ligand reorganization and conformational restraints on the binding free energies of DAPY non-nucleoside inhibitors to HIV reverse transcriptase. Mol. Biosc., 2:7–22, 2012.
- [31] Michael R Shirts and John D Chodera. Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys., 129:124105, 2008.
- [32] Zhiqiang Tan, Emilio Gallicchio, Mauro Lapelosa, and Ronald M. Levy. Theory of binless multi-state free energy estimation with applications to protein-ligand binding. J. Chem. Phys., 136:144102, 2012.
- [33] B. Widom. Potential-distribution theory and the statistical mechanics of fluids. J. Phys. Chem., 86:869–872, 1982.
- [34] L. R. Pratt and A. Pohorille. Theory of hydrophobicity: transient cavities in molecular liquids. Proc Natl Acad Sci U S A, 89:2995–2999, 1992.
- [35] G. Hummer, S. Garde, A. E. García, A. Pohorille, and L. R. Pratt. An information theory model of hydrophobic interactions. Proc. Natl. Acad. Sci. USA, 93:8951–8955, 1996.
- [36] T. Simonson. Dielectric relaxation in proteins: Microscopic and macroscopic models. International Jrnl. of Quantum Chemistry, 73:45–57, 1999.
- [37] D. M. Huang and D. Chandler. The hydrophobic effect and the influence of solute-solvent attractions. J. Phys. Chem. B, 106:2047–2053, 2002.
- [38] Tom L. Beck, Michael E. Paulaitis, and Lawrence R. Pratt. The Potential Distribution Theorem and Models of Molecular Solutions. Cambridge University Press, New York, 2006.
- [39] E. J. Gumbel. Statistics of Extremes. Dover Publications, New York, 2012.
- [40] Tai-Sung Lee, Brian K Radak, Anna Pabis, and Darrin M York. A new maximum likelihood approach for free energy profile construction from molecular simulations. J. Chem. Theory Comput., 9(1):153–164, 2012.
- [41] Lauren Wickstrom, Peng He, Emilio Gallicchio, and Ronald M. Levy. Large scale affinity calculations of cyclodextrin host-guest complexes: Understanding the role of reorganization in the molecular recognition process. J. Chem. Theory Comput., 9:3136–3150, 2013.
- [42] Emilio Gallicchio, Haoyuan Chen, He Chen, Michael Fitzgerald, Yang Gao, Peng He, Malathi Kalyanikar, Chuan Kao, Beidi Lu, Yijie Niu, Manasi Pethe, Jie Zhu, and Ronald M Levy. BEDAM binding free energy predictions for the SAMPL4 octa-acid host challenge. J. Comp. Aided Mol. Des., 29:315–325, 2015.
- [43] Rajat Kumar Pal, Kamran Haider, Divya Kaur, William Flynn, Junchao Xia, Ronald M. Levy, Tetiana Taran, Lauren Wickstrom, Tom Kurtzman, and Emilio Gallicchio. A combined treatment of hydration and dynamical effects for the modeling of host-guest binding thermodynamics: The SAMPL5 blinded challenge. J. Comp. Aided Mol. Des., 31:29–44, 2016.
- [44] Emilio Gallicchio, Junchao Xia, William F Flynn, Baofeng Zhang, Sade Samlalsingh, Ahmet Mentes, and Ronald M Levy. Asynchronous replica exchange software for grid and heterogeneous computing. Computer Physics Communications, 196:236–246, 2015.
- [45] W. L. Jorgensen, D. S. Maxwell, and J. Tirado-Rives. Developement and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J. Am. Chem. Soc., 118:11225–11236, 1996.
- [46] G. A. Kaminski, R. A. Friesner, J. Tirado-Rives, and W. L. Jorgensen. Evaluation and reparameterization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. B, 105:6474–6487, 2001.
- [47] Thomas Simonson. Free energy of particle insertion: an exact analysis of the origin singularity for simple liquids. Molecular Physics, 80(2):441–447, 1993.
- [48] Jaegil Kim and John E Straub. Generalized simulated tempering for exploring strong phase transitions. J. Chem. Phys., 133:154101, 2010.
- [49] Andrew Binder, Tony Lelièvre, and Gideon Simpson. A generalized parallel replica dynamics. J. Comp. Phys., 284:595–616, 2015.
- [50] H Reiss, HL Frisch, and JL Lebowitz. Statistical mechanics of rigid spheres. J. Chem. Phys., 31(2):369–380, 1959.
- [51] L. R. Pratt and D. Chandler. Theory of the hydrophobic effect. J. Chem. Phys., 67:3683–3704, 1977.
- [52] Argyroula Stamatopoulou and Dor Ben-Amotz. Cavity formation free energies for rigid chains in hard sphere fluids. J. Chem. Phys., 108(17):7294–7300, 1998.
- [53] C. Pangali, M. Rao, and B. J. Berne. Hydrophobic hydration around a pair of apolar species in water. J. Chem. Phys., 71:2975–2981, 1979.
- [54] A. Wallqvist and B. J. Berne. Molecular dynamics study of the dependence of water solvation free energy on solute curvature and surface area. J. Phys. Chem., 99:2885–2892, 1994.
- [55] B. J. Berne. Inferring the hydrophobic interaction from the properties of neat water. Proc. Natl. Acad. Sci. USA, 93:8800–8803, 1996.
- [56] E. Gallicchio, M. M. Kubo, and R. M. Levy. Enthalpy-entropy and cavity decomposition of alkane hydration free energies: Numerical results and implications for theories of hydrophobic solvation. J. Phys. Chem. B, 104:6271–6285, 2000.
- [57] A. Pohorille and L. R. Pratt. Cavities in molecular liquids and the theory of hydrophobic solubilities. J. Am. Chem. Soc., 112:5066–5074, 1990.
- [58] R. A. Pierotti. A scaled particle theory of aqueous and nonaqueous solutions. Chemical Reviews, 76:717–26, 1976.
- [59] Georgios Archontis and Thomas Simonson. Dielectric relaxation in an enzyme active site: molecular dynamics simulations interpreted with a macroscopic continuum model. J. Am. Chem. Soc., 123(44):11047–11056, 2001.
- [60] Thomas Simonson. Gaussian fluctuations and linear response in an electron transfer protein. Proc. Natl. Acad. Sci., 99(10):6544–6549, 2002.
- [61] Thomas Simonson. Dielectric relaxation in proteins: the computational perspective. Photosynthesis research, 97(1):21–32, 2008.
- [62] Hugh Nymeyer and Huan-Xiang Zhou. A method to determine dielectric constants in nonhomogeneous systems: application to biological membranes. Biophys. J., 94(4):1185–1193, 2008.
- [63] Floris P. Buelens and Helmut Grubmüller. Linear-scaling soft-core scheme for alchemical free energy calculations. J. Comput. Chem., 33:25–33, 2012.
- [64] Michael R Shirts and Vijay S Pande. Comparison of efficiency and bias of free energies computed by exponential averaging, the bennett acceptance ratio, and thermodynamic integration. J Chem Phys, 122:144107, 2005.
- [65] Andrew Pohorille, Christopher Jarzynski, and Christophe Chipot. Good practices in free-energy calculations. J. Phys. Chem. B, 114:10235–10253, 2010.
- [66] Michael R Shirts and David L Mobley. An introduction to best practices in free energy calculations. Biomolecular Simulations: Methods and Protocols, pages 271–311, 2013.
- [67] JC Schön. A thermodynamic distance criterion of optimality for the calculation of free energy changes from computer simulations. J. Chem. Phys., 105(22):10072–10083, 1996.
- [68] Daniel K Shenfeld, Huafeng Xu, Michael P Eastwood, Ron O Dror, and David E Shaw. Minimizing thermodynamic length to select intermediate states for free-energy calculations and replica-exchange simulations. Phys. Rev. E, 80(4):046705, 2009.
- [69] Daniele Di Marino, Ilda D’Annessa, Holly Tancredi, Claudia Bagni, and Emilio Gallicchio. A unique binding mode of the eukaryotic translation initiation factor 4E for guiding the design of novel peptide inhibitors. Prot. Sci., 24:1370–1382, 2015.
- [70] B. Roux and T. Simonson. Implicit solvent models. Biophys. Chem., 78:1–20, 1999.

## Appendix A Appendix

### a.1 Derivation of Eq. (20)

Consider two particles interacting by the pair potential (19) in which one particle (representing the receptor) is fixed at the origin and the other (representing the ligand) is uniformly distributed in a sphere of radius centered at the origin (Fig. 8). Here we assume that (the distance beyond which the Lennard-Jones WCA potential is zero). We will derive the probability density of the interaction energy , where is the distance between the two particles, by differentiating the cumulative probability function defined as the probability that, given that the ligand particle is uniformly distributed in the sphere, the interaction energy is greater than the given value . The value of the WCA potential at is denoted by ; is therefore the smallest allowed interaction energy.

The probability that the pair interaction energy is smaller than is given by:

(37) |

where the Heaviside function imposes the requirement that be larger than the minimum values, is the volume of the sphere of radius and is the volume of the sphere of radius , where is inter-particle distance at which the LJ WCA potential has value . From Eq. (19) we have

(38) |

where is the minimum of the Lennard-Jones pair potential and

(39) |

Inserting Eq. (38) into Eq. (37) and differentiating with respect to yields Eq. (20), which expresses a normalized distribution as it can be verified by direct integration using the fact that

(40) |

Now consider a receptor composed of atoms interacting with a monoatomic ligand with the WCA repulsive potential (19). The cumulative probability is given by the expression , as in Eq. (37), where now is the volume of the region of the receptor where the WCA potential is larger than and, similarly, is the volume where the WCA potential is larger than . We can approximate by the van der Waals volume of a molecule with atoms with van der Waals radii , given by Eq. (38) below, corresponding to distance at which the value of WCA repulsive pair potential is equal to . Differentiating the cumulative distribution with respect to , yields:

(41) |

where is the van der Waals surface of the receptor when the atomic radii are set to , and and are both functions of [see Eqs. (38) and (39)].

Eq. (41) is interesting because it links the probability density of the collisional interaction energy to the shape of the receptor. There are numerical algorithms (some analytical) to obtain the van der Waals surface area of a molecule. For large , is small and atomic overlaps between receptor atoms can be ignored. In this limit , and assuming that , we obtain

(42) |

which has the same form as the probability density of the collisional energy for one receptor atom.