# [

###### Abstract

Free energy profile (FE Profile) is an essential quantity for the estimation of reaction rate and the validation of reaction mechanism. For chemical reactions in condensed phase or enzymatic reactions, the computation of FE profile at ab initio (ai) quantum mechanical/molecular mechanics (QM/MM) level is still far too expensive. Although semiempirical (SE) method can be hundreds or thousands of times faster than the ai methods, the accuracy of SE methods is often unsatisfactory, due to the approximations that have been adopted in these methods. In this work, we proposed a new method termed MBAR+wTP, in which the ai QM/MM free energy profile is computed by a weighted thermodynamic perturbation (TP) correction to the SE profile generated by the multistate Bennett acceptance ratio (MBAR) analysis of the trajectories from umbrella samplings (US). The weight factors used in the TP calculations are a byproduct of the MBAR analysis in the post-processing of the US trajectories, which are often discarded after the free energy calculations. The raw ai QM/MM free energy profile is then smoothed using Gaussian process regression, in which the noise of each datum is set to be inversely proportional to the exponential of the reweighting entropy. The results show that this approach can enhance the efficiency of ai FE profile calculations by several orders of magnitude with only a slight loss of accuracy. This method can significantly enhance the applicability of ai QM/MM methods in the studies of chemical reactions in condensed phase and enzymatic reactions.

[SI-]SI \altaffiliationCurrent address: NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China \alsoaffiliationNYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China \alsoaffiliationDepartment of Chemistry and Biochemistry, University of Oklahoma, Norman OK 73019, United States of America ]Accelerated Computation of Free Energy Profile at ab initio Quantum Mechanical/Molecular Mechanics Accuracy via a Semi-Empirical Reference-Potential. I. Weighted Thermodynamics Perturbation

## 1 Introduction

Free energy profile (FE profile), defined as the free energy variation along the reaction coordinate(s) (RC), is now widely used to calculate the reaction rate and to examine the reaction mechanism.^{1} Based on the transition state theory (TST), the reaction rate is determined by the free energy barrier and the curvature of the free energy profile near the transition state.^{2, 3} Due to the exponential decrease of the population with respect to the free energy, a brute force simulation is always insufficient for the states aways from the reactant and the product. Umbrella sampling (US) is now routinely employed to enhance the sampling of the system along the RC by restraining the RC around a series of values using biasing potentials. The trajectories are then analyzed using the multistate Bennett acceptance ratio (MBAR)^{4, 5} or its binned variant, the weighted histogram analysis method (WHAM)^{6, 7, 8, 9} method, to construct the free energy profile. Multiple windows are required in the US sampling, and the number of windows is determined by the curvature of the free energy profile. MBAR and WHAM are global algorithms, which assume that a global equilibrium of all the windows have been reached.^{10} Besides, the samples must be as statistically independent as possible. In order to obtain a statistically sound result, a long time simulation for each window is needed. Another interesting method for data-processing that has emerged recently is the variational free energy profile (vFEP) method developed by Lee et al.^{11, 12} In this method, free energy profile is parametrically constructed via a maximum-likelihood analysis of the fitting functions for regression. It may out-perform WHAM and MBAR when data are scarce and overlaps between neighboring windows are insufficient. Similar ideas were also reported in other works.^{13, 14}

For the studies of chemical reactions in condensed phase, classical force fields, which lack a finite bond-dissociation energy, are in general not applicable. On the other end of the methodological spectrum, full quantum mechanical methods are not feasible either due to the extremely poor computational scaling with the system size. The hybrid quantum mechanical/molecular mechanical method (QM/MM)^{15} inherits the advantages from both sides, which makes it an appealing method and is nowadays routinely employed. However, the convergence of the FE profiles for chemical and enzymatic reactions in condensed phase is very slow, which requires exhaustive sampling in the high-dimensional phase space, and makes the computation of the FE profile at the ab initio (ai) QM/MM levels^{16, 17, 18} notoriously time consuming, if feasible at all. Therefore, a great amount of QM/MM simulations employed semiempirical (SE) methods such as PM3^{19}, AM1^{20}, MNDO^{21}, PM6^{22}, and density functional theory-based tight-binding (DFTB)^{23, 24} as well as its self-consistent-charge variant (SCC-DFTB)^{25}. Nevertheless, the results from QM/MM simulations employing an SE Hamiltonian are often of poor accuracy in terms of the reaction free energy and free energy barrier^{26}. Consequently, developing an efficient method for the computation of state free energy and free energy profile at ai QM/MM level remains one of the major challenges in modern computational chemistry.

In a pioneer work, Gao applied the dual-Hamiltonian method, aka the reference-potential method, in a study of the hydration free energy.^{27, 28} Muller et al. and Bentzien et al. proposed another dual-Hamiltonian approach to calculate the FE profile for a chemical reaction^{29, 30}, which was later named the paradynamics (PD) method.^{31, 32, 33} In this method, an initial simulation is carried out utilizing empirical valence bond (EVB)^{34, 35, 36} as a reference-potential, which is reasonably close to the real potential, allowing one to use a perturbation rectification in a subsequent step to obtain the free energy profile at the ai level. However, constructing the EVB potential energy surface is nontrivial, and the simulations at high-level Hamiltonian are not completely avoided. Rod and Ryde computed the free energy barrier for a methyl transfer reaction catalyzed by the enzyme catechol O-methyltransferase using a dual Hamiltonian approach, in which the free energy profile sampled at a molecular mechanical level was corrected to the level of density functional theory using thermodynamic perturbation (TP).^{37} However, in their study, the QM degrees-of-freedom were not sampled, which inevitably led to overestimation of the free energy. Later on, Heimdal and Ryde studied the convergence of MM to QM/MM perturbation in the reference-potential method, and noticed that the perturbations typically have convergence problems, and a thoroughly parameterized force field or SQM methods are often required.^{38} However, the perturbation from the low-level Hamiltonian to the high-level Hamiltonian in this work was carried out using the traditional thermodynamic perturbation with equal weight assigned to the snapshots from biased simulations.
Beierlein et al applied this idea in the calculation of free energy in protein-ligand association.^{39} Polyak et al designed a dual Hamiltonian free energy perturbation (DH-FEP) method to calculate the free energy profiles of a model system and chorismate mutase.^{40} However, their method was not rigorous, because in the TP calculation the configurations were sampled on a low-level Hamiltonian but the energy difference between two adjacent windows were calculated on a high-level Hamiltonian. In addition, the constrained dynamics always overestimates the free energy difference in TP calculations. More importantly, the configurations of the target RC (the unsampled state in Eq. 10 of ref. 40) were not unambiguous when being taken into the TP calculations. Therefore, the calculation with two-dimensional RCs worked better than the one with a one-dimensional RC, as they have observed in their study. König et al. combined energy reweighting with Bennett acceptance ratio and developed a new method called non-Boltzmann Bennett acceptance ratio (NBB). However, this method does not yield a minimum variance for the results.^{41, 42} Hudson et al incorporated energy reweighting into the chain-of-replicas method for the computation of pathway free energy.^{43} Because the chain-of-replicas method is a global simulation method instead of a pure post-processing method, the implementation of the reweighting is not trivial. Hudson et al also used the non-equilibrium simulation method to correct the free energy profile of a low-level Hamiltonian to a high-level one.^{44} However, this method requires expensive energy/force evaluation on the high-level Hamiltonian for every step of the simulation, which is not necessarily the preferred method in the first place for such studies.^{45}

In our previous work^{46}, the dual-Hamiltonian approach was used to calculate the solvation free energies of the molecules in the SAMPL4 via an alchemical decoupling process. A classical force field was chosen as the reference-potential, on which level the solvation free energies were estimated using BAR or its multistate variant MBAR. Then a direct TP calculation was performed to obtain the MM-to-QM/MM correction. It has been shown that this (M)BAR+TP method yielded minimum variance for the free energy. Since this (M)BAR+TP method can be applied in the calculation of free energy difference between two thermodynamic states, quite naturally it can also be applied to the calculation of FE profile, because free energy difference between two thermodynamic states is calculated as a generalized FE profile in an extended phase space with one dimension of the coupling parameter appended to the normal phase space. Dybeck incorporated the dual Hamiltonian method with MBAR in the calculation of free energy difference between two states, which can be put forward to the calculations of FE profile.^{47} Although this method yields the globally minimal variance for the free energies, it is an algorithm, where is the number of the intermediate states.

Another class of approaches that has become a new boost in computational chemistry is machine learning (ML) or deep learning (DL).^{48, 49} Based on some training using neural network, an energy correction term in a high-dimensional space (probably a complete phase space or a subspace of it) can be learned. Consequently, potential energy and/or force on the high-level Hamiltonian can be accurately estimated for the structural ensemble generated at the low-level Hamiltonian.^{50, 51, 52, 53, 54, 55, 56, 57} However, for most cases the learned energy/force correction for a chemical reaction is system-specific and short of transferability to other reactions, and the training itself is expensive. Even before the introduction of ML into this field, parameter tuning to high-level ai accuracy for the SE methods and force fields to conduct the simulation has been performed by many groups.^{58, 59, 60, 61, 62, 63, 64}

In this work, we extended the idea in MBAR+TP to the calculations of the FE profiles of one quasi-chemical reaction and three chemical reactions in aqueous solution as shown in Fig. 1. Most importantly, we show that TP calculations should not be carried out in the traditional way, in which equal weights are assigned to the configurations. Instead, each configuration is assigned a weight factor from the MBAR analysis. The reactions studied include (a) main chain dihedral rotation of a butane molecule, (b) reaction of \ceCH3Cl + Cl- -¿ Cl- + CH3Cl, (c) glycine intramolecular proton transfer reaction from the neutral form to the zwitterion form, and (d) aliphatic Claisen rearrangement reaction of allyl vinyl ether to 4-pentenal. We show that the FE profile of high-level Hamiltonian can be obtained in this way with configuration sampling carried out only using the low-level Hamiltonian, and only one hundredth of the configurations were used for the high-level energy calculations. As a consequence, the formidably expensive sampling using the high-level Hamiltonian was avoided, thus enhancing the efficiency by several orders of magnitude. Inevitably, the raw free energy profile computed from this weighted TP are contaminated by statistical noise. Gaussian process regression was applied using the statistical noise, as well as the profile itself, as the input. A smoothed profile and its uncertainty can thus be obtained.

## 2 Method

### 2.1 Umbrella Sampling and Multistate Bennett Acceptance Ratio

For reactions in condensed phase, the reactant and the product states are often separated by a free energy barrier hindering the transition between these two states. In order to characterize this free energy barrier, a reaction path defined by reaction coordinates (RC) in a low dimension should be carefully chosen, and the free energy profile along the reaction path from the reactant to the product is proportional to the logarithm of the probability of visit by the system. The reaction coordinate is usually a function of the collective atomic coordinates x. The reaction rate can be estimated by the ratio of the free energy barrier over , where is Boltzmann constant and is the temperature. When the reaction barrier is much greater than , which is often the case for chemical reactions under mild conditions, the probability of visit to the regions away from either the reactant or the product is rare, especially the visit to the transition state, and computational cost for a brute force simulation is far beyond what we can afford with modern computers. Umbrella sampling (US) method^{65} is a solution to this situation, and it is routinely used nowadays. US is a stratified approach, and each stratum samples a limited range of phase space along the RC by augmenting the original potential energy surface with a restraining bias . A harmonic potential is frequently used as the bias , where and are the target value of RC and the strength of the restraint, respectively, for the th biased window. Both the strength of restraint value and the number of US windows can affect the magnitude of overlap between adjacent windows, which is critical to the quantity of free energy calculations. In this work, values were optimized for each window based on short trial simulations, and overlap between neighboring windows should be larger than 0.03 using an assessment proposed by Klimovich et al. (See SI for details). The trajectories from a set of biased window simulations indexed by () with the potential energy surfaces are then post-processed using the Weighted Histogram Analysis Method (WHAM)^{6, 7, 8, 9} or its “binless” variant, the Multistate Bennett Acceptance Ratio (MBAR)^{4, 5}, globally to obtain the unbiased thermodynamic properties on the original potential energy surface . Both methods have been well-documented in the literature ^{6, 7, 8, 9, 4, 5}.
Here, we will present a brief explanation of the MBAR method for the analysis of US simulations.

According to the MBAR method^{4, 5}, the free energy of the th biased simulation window can be obtained by

(1) |

where is the th frame among configurations in the th biased simulation. Since appear on both the left-hand side and the right-hand side of Eq. 1, these equations were solved in this work by a simple self-consistent iteration method using an in-house program, although some other methods are also available^{4}. Sorting the samples into bins are unnecessary. In other words, MBAR is equivalent to WHAM in the limit that bin widths shrunk to zero and can be derived without the need to invoke histograms.^{4} It is worth emphasizing that is not necessarily among . It can be the free energy of any state of the system with a potential energy function . In particular, if , equals to the unbiased potential energy , and becomes the unbiased free energy .

The weight of the th configuration in the th biased simulation for any Hamiltonian is, akin to Eq. 9 in ref. 4,

(2) |

This definition ensures that for all and for all where . The unbiased probability can be calculated now by

(3) |

and the free energy profile is calculated as

(4) |

The covariance matrix can be obtained from Eq. D8 in Ref. 4 by

(5) |

where N is a matrix, denotes the number of histogram bins and is the total number of samples collected from all the biased simulations, and W is the weight matrix with a dimension of . The elements of W can be obtained via

(6) |

The elements of W can be obtained by

(7) |

The uncertainty in the estimated free energy difference can be computed as

(8) |

### 2.2 Weighted Thermodynamic Perturbation

Thermodynamic Perturbation (TP), also known as Free Energy Perturbation, exponential average, and Zwanzig equation was developed by Zwanzig.^{66}. In TP, only one simulation (normally with the inexpensive Hamiltonian) is required and the free energy difference between the target Hamiltonian and the sampled Hamiltonian can be calculated with the sampled configurations. Using the binned configurations from the US simulations and their weights from the MBAR analysis, the FE profile at high level Hamiltonian can be calculated by TP. According to Eq. 2, the weight for the th configuration in the th biased simulation under the low-level Hamiltonian, which is the unbiased SE Hamiltonian in this study, is

(9) |

where is constant for a given Hamiltonian, which can be canceled in normalization. In a “traditional” unbiased TP calculation, all samples would have equal weights, which is actually a special case of Eq. 9. Suppose we have performed a series of brute-force unbiased simulations to sample the configurational phase covering the reactant, the product and in between. for all the snapshots, becomes the unbiased free energy, and the exponential terms in the numerator and denominator cancel each other. We find , indicating that all the samples have equal weights in a “traditional” unbiased TP calculation.

For a certain histogram bin, the free energy difference between the high-level and low-level Hamiltonians can be obtained via

(10) |

where the subscripts and denote the high-level and the low-level Hamiltonians, respectively, is the histogram bin that the th configuration in the th biased simulation belongs to, and is the unbiased weight of the sample in bin under the low-level Hamiltonian. Here, the delta function singles out the samples inside bin . It is worth emphasizing that , as a test of our methodology, we can also calculate the weights of the samples under the high-level Hamiltonian, and then compute the free energy difference between the low-level Hamiltonian and the high level Hamiltonian in a reverse way. Or we can calculate the weights of the samples under both the low-level and high-level Hamiltonians and then compute the free energy via BAR. All these calculations yield the same results as shown in SI.

In order to obtain the uncertainty , Eq. 10 can be divided into two terms as

(11) |

The uncertainty can be computed by error propagation

(12) |

Here, and correspond to the first and the second terms on the right-hand side of Eq. 11, respectively.

(13) |

where , is the number of configurations falling into bin among the samples from all the biased simulations.

(14) |

where , is the number of configurations pertaining to histogram bin among all the samples.

In this work, the US samplings were performed at the low-level Hamiltonian, of which the FE profile can be obtained. Then, the FE profile of the high-level Hamiltonian can be calculated by

(15) |

The corresponding uncertainty of can be computed by error propagation

(16) |

In the following, we will use direct FE profile to refer to the FE profile obtained from the US samplings at the corresponding Hamiltonian, and use indirect FE profile to refer to the one obtained from TP correction over the direct FE profile from a different Hamiltonian.

In order to characterize the reliability of the TP calculation, “reweighting entropy” is introduced in our previous work^{45}, which is defined as

(17) |

where

(18) |

Here, represents the number of configurations falling into bin among the samples from all the biased simulations. The larger the reweighting entropy is, the more reliable the TP calculation can be. has a maximum value of 1.0, where all the samples have equal weights. The maximum of among frames is called “maximal weight” . Generally speaking, the smaller the maximal weight is, the more reliable the TP calculation can be. It is well known that TP requires significant overlap in phase space between the sampled (PM3 or PM6 in this work) Hamiltonian and the target (PM6 or B3LYP in this work) Hamiltonian.^{67}
Recently, we noticed that a similar quantity has been also suggested by Hansen et al.^{68}

### 2.3 Gaussian process regression for curve fitting

Gaussian Processes (GP) are generic supervised learning methods designed to solve regression and probabilistic classification problems.^{69} It is an attractive approach because it is nearly model-free. Gaussian process regression (GPR) has been used in free energy surface reconstruction.^{70, 71}
In this work, Gaussian process regression was utilized to fit the free energy profiles after the weighted TP correction, which is always contaminated by statistical noise. Given a set of observations , it can be imagined as a single sample from a Gaussian distribution with variates. A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution. A Gaussian process can be completely specified by its mean and covariance function , expressed as

(19) |

and

(20) |

where denotes the expectation operation and the covariance function relates one observation to another. Since the observations are noisy, each observation is related to an underlying function through a Gaussian noise model

(21) |

By considering also the noise, the covariance function was defined using the squared exponential as

(22) |

where is the length-scale and is the signal variance, is the Kronecker delta function and is the noise variance, which was set to the reciprocal of the exponential of the reweighting entropy value () corresponding to each observation in this work.
The free parameters are the “hyperparameters” that are optimized to maximize the likelihood of the observations.
The Gaussian process regression was carried out using the scikit-learn package^{72}.

### 2.4 Molecular Dynamics Simulations

One quasi-chemical reaction and three chemical reactions in aqueous solution shown in Fig. 1 are studied, including main chain dihedral rotation of a butane molecule, an reaction of \ceCH3Cl + Cl- -¿ Cl- + CH3Cl, intramolecular proton transfer in glycine from the neutral form to the zwitterion form, and an aliphatic Claisen rearrangement reaction from allyl vinyl ether to 4-pentenal.

For the main chain dihedral rotation of a butane molecule, the entire molecule was defined as the semiempirical QM (SQM) or QM region.
For the reaction, the complex of \ceCH3Cl and \ceCl- was defined as the semiempirical QM (SQM) or QM region.
For the glycine intramolecular proton transfer reaction, the glycine molecule was defined as the SQM or QM region.
For the Claisen rearrangement reaction, the solute was defined as the SQM or QM region.
A TIP3P water sphere with a radius of 25 Å was added to the solute centering on the heavy atom closest to the center-of-mass of the QM regions and was restrained by a soft half-harmonic potential with a force constant of 10 kcal/mol/ to avoid evaporation. Thiel et al. have justified the widespread use of a droplet model in QM/MM studies of reactions in solution and in enzymes^{73}. There were 2016 water molecules for the butane molecule system, 2014 water molecules for the reaction, 2017 water molecules for the glycine intramolecular proton transfer reaction and 2009 water molecules for the Claisen rearrangement reaction. The nonbonded interactions were fully counted without any truncations. The van der Waals (vdW) interactions were described with the general AMBER force field^{74}. PM3 and PM6 were used as the low-level Hamiltonians in the umbrella samplings, and the high-level QM Hamiltonian was chosen as B3LYP/6-31G(d). In order to measure the reliability of this indirect method, umbrella samplings at B3LYP level were also performed. We first compared the direct FE profile at PM6 with the indirect one at the same level reweighted from PM3 level. Then, the direct free energy profile at B3LYP/6-31G(d) level was computed to validate the indirect FE profile at the same level reweighted from the SE levels.

For the main chain dihedral rotation of a butane molecule, the main chain dihedral was chosen as the reaction coordinate. Umbrella samplings with 61 windows centering on ranged from 0 to 180 degrees were performed. The force constant of the bias restraint ranged from 100 kcal/mol/ to 150 kcal/mol/ in PM3, PM6 and B3LYP simulations for this system.
The reaction coordinate for the reaction was defined as , where and are the bond distances. Umbrella samplings with 73 windows centering on ranged from -2.5 to 2.5 were performed. The force constant of the bias restraint ranged from 100 kcal/mol/ to 900 kcal/mol/ in PM3 simulations and from 100 kcal/mol/ to 800 kcal/mol/ in PM6 and B3LYP simulations for this system.
The reaction coordinate for the glycine intramolecular proton transfer reaction was defined as , where is the hydrogen atom to be transferred. Umbrella samplings with 61 windows under PM3 level and with 51 windows under PM6 and B3LYP levels centering from on ranged from -1.5 to 1.5 were applied. The force constant of the bias restraint ranged from 100 kcal/mol/ to 1350 kcal/mol/ in PM3 simulations and from 100 kcal/mol/ to 900 kcal/mol/ in PM6 and B3LYP simulations for this system.
The reaction coordinate for the Claisen rearrangement reaction was defined as . Umbrella samplings with 95 windows centering on ranged from -2.2 to 1.7 were applied. The force constant of the bias restraint ranged from 100 kcal/mol/ to 1600 kcal/mol/ in PM3 simulations and from 100 kcal/mol/ to 1400 kcal/mol/ in PM6 and B3LYP simulations for this system.
For each US window simulation, the system was optimized in energy for 500 steps using the steepest decent optimization method followed by 500 steps of the conjugate gradient method with the respective Hamiltonian. For SE Hamiltonian, the system was heated up to 300 K in 50 ps and was equilibrated for 100 ps. A 1-ns production simulation was conducted for each window for free energy analysis. However, for B3LYP Hamiltonian the system was heated up to 300 K in 10 ps and was equilibrated for 10 ps. The production simulation has a length of 100-ps for each window. The integration time step was set to 1 fs, except for that in the PM3 and B3LYP simulations of the glycine intramolecular proton transfer reaction, which was set to 0.5 fs. The configurations were saved every 0.1 ps in the PM3 and PM6 simulations, and were saved every 0.05 ps in the B3LYP simulations. The temperature was regulated at 300 K with the Andersen temperature coupling scheme^{75}.
All the simulations were performed by the AmberTools 16 program package^{76}, and the QM/MM calculations were carried out by interfacing with Q-Chem 4.3 package^{77}.

## 3 Results

### 3.1 Main chain dihedral rotation of a butane molecule

Although the minimum free energy structures are located in the same RC value (dihedral angle) under PM3/MM and PM6/MM Hamiltonians, the well depths are different. For the global minimum with , these two Hamiltonians led to free energy values that differ by 0.2 kcal/mol. The other free energy minimum is located at . As shown in Table 1, the well depth from the PM3/MM calculation is -3.39 kcal/mol, which is 0.4 kcal/mol deeper than that from the PM6/MM calculation (-2.99 kcal/mol). Although the weighted TP correction was rigorously derived, its practical applicability is still worth a validation. In the first step, the PM3-to-PM6 correction was applied to the direct PM3 FE profile, and the indirect PM6 FE profile obtained in this way was compared with the direct PM6 FE profile. As shown in Fig. LABEL:SI-fig:BT-PM3andPM6toB3LYP-RE, all the reweighting entropy values are larger than 0.76, which indicate that this TP calculation is trustworthy and a smooth indirect FE profile can be obtained. It clearly shows in Fig. 2 that the indirect PM6 FE profile is highly consistent with the direct PM6 FE profile, indicating that the reweighting method introduced in this work is not only theoretically rigorous but also practically reliable at least for the correction from PM3 to PM6.

B3LYP | |||||
---|---|---|---|---|---|

Reaction | direct PM3 | direct PM6 | indirect from PM3 | indirect from PM6 | direct |

BT | -3.390.03 | -2.990.03 | -6.180.05 | -5.830.05 | -5.860.11 |

25.310.07 | 20.600.07 | 22.440.08 | 22.150.08 | 21.700.16 | |

PT | 24.470.08 | 3.250.05 | 2.790.10 | 3.670.08 | 3.040.14 |

CR | 35.430.08 | 30.420.08 | 27.480.10 | 28.050.10 | 26.300.17 |

^{a} Location at first well depth.

^{b} Location at .

Next, the PM3-to-B3LYP and PM6-to-B3LYP corrections were applied to generate the indirect FE profiles at the B3LYP level. As shown in Fig. LABEL:SI-fig:BT-PM3andPM6toB3LYP-RE, the reweighting entropies are smaller than those in the TP calculations from PM3 to PM6, indicating that the overlaps in phase space between B3LYP and the SQM Hamiltonians are less significant than that between PM3 and PM6. Comparing with PM3, the PM6 Hamiltonian is more similar to B3LYP as can be inferred from the reweighting entropy values. This is reasonable, because PM6 was refined over the PM3 Hamiltonian. Nevertheless, most of the weighting entropy values are still larger than 0.6, with only a small number of exceptions. It can be noted that at , there is a precipitous drop of reweighting entropy in reweighting the PM3 level to the B3LYP level process in Fig. LABEL:SI-fig:BT-PM3andPM6toB3LYP-RE, which leads to a corresponding precipitous drop of the free energy around this region as shown in Fig. 3(a). This indicates that the weighted TP corrections for each histogram bin is sensitive to the reweighting entropy quantity, and the reweighting entropy can be used to characterize the reliability of the weighted TP calculations. Then, we fitted these two indirect FE profiles by Gaussian process regression, and the smoothed curves are nearly identical except for a small difference near , as shown in Fig. 3(b). They are also very close to the direct FE profile at the same level, and the indirect one from the PM6 level performs slightly better than the indirect one corrected from the PM3 level. As shown in Table 1, the two indirect B3LYP FE profiles reweighted from the PM3 and PM6 levels have a free energy of -6.180.05 kcal/mol and -5.830.05 kcal/mol at , respectively. They well match the direct B3LYP FE profile, which has a free energy of -5.860.11 kcal/mol at . Meanwhile, all the B3LYP FE profiles, either direct or indirect, have the same positions in RC for the maximum and minimum free energy structures.

### 3.2 reaction of \ceCH3Cl + Cl- -¿ Cl- + CH3Cl

In this work, as shown in Fig. 4, the direct FE profile calculated at the PM3 level shows some difference from that at the PM6 level. As shown in Table 1, the free energy barrier estimated from the PM3 simulations is about 4.7 kcal/mol higher than that obtained from the PM6 simulations. The element-specific Gaussian core-core corrections in PM3 is replaced with a pairwise core-core correction term in PM6.^{22, 78} Besides, -orbitals are added to the atomic basis for certain elements including chlorine in the PM6 Hamiltonian, which shows significant improvement over the PM3 method.^{79} Therefore, it is expected that PM6 Hamiltonian outperforms PM3 for the agreement of the FE profile with the one at the B3LYP level. After the PM3-to-PM6 correction, the indirect PM6 FE profile is highly consistent with the direct PM6 FE profile, as shown in Fig. 4. The reweighting entropies are all larger than 0.70 (see Fig. LABEL:SI-fig:SN2-PM3andPM6toB3LYP-RE), which also indicates that this TP calculation can generate a reliable indirect FE curve.

However, when running the TP calculations from the SQM to B3LYP level, the reweighting entropies became much smaller than those in the PM3-to-PM6 TP corrections (see Fig. LABEL:SI-fig:SN2-PM3andPM6toB3LYP-RE). This is caused by the slow convergence at the tail region of the biasing potential distribution.^{46, 38, 42, 80, 81, 67, 82} It is not surprising that the fitted FE profile data are noisy, which are shown in Fig. 5(a). This statistical fluctuation can be well smoothed by the Gaussian process regression, and the final curves are nearly identical, as shown in Fig. 5(b). In addition, these two indirect B3LYP FE profiles are highly consistent with the direct B3LYP FE profile, except for some marginal difference at the transition state and product state regions. Especially, when corrected from the PM6 level, the transition barrier differs by only 0.45 kcal/mol from the direct FE profile at the B3LYP level, as shown in Table 1.

Please note that the simulated reaction barrier is around 22 kcal/mol, which is lower by 4-5 kcal/mol than the experimentally measured one (26.5 kcal/mol) determined from rate constant. This deviation is likely to be caused by the limited accuracy of B3LYP functional and the approximate QM/MM van der Waals interaction, which are not fully compatible with each other. The agreement is expected to be further improved when a more accurate description is adopted for both the intra-molecular and inter-molecular interactions. For instance, Kuechler et al studied this reaction using a highly accurate specific reaction parameter (SRP) semi-empirical method^{83} and a charge-dependent exchange and dispersion (QXD) model^{84}, and found excellent agreement between the simulated and experimental barriers for this reaction. However, the accuracy of the high-level Hamiltonian (B3LYP) is not a focus in the current work.

### 3.3 Glycine intramolecular proton transfer reaction

As shown in Fig. 6, the FE profile calculated at the PM3 level is very different from that at the PM6 level. As shown in Table 1, the reaction free energy and activation barrier estimated from the PM3/MM MD simulations are 1.51 0.11 and 24.47 0.08 kcal/mol, respectively, while those obtained from the PM6/MM simulations are -13.45 0.08 and 3.25 0.05 kcal/mol. The transition state at PM3 level is located at -0.02 Å, while that at PM6 level is located at -0.36 Å. Clearly, the PM3/MM simulations yielded an incorrect result, because the zwitterion form of glycine molecule is stabilized by the water molecules and is the dominant state in aqueous solution. The glycine intramolecular proton transfer reaction process involves one hydrogen atom transferred from the oxygen atom to the nitrogen atom. Because PM6 uses different core-core repulsion potentials for \ceN-H, \ceO-H, \ceC-C, \ceSi-O pairs to correct for the specific defect in the parametrization, it is not surprising that it yielded a FE profile that is more reasonable than the PM3 one. As is shown in Fig. LABEL:SI-fig:PT-PM3andPM6toB3LYP-RE, some of the reweighting entropies are rather small in magnitude in comparison with those obtained for the two systems above, especially for the bins with RC from -1.5 Å to -0.5 Å. After the TP correction, the indirect PM6 FE profile shows some deviations from the direct one in this region. It is interesting to see that, after the Gaussian process regression, the indirect PM6 FE profile can be well superimposed with the direct PM6 FE profile. This result demonstrated that this TP correction can even work well despite a large difference in two semiempirical Hamiltonians.

The reweighting entropies are even smaller in the TP calculations from the PM3 and PM6 levels to the B3LYP level as shown in Fig. LABEL:SI-fig:PT-PM3andPM6toB3LYP-RE, which leads to even more noisy FE profiles as shown in Fig. 7(a). Since PM3 Hamiltonian is less accurate than PM6, it is quite natural that the indirect FE profile from PM3 has much more notable fluctuations than the one from PM6. Among all the reactions studied in this work, glycine intramolecular proton transfer reaction is the most challenging one with many reweighting entropy values below 0.3. This will definitely lead to large uncertainty in the indirect free energy profiles at the B3LYP level. For the TP calculations from PM6 to B3LYP, most of the reweighting entropies are larger than 0.3 when the reaction coordinate is larger than 0 Å (see Fig. LABEL:SI-fig:PT-PM3andPM6toB3LYP-RE). Correspondingly, the free energy profile in this region is also smooth with only small fluctuations as shown in Fig. 7(a). While in the region with , many points have a reweighting entropy smaller than 0.3. The fluctuations of the free energy in this region are more significant. In Gaussian process regression, we set the variance of data noise to be inversely proportional to the exponential of the reweighting entropy. Data with large reweighting entropy have large weight in determining the profile. While those with small reweighting entropy are less important. Provided that there are some data points with large reweighting entropies (larger than 0.3 for this case) in a small interval, a confident free energy profile can still be generated locally. After Gaussian process regression, the curves agree much better with each other, as well as with the direct FE profile at the B3LYP level as shown in Fig. 7(b). The transition states after TP correction shift to the left by 0.09 Å (at -0.11 Å) for PM3 and to the right by 0.2 Å (at -0.16 Å) for PM6 as compared to the original SQM results. The transition state in the direct B3LYP FE profile locates at -0.11 Å. As shown in Table 1, the free energy barrier heights in the indirect FE profiles are 2.79 kcal/mol and 3.67 kcal/mol, respectively, which are close to the direct B3LYP FE value (3.04 kcal/mol) with a somewhat acceptable difference.

### 3.4 Aliphatic Claisen rearrangement reaction of allyl vinyl ether to 4-pentenal

Just like the other three reactions, the PM3 FE profile shows some difference from the PM6 one. The reaction barrier is about 5.0 kcal/mol higher (30.420.08 kcal/mol for PM6 v.s. 35.430.08 kcal/mol for PM3), and the reaction free energy is 10.5 kcal/mol more negative (-14.880.11 kcal/mol for PM6 vs -25.390.12 kcal/mol for PM3), as shown in Table 1. However, the TP correction can still eliminate this difference, and the direct and indirect PM6 FE profiles agree with each other quite well with only marginal difference in some small region as shown in Fig. 8(a), which was caused mainly by small magnitude of overlaps in the phase space between the high-level and low-level Hamiltonians and can be readily flagged by the reweighting entropy in these regions as shown in Fig. LABEL:SI-fig:CR-PM3andPM6toB3LYP-RE. Most of the bins have a reweighting entropy larger than 0.6, which guaranteed reliable free energy corrections for those bins. After smoothing using the Gaussian process regression, the direct and indirect FE profiles at PM6 level perfectly agree with each other, as shown in Fig. 8(b).

In the TP calculations from the PM3 and PM6 levels to the B3LYP level, the reweighting entropies are even smaller, resulting in two rough indirect FE profiles at the B3LYP level shown in Fig. 9. For the indirect FE profile from PM6, the noise is rather intense in the region with the RC around 0.0 Å, which corresponds well with the small reweighting entropy in that region as shown in Fig. LABEL:SI-fig:CR-PM3andPM6toB3LYP-RE. The two indirect B3LYP FE profiles after being smoothed by the Gaussian process regression method are highly consistent with each other again, which are shown in Fig. 9(b). The transition states after the TP corrections from the PM3 and PM6 Hamiltonians are located at -0.42 Å and -0.32 Å, respectively, which are shifted to the left by 0.15 Å and 0.05 Å as compared to the original SQM results. The transition state in the direct B3LYP/MM FE profile locates at -0.46 Å. As shown in Table 1, the free energy barrier height in the indirect FE profiles are 27.48 kcal/mol and 28.05 kcal/mol, which deviate by only 1.18 and 1.75 kcal/mol from the direct B3LYP/MM simulation (26.30 kcal/mol).

### 3.5 Computational expense

The estimated wall-clock times for the computations of the direct and indirect QM/MM free energy profiles at the B3LYP/6-31G* level are listed in Table 2. For the calculations of the indirect profiles, the wall-clock time include both the time for generating the SQM trajectories and the time for the single point energy calculations at the B3LYP level. The cost at the PM3 level is similar to that at the PM6 level with the reaction being an exception, which was caused by the additional -orbital in the basis set for the chlorine atoms in PM6. Nonetheless, both methods are orders of magnitude faster than the direct calculations of the profiles at the B3LYP level, because for each indirect profile only high-level single point calculations were required, where is the number of windows. Because the samples for the single-point calculations at the B3LYP level were saved once every 100 MD steps in the SQM simulations in this work, the efficiency enhancement cannot exceed 100 folds. The samples can be saved every 1000 steps or even less infrequently, especially for chemical reactions with a correlation time longer than tens of picoseconds. Then, the enhancement in the efficiency will be even greater.

Wall-clock Time | |||||||
---|---|---|---|---|---|---|---|

Reaction | indirect from PM3 | indirect from PM6 | |||||

sampling | energy evaluation | total | sampling | energy evaluation | total | direct | |

BT | 885 | 1320 | 2205 | 890 | 1322 | 2212 | 54,900 |

971 | 1432 | 2403 | 2263 | 1480 | 3743 | 55,480 | |

PT | 824 | 1282 | 2106 | 740 | 1252 | 1992 | 91,800 |

CR | 1211 | 2248 | 3459 | 1306 | 2177 | 3483 | 101,650 |

^{a} Using the same number of windows as in the semiempirical simulations, and one 1-ns simulation for each window.

## 4 Discussion

It is well known that the success of TP calculations heavily relies on the similarity of the reference (low-level) Hamiltonian and the target (high-level) Hamiltonian. If the important regions of these two Hamiltonians are well-separated in phase space, the important region of the target Hamiltonian cannot be sampled in finite simulation time. For those cases, the TP correction is expected to fail to yield accurate results, simply because the probability of an unsampled configuration cannot be corrected via reweighting. In this work, we only studied some simple reactions, of which the reactants have a small number of degrees-of-freedom. When the reactant becomes larger, this numerical difficulty on phase-space overlap becomes even severe, and this reweighting method is prone to more failure.
For those difficult cases, a “case-specific” optimization of the parameters in the reference Hamiltonian against the high-level Hamiltonian in terms of energetic and structural properties can improve the reliability of TP calculation. Kuechler et al. studied the solvation effect on the reaction mechanism of the reaction between chlorine anion and methyl chloride using optimized specific reaction parameters (SRP) for chlorine.^{83} They showed that the optimized semi-empirical Hamiltonian can reproduce the free energy profile of a high-level method. Zhou and Pu developed a general strategy of reparametrizing semiempirical (SE) methods against ab initio methods designated Reaction Path Force Matching (RP-FM) for QM/MM simulations in condensed phases.^{61} Albeit nontransferable, RP-FM can also improve the agreement between the reference Hamiltonian and the target Hamiltonian. This topic will be covered in the next paper in this series of reports by us.

## 5 Conclusion

For chemical reactions in condensed phase or enzymatic reactions, the computation of a converged free energy (FE) profile at ab initio (ai) QM/MM level is still far from being affordable. In this work, we proposed a new method termed MBAR+wTP to obtain the FE profile at ai QM/MM level with much less computational expense by combining the Multistate Bennett Acceptance Ratio (MBAR) method and weighted Thermodynamic Perturbation (TP) method. This method uses the semiempirical (SE) QM/MM free energy profile obtained from MBAR analysis of the Umbrella Sampling (US) trajectories as an initial guess, followed by the SE-to-ai correction using weighted TP. One quasi-chemical reaction and three chemical reactions are used to validate the applicability of this method.
The SE FE profiles were found to deviate from the ai ones by several kcal/mol in terms of the barrier height and the reaction free energy. After the SE-to-ai correction, the FE profiles agreed much better with the direct simulated one with errors below 1 kcal/mol for most cases. To reduce the numerical difficulty in the weighted TP calculations, similar to most TP calculations, curve fitting using the Gaussian process regression were employed to effectively reduce the fluctuations in the free energy profile. Choosing a SE method that is more similar to the ai Hamiltonian also reduces the fluctuation and make this method more reliable. Our method is expected to fail to produce accurate reaction free energy profiles, when the SE Hamiltonian and the ai Hamiltonian have no significant overlap in the phase space. This difficulty can be resolved by refitting the SE Hamiltonian to the ai one via the Reaction Path Force Matching approach, which will be reported in an accompanying paper.^{61} Because the samples used for the TP calculations are only one hundredth or even one thousandth of the total number of samples generated in the simulation, our method can increase the efficiency by several orders of magnitude. Since the numerical difficulty increases with the complexity of the reactions, we will validate this approach further by applying it in some real systems like enzymatic reactions.

The proof that the forward and backward thermodynamic perturbation are equivalent to the Bennett Acceptance Ratio for the weighted samples from umbrella sampling, the reliability analyses of the results using overlap matrix, reweighting entropy, time interval for sampling harvesting, and the center and the force constant for each biasing window are available in the Supporting Information.

Y.M. is supported by the National Natural Science Foundation of China (Grant No. 21773066) and the Fundamental Research Funds for the Central Universities. Y.S. is grateful to Department of Energy Office of Science for support through grant DE-SC0011297. CPU time was provided by the Supercomputer Center of East China Normal University.

## References

- Hu and Yang 2008 Hu, H.; Yang, W. Free Energies of Chemical Reactions in Solution and in Enzymes with ab initio Quantum Mechanics/Molecular Mechanics Methods. Annu. Rev. Phys. Chem. 2008, 59, 573–601.
- Laidler and King 1983 Laidler, K. J.; King, M. C. Development of Transition-State Theory. J. Phys. Chem. 1983, 87, 2657–2664.
- Truhlar et al. 1996 Truhlar, D. G.; Garrett, B. C.; Klippenstein, S. J. Current Status of Transition-State Theory. J. Phys. Chem. 1996, 100, 12771–12800.
- Shirts and Chodera 2008 Shirts, M. R.; Chodera, J. D. Statistically Optimal Analysis of Samples from Multiple Equilibrium States. J. Chem. Phys. 2008, 129, 124105.
- Shirts 2017 Shirts, M. R. Reweighting from the Mixture Distribution as a Better Way to Describe the Multistate Bennett Acceptance Ratio. arXiv.org, 2017; https://arxiv.org/abs/1704.00891.
- Ferrenberg and Swendsen 1989 Ferrenberg, A. M.; Swendsen, R. H. Optimized Monte Carlo Data Analysis. Phys. Rev. Lett. 1989, 63, 1195–1198.
- Souaille and Roux 2001 Souaille, M.; Roux, B. Extension to the Weighted Histogram Analysis Method: Combining Umbrella Sampling with Free Energy Calculations. Comput. Phys. Commun. 2001, 135, 40–57.
- Gallicchio et al. 2005 Gallicchio, E.; Andrec, M.; Felts, A. K.; Levy, R. M. Temperature Weighted Histogram Analysis Method, Replica Exchange, and Transition Paths. J. Phys. Chem. B 2005, 109, 6722–6731.
- Chodera et al. 2007 Chodera, J. D.; Swope, W. C.; Pitera, J. W.; Seok, C.; Dill, K. A. Use of the Weighted Histogram Analysis Method for the Analysis of Simulated and Parallel Tempering Simulations. J. Chem. Theory Comput. 2007, 3, 26–41.
- Mey et al. 2014 Mey, A. S. J. S.; Wu, H.; Noé, F. xTRAM: Estimating Equilibrium Expectations from Time-Correlated Simulation Data at Multiple Thermodynamic States. Phys. Rev. X 2014, 4, 041018.
- Lee et al. 2013 Lee, T.-S.; Radak, B. K.; Pabis, A.; York, D. M. A New Maximum Likelihood Approach for Free Energy Profile Construction from Molecular Simulations. J. Chem. Theory Comput. 2013, 9, 153–164.
- Lee et al. 2014 Lee, T.-S.; Radak, B. K.; Huang, M.; Wong, K.-Y.; York, D. M. Roadmaps through Free Energy Landscapes Calculated Using the Multidimensional vFEP Approach. J. Chem. Theory Comput. 2014, 10, 24–34.
- Maragliano and Vanden-Eijnden 2008 Maragliano, L.; Vanden-Eijnden, E. Single-sweep Methods for Free Energy Calculations. J. Chem. Phys. 2008, 128, 184110.
- Meng and Roux 2015 Meng, Y.; Roux, B. Efficient Determination of Free Energy Landscapes in Multiple Dimensions from Biased Umbrella Sampling Simulations Using Linear Regression. J. Chem. Theory Comput. 2015, 11, 3523–3529.
- Warshel and Levitt 1976 Warshel, A.; Levitt, M. Theoretical Studies of Enzymic Reactions: Dielectric, Electrostatic and Steric Stabilization of the Carbonium Ion in the Reaction of Lysozyme. J. Mol. Biol. 1976, 103, 227–249.
- Zhang et al. 1999 Zhang, Y.; Lee, T.-S.; Yang, W. A Pseudobond Approach to Combining Quantum Mechanical and Molecular Mechanical Methods. J. Chem. Phys. 1999, 110, 46–54.
- Zhang et al. 2000 Zhang, Y.; Liu, H.; Yang, W. Free Energy Calculation on Enzyme Reactions with an Efficient Iterative Procedure to Determine Minimum Energy Paths on a Combined ab initio QM/MM Potential Energy Surface. J. Chem. Phys. 2000, 112, 3483–3492.
- Cui and Karplus 2001 Cui, Q.; Karplus, M. Triosephosphate Isomerase: A Theoretical Comparison of Alternative Pathways. J. Am. Chem. Soc. 2001, 123, 2284–2290.
- Stewart 1989 Stewart, J. J. P. Optimization of Parameters for Semiempirical Methods I. Method. J. Comput. Chem. 1989, 10, 209–220.
- Dewar et al. 1985 Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. Development and Use of Quantum Mechanical Molecular Models. 76. AM1: A New General Purpose Quantum Mechanical Molecular Model. J. Am. Chem. Soc. 1985, 107, 3902–3909.
- Dewar and Thiel 1977 Dewar, M. J. S.; Thiel, W. Ground States of Molecules. 38. The MNDO Method. Approximations and Parameters. J. Am. Chem. Soc. 1977, 99, 4899–4907.
- Stewart 2007 Stewart, J. J. P. Optimization of Parameters for Semiempirical Methods V: Modification of NDDO Approximations and Application to 70 Elements. J. Mol. Model. 2007, 13, 1173–1213.
- Porezag et al. 1995 Porezag, D.; Frauenheim, T.; Kohler, T.; Seifert, G.; Kaschner, R. Construction of Tight-Binding-Like Potentials on the Basis of Density-Functional-Theory: Applications to Carbon. Phys. Rev. B 1995, 51, 12947.
- Seabra et al. 2007 Seabra, G. d. M.; Walker, R. C.; Elstner, M.; Case, D. A.; Roitberg, A. E. Implementation of the SCC-DFTB Method for Hybrid QM/MM Simulations within the Amber Molecular Dynamics Package. J. Phys. Chem. A 2007, 111, 5655–5664.
- Elstner et al. 1998 Elstner, M.; Porezag, D.; Jungnickel, G.; Elsner, J.; Haugk, M.; Frauenheim, T.; Suhai, S.; Seifert, G. Self-consistent Charge Density Functional Tight-Binding Method for Simulation of Complex Material Properties. Phys. Rev. B 1998, 58, 7260.
- Christensen et al. 2017 Christensen, A. S.; Kromann, J. C.; Jensen, J. H.; Cui, Q. Intermolecular Interactions in the Condensed Phase: Evaluation of Semi-empirical Quantum Mechanical Methods. J. Chem. Phys. 2017, 147, 161704.
- Gao 1992 Gao, J. Absolute Free Energy of Solvation from Monte Carlo Simulations Using Combined Quantum and Molecular Mechanical Potentials. J. Phys. Chem. 1992, 96, 537–540.
- Gao and Xia 1992 Gao, J.; Xia, X. A Priori Evaluation of Aqueous Polarization Effects through Monte Carlo QM-MM Simulations. Science 1992, 258, 631–635.
- Muller and Warshel 1995 Muller, R. P.; Warshel, A. Ab Initio Calculations of Free Energy Barriers for Chemical Reactions in Solution. J. Phys. Chem. 1995, 99, 17516–17524.
- Bentzien et al. 1998 Bentzien, J.; Muller, R. P.; Florián, J.; Warshel, A. Hybrid ab initio Quantum Mechanics/Molecular Mechanics Calculations of Free Energy Surfaces for Enzymatic Reactions: The Nucleophilic Attack in Subtilisin. J. Phys. Chem. B 1998, 102, 2293–2301.
- Plotnikov et al. 2011 Plotnikov, N. V.; Kamerlin, S. C. L.; Warshel, A. Paradynamics: An Effective and Reliable Model for ab initio QM/MM Free-Energy Calculations and Related Tasks. J. Phys. Chem. B 2011, 115, 7950–7962.
- Plotnikov and Warshel 2012 Plotnikov, N. V.; Warshel, A. Exploring, Refining, and Validating the Paradynamics QM/MM Sampling. J. Phys. Chem. B 2012, 116, 10342–10356.
- Lameira et al. 2016 Lameira, J.; Kupchencko, I.; Warshel, A. Enhancing Paradynamics for QM/MM Sampling of Enzymatic Reactions. J. Phys. Chem. B 2016, 120, 2155–2164.
- Warshel and Weiss 1980 Warshel, A.; Weiss, R. M. An Empirical Valence Bond Approach for Comparing Reactions in Solutions and in Enzymes. J. Am. Chem. Soc. 1980, 102, 6218–6226.
- Warshel 1991 Warshel, A. Computer Modeling of Chemical Reactions in Enzymes and Solutions; John Wiley and Sons: New York, 1991.
- Billeter et al. 2001 Billeter, S.; Webb, S.; Iordanov, T.; Agarwal, P.; Hammes-Schiffer, S. Hybrid Approach for Including Electronic and Nuclear Quantum Effects in Molecular Dynamics Simulations of Hydrogen Transfer Reactions in Enzymes. J. Chem. Phys. 2001, 114, 6925–6936.
- Rod and Ryde 2005 Rod, T. H.; Ryde, U. Quantum Mechanical Free Energy Barrier for an Enzymatic Reaction. Phys. Rev. Lett. 2005, 94, 138302.
- Heimdal and Ryde 2012 Heimdal, J.; Ryde, U. Convergence of QM/MM Free-Energy Perturbations Based on Molecular-Mechanics or Semiempirical Simulations. Phys. Chem. Chem. Phys. 2012, 14, 12592–12604.
- Beierlein et al. 2011 Beierlein, F. R.; Michel, J.; Essex, J. W. A Simple QM/MM Approach for Capturing Polarization Effects in Protein-ligand Binding Free Energy Calculations. J. Phys. Chem. B 2011, 115, 4911–4926.
- Polyak et al. 2013 Polyak, I.; Benighaus, T.; Boulanger, E.; Thiel, W. Quantum Mechanics/Molecular Mechanics Dual Hamiltonian Free Energy Perturbation. J. Chem. Phys. 2013, 139, 064105.
- König and Boresch 2011 König, G.; Boresch, S. Non-Boltzmann Sampling and Bennett’s Acceptance Ratio Method: How to Profit from Bending the Rules. J. Comput. Chem. 2011, 32, 1082–1090.
- König et al. 2014 König, G.; Hudson, P. S.; Boresch, S.; Woodcock, H. L. Multiscale Free Energy Simulations: An Efficient Method for Connecting Classical MD Simulations to QM or QM/MM Free Energies Using Non-Boltzmann Bennett Reweighting Schemes. J. Chem. Theory Comput. 2014, 10, 1406–1419.
- Hudson et al. 2015 Hudson, P. S.; White, J. K.; Kearns, F. L.; Hodoscek, M.; Boresch, S.; Woodcock, H. L. Efficiently Computing Pathway Free Energies: New Approaches Based on Chain-of-replica and Non-Boltzmann Bennett Reweighting Schemes. BBA Gen. Subjects 2015, 1850, 944–953.
- Hudson et al. 2015 Hudson, P. S.; Woodcock, H. L.; Boresch, S. Use of Nonequilibrium Work Methods to Compute Free Energy Differences Between Molecular Mechanical and Quantum Mechanical Representations of Molecular Systems. J. Phys. Chem. Lett. 2015, 6, 4850–4856.
- Wang et al. 2017 Wang, M.; Li, P.; Jia, X.; Liu, W.; Shao, Y.; Hu, W.; Zheng, J.; Brooks, B. R.; Mei, Y. Efficient Strategy for the Calculation of Solvation Free Energies in Water and Chloroform at the Quantum Mechanical/Molecular Mechanical Level. J. Chem. Inf. Model. 2017, 57, 2476–2489.
- Jia et al. 2016 Jia, X.; Wang, M.; Shao, Y.; König, G.; Brooks, B. R.; Zhang, J. Z. H.; Mei, Y. Calculations of Solvation Free Energy through Energy Reweighting from Molecular Mechanics to Quantum Mechanics. J. Chem. Theory Comput. 2016, 12, 499–511.
- Dybeck et al. 2016 Dybeck, E. C.; König, G.; Brooks, B. R.; Shirts, M. R. Comparison of Methods To Reweight from Classical Molecular Simulations to QM/MM Potentials. J. Chem. Theory Comput. 2016, 12, 1466–1480.
- Rupp 2015 Rupp, M. Special Issue on Machine Learning and Quantum Mechanics. Int. J. Quantum Chem. 2015, 115, 1003–1004.
- Goh et al. 2017 Goh, G. B.; Hodas, N. O.; Vishnu, A. Deep Learning for Computational Chemistry. J. Comput. Chem. 2017, 38, 1291–1307.
- Handley and Popelier 2010 Handley, C. M.; Popelier, P. L. A. Potential Energy Surfaces Fitted by Artificial Neural Networks. J. Phys. Chem. A 2010, 114, 3371–3383.
- Caccin et al. 2015 Caccin, M.; Li, Z.; Kermode, J. R.; De Vita, A. A Framework for Machine-learning-augmented Multiscale Atomistic Simulations on Parallel Supercomputers. Int. J. Quantum Chem. 2015, 115, 1129–1139.
- Ramakrishnan et al. 2015 Ramakrishnan, R.; Dral, P. O.; Rupp, M.; von Lilienfeld, O. A. Big Data Meets Quantum Chemistry Approximations: The -Machine Learning Approach. J. Chem. Theory Comput. 2015, 11, 2087–2096.
- Botu et al. 2017 Botu, V.; Batra, R.; Chapman, J.; Ramprasad, R. Machine Learning Force Fields: Construction, Validation, and Outlook. J. Phys. Chem. C 2017, 121, 511–522.
- Shen et al. 2016 Shen, L.; Wu, J.; Yang, W. Multiscale Quantum Mechanics/Molecular Mechanics Simulations with Neural Networks. J. Chem. Theory Comput. 2016, 12, 4934–4946.
- Wu et al. 2017 Wu, J.; Shen, L.; Yang, W. Internal Force Corrections with Machine Learning for Quantum Mechanics/Molecular Mechanics Simulations. J. Chem. Phys. 2017, 147, 161732.
- Dral et al. 2017 Dral, P. O.; Owens, A.; Yurchenko, S. N.; Thiel, W. Structure-based Sampling and Self-correcting Machine Learning for Accurate Calculations of Potential Energy Surfaces and Vibrational Levels. J. Chem. Phys. 2017, 146, 244108.
- Li et al. 2017 Li, Y.; Li, H.; Pickard, F. C.; Narayanan, B.; Sen, F. G.; Chan, M. K. Y.; Sankaranarayanan, S. K. R. S.; Brooks, B. R.; Roux, B. Machine Learning Force Field Parameters from ab initio Data. J. Chem. Theory Comput. 2017, 13, 4492–4503.
- Gonzalez-Lafont et al. 1991 Gonzalez-Lafont, A.; Truong, T. N.; Truhlar, D. G. Direct Dynamics Calculations with NDDO (Neglect of Diatomic Differential Overlap) Molecular Orbital Theory with Specific Reaction Parameters. J. Phys. Chem. 1991, 95, 4618–4627.
- Nam et al. 2007 Nam, K.; Cui, Q.; Gao, J.; York, D. M. Specific Reaction Parametrization of the AM1/d Hamiltonian for Phosphoryl Transfer Reactions: H, O, and P Atoms. J. Chem. Theory Comput. 2007, 3, 486–504.
- Doron et al. 2011 Doron, D.; Major, D. T.; Kohen, A.; Thiel, W.; Wu, X. Hybrid Quantum and Classical Simulations of the Dihydrofolate Reductase Catalyzed Hydride Transfer Reaction on an Accurate Semi-Empirical Potential Energy Surface. J. Chem. Theory Comput. 2011, 7, 3420–3437.
- Zhou and Pu 2014 Zhou, Y.; Pu, J. Reaction Path Force Matching: A New Strategy of Fitting Specific Reaction Parameters for Semiempirical Methods in Combined QM/MM Simulations. J. Chem. Theory Comput. 2014, 10, 3038–3054.
- Maurer et al. 2007 Maurer, P.; Laio, A.; Hugosson, H. W.; Colombo, M. C.; Rothlisberger, U. Automated Parametrization of Biomolecular Force Fields from Quantum Mechanics/Molecular Mechanics (QM/MM) Simulations through Force Matching. J. Chem. Theory Comput. 2007, 3, 628–639.
- Wu et al. 2011 Wu, R.; Lu, Z.; Cao, Z.; Zhang, Y. A Transferable Nonbonded Pairwise Force Field to Model Zinc Interactions in Metalloproteins. J. Chem. Theory Comput. 2011, 7, 433–443.
- Knight et al. 2010 Knight, C.; Maupin, C. M.; Izvekov, S.; Voth, G. A. Defining Condensed Phase Reactive Force Fields from ab initio Molecular Dynamics Simulations: The Case of the Hydrated Excess Proton. J. Chem. Theory Comput. 2010, 6, 3223–3232.
- Torrie and Valleau 1977 Torrie, G.; Valleau, J. Nonphysical Sampling Distributions in Monte Carlo Free-energy Estimation: Umbrella Sampling. J. Comput. Phys. 1977, 23, 187–199.
- Zwanzig 1954 Zwanzig, R. W. High-Temperature Equation of State by a Perturbation Method. I. Nonpolar Gases. J. Chem. Phys. 1954, 22, 1420.
- Lu and Woolf 2007 Lu, N.; Woolf, T. B. In Free Energy Calculations; Chipot, C., Pohorille, A., Eds.; Springer: Heidelberg, 2007; pp 199–247.
- Hansen et al. 2010 Hansen, H. S.; Daura, X.; Hünenberger, P. H. Enhanced Conformational Sampling in Molecular Dynamics Simulations of Solvated Peptides: Fragment-Based Local Elevation Umbrella Sampling. J. Chem. Theory Comput. 2010, 6, 2598–2621.
- Rasmussen and Williams 2006 Rasmussen, C.; Williams, C. Gaussian Processes for Machine Learning; MIT Press, 2006; pp 7–32.
- Stecher et al. 2014 Stecher, T.; Bernstein, N.; Csányi, G. Free Energy Surface Reconstruction from Umbrella Samples Using Gaussian Process Regression. J. Chem. Theory Comput. 2014, 10, 4079–4097.
- Mones et al. 2016 Mones, L.; Bernstein, N.; Csányi, G. Exploration, Sampling, and Reconstruction of Free Energy Surfaces with Gaussian Process Regression. J. Chem. Theory Comput. 2016, 12, 5100–5110.
- Pedregosa et al. 2011 Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830.
- Vasilevskaya and Thiel 2016 Vasilevskaya, T.; Thiel, W. Periodic Boundary Conditions in QM/MM Calculations: Implementation and Tests. J. Chem. Theory Comput. 2016, 12, 3561.
- Wang et al. 2004 Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and Testing of a General Amber Force Field. J. Comput. Chem. 2004, 25, 1157–1174.
- Andrea et al. 1983 Andrea, T. A.; Swope, W. C.; Andersen, H. C. The Role of Long Ranged Forces in Determining the Structure and Properties of Liquid Water. J. Chem. Phys. 1983, 79, 4576–4584.
- Case et al. 2016 Case, D. A.; Berryman, J. T.; Betz, R. M.; Cerutti, D. S.; Cheatham, T. E., III; Darden, T. A.; Duke, R. E.; Giese, T. J.; Gohlke, H.; Goetz, A. W.; Homeyer, N.; Izadi, N.; Janowski, P.; Kaus, J.; Kovalenko, A.; Lee, T. S.; LeGrand, S.; Li, P.; Luchko, T.; Luo, R.; Madej, B.; Merz, K. M.; Monard, G.; Needham, P.; Nguyen, H.; Nguyen, H. T.; Omelyan, I.; Onufriev, A.; Roe, D. R.; Roitberg, A.; Salomon-Ferrer, R.; Simmerling, C. L.; Smith, W.; Swails, J.; Walker, R. C.; Wang, J.; Wolf, R. M.; Wu, X.; York, D. M.; Kollman, P. A. AMBER 2016, University of California, San Francisco. 2016.
- Shao et al. 2015 Shao, Y.; Gan, Z.; Epifanovsky, E.; Gilbert, A. T.; Wormit, M.; Kussmann, J.; Lange, A. W.; Behn, A.; Deng, J.; Feng, X.; Ghosh, D.; Goldey, M.; Horn, P. R.; Jacobson, L. D.; Kaliman, I.; Khaliullin, R. Z.; Kuś, T.; Landau, A.; Liu, J.; Proynov, E. I.; Rhee, Y. M.; Richard, R. M.; Rohrdanz, M. A.; Steele, R. P.; Sundstrom, E. J.; Woodcock, H. L.; Zimmerman, P. M.; Zuev, D.; Albrecht, B.; Alguire, E.; Austin, B.; Beran, G. J. O.; Bernard, Y. A.; Berquist, E.; Brandhorst, K.; Bravaya, K. B.; Brown, S. T.; Casanova, D.; Chang, C.-M.; Chen, Y.; Chien, S. H.; Closser, K. D.; Crittenden, D. L.; Diedenhofen, M.; DiStasio, R. A.; Do, H.; Dutoi, A. D.; Edgar, R. G.; Fatehi, S.; Fusti-Molnar, L.; Ghysels, A.; Golubeva-Zadorozhnaya, A.; Gomes, J.; Hanson-Heine, M. W.; Harbach, P. H.; Hauser, A. W.; Hohenstein, E. G.; Holden, Z. C.; Jagau, T.-C.; Ji, H.; Kaduk, B.; Khistyaev, K.; Kim, J.; Kim, J.; King, R. A.; Klunzinger, P.; Kosenkov, D.; Kowalczyk, T.; Krauter, C. M.; Lao, K. U.; Laurent, A. D.; Lawler, K. V.; Levchenko, S. V.; Lin, C. Y.; Liu, F.; Livshits, E.; Lochan, R. C.; Luenser, A.; Manohar, P.; Manzer, S. F.; Mao, S.-P.; Mardirossian, N.; Marenich, A. V.; Maurer, S. A.; Mayhall, N. J.; Neuscamman, E.; Oana, C. M.; Olivares-Amaya, R.; O’Neill, D. P.; Parkhill, J. A.; Perrine, T. M.; Peverati, R.; Prociuk, A.; Rehn, D. R.; Rosta, E.; Russ, N. J.; Sharada, S. M.; Sharma, S.; Small, D. W.; Sodt, A.; Stein, T.; Stück, D.; Su, Y.-C.; Thom, A. J.; Tsuchimochi, T.; Vanovschi, V.; Vogt, L.; Vydrov, O.; Wang, T.; Watson, M. A.; Wenzel, J.; White, A.; Williams, C. F.; Yang, J.; Yeganeh, S.; Yost, S. R.; You, Z.-Q.; Zhang, I. Y.; Zhang, X.; Zhao, Y.; Brooks, B. R.; Chan, G. K.; Chipman, D. M.; Cramer, C. J.; Goddard, W. A.; Gordon, M. S.; Hehre, W. J.; Klamt, A.; Schaefer, H. F.; Schmidt, M. W.; Sherrill, C. D.; Truhlar, D. G.; Warshel, A.; Xu, X.; Aspuru-Guzik, A.; Baer, R.; Bell, A. T.; Besley, N. A.; Chai, J.-D.; Dreuw, A.; Dunietz, B. D.; Furlani, T. R.; Gwaltney, S. R.; Hsu, C.-P.; Jung, Y.; Kong, J.; Lambrecht, D. S.; Liang, W.; Ochsenfeld, C.; Rassolov, V. A.; Slipchenko, L. V.; Subotnik, J. E.; Van Voorhis, T.; Herbert, J. M.; Krylov, A. I.; Gill, P. M.; Head-Gordon, M. Advances in Molecular Quantum Chemistry Contained in the Q-Chem 4 Program Package. Mol. Phys. 2015, 113, 184–215.
- Christensen et al. 2016 Christensen, A. S.; Kubar̆, T.; Cui, Q.; Elstner, M. Semiempirical Quantum Mechanical Methods for Noncovalent Interactions for Chemical and Biochemical Applications. Chem. Rev. 2016, 116, 5301–5337.
- Thiel and Voityuk 1996 Thiel, W.; Voityuk, A. A. Extension of MNDO to d Orbitals: Parameters and Results for the Second-Row Elements and for the Zinc Group. J. Phys. Chem. 1996, 100, 616–626.
- Cave-Ayland et al. 2015 Cave-Ayland, C.; Skylaris, C.-K.; Essex, J. W. Direct Validation of the Single Step Classical to Quantum Free Energy Perturbation. J. Phys. Chem. B 2015, 119, 1017–1025.
- Klimovich et al. 2015 Klimovich, P. V.; Shirts, M. R.; Mobley, D. L. Guidelines for the Analysis of Free Energy Calculations. J. Comput. Aid. Mol. Des. 2015, 29, 397–411.
- Ryde 2017 Ryde, U. How Many Conformations Need to Be Sampled to Obtain Converged QM/MM Energies? The Curse of Exponential Averaging. J. Chem. Theory Comput. 2017, 13, 5745–5752.
- Kuechler and York 2014 Kuechler, E. R.; York, D. M. Quantum Mechanical Study of Solvent Effects in a Prototype Reaction in Solution: \ceCl- Attack on \ceCH3Cl. J. Chem. Phys. 2014, 140, 054109.
- Kuechler et al. 2015 Kuechler, E. R.; Giese, T. J.; York, D. M. Charge-Dependent Many-Body Exchange and Dispersion Interactions in Combined QM/MM Simulations. J. Chem. Phys. 2015, 143, 234111.

See pages - of SI.pdf