# Reinforced dynamics of large atomic and molecular systems

## Abstract

A new approach for efficiently exploring the configuration space and computing the free energy of large atomic and molecular systems is proposed, motivated by an analogy with reinforcement learning. There are two major components in this new approach. Like metadynamics, it allows for an efficient exploration of the configuration space by adding an adaptively computed biasing potential to the original dynamics. Like deep reinforcement learning, this biasing potential is trained on the fly using deep neural networks, with data collected judiciously from the exploration. Applications to the full-atom, explicit solvent models of alanine dipeptide and tripeptide show some of promise for this new approach.

## 1Introduction

Exploring the configuration space of large atomic and molecular systems is a problem of fundamental importance for many applications including protein folding, materials design, and understanding chemical reactions, etc. There are several difficulties associates with these tasks. The first is that the dimensionality of the configuration space is typically very high. The second is that there are often high energy barriers associated with the exploration. Both difficulties can be reduced by the introduction of collective variables (CVs) and the mapping of the problem to the CV space. The problem then becomes finding the free energy surface (FES) associated with the set of CVs, a problem that has attracted a great deal of interest in the last few decades [1]. One of the most effective techniques is metadynamics [9], which computes a biasing potential by depositing Gaussian bases along the trajectory in the CV space. It is shown that the biasing potential converges to the inverted free energy at the end of the calculation [10]. Also closely related to our work are the recent papers that propose to use machine learning methods to help parametrizing FES [14]. In particular, the deep neural network (DNN) model has shown promise in effectively representing the FES defined on high dimensional CV space [16].

In this work, we take metadynamics and machine learning methods one step further by making an analogy between reinforcement learning [18] and the task of configuration space exploration and FES calculation. Classical reinforcement learning involves a state space and an action space. The objective is to find the best policy function, which is a mapping from the state space to the action space that optimizes certain predefined reward function. Our problem can be thought of as being a multi-scale reinforcement learning problem. We have a micro-state space, the configuration space of the detailed atomic system, and a macro-state space, the space of the CVs. The action space will be represented by the biasing potential in the biased molecular dynamics on the micro-state space. The optimal policy function is the FES, defined on the macro-state space. If the FES is used as the biasing potential, the biased molecular dynamics behaves diffusively in the configuration space. In the absence of an explicit reward function, we define a discriminator that can be used to quantify the level of confidence for the accuracy of the FES. We call this the “reinforced dynamics”, to signal its analogy with reinforcement learning. We believe that reinforced dynamics will be quite useful for finding FES and in global structure optimization.

We will take this analogy one step further by borrowing ideas from deep reinforcement learning [19], to improve the exploration and the construction of the FES. From the viewpoint of deep reinforcement learning, in addition to parametrizing the FES using carefully designed deep neural networks (DNNs), we use the biased molecular dynamics as the exploration algorithm. A crucial concept in this framework is the discriminator. This discriminator is designed to encourage exploration in the insufficiently explored regions in the configuration space. It is defined as the standard deviation of the predictions from an ensemble of DNN models, which are trained using the same dataset but different initialization of the model parameters. The bias is only adopted in regions where the discriminator value is low, and hence good model consistency is guaranteed and the FES should be reasonably accurate. We term these regions as the regions with confidence. Similarly, we call regions with high discriminator values the regions without confidence.

Roughly speaking, reinforced dynamics works as follows: The biasing potential, or the action, is initialized at 0 and is expected to converge to the inverted FES as the dynamics proceeds. Each step of the macro iteration involves the following components. First, a biased MD is performed, in which the system is biased only in the regions with confidence. The biased simulation is likely to visit the CV regions never visited before or where the FES representation quality is poor. Next, the newly visited CV values that are in the regions without confidence are added to the training dataset. A restrained MD is performed to obtain the mean force, or the negative gradient of the FES, at each of the newly added CV values. Finally, the accumulated CV values and the mean forces are used as labels to train several network models, which give the current estimate of the biasing potential and the discriminator. This process goes in an iterative way until convergence, when all the newly visited CV values fall in the regions with confidence. The reinforced dynamics approach is illustrated and validated by the systems of alanine dipeptide and tripeptide dissolved in water, and is shown to be able to accurately represent the FES and substantially improve the sampling efficiency of the MD simulation.

## 2Theory

### 2.1Free energy and mean forces

We assume that the system we are studying has atoms, with their positions denoted by . The potential energy of the system is denoted by . Without loss of generality, we consider the system in a canonical ensemble. Given predefined CVs, denoted by , the free energy defined on the CV space is

with being the normalization factor. The brute-force way of computing the free energy is to sample the CV space exhaustively and to approximate the probability distribution by making a histogram of the CVs. This approach may easily become prohibitively expensive. In such a case, an alternative way of constructing the FES is to fit the mean forces acting on the CVs, i.e.,

Several ways of computing have been proposed [20]. We will adopt the approach of restrained dynamics proposed in [21]. In this formulation, a new term is added to the potential of the system to represent the effect of the spring forces between the configuration variables and the CVs. It can be shown that the mean force is given by for , where the -th component of is defined to be

Here is the normalization factor, are the spring constants for the harmonic restraining potentials, and is defined by

In practice, the spring constants are chosen to be large enough to guarantee the convergence to the mean forces. The time duration for the restrained dynamics should be longer than the largest relaxation timescale of the fast modes of the system, in order for the ensemble average in Eq. to be approximated adequately by the time average. In the rest of the paper, we do not explicitly distinguish and .

### 2.2Free energy representation

The free energy will be represented by a deep neural network (DNN) model, in which the input CVs are first preprocessed, then passed through multiple fully connected hidden layers, and, in the end, mapped to the free energy. The structure of the DNN model is schematically illustrated in Fig. ?. Mathematically, a DNN representation with hidden layers is given by

where “” denotes function composition. The differentiable operator represents the system-dependent preprocessing procedure for the CVs, which will be illustrated by the examples in Section 3. For the -th hidden layer, which has neurons , is the operation that maps to , using:

Here and are coefficients of a linear mapping, often called weights. is the so-called activation function, which is in general nonlinear. In this project we use the component-wise hyperbolic tangent function for . The output layer is defined by

where and are the weights of the linear mapping. Finally, and constitute all the DNN model parameters to be determined. We note that the gradient, representing the mean force

is well defined, since each layer of the construction is differentiable, and hence the DNN representation of the free energy is also differentiable.

It should be noted that the design of the DNN model can be adapted to different kinds of problems. We use the fully-connected DNN model here for simplicity of discussion. For example, for some condensed systems, an alternative network model resembling the one used in the Deep Potential method should be preferred [22]. We leave this to future work.

### 2.3Training and model deviation

The DNN representation of the free energy is obtained by solving the following minimization problem

The loss function is defined by

where denotes the set of training data and denotes the size of the dataset . Here comes from the DNN model, and is the collected mean force for the data . Precise ways of collecting the data will be discussed later. It should be noted that at the beginning of the training process, we have no data. Data is collected as the training process proceeds. To guarantee accuracy for this model, we require that the CV values in is an adequate sample of the CV space. This is made difficult due to the barriers on the energy landscape. The MD will tend to be stuck at metastable states without being able to escape. To help overcome this problem, we introduce a biased dynamics. Details of that will be discussed in the next subsection.

During training, we are faced with a situation that we do not have sufficient data in . A crucial observation to address this issue is that the DNN model should be reasonably accurate in regions that are effectively covered by , but is much less so in regions that are covered poorly by (or have not been visited by the MD). To quantify this, we introduce a small ensemble of DNN models, where the only difference between these models is the random weights used to initialize them. We can then define the model deviation , or the discriminator,

where the ensemble average is taken over this ensemble of models. Our experience confirms the expectation that these ensemble of models give rise to close predictions of the mean forces in the regions adequately covered by the training data, but different predictions in the regions that are covered poorly but the data.

Finally, it is worth noting that the minimization problem is solved by the stochastic gradient descent (SGD) method combined with the back-propagation algorithm [24], which has become the *de facto* standard algorithm for training DNN models. At each training step, the loss function is evaluated on a small *batch*, or subset of the training data , i.e.,

### 2.4Adaptive biasing

A way of encouraging the MD to overcome the barriers in the energy landscape and escaping metastable regions is to add a bias to the potential. The force on the -th atom then becomes:

If the biasing potential is equal to , then the sampling would encounter no free energy barrier (even though there could still be small potential energy barriers) and the system should behave diffusively. Therefore it is natural to use the current approximation of the FES as the biasing potential, as is done in metadynamics [9]. We will adopt the same strategy but we propose to switch on the biasing potential only in regions where we have high confidence on the accuracy of the DNN approximation of the FES:

where the biasing potential is the mean of the predefined ensemble of DNN models, and is a smooth switching function defined by

Here and are two confidence levels for the accuracy of the DNN model. In regions where the model deviation is smaller than the level , the accuracy of the DNN representation of is adequate, and hence the system will be biased by . In the regions where is larger than level , the accuracy of the DNN representation is inadequate, and the system will follow the original dynamics governed by the potential energy . In between and , the DNN model is partially used to bias the system via a rescaled force term .

### 2.5Data collection

After the biased MD, a number of the newly visited CV values that are in the regions are added to the training dataset . For each value of the CV in , we use the restrained dynamics to calculate the mean forces via Eq. . These values, together with those computed in previous iterations, are used as the labels for training the next updated model.

Figure 1 is a flowchart of the reinforced dynamics scheme. Given an initial guess of the FES represented by the DNN, a biased MD, i.e. Eq. , is performed to sample the CV space from an arbitrarily chosen starting point. If no *a priori* information on the FES is available, then a standard MD is carried out. The visited CV values are recorded at a certain time interval and tested by the discriminator to see whether they belongs to a region without confidence in the CV space, which are defined to be the CV values that give rise to large model deviation, viz., . A reasonable choice of the threshold is . If all the newly sampled CV values from the biased MD trajectory belong to the region without confidence, it can be (1) the biased MD is not long enough, so parts of the CV space are not explored, (2) the interval for recording CV values along the biased MD is not small enough, so some visited CV values belonging to the region without confidence are missed, or (3) the DNN representation for FES is fully converged, then the iteration should be stopped and one can output the DNN representation for the FES, namely the mean of the predefined ensemble of models. Case (1) can be excluded by systematically increasing the length of the simulation. Case (2) can be excluded by decreasing the recording interval.

If CV values belonging to the region without confidence are discovered, they will be added to the training dataset . It is noticed that the CV values that are already in the training dataset should be retained and serve as training data for later iterations. The mean forces at the added CV values are computed by the restrained dynamics Eq. . A new ensemble of DNN models for the FES are then trained, using different random initial guesses for . The standard deviation of the predictions from these models is again used to estimate the model deviation . The iteration starts again using the biased MD simulation with the new DNN models.

Finally, it is worth noting that the restrained MD simulations for mean forces, which take over most of the computation time in the reinforced dynamics scheme, are embarrassingly parallelizable. The training of the ensemble of DNN models is also easily parallelizable. Several independent walkers can be set up simultaneously for a parallelized biased simulation, and this provides a more efficient exploration of the FES. These techniques can help accelerating the data collection process and benefit large-scale simulations for complex systems.

## 3Numerical example

### 3.1Alanine dipeptide

An alanine dipeptide (-CO-NH--CO-NH-) modeled by the Amber99SB force field [25] is dissolved in 342 TIP3P [26] water molecules in a periodic simulation cell. All the MD simulations are performed by the package GROMACS 5.1.4 [27]. The cut-off radius of the van der Waals interaction is 0.9 nm. The dispersion correction due to the finite cut-off radius is applied to both energy and pressure calculations. The Coulomb interaction is treated with smooth particle mesh Ewald method [28] with a real space cut-off 0.9 nm and reciprocal space grid spacing 0.12 nm. The system is integrated with the leap-frog scheme at timestep 2 fs. The temperature of the system is controlled to 300 K by velocity-rescale thermostat [29] with a relaxation time 0.2 ps. The solute and solvent are coupled to two independent thermostats to avoid the hot-solvent/cold-solute problem [30]. The Parrinello-Rahman barostat [31] (GROMACS implementation) with a relaxation timescale 1.5 ps and compressibility is coupled to the system to control the pressure to 1 Bar. For the alanine dipeptide molecule, any covalent bond that connects a hydrogen atom is constrained by the LINCS algorithm [32]. The H-O bond and H-O-H angle of water molecules are constrained by the SETTLE algorithm [33].

Two torsion angles (C, N, , C) and (N, , C, N), are chosen as CVs for this system, i.e. . The GROMACS source code is modified and linked to PLUMED 2.4b [34] to support biased and restrained simulations. The PLUMED package is also modified by the authors to support the DNN biasing force, viz., Eq. . The DNN models used in this example contain three hidden layers of size . The preprocessing operator is taken as , so the periodic condition of the FES is guaranteed. Model training is carried out under the deep learning framework TensorFlow [35], using the Adam stochastic gradient descent algorithm [36] with a batch size of . The learning rate is 0.001 in the beginning and decays exponentially according to , where is the training step, is the decay rate, and is the decay step. In each reinforced dynamics step, four DNN models with independent random initialization are trained in the same way to estimate the model deviation. The biased MD simulations last for 100 ps. The CV values along the MD trajectories are computed and recorded in every 0.2 ps. We assume no *a priori* information regarding the FES, so a brute-force simulation is performed for the first iteration step (iteration 0). In each iteration at most 50 recorded CV values in the region without confidence are added to the training dataset . Restrained MD simulations with spring constant are performed to estimate the mean forces by Eq. . Each restrained MD simulation is ps long and the CV values are recorded in every 0.01 ps along the MD trajectory to estimate the mean forces.

The FES on the - plane (known as the Ramachandran plot) is reported in Figure 2. We perform 6 independent brute-force MD simulations each 860 ns long, thus in total 5.1 s MD trajectories are used to estimate the FES and compare with the reinforced dynamics result. The system presents 5 metastable states , , , and , as noted in Figure 2 (a). The , regions correspond to the dihedral angles observed in the -strands conformations. The and regions correspond to the dihedral angles of right- and left-handed -helix conformations, respectively. The transition between the and would have to go over an energy barrier of 25 kJ/mol, or equivalently . The mean first passage time from the state to is shown to be 43 ns for the same model [37].

In Figure 2, the FES sampled by the brute-force MD (a) is compared with (b), which is constructed by reinforced dynamics with confidence levels kJ/mol/rad and kJ/mol/rad. At iteration 9 for (b), the biased simulation does not produce any CV value that belongs to the region without confidence, thus the computation stops. In total (iteration 0 to 8) 198 CV values are added to the training dataset to train the FES. It is observed that the reinforced dynamics is able to reproduce, with satisfactory accuracy, the FES at the important metastable states and transition paths of the system. The difference between (a) and (b) is plotted in (c). The error of FES at states , and is below 0.5 kJ/mol, while the error at and is around 1.5 kJ/mol. The total biased MD simulation time is . The total restrained MD simulation time is . Thus the total MD simulation time is 20.8 ns, which is only % of the brute-force simulation length and half of the mean first passage time from to of the brute-force MD simulation.

It is noted that the accuracy of the FES can be systematically improved by using more strict confidence levels. The result using kJ/mol/rad and kJ/mol/rad is reported in Figure 2 (d) and (e). In this case, the biased MD simulation does not generate CV values belonging to the region without confidence at iteration 21. In total (iteration 0 to 20) 303 CV values are added to the training dataset to construct the FES. The error of FES at all metastable states and transition regions is uniformly below 0.5 kJ/mol. The total biased MD simulation time is . The total restrained MD simulation time is . Thus the total MD simulation time is 32.5 ns, which is 50 % longer than the reinforced dynamics with lower confidence levels ( kJ/mol/rad and kJ/mol/rad), but still shorter than the mean first passage time from to of the brute-force simulation (43 ns).

To highlight the adaptive feature of the reinforced dynamics, the CV values visited in each biased MD simulation and those iteratively added to the training dataset in the case of , kJ/mol/rad are illustrated with Figure 3. In iteration 0, no *a priori* information of the FES is available, so the MD simulation is not biased. The starting state of the simulation is , and the system spontaneously transforms to states and in the 0.1 ns simulation ^{1}

### 3.2Alanine tripeptide

In this example, the FES is constructed for a system that contains an alanine tripeptide (-CO-NH--CO-NH--CO-NH-) dissolved in 341 water molecules. The torsion angles associated to the first and second s are denoted by , and , , respectively, and are used as CVs for the system, viz. . The force field and simulation protocols are the same as the alanine dipeptide case (Sec. Section 3.1), except that the length of the biased and restrained MD simulations is increased to 140 ps, and the DNN preprocessing operator is with being the preprocessing operator defined for the dihedral angles of alanine dipeptide.

The confidence levels of the reinforced dynamics are set to kJ/mol/rad and kJ/mol/rad. The bias MD simulation of iteration 72 does not find any CV value belonging to the region without confidence, so the process stops. From iteration 0 to 71, 1363 CV values are added to the training dataset . The total biased MD simulation time is ns, while the total restrained MD simulation time is ns. For comparison, we carry out 18 independent brute-force MD simulation, each of which is 2.65 s long, so the total length of brute-force MD trajectories is 47.7 s. The four-dimensional FESs constructed by brute-force MD sampling and the reinforced dynamics are presented in Figure 4 by three projections on , and planes. Taking the projection on for example, it is defined by

where is a constant that is chosen to normalize the minimum value of to zero. Projected free energies and are defined analogously. Figure 4 shows that the reinforced dynamics is able to reproduce the FES up to a satisfactory accuracy on all the projected planes. It is noted that both of projected FESs on and are different from the FES of alanine dipeptide, which indicates the correlation of backbone atoms.

The CV values visited by the brute-force MD simulation in iteration 0, and those by the biased MD simulations in iteration 10 and 70 are reported in Figure 5. Both the brute-force and biased MD simulations last for 140 ps. It is observed that the CV values visited by the brute-force MD simulation are mostly localized in the metastable state on the - plane and in the state on the - plane. A few CV values are in states and on the - plane, due to the low free energy barrier. In iteration 10, the MD simulation biased by the partially constructed FES representation is more diffusive than the brute-force MD simulation. During the 140 ps-long simulation, the system visits , and on the - plane, and , and on the - plane. It is noted that the system is able to jump over the relatively high energy barrier between states and on the - plane. In iteration 70, the FES is constructed up to a relatively high accuracy, so the biased MD simulation is more diffusive than the iteration 10 in the CV space: the system is able to diffusively cross the barriers between left and right parts of the - and - planes. The diffusivity of the biased MD in the CV space is actually an indication of the good FES representation quality.

## 4Conclusion and Perspective

In summary, reinforced dynamics is a promising tool for the exploring the configuration space and calculating the free energy of large atomistic systems. Even though we only presented two examples of molecules, it should be clear that the same strategy should also be useful for condensed systems. In addition, one should be able to couple it with optimization algorithms in order to perform structure optimization. These and other potential extensions and applications are currently underway.

### Footnotes

- The mean first passage time from to and are 0.041 and 0.255 ns, respectively. See Ref. [37].

### References

**The weighted histogram analysis method for free-energy calculations on biomolecules. i. the method.**

Shankar Kumar, John M Rosenberg, Djamal Bouzida, Robert H Swendsen, and Peter A Kollman.*Journal of computational chemistry*, 13(8):1011–1021, 1992.**Multidimensional free-energy calculations using the weighted histogram analysis method.**

Shankar Kumar, John M Rosenberg, Djamal Bouzida, Robert H Swendsen, and Peter A Kollman.*Journal of Computational Chemistry*, 16(11):1339–1350, 1995.**Hyperdynamics: Accelerated molecular dynamics of infrequent events.**

Arthur F Voter.*Physical Review Letters*, 78(20):3908, 1997.**Replica-exchange molecular dynamics method for protein folding.**

Yuji Sugita and Yuko Okamoto.*Chemical physics letters*, 314(1):141–151, 1999.**Efficient multidimensional free energy calculations for ab initio molecular dynamics using classical bias potentials.**

Joost VandeVondele and Ursula Rothlisberger.*The Journal of Chemical Physics*, 113(12):4863–4868, 2000.**Parallel tempering: Theory, applications, and new perspectives.**

David J Earl and Michael W Deem.*Physical Chemistry Chemical Physics*, 7(23):3910–3916, 2005.**Multiple free energies from a single simulation: Extending enveloping distribution sampling to nonoverlapping phase-space distributions.**

Clara D Christ and Wilfred F van Gunsteren.*The Journal of chemical physics*, 128:174112, 2008.**Self-adaptive enhanced sampling in the energy and trajectory spaces: Accelerated thermodynamics and kinetic calculations.**

Y.Q. Gao.*The Journal of chemical physics*, 128:134111, 2008.**Escaping free-energy minima.**

Alessandro Laio and Michele Parrinello.*Proceedings of the National Academy of Sciences*, 99(20):12562–12566, 2002.**Well-tempered metadynamics: A smoothly converging and tunable free-energy method.**

Alessandro Barducci, Giovanni Bussi, and Michele Parrinello.*Physical review letters*, 100(2):020603, 2008.**Single-sweep methods for free energy calculations.**

Luca Maragliano and Eric Vanden-Eijnden.*The Journal of chemical physics*, 128(18):184110, 2008.**Efficient and direct generation of multidimensional free energy surfaces via adiabatic dynamics without coordinate transformations.**

Jerry B Abrams and Mark E Tuckerman.*The Journal of Physical Chemistry B*, 112(49):15742–15757, 2008.**Temperature-accelerated method for exploring polymorphism in molecular crystals based on free energy.**

Tang-Qing Yu and Mark E Tuckerman.*Physical review letters*, 107(1):015701, 2011.**Free energy surface reconstruction from umbrella samples using gaussian process regression.**

T Stecher, N Bernstein, and G Csányi.*Journal of chemical theory and computation*, 10(9):4079, 2014.**Exploration, sampling, and reconstruction of free energy surfaces with gaussian process regression.**

L Mones, N Bernstein, and G Csányi.*J Chem Theory Comput*, 12:5100–5110, 2016.**Neural network and nearest neighbor algorithms for enhancing sampling of molecular dynamics.**

Raimondas Galvelis and Yuji Sugita.*J. Chem. Theory Comput*, 13(6):2489–2500, 2017.**Stochastic neural network approach for learning high-dimensional free energy surfaces.**

Elia Schneider, Luke Dai, Robert Q Topper, Christof Drechsel-Grau, and Mark E Tuckerman.*Physical Review Letters*, 119(15):150601, 2017.*Reinforcement learning: An introduction*, volume 1.

Richard S Sutton and Andrew G Barto. MIT press Cambridge, 1998.**Human-level control through deep reinforcement learning.**

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al.*Nature*, 518(7540):529–533, 2015.**Blue moon sampling, vectorial reaction coordinates, and unbiased constrained dynamics.**

Giovanni Ciccotti, Raymond Kapral, and Eric Vanden-Eijnden.*ChemPhysChem*, 6(9):1809–1814, 2005.**A temperature accelerated method for sampling free energy and determining reaction pathways in rare events simulations.**

Luca Maragliano and Eric Vanden-Eijnden.*Chemical physics letters*, 426(1):168–175, 2006.**Deep potential: a general representation of a many-body potential energy surface.**

Jequn Han, Linfeng Zhang, Roberto Car, and Weinan E.*Communications in Computational Physics*, 23(3):629–639, 2018.**Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics.**

Linfeng Zhang, Jiequn Han, Han Wang, Roberto Car, and Weinan E.*arXiv preprint arXiv:1707.09571*, 2017.**Efficient backprop.**

Yann A LeCun, Léon Bottou, Genevieve B Orr, and Klaus-Robert Müller. In*Neural networks: Tricks of the trade*, pages 9–48. Springer, 2012.**Comparison of multiple amber force fields and development of improved protein backbone parameters.**

Viktor Hornak, Robert Abel, Asim Okur, Bentley Strockbine, Adrian Roitberg, and Carlos Simmerling.*Proteins: Structure, Function, and Bioinformatics*, 65(3):712–725, 2006.**Comparison of simple potential functions for simulating liquid water.**

William L Jorgensen, Jayaraman Chandrasekhar, Jeffry D Madura, Roger W Impey, and Michael L Klein.*The Journal of chemical physics*, 79(2):926–935, 1983.**Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers.**

Mark James Abraham, Teemu Murtola, Roland Schulz, Szilárd Páll, Jeremy C Smith, Berk Hess, and Erik Lindahl.*SoftwareX*, 1:19–25, 2015.**A smooth particle mesh ewald method.**

U. Essmann, L. Perera, M.L. Berkowitz, T. Darden, H. Lee, and L.G. Pedersen.*The Journal of Chemical Physics*, 103(19):8577, 1995.**Canonical sampling through velocity rescaling.**

G. Bussi, D. Donadio, and M. Parrinello.*The Journal of chemical physics*, 126:014101, 2007.**The “hot-solvent/cold-solute” problem revisited.**

M. Lingenheil, R. Denschlag, R. Reichold, and P. Tavan.*Journal of Chemical Theory and Computation*, 4(8):1293–1306, 2008.**Polymorphic transitions in single crystals: A new molecular dynamics method.**

M. Parrinello and A. Rahman.*Journal of Applied Physics*, 52:7182, 1981.**Lincs: a linear constraint solver for molecular simulations.**

B. Hess, H. Bekker, H.J.C. Berendsen, and J.G.E.M. Fraaije.*Journal of Computational Chemistry*, 18(12):1463–1472, 1997.**Settle: an analytical version of the shake and rattle algorithm for rigid water models.**

S. Miyamoto and P.A. Kollman.*Journal of Computational Chemistry*, 13(8):952–962, 2004.**Plumed 2: New feathers for an old bird.**

Gareth A Tribello, Massimiliano Bonomi, Davide Branduardi, Carlo Camilloni, and Giovanni Bussi.*Computer Physics Communications*, 185(2):604–613, 2014.**Tensorflow: A system for large-scale machine learning.**

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. In*OSDI*, volume 16, pages 265–283, 2016.**Adam: A method for stochastic optimization.**

Diederik Kingma and Jimmy Ba.*arXiv preprint arXiv:1412.6980*, 2014.**Efficient estimation of rare-event kinetics.**

Benjamin Trendelkamp-Schroer and Frank Noe.*Phys. Rev. X*, 6:011009, 2016.