METAGUI 3: a graphical user interface for choosing the collective variables in molecular dynamics simulations
Molecular dynamics (MD) simulations allow the exploration of the phase space of biopolymers through the integration of equations of motion of their constituent atoms. The analysis of MD trajectories often relies on the choice of collective variables (CVs) along which the dynamics of the system is projected. We developed a graphical user interface (GUI) for facilitating the interactive choice of the appropriate CVs. The GUI allows: defining interactively new CVs; partitioning the configurations into microstates characterized by similar values of the CVs; calculating the free energies of the microstates for both unbiased and biased (metadynamics) simulations; clustering the microstates in kinetic basins; visualizing the free energy landscape as a function of a subset of the CVs used for the analysis. A simple mouse click allows one to quickly inspect structures corresponding to specific points in the landscape.
keywords:Molecular Dynamics, GUI, Clustering, Free Energy
Program Title: METAGUI 3
Licensing provisions: GPLv3
Programming language: Tcl/Tk, Fortran
Journal reference of previous version: METAGUI 
Does the new version supersede the previous version?: No
Nature of problem: Choose the appropriate collective variables for describing the thermodynamics and kinetics of a biomolecular system through biased and unbiased molecular dynamics.
Solution method: Provide an environment to compute and visualize free energy surfaces as a function of collective variables, interactively defined.
Additional comments: METAGUI 3 is not a standalone program but a plugin that provides analysis features within VMD (version 1.9.2 or higher).
- (1) X. Biarnés, F. Pietrucci, F. Marinelli, A. Laio, METAGUI. A VMD interface for analyzing metadynamics and molecular dynamics simulations, Computer Physics Communications 183 (2012) 203–211.
Appendix A Introduction
Molecular dynamics (MD) is a computationally intensive simulation technique which is used to gather insights into the structural, thermodynamic and kinetic properties of systems modeled at the atomistic level. MD simulations have been extensively used to study, e.g., biologically-relevant conformational changes of proteins, their interactions with small molecules, and with other proteins. The problem of achieving sufficient sampling of the phase space has been tackled by careful software parallelization Phillips et al. (2005); Pronk et al. (2013) and accelerated processors Harvey et al. (2009). Moreover, the sampling problem can be addressed by enhanced sampling techniques, such as umbrella sampling Torrie and Valleau (1977), metadynamics Laio and Gervasio (2008), steered MD Giorgino and De Fabritiis (2011), accelerated MD Hamelberg et al. (2004), etc. These approaches alter the Hamiltonian or the temperature of the system, therefore modifying its kinetic properties. However, they allow the reconstruction of the free energy landscape.
Most of the biased sampling methodologies require the modeler to choose a few “important” collective variables (CV), along which the important dynamics of the system is assumed to happen. CVs are arbitrary functions of the internal coordinates of the system; software such as PLUMED Tribello et al. (2014) and Colvars Fiorin et al. (2013) allows their specification them via concise symbolic expressions. The choice of CVs is normally done before performing the simulation Laio and Parrinello (2002). However it is common practice to choose the CVs for a production run by performing an analysis on a preliminary simulation. A careful choice of CVs is also needed in unbiased molecular dynamics simulations, in order to obtain insight on the relevant processes that possibly occurred during the time evolution.
In general, choosing the CVs that better describe a conformational change, even if this change can be observed by visualizing the MD trajectory, can be far from trivial. For these reasons, there is a need for software interfaces that simplify their iterative definition and evaluation Giorgino (2014); Biarnés et al. (2012).
Appendix B Features
We here present METAGUI 3, a plug-in providing a graphical user interface (GUI) to construct thermodynamic and kinetic models of processes simulated by large-scale MD. The GUI is based on the well-known MD visualization software VMD Humphrey et al. (1996); it extends the features of an earlier version Biarnés et al. (2012), exploiting in particular: (a) its ability to toggle between two representations, structural and free energy, enabling the quick inspection of the configurations associated to specific values of the CVs; (b) a procedure for grouping together similar structures in microstates based only on the collective variables; and (c) a simplified and guided workflow for the analysis of metadynamics results. METAGUI 3 was extensively rewritten to provide the following new features, which will be presented in detail in the next sections:
an interface for defining and adding interactively new collective variables;
a reorganized graphical interface;
support for the analysis of unbiased MD trajectories;
a new clustering method Rodriguez and Laio (2014) for identifying kinetic basins.
b.1 The graphical user interface
Upon starting the tool, the user is presented with the METAGUI 3 window; the three main tasks are shown as tabs, namely Define inputs, Analyze and Visualize. The three sections correspond to the three logical steps of the analysis workflow shown in Figure 1.
The Define inputs tab (Figure 2) allows to specify the input files, including the topology file of the system (i.e. a file in PDB, PSF, or GRO format), and one or more trajectory files. For each trajectory, a file containing the values of collective variables (COLVAR files) is required as well as the temperature of the run; if the simulations are biased by metadynamics, the time-varying potentials have to be supplied in matching HILLS files, produced by PLUMED-patched MD engines during the simulation. The part of the plug-in dedicated to the analysis of biased simulations is unchanged with respect to the earlier version of METAGUI Biarnés et al. (2012). The Save and Load configuration buttons allow storing and retrieving the whole set-up at once.
Activating the Load all button loads all the simulation data into memory; this includes the molecular topology, all of the trajectory frames, and the values of the collective variables. A summary of the CVs appears on the lower-hand panel of the GUI, along with any additional CV defined manually at runtime, as discussed in Section B.3.
After the input is defined, operations in the Analyze tabs can take place (Figure 3). In particular, the first step is to group the configurations explored during the simulations into microstates, namely sets of similar structures. Three partitioning algorithms are available, namely (a) -medoids with -means++ initial seeding Kaufman and Rousseeuw (1990); Arthur and Vassilvitskii (2007) with or without sieving Shao et al. (2007); (b) an implementation of the algorithm by Daura et al. Daura et al. (1999); and (c) the simple grid partitioning in CV space implemented in the earlier versions of METAGUI. The -medoids correspond to a partitioning of the space in Voronoi cells. The parameter directly defines the number of microstates that are obtained. Daura’s clustering algorithm Daura et al. (1999) automatically determines the number of microstates by finding groups within a cut-off distance, that should be specified by the user. With respect to the grid partitioning, included also in this version, the use of more elaborated partition methods has the advantage of automatically focusing on the populated regions of the CV space. This avoids the exponential growth with the number of CVs of the microstates with negligible population.
Once microstates have been computed, the tool allows their visualization. The Visualize tab lists the microstates along with their free energy (Figure 5). Selecting one of them it is possible to visualize the corresponding structures or save them into a PDB file. It is also possible to seamlessly switch between a collective variables space representation of microstates and their atomic structure representation (Figure 6); this greatly helps associating structural rearrangements with collective variable changes.
b.2 Support for unbiased simulations
A new feature in METAGUI 3.0 enables the analysis of trajectories resulting from both unbiased and metadynamics-based simulations. It is possible to switch between the two modes by toggling a Biased check-box on each trajectory. In both biased and unbiased simulations the structures are grouped together into a set of microstates, namely sets of structures with similar values of the collective variables. For biased simulations, free energies of the microstates are then computed by the weighted histogram analysis method (WHAM) Roux (1995); Marinelli et al. (2009). For unbiased simulations, free energies are computed applying the Boltzmann inversion to the populations of the microstates (visit counts). The tool also allows analyzing simulations for which part of the trajectories are biased and part of the trajectories are not.
b.3 Run-time definition of collective variables
In METAGUI 3, PLUMED output can be read after the simulation, in the form of COLVAR files as described above, as well as recomputed at run-time. This is an important improvement with respect to the previous version of the interface: the user can now define collective variables in the PLUMED language, and have them evaluated on the loaded trajectories. CVs defined at run-time can either replace or be added to the list of pre-computed CVs. Definition of CVs at run-time is done through the Plumed-GUI software (Figure 4), which provides a guided environment to prototype CV declarations with templates and symbolic atom selection keywords Giorgino (2014). Plumed-GUI is distributed with VMD since version 1.9.2, and can be updated separately from it.
The new version of METAGUI described in this work can process the output of the PLUMED 1.3 Bonomi et al. (2009) and 2.x Tribello et al. (2014) engines, making it compatible with a number of different molecular dynamics packages like AMBER Case et al. (2005), NAMD Phillips et al. (2005), GROMACS Pronk et al. (2013) and several others. The use of CVs computed at run-time only requires PLUMED’s driver, a stand-alone executable which is built independently of MD engines.
b.4 Clustering strategies
METAGUI 3 allows quickly detecting the presence of metastable states by applying the Density Peak clustering approach Rodriguez and Laio (2014). The approach is based on the assumption that cluster centers are surrounded by neighbors with lower local density and that they are at a relatively large distance from any points with a higher local density. The method proceeds by computing two quantities for each point: the density and the minimum distance to a point with higher density . In its simplest formulation, the density is estimated by the number of neighbors within a cut-off, although more elaborated definitions can be employed. Once computed, both quantities are plotted in the so-called decision plane. Cluster centers stand out naturally as outliers on the graph and can be manually picked up. After the cluster centers have been found, each remaining point is assigned to the same cluster as its nearest neighbor of higher density.
In the context of molecular dynamics, the quantity has a direct physical interpretation through the Boltzmann equation, which relates the logarithm of the probability density with the free energy. Therefore, in the version implemented in METAGUI 3, the density of a microstate is computed by , where is the Boltzmann constant and is the temperature. This definition leads to realistic density values even in the case of biased simulations, where standard density calculation methods cannot be employed. The distance between microstates is computed as the Euclidean distance in the CVs space. However, to deal with the different ranges of variation of the CVs, it is necessary to include some kind of normalization. This is done by dividing the coordinates by the bin size in the grid. Therefore, the distance can be interpreted as the number of hypercubes between microstates. The clusters obtained by this procedure are a good approximation of the metastable states normally obtained by Markov State Modeling. Therefore, they can be used as a starting point for kinetic analysis.
Appendix C A benchmark on a 125 microseconds-long MD trajectory
With the aim of illustrating the main features of METAGUI 3, we analyzed an unbiased simulation of Villin-headpiece folding Lindorff-Larsen et al. (2011). The trajectory is 125 s long, and includes approximately 12 folding events. The collective variables employed were the content of -helix, the radius of gyration, the backbone dihedral distance to the crystal structure and the number of hydrophobic contacts. All of them were computed by means of PLUMED Tribello et al. (2014) using the Plumed-GUI interface Giorgino (2014). The content of -helix was computed by means of the ALPHARMSD keyword Pietrucci and Laio (2009), using , and . The radius of gyration was computed by taking into account only the C coordinates. For the backbone dihedral distance, the reference structure considered was PDB code 2F4K Kubelka et al. (2006); the dihedrals of terminal residues were ignored. The number of hydrophobic contacts was computed as the combination of the coordination coordinates between all the possible hydrophobic residues side-chain pairs. Only the heavy atoms were considered and was set to 3.5 Å.
A total of 400 microstates were found by the -medoids method, reducing the computational time by using the sieving option Shao et al. (2007) with a maximum distance matrix dimension of 15,000. Then, the free energy of each microstate was computed applying the Boltzmann inversion to their populations. Finally, the free energy wells were localized by Density Peak clustering Rodriguez and Laio (2014), allowing the classification of the microstates in 3 different clusters.
Figure 7 summarizes the results of the analysis, showing the projection on different CVs of the free energy surface, one using -helix and the radius of gyration (panel A) and the other using the dihedral distance to the crystal structure and the number of hydrophobic contacts (panel C), the clusters partitions of the same projections (panels B and D respectively), the decision plane that origins the basin partition (panel E) and the representative structures for each basin (labeled as S1, S2 and S3). As expected, the microstate with minimum free energy (S1) corresponds to the folded state. Moreover, the folding funnel is evident in both the projections. S2 and S3 are configurations corresponding to intermediates of the folding process. Remarkably, the different projections shown in figure do not correspond to different calculations of the system, but different visualizations of the same analysis. METAGUI 3 allows toggling between them by changing the checked CVs in the FES and Basins plot frame (Figure 5). Moreover, by using the Pick function, one can visualize the structures corresponding to a microstate by simply clicking on a sphere in the free energy representation.
Appendix D Conclusions
The introduction of graphical tools in the MD analysis workflow can speed-up iterations in the search for appropriate collective variables, a necessary step towards the data-driven rationalization of MD results. METAGUI 3 aims at providing a quick way to cross-reference structural and thermodynamic information, by enabling both the selection of a set of relevant CVs on existing simulations, and providing a quick way to define new ones.
The source code of METAGUI 3 is available at the URL https://github.com/metagui/metagui3; it can be freely redistributed and/or modified under the terms of the GNU General Public License (Free Software Foundation) version 3 or later.
Appendix E Acknowledgements
The code of METAGUI 3.0 includes parts from the earlier METAGUI 2.0 version, written by Xevi Biarnés, Fabio Pietrucci, Fabrizio Marinelli and Alessandro Laio. Alessandro Laio and Alex Rodriguez acknowledge financial support from the grant Associazione Italiana per la Ricerca sul Cancro 5 per mille, Rif. 12214.
- Phillips et al. (2005) J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kalé, K. Schulten, Scalable molecular dynamics with NAMD., J Comput Chem 26 (2005) 1781–1802.
- Pronk et al. (2013) S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R. Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, E. Lindahl, GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit, Bioinformatics 29 (2013) 845–854.
- Harvey et al. (2009) M. J. Harvey, G. Giupponi, G. De Fabritiis, ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale, Journal of Chemical Theory and Computation 5 (2009) 1632–1639.
- Torrie and Valleau (1977) G. M. Torrie, J. P. Valleau, Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling, Journal of Computational Physics 23 (1977) 187 –199.
- Laio and Gervasio (2008) A. Laio, F. L. Gervasio, Metadynamics: a method to simulate rare events and reconstruct the free energy in biophysics, chemistry and material science, Reports on Progress in Physics 71 (2008) 126601.
- Giorgino and De Fabritiis (2011) T. Giorgino, G. De Fabritiis, A High-Throughput Steered Molecular Dynamics Study on the Free Energy Profile of Ion Permeation through Gramicidin A, J. Chem. Theory Comput. 7 (2011) 1943–1950.
- Hamelberg et al. (2004) D. Hamelberg, J. Mongan, J. A. McCammon, Accelerated molecular dynamics: A promising and efficient simulation method for biomolecules, The Journal of Chemical Physics 120 (2004) 11919–11929.
- Tribello et al. (2014) G. A. Tribello, M. Bonomi, D. Branduardi, C. Camilloni, G. Bussi, PLUMED 2: New feathers for an old bird, Computer Physics Communications 185 (2014) 604–613.
- Fiorin et al. (2013) G. Fiorin, M. L. Klein, J. Hénin, Using collective variables to drive molecular dynamics simulations, Molecular Physics 111 (2013) 3345–3362.
- Laio and Parrinello (2002) A. Laio, M. Parrinello, Escaping free-energy minima, Proceedings of the National Academy of Sciences of the United States of America 99 (2002) 12562–12566.
- Giorgino (2014) T. Giorgino, PLUMED-GUI: An environment for the interactive development of molecular dynamics analysis and biasing scripts, Computer Physics Communications 185 (2014) 1109–1114.
- Biarnés et al. (2012) X. Biarnés, F. Pietrucci, F. Marinelli, A. Laio, METAGUI. A VMD interface for analyzing metadynamics and molecular dynamics simulations, Computer Physics Communications 183 (2012) 203–211.
- Humphrey et al. (1996) W. Humphrey, A. Dalke, K. Schulten, VMD: Visual molecular dynamics, Journal of Molecular Graphics 14 (1996) 33–38.
- Kaufman and Rousseeuw (1990) L. Kaufman, P. J. Rousseeuw, Finding Groups in Data, John Wiley & Sons, Inc., 1990.
- Daura et al. (1999) X. Daura, K. Gademann, B. Jaun, D. Seebach, W. F. van Gunsteren, A. E. Mark, Peptide Folding: When Simulation Meets Experiment, Angewandte Chemie International Edition 38 (1999) 236–240.
- Rodriguez and Laio (2014) A. Rodriguez, A. Laio, Clustering by fast search and find of density peaks, Science 344 (2014) 1492–1496.
- Arthur and Vassilvitskii (2007) D. Arthur, S. Vassilvitskii, K-means++: The Advantages of Careful Seeding, in: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2007, pp. 1027–1035.
- Shao et al. (2007) J. Shao, S. W. Tanner, N. Thompson, T. E. Cheatham, Clustering Molecular Dynamics Trajectories: 1. Characterizing the Performance of Different Clustering Algorithms, Journal of Chemical Theory and Computation 3 (2007) 2312–2334.
- Roux (1995) B. Roux, The calculation of the potential of mean force using computer simulations, Computer Physics Communications 91 (1995) 275 –282.
- Marinelli et al. (2009) F. Marinelli, F. Pietrucci, A. Laio, S. Piana, A kinetic model of Trp-Cage folding from multiple biased molecular dynamics simulations, PLOS Computational Biology 5 (2009) 1–18.
- Bonomi et al. (2009) M. Bonomi, D. Branduardi, G. Bussi, C. Camilloni, D. Provasi, P. Raiteri, D. Donadio, F. Marinelli, F. Pietrucci, R. A. Broglia, M. Parrinello, PLUMED: A portable plugin for free-energy calculations with molecular dynamics, Computer Physics Communications 180 (2009) 1961–1972.
- Case et al. (2005) D. A. Case, T. E. Cheatham, T. Darden, H. Gohlke, R. Luo, K. M. Merz, A. Onufriev, C. Simmerling, B. Wang, R. J. Woods, The Amber biomolecular simulation programs, Journal of Computational Chemistry 26 (2005) 1668–1688.
- Lindorff-Larsen et al. (2011) K. Lindorff-Larsen, S. Piana, R. O. Dror, D. E. Shaw, How fast-folding proteins fold, Science 334 (2011) 517–520.
- Pietrucci and Laio (2009) F. Pietrucci, A. Laio, A Collective Variable for the Efficient Exploration of Protein Beta-Sheet Structures: Application to SH3 and GB1, Journal of Chemical Theory and Computation 5 (2009) 2197–2201.
- Kubelka et al. (2006) J. Kubelka, T. K. Chiu, D. R. Davies, W. A. Eaton, J. Hofrichter, Sub-microsecond protein folding, Journal of Molecular Biology 359 (2006) 546 – 553.