Predicting protein dynamics from structural ensembles.
The biological properties of proteins are uniquely determined by their structure and dynamics. A protein in solution populates a structural ensemble of metastable configurations around the global fold. From overall rotation to local fluctuations, the dynamics of proteins can cover several orders of magnitude in time scales. We propose a simulation-free coarse-grained approach which utilizes knowledge of the important metastable folded states of the protein to predict the protein dynamics. This approach is based upon the Langevin Equation for Protein Dynamics (LE4PD), a Langevin formalism in the coordinates of the protein backbone. The linear modes of this Langevin formalism organize the fluctuations of the protein, so that more extended dynamical cooperativity relates to increasing energy barriers to mode diffusion. The accuracy of the LE4PD is verified by analyzing the predicted dynamics across a set of seven different proteins for which both relaxation data and NMR solution structures are available. Using experimental NMR conformers as the input structural ensembles, LE4PD predicts quantitatively accurate results, with correlation coefficient to NMR backbone relaxation measurements for the seven proteins. The NMR solution structure derived ensemble and predicted dynamical relaxation is compared with molecular dynamics simulation-derived structural ensembles and LE4PD predictions, and are consistent in the timescale of the simulations. The use of the experimental NMR conformers frees the approach from computationally demanding simulations.
The evolved amino acid sequence of a native protein encodes its folded structure and inherent dynamical properties in aqueous solution.Pettitt (1990); Frauenfelder et al. (1991); Lewandowski et al. (2015) The latter determines the dynamics of specific residues in a protein primary sequence, which are active participants in the pathways of the biological function. Biologically active segments are often mobile and adaptable to assume a proper configuration when binding to a reaction partner. The multiple configurational states that an active segment may populate are not randomly selected: configurations with minimal energy are connected by energy barriers, and as such are thermally activated, enabling emerging regions of high mobility, which can behave like “switches” along the binding pathway.Lewandowski et al. (2015)
Different experiments and computational models exist to probe the dynamical processes of proteins, spanning the femtosecond regime of bond and angle vibrational modes to the millisecond and longer time regimes of folding and enzymatic kinetics. Important information in the picosecond to tens of nanosecond regime can be collected through NMR relaxation experiments, such as , and , however their interpretation is model dependent. Atomistic Molecular Dynamic (MD) simulations can provide a realistic dynamical model, but for most proteins of interest sufficient sampling to obtain converged dynamical correlations is prohibitively costly, and a theoretical approach is needed.
The theory we present here is the Langevin Equation for Protein Dynamics (LE4PD), which provides a coarse-grained but still physically realistic representation of biological macromolecules at the lengthscale of a single amino acid and larger. The LE4PD theory describes the amino acid dynamics quantitatively, as the theory contains information about the extent of the intramolecular energy barriers, specific amino acid friction coefficient, semiflexibility, degree of hydrophobicity, as well as hydrodynamics. The LE4PD accurately predicts the sequence-dependent dynamics starting from the ensemble of metastable structural configurations around the folded state measured by NMR, or from MD simulations.
The LE4PD model is unique in that it is a minimal dynamical model which projects the local and global diffusive dynamics of proteins from the protein structural ensemble with no adjustable parameters. This is possible because it is a coarse-grained yet microscopic model whose parameters are set directly from the microscopic physical system. This is in contrast to most methods constructed to model protein dynamics which rely upon site-specific adjustable parameters, such as the model-free formalism of Lipari and Szabo.Lipari and Szabo (1982) Other methods attempt to define the internal diffusion of proteins as fractional Brownian processes,Kneller and Hinsen (2004) which is a more accurate description of the general nature of the internal motion of proteins but is not predictive in nature. Nodet, Abergel, and Bodenhausen have modeled the dynamics of proteins as a coupled network of rotators under the assumption of a single conformational minima and small displacements.Abergel and Bodenhausen (2005) This approach, which attempts to predict fluctuations and dynamics from a single protein structure, is not directly comparable to the LE4PD model we present here, where we model the dynamics and take the structural ensemble from experiment or by sampling an underlying atomistic model via MD simulation. Like other elastic network modelsGo et al. (1983); Bahar et al. (1997); Atilgan et al. (2001); Yang et al. (2007, 2009) the coupled rotator model is capable of capturing the local variation in flexibility along the protein chain with no site-specific adjustable parameters, but because it begins from an empirical network description it requires a large amount of parameterization and specification of an overall rotational diffusion time , a scaling factor , a cut-off distance , and a characteristic internal diffusion time . The model is explicitly limited to small displacements around a single conformational minima, and relaxation times centered upon a short characteristic internal diffusion time of . In contrast, the LE4PD model is capable of simultaneously describing the global rotational diffusion, as well as local motion spanning the picosecond to many nanosecond and microsecond regimes. In particular, the long-time, highly correlated, large-amplitude dynamical motion of proteins is of great biological interest.
Input to the LE4PD is an ensemble of structural configurations, which has to be representative of the distribution of folded states of the protein. While proteins sample a very large -dimensional configurational space, with the number of independent sites comprising the protein, at the bottom of the funnel-like energy landscape the conformational diversity is much smaller.Bryngelson et al. (1995); Onuchic et al. (1997); Dill and Chan (1997) A common paradigm is that the important internal fluctuations of a folded protein span a limited number of specific structures,Monod et al. (1965); Levinthal (1968) and these can be well sampled experimentally by NMR.Lange et al. (2008) If that is the case, NMR conformer ensembles should provide a structural ensemble consistent with well-sampled MD simulations, and the LE4PD coupled with structural NMR should provide predictions of the protein dynamics without need of performing lengthy computer simulations. In practice, NMR solution structures encode a structural diversity that is due to a combination of thermal fluctuations and a possible lack of complete experimental information. The LE4PD method provides the ability to test the capability of an input structural ensemble to produce experimentally determined dynamical measurements such as site-specific NMR relaxation.
The diffusive mode solution of the LE4PD organizes the configurational landscape, defining fluctuations on a set of well-defined length and timescales encompassing the relative motion between neighboring -carbons to the global rotations of the structure as a whole.Caballero-Manrique et al. (2007); Copperman and Guenza (2014) In the diffusive mode description the LE4PD identifies the regions of local flexibility and cooperative motion of the residues inside a protein. As an example we project the MD trajectory onto the diffusive modes of the HIV protease monomer, and obtain a free energy landscape barrier height distribution which scales with mode cooperativity. Using the scaling form for this barrier height distribution, which appears to be a general feature of protein dynamics, leads to accurate dynamical timescales in the simulation-free conformer-based LE4PD model.
Mode-based descriptions are extremely useful in computational approaches to protein dynamics.Amadei et al. (1993); Berendsen and Hayward (2000) Analysis of the free energy landscape in covariance modes have been used to describe the folding of small proteins.Maisuradze et al. (2009) The covariance matrix of the spatial functions of the nuclear spin interactions from MD simulation have been used to calculate NMR relaxation, as fit to the trajectory correlation times and experimental values.Prompers and Brüschweiler (2001) The characteristic difference between the LE4PD approach and these other approaches is that we study the modes of an appropriate equation of motion, and as such are associated directly with the timescale and pathway of a quasi-independent structural relaxation process. Other mode-based approaches are based upon studying the abstract covariance modes of a set of variables, and as such any time-dependence in these modes comes purely from a fit to the simulation trajectory.
The dynamical predictions of the LE4PD model starting from an ensemble of structures generated from experimentally determined NMR conformers are compared with a second ensemble of structures generated in the course of an MD simulation in the timescale of nanoseconds. To validate the accuracy of the theoretical predictions of the dynamics using the LE4PD approach we test its predictions against experimental data of NMR relaxation for seven different proteins and 1876 site-specific NMR relaxation measurements. Using either the MD generated or NMR solution structure ensembles we obtain quantitatively self-consistent predictions, with similar overall correlation of and . We find that, in general, the MD-generated ensembles provide through the LE4PD a closer agreement with experimental data than the LE4PD informed by NMR ensembles, with 17% lower relative error.
Ii Theoretical Approach: the Langevin Equation for Protein Dynamics
In the LE4PD equation the dynamics of the protein is described as a diffusive motion across the configurational landscape,Caballero-Manrique et al. (2007); Copperman and Guenza (2014); Perico et al. (1995) consistent with an optimized Rouse-Zimm theory of the dynamics of macromolecules in solution.Doi and Edwards (1986); Perico and Guenza (1985) Proteins are anisotropic in shape and have a hydrophobic core which is only partially exposed to solvent, with this effect depending on the position of each amino acid in the protein. The LE4PD includes both rotational anisotropy and the hydrophobic core, which are features characteristic of biological macromolecules but are uncommon in synthetic polymers in solution. Local energy barriers in the interior of the protein are important to properly define its dynamics and are explicitly taken into account in the LE4PD method.
The Langevin equation formalism is derived starting from the Liouville equation for the conservation of probability density in the phase space of the full atomistic system of the protein and solvent, and using projection operators to obtain an equation of motion for the chosen sites.Zwanzig (2001) Here the chosen coarse-grained sites are the -carbon of each amino acid in the protein primary sequence. To obtain a linear Langevin equation,Lamm and Szabo (1986) we take the coordinates tracing the backbone configuration of the protein to be complete of the relevant slow configurational degrees of freedom, and neglect system memory. Inertial terms may be discarded as a protein in aqueous solution is safely in the overdamped limit. The intramolecular distribution around the folded state is assumed to be Gaussian, and the parameters in the distribution are directly obtained from the starting configurational ensemble.Caballero-Manrique et al. (2007); Guenza (2008) The coarse-grained LE4PD represents the balance of viscous dissipation with the entropic restoring force and a random Brownian force due to the random collisions of the coarse-grained protein with the fast-moving projected atoms belonging to solvent, ions, and the protein. The time evolution of the coordinate of the coarse-grained site is well-described by the following equation
where is the Boltzmann constant, is the temperature, is the squared bond distance, and is the average monomer friction coefficient, defined as , with the friction of the monomer . is a delta-correlated random force due to projecting the system dynamics onto the coarse-grained sites, where fluctuation-dissipation requires where are cartesian indices. Eq. 1 is the well-known Rouse-Zimm equation for the dynamics of polymers in solution.Bixon and Zwanzig (1978); Doi and Edwards (1986)
To obtain an effective linear description we assume a well-folded state where site-site correlations are Gaussian in nature. The structural force matrix defines the effective mean-force potential, , which has been successfully adopted in theories of protein folding to describe the final state of the folding process.Portman et al. (1998) The matrix is calculated as
where is the matrix that defines the center of gyration and the connectivity between sites, . In a protein the -carbons are connected linearly, so that for the matrix is defined as and , with , while for the first row, and otherwise. The matrix is the bond correlation matrix with .
The matrix H is the hydrodynamic interaction matrix, which describes the interaction between protein sites occurring through the liquid, represented as a continuum medium. While it is standard to utilize hydrodynamical models to obtain the translational and rotational dynamics of proteins,García de la Torre et al. (2000) the contribution of hydrodynamical effects to protein internal motion is generally neglected. While this may be justified for very localized motion, in general the non-local hydrodynamic coupling alters the timescale and nature of the large-amplitude highly correlated internal motion and cannot be neglected.Granek (2011); Copperman and Guenza (2014) To maintain an effective linear description, the hydrodynamic interaction must be preaveraged. While the derivation of the hydrodynamic interaction utilizes the Oseen tensor following the general Rouse-Zimm treatment of polymer chains in dilute solution,Doi and Edwards (1986) other methods such as the Rotne-Prager interaction tensor reduce to the same form upon preaveraging over the equilibrium ensemble.Rotne and Prager (1969) The elements in the matrix of the hydrodynamic interaction are defined as
where is the average hydrodynamic radius which is defined below. This is a perturbative hydrodynamic interaction accounting for the nature of the amino acid primary structure as a heteropolymer made up of building blocks of different chemical types, propagating through the aqueous solvent but screened in the dense hydrophobic core. The site-specific friction parameters, , are obtained by calculating the solvent-exposed surface area, and calculating the total friction of the site via a simple extension of Stoke’s law as
Here and denote, respectively, the viscosity of water and the radius of a spherical bead of identical surface area as the solvent-exposed surface area of the residue, the hydrodynamic radiusCaballero-Manrique et al. (2007), while denotes the hydrodynamic radius related to the surface not exposed to the solvent. The internal viscosity is , which we approximated in our previous work to be related to the water viscosity rescaled by the local energy-barrier scale .Copperman and Guenza (2014); Sagnella et al. (2000) The largest possible value of that maintains a positive definite solution of the matrix diagonalization is adopted to avoid the well-known issue with the preaveraging of the hydrodynamic interaction in dense systems.Zwanzig et al. (1968) For example, in the application of the model to HIV protease, the calculated is very close to the adopted value of , which avoids negative eigenvalues.
Because we focus only on the bond orientational dynamics and not translation, in the interest of a simpler notation we separate out the zeroth order translational mode from the internal dynamics. Following the same notation introduced for the orientational dynamics of star polymers,Guenza and Perico (1992) we define as the matrix after suppressing the first row used to define the center of mass, and define . The orientational Langevin equation governing the bond dynamics is
with , and where , and is the random delta-correlated bond velocity.
Eq. 5 represents a set of first-order coupled differential equations, which are solved by finding the matrix of eigenvectors which diagonalizes the product of matrices . In these diffusive modes we have uncoupled linear equations where each mode is just a sum over the original bond vector basis . We define to be the eigenvalues of with , ordered from smallest to largest . Like the set of bond vectors the set of coordinates defines the instantaneous conformation of the macromolecule. While the and matrices are individually symmetric, the matrix is not necessarily symmetric, making the matrix only approximately diagonal in the eigenvector basis. The mean squared mode length is then with not exactly the eigenvalues of the bond correlation matrix alone, but defined by the sum . The diffusive mode basis spans the same space as the bond vector basis with near linearity: .
The first three global modes of the LE4PD describe the rotations of the folded structure as the rotational diffusion tensor. For proteins which have an arbitrary folded structure, the full rotational diffusion equation of an anisotropic 3-dimensional body must be solved. This alters the relaxation of the three global modes describing the rotational relaxation in the inertial lab frame.Favro (1960); Copperman and Guenza (2014)
To account for the affect of the local energy barriers on the internal dynamics, the friction becomes mode dependent by assuming thermal activation over the mode-dependent energy barrier
leading to the slowing of the mode timescale by . The energy barriers in the modes can be calculated directly from an MD simulation trajectory when available, and are found to be related to the mode cooperativity, as discussed in section IV.
This simple dynamical renormalization provides an average correction to the dynamics of the Langevin Equation, which approximately accounts for the local barrier crossing, and is in agreement with free energy landscape theories suggesting activated dynamics.Bryngelson et al. (1995); Onuchic et al. (1997) As a first approximation, the depth of the minimum free-energy well in the mode serves as the relevant barrier to transport.
ii.1 Local Dynamics
To provide a further assessment of the accuracy of the LE4PD for the local dynamics we compare its theoretical predictions with experimental data of NMR , , and NOE relaxation. The physical quantities of interest for this test are the bond autocorrelation function and the second order Legendre polynomial of the time dependent bond orientation.
For each bond along the backbone of the protein, the bond autocorrelation function is defined in the formalism of the Langevin equation as
with and the correlation time for the th mode.
Another quantity of interest is the second order Legendre polynomial of the time dependent bond orientation function which can be related to the first order bond autocorrelation by
which is a function of as .
This expression relies on assuming a Gaussian form for the joint probabilities in normal mode coordinates.Perico and Guenza (1985) For dipolar relaxation, the Fourier transform of defines the spectral density from which spin-lattice () and spin-spin () relaxation times, and nuclear Overhauser effect (NOE) can be calculated and compared with NMR measurements.Copperman and Guenza (2014)
Iii Structural ensembles of proteins
The LE4PD model predicts the dynamics of the protein using the structural ensemble as input. By generating a structural ensemble through relatively short time () MD simulations, the needed input for the LE4PD was evaluated leading to accurate predictions for the global and site-specific dynamics of UbiquitinCopperman and Guenza (2014) and the signal transduction protein CheY.Caballero-Manrique et al. (2007) The accuracy of the simulations, however, depends upon the accuracy of the force-field used, and sampling the full configurational space can become computationally expensive depending upon the size of the protein and the extent of configurational rearrangements.
iii.1 Building statistical ensembles from metastable configurations
We take as an ansatz that the configurational space of a folded protein is spanned by limited number of conformational states, and that these conformational states are known a priori. As an alternative procedure to performing MD simulations we assume as starting configurational ensembles the conformers that were measured experimentally by NMR. The extent to which NMR solution structure conformers represent important metastable states of the protein, as opposed to uncertainty due to incomplete experimental information, is controversial and varies between different NMR structures.Spronk et al. (2003) It is certainly clear that NMR structural ensembles do encode some measure of the conformational variability of the protein, as NMR structural ensembles have been shown to correlate highly with structural ensembles generated by MD simulation,Jamroz et al. (2014) and have been used to gain valuable insight into protein flexibility in computational studies of ligand binding.Damm and Carlson (2007) In our model we investigate the assumption that all conformers represent metastable protein configurations which contribute equally to the full ensemble, and use the resulting dynamical predictions to evaluate the ability of the input structural ensemble to span the experimentally observed dynamics.
Fluctuations around the local conformational states are imposed by applying a Gaussian Network Model (GNM).Bahar et al. (1997) While many elastic network models of varying complexity are in routine use, the differences in the predicted local flexibilities are usually small and affect only the short-time dynamics in the picosecond regime. The GNM builds a harmonic network of interactions around each residue based on a distance cutoff criteria, and solves the resulting site-site fluctuations as a linear matrix equation. GNM models have been shown to reproduce well crystalline state fluctuations measured as Debye-Waller Temperature factors (B-factors) and thus are a good representation of the short-time fluctuations while they require minimal computational effort. Once combined with the LE4PD the theory provides a realistic and computationally inexpensive prediction of the dynamics of proteins on a wide range of time scales, from the local fluctuations to the large, concerted, conformational transitions.
From the GNM we define the bond correlation matrix locally around the th conformer . The GNM defines the pairwise fluctuations where is the Kirchoff adjacency matrix defined using a cutoff radius of Bahar et al. (1997); Atilgan et al. (2001) and is the harmonic interaction strength. We found that in general a value of is needed to match the short-time orientational fluctuations of the protein from the MD simulation.
An interaction strength of is typically used with the GNM to predict crystallographic B-factors; this order of magnitude difference in interaction strength may be due to the local anharmonic softening of the orientational potential energy surface due to the aqueous solvent.Levy and Onuchic (2006) The boundary water layer of hydrated proteins in aqueous solution is highly mobile in the picosecond regimePal et al. (2002); the constant shifting of the protein-water hydrogen bonds may lead to enhanced orientational fluctuations which are completely local in nature. This effect is absent in the crystalline state, where the hydration water is much more static.
Recognizing that in the body-fixed reference frame we can determine the local bond correlation matrix around each conformer as
The total matrix is then simply the average with the number of conformers in the NMR structural ensemble. Similarly, the hydrodynamic matrix , the site friction coefficient , and all other input quantities to the LE4PD, are calculated separately for each conformer and then the statistical average is taken over all the conformers. This is an extremely simplistic picture of the structural ensemble of a protein; however the dynamics predicted by this structural ensemble generated by the set of NMR conformers is consistent in many ways with the much more detailed ensemble generated through the sophisticated process of explicit solvent MD simulation. The set of NMR conformers provides us an ensemble of important metastable structural minima in the free energy landscape; and the GNM provides fluctuations around these minima. Molecular anisotropy, rotational diffusion, hydrodynamic interactions, and local energy barriers are included through the LE4PD.
iii.2 Building statistical ensembles from Molecular Dynamics simulations
Because both the determination of NMR conformers and the experimental measurements of NMR relaxation are affected by errors, we performed as a further test MD simulations of the same systems to evaluate the quality of agreement of the LE4PD starting from the NMR and the MD conformers. Simulations were performed in explicit solvent using the spc/e water model. We utilized the AMBER99SB-ILDNLindorff-Larsen et al. (2010) atomic force field for proteins and the GROMACSBerendsen et al. (1995); Lindahl et al. (2001); Van Der Spoel et al. (2005); Hess et al. (2008) molecular dynamics engine was utilized on the TRESTLES supercomputer at San Diego.Towns et al. (2014) All system conditions, e.g. temperature and salt concentration, were set to reproduce the experimental conditions. The systems were solvated and energy minimized, and then underwent a tempering and equilibration routine including pressure coupling. The production simulations were performed in the canonical ensemble, using a velocity rescaling thermostat.Bussi et al. (2007)
For the PR95 protease monomer, simulations were performed starting from each of the twenty conformers in the NMR structure, resulting in a set of twenty production ensembles used as input to the LE4PD, with averages taken over all twenty results. The same set of production trajectories for Ubiquitin were used from our previous work.Copperman and Guenza (2014) For the remaining five proteins, the first conformer was chosen as the starting structure, and only one simulation was performed for each protein. Each simulation had of production. For each trajectory the root mean square deviation (RMSD) was calculated and statistics were only collected in the equilibrated sections of the trajectory. The trajectories were also required to contain only reversible transitions, as monitored by the RMSD. The simulation time effectively used in the LE4PD for each trajectory ranged from to . The simulations performed and the protein databank structures used are summarized in Table 1.
|Protein||MD Sim.||Starting Struct.||Temp.||NMR + GNM|
|Protease||20 x 50 ns||1Q9P (1-20)||293K||1Q9P (1-20)|
|1GF2R||160 ns||2M6T (1)||273K||2M6T (1-20)|
|N-TIMP-1||50 ns||1D2B (1)||293K||1D2B (1-30)|
|S836||50 ns||2JUA (1)||298K||2JUA (1-20)|
|CPB1||50 ns||1MX7 (1)||298K||1MX7 (1-22)|
|KAPP||50 ns||1MZK (1)||298K||1MZK (1-30)|
|Ubiquitin||10 x 10ns||1UBQ (1)||300K||1XQQ (1-128)|
The configurational ensembles that emerge from the NMR ensembles and from the MD simulations are reported in Figure 1. For all but the Ubiquitin protein, the starting configuration was from the NMR structure, yet the equilibrated simulation conformations do not exactly resemble this initial structure. Overall, however, the global fold is fully preserved, and the conformational differences are specific in nature. This indicates the NMR structures were not necessarily in an exact free energy minimum of the AMBER force-field model, though this does not indicate whether the MD equilibrated protein structures are necessarily more accurate. For all but the s836 protein, the structural variation in the NMR ensembles is slightly larger than the MD simulation. This is primarily true in the intrinsically disordered regions of the protein, such as the C-terminal and N-terminal tails. This may be because the limited simulation times do not fully sample the configurational space. A study over a test set of 140 proteins found high correlation between the fluctuations of NMR ensembles and MD simulations, and found that the increased sampling allowed by using a coarse-grained protein model led to even higher correlation between simulations and NMR ensembles.Jamroz et al. (2014) What does agree quite remarkably are the locations of enhanced flexibility and the timescales of the motion, which can be seen in Figures 5 and 6, showing the calculated NMR relaxation times from the ensembles.
To evaluate the consistency between the dynamics generated using the MD ensembles as input, and the NMR conformer ensembles, we compare the full decay of the correlation function of the th - segment in the HIV protease protein, with the data from simulations (see Figure 2). While there are many differences between the analytical predictions from the NMR ensemble and the MD ensemble, and differences especially at short times, overall Figure 2 shows that the agreement is quite good as the LE4PD, from both ensembles, can model quite accurately the site-specific internal and rotational dynamics of the protein.
Iv Dynamical barriers and cooperativity
Analysis of the collective fluctuations obtained from simulations of proteins Amadei et al. (1999) has shown that the dynamics around the minima of energy is well described by small fluctuations inside metastable states at low local energy and by the crossings between them. By reverting the LE4PD equation to its mode form, the structural representations of these important metastable minima can be identified as a function of mode number. We investigate the nature of the free energy surface of a protein around its folded ground state.
Each diffusive mode obtained from the diagonalization of Eq. 5 is a vector defined by the linear combination of the bond vectors weighted by the eigenvectors of the product of matrices , as . In polar coordinates the vector is represented as . The most relevant changes in the diffusive mode free energy occur as the angles, expressed in the spherical coordinates, span the configurational space. For any diffusive mode , the free energy surface is defined as a function of the spherical coordinate angles and as ,with the probability of finding the diffusive mode vector having the given value of the solid angle. Given that we are interested in the explicit representation of the structure at the minima of interest, all structures from the simulation ensemble which pertain to a particular orientation, which is a relatively deep minima in the mode free energy, are extracted and averaged.
By calculating the average structure at each minima we obtain the structural ensemble of metastable states spanning each internal mode of fluctuation for the protein. As a representative example, the free energy landscape in the LE4PD modes from the MD simulation of the HIV protease monomeric construct is presented in Figure 3. The ensemble of structural minima on the mode free energy surfaces generated from MD simulations, and the structural ensembles directly measured by NMR experiments, are compared as well. The full configurational landscape for each mode is generated from the combination of twenty well-equilibrated independent simulation trajectories. Each trajectory starts from a different experimental NMR conformer and runs to of simulation time. Superimposed to the full configurational landscape from simulations, the twenty starting configurations measured experimentally by NMR are reported as red stars. The combination of the trajectories creates a complex free energy landscape, which is only partially spanned by the NMR conformers. The starting NMR configurations are often close to energy minima (reported as green triangles), but they do not exactly correspond to them. Nor they are fully representative of all the minima that define the configurational landscape obtained from the simulation trajectories.
The fluctuations in each mode appear to be spanned by a handful of metastable minima. As the mode index increases the fluctuations progress from collective in nature to more local. The typical energy barrier in each mode, , is evaluated from the simulation as the Median Absolute DeviationRuppert (2010) from the global minimum , that is . The depth of these minima, or the barriers between them, are largest for the low mode numbers corresponding to the most collective, large-amplitude fluctuations. Figure 4 shows that the energy barriers as observed in the simulation trajectory can be well described as scaling with the mode index, a measure of the mode cooperativity, over a large range of the protein fluctuations. The observer scaling with mode number follows , where the first three rotational modes have been separated out. At a local enough length scale, where the specific chemical nature of the amino acid is most important, the energy barriers are no longer described by this expression.
The observed scaling law is consistent with the hierarchical nature of the protein free energy landscape. Each mode describes dynamics involving a number of bonds in the protein, which need to move collectively in a cooperative fashion. At short times the bonds fluctuate independently, while large-amplitude correlated fluctuations occur when all the bonds transition collectively.Jackson (1993) The equilibrium probability for the gating bonds to independently transition away from the ”correct” orientation with energy preference is . In a transition state perspective this can be interpreted as a free energy barrier which scales proportionally with the number of bonds cooperatively rearranging. This model is similar to the Adam-Gibbs theory of the glass transition,Adam and Gibbs (1965); Starr et al. (2013) relating the complex hierarchical nature of the free energy landscape the protein in solution to a structured glassy fluid.Onuchic et al. (1997); Wales (2015) The observed scaling form is included in the simulation-free LE4PD approach, which adopts the set of NMR conformers as the input structural ensemble.
V Predictions of NMR relaxation are compared to experiments
Theoretical predictions for are obtained from using Eq. 8, and are used to calculate and relaxation times, and NOE, which are measured experimentally. 15N NMR backbone relaxation experiments are very sensitive to the site-specific dynamics in the picosecond to the nanosecond regimes.Palmer III (2001) To test the LE4PD approach using the NMR solution structures to generate the structural ensemble, we constructed dynamical models for seven proteins for which NMR relaxation data and NMR solution structures were available. These proteins were N-TIMP-1 (1D2B)Gao et al. (2000), a de Novo -helix bundle protein s836 (2JUA)Go et al. (2008), Cellular retinol-binding protein I CPB1 (1MX7)Lu et al. (2003), Kinase-associated protein phosphatase KAPP (1MZK)Lee et al. (2003a, b), Insulin Growth Factor 2 Receptor IFG2R domain 11 (2M6T)Williams et al. (2007), Ubiquitin (1UBQ)Vijay-Kumar et al. (1987); Lienin et al. (1998), and HIV Protease monomer (1Q9P).Ishima et al. (2001, 2003)
The input parameters to the LE4PD equation change from protein to protein: the structural parameters such as bond length, monomer friction, hydrodynamic radius, and the pairwise bond correlations are determined from the structural ensemble, while the thermodynamic parameters such as solvent viscosity, and temperature, are defined by the experimental conditions. The viscosity was set to account for temperature dependence and content of deuterated water.Wong and Case (2008) Parameters such as the protein internal viscosity , the proportionality constant between cooperativity and energy barriers, and the characteristic parameters needed to calculate the NMR relaxation times, such as the chemical shift or , were assumed to be identical for all proteins in this study and identical to those used in our previous work.Copperman and Guenza (2014)
Figure 5 and 6 displays the calculations of , and NOE relaxation times as they are directly predicted by the LE4PD approach and the NMR experimental data. NMR experimental data of relaxation times are not used at all in any point to optimize the theoretical calculations, so these are independent theoretical predictions. The comparison between theory and experiments is performed for each amino acid in the protein and reported as a function of the protein primary sequence. Also reported are the experimental uncertainties for the NMR data of each protein.
|HIV Protease (MD)||.92||.73||.91||.91||32.9%|
|HIV Protease (NMR)||.83||.65||.90||.77||42.6%|
The correlation and errors of the model, using both the MD and NMR solution structures as ensembles, are shown in Table 2. Over this set of measurements the overall correlation to the experimental values was similar for both the dynamical models constructed from NMR conformer ensembles or the MD ensembles, but with 17% lower relative error for the MD derived ensembles than the NMR conformer ensembles. Figures 5 and 6, and Table 2, in general show that MD simulations have the most detailed agreement along the primary sequence. The correlation to NOE and are similar, but higher correlation for the MD ensembles. Over the seven proteins, the quality of the experimental measurements varies greatly; for example, in the measured relaxation for the s836 protein the experimental values themselves come with error, so that the low correlation of the theory with the experimental data is expected. For the CPB1 protein the experimental measurements in most loop and termini regions were unavailable; this is where the largest variability in the dynamics occurs and where it is possible to develop strong correlation. In general, the NOE measurements display the largest site-specific variability along the protein sequence and the highest correlation between theory and experiment for each individual protein. A scatter plot of the calculated and experimental data is shown in Figure 7. The agreement between theoretical predictions and measured NMR is supporting the quality of the predictions of the LE4PD approach.
Because the accuracy of a given NMR solution structure ensemble to represent the conformational diversity of the protein is unknown, the dynamical model built using the LE4PD approach may be useful to evaluate the quality of an available structural ensemble. We apply the method to 9 different NMR structural ensembles of the Ubiquitin protein, PDB codes 1XQQ,Lindorff-Larsen et al. (2005) 2KOX,Fenwick et al. (2011) 2LJ5,Montalvao et al. (2012) 2NR2,Richter et al. (2007) 1D3Z,Cornilescu et al. (1998) 1G6J,Babu et al. (2001) 2KLG,Madl et al. (2009) 2MJB,Maltsev et al. (2014) and 2K39.Lange et al. (2008) The comparison to the calculated NMR backbone relaxation in table 3 shows that all the ensembles capture the primary , , and baselines and the enhanced flexibility of the tail region. Ensembles 1XQQ and 2NR2 have the highest correlation and lowest relative error; when the unstructured C-tail is not considered in the calculation of the correlation coefficients, it can be seen that the 1XQQ, 2NR2, 2KOX, 2LJ5 and 2K39 ensemble separate as capturing the structural variability of both the C-tail and the more structured portion of the protein, see column 2 of table 3. The primary contribution to this correlation comes from the structural variability at the loop containing lysine 6 and 11, important poly-ubiquitination linkage sites involved in cell-cycle control and DNA repair.Komander (2009) The structural ensemble generated by molecular dynamics simulation starting from the 1UBQVijay-Kumar et al. (1987) crystal structure, with results reported in our previous paper,Copperman and Guenza (2014) is perhaps slightly more accurate overall, but only by a very small amount due to considering the correlation without contributions from the C-tail.
In generating the 1XQQ ensemble the NMR-derived order parameters from the model-free analysis of Lipari and SzaboLipari and Szabo (1982) were used as an additional set of restraints in the generation of the ensemble. It is not surprising then that this leads to an accurate dynamical model. We have shown previously that the site-specific variability in model-free derived order parameters correlates strongly with our results,Caballero-Manrique et al. (2007) despite differences in the nature of the predicted internal dynamics. This illustrates the complementary utility of the LE4PD approach, which provides a highly detailed model and additional insight beyond that available when performing only a model-free analysis of NMR backbone relaxation.
The Ubiquitin ensemble 2K39 was constructed to represent the protein fluctuations in only the long-time regime beyond the global correlation time. As such this ensemble is not as accurate overall, and the dynamical model leads to high error and in particular a poor representation of the C-tail dynamics. However, we do see a separation in the mode timescales, with a slow internal process emerging on the order of and with , suggesting that this ensemble has captured fluctuations in the difficult to access time regime between the global correlation time and the millisecond time regime of conformational exchange. The authors showed that this ensemble spanned the set of known bound Ubiquitin conformations, suggesting that there are configurational fluctuations of the Ubiquitin protein in the many nanosecond regime beyond the global correlation time which are relevant for the recognition of binding partners.Lange et al. (2008)
|NMR Conformer||(1-71)||(all res.)||Rel. Error|
The LE4PD approach was tested across a set of seven different proteins with overall consistent results for both the MD generated ensembles and the NMR conformer ensembles, with an overall correlation to the relaxation measurements of . Calculations using 9 different available NMR structural ensembles for the ubiquitin protein show that results are strongly dependent upon the quality of the input structural ensemble and experimental data, and suggest that this approach may be used as a tool to evaluate the quality of a structural ensemble to represent the important protein fluctuations around the ground folded state.
The consistent results between the MD generated ensembles and the NMR ensembles suggest that protein configurational space around the folded state can be defined by a small set of important metastable minima. However, when determining the dynamics of transitions between these minima, the hierarchical nature of the protein free energy landscape needs to be taken into account. The mode approach of the LE4PD allows one to conveniently separate contributions to the dynamics depending on the timescales involved. The LE4PD prediction of the existence of a barrier height distribution for the dynamics of folded proteins is consistent with the physics of glass-forming systems.
Building a dynamical model from NMR conformer structures using the LE4PD requires only a few seconds to a few minutes on a single processor with a standard desktop computer, with the computational time depending on the size of the protein and on the number of conformers in the NMR solution structure. While explicit solvent atomistic classical MD simulations are well-developed and can be quite accurate, achieving MD simulations with converged dynamics on the same timescale would require on the order of hours of processor time or more. The LE4PD is not a replacement for MD simulation as a computational method to predict the fluctuations and dynamics of proteins, but it is a useful tool to quickly provide a prediction of the dynamics given an input structural ensemble.
Even though the simulation-free LE4PD requires minimal computation it is site-specific, informed of intramolecular energy barriers, hydrodynamics, and long-range correlated motion. It is a sophisticated model of protein dynamics and because of its accuracy in predicting the dynamics, with no input from the dynamical data, LE4PD is a valuable and computationally convenient model to investigate barrier-crossing processes on the suite of timescales defining the fluctuations of proteins.
Acknowledgements.This work was supported by the National Science Foundation Grant CHE-1362500. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1053575.
- Pettitt (1990) B. M. Pettitt, Advances in Chemical Physics, Proteins: A Theoretical Perspective of Dynamics, Structure, and Thermodynamics (John Wiley & Sons, 1990).
- Frauenfelder et al. (1991) H. Frauenfelder, S. G. Sligar, and P. G. Wolynes, Sci. 254, 1598 (1991).
- Lewandowski et al. (2015) J. R. Lewandowski, M. E. Halse, M. Blackledge, and L. Emsley, Sci. 348, 578 (2015).
- Lipari and Szabo (1982) G. Lipari and A. Szabo, J. A. Chem. Soc. 104, 4546 (1982).
- Kneller and Hinsen (2004) G. Kneller and K. Hinsen, The Journal of chemical physics 121, 10278 (2004).
- Abergel and Bodenhausen (2005) D. Abergel and G. Bodenhausen, J. Chem. Phys. 123, 204901 (2005).
- Go et al. (1983) N. Go, T. Noguti, and T. Nishikawa, Proc. Natl. Acad. Sci. 80, 3696 (1983).
- Bahar et al. (1997) I. Bahar, A. R. Atilgan, and B. Erman, Folding Design 2, 173 (1997).
- Atilgan et al. (2001) A. Atilgan, S. Durell, R. Jernigan, M. Demirel, O. Keskin, and I. Bahar, Biophys. J. 80, 505 (2001).
- Yang et al. (2007) L. Yang, G. Song, and R. L. Jernigan, Biophys. J. 93, 920 (2007).
- Yang et al. (2009) L. Yang, G. Song, and R. L. Jernigan, Proc. Natl. Acad. Sci. 106, 12347 (2009).
- Bryngelson et al. (1995) J. D. Bryngelson, J. N. Onuchic, N. D. Socci, and P. G. Wolynes, Proteins: Struct., Funct., Bioinf. 21, 167 (1995).
- Onuchic et al. (1997) J. N. Onuchic, Z. Luthey-Schulten, and P. G. Wolynes, Ann. Rev. Phys. Chem. 48, 545 (1997).
- Dill and Chan (1997) K. A. Dill and H. S. Chan, Nat. Struct. Biol. 4, 10 (1997).
- Monod et al. (1965) J. Monod, J. Wyman, and J.-P. Changeux, J. Mol. Biol. 12, 88 (1965).
- Levinthal (1968) C. Levinthal, J. Chim. phys 65, 44 (1968).
- Lange et al. (2008) O. F. Lange, N.-A. Lakomek, C. Farès, G. F. Schröder, K. F. Walter, S. Becker, J. Meiler, H. Grubmuller, C. Griesinger, and B. L. De Groot, Sci. 320, 1471 (2008).
- Caballero-Manrique et al. (2007) E. Caballero-Manrique, J. K. Bray, W. A. Deutschman, F. W. Dahlquist, and M. G. Guenza, Biophys. J. 93, 4128 (2007).
- Copperman and Guenza (2014) J. Copperman and M. G. Guenza, J. Phys. Chem. B (2014).
- Amadei et al. (1993) A. Amadei, A. Linssen, and H. J. Berendsen, Proteins: Struct., Funct., Bioinf. 17, 412 (1993).
- Berendsen and Hayward (2000) H. J. Berendsen and S. Hayward, Curr. Op. Struct. Biol. 10, 165 (2000).
- Maisuradze et al. (2009) G. G. Maisuradze, A. Liwo, and H. A. Scheraga, J. Mol. Biol. 385, 312 (2009).
- Prompers and Brüschweiler (2001) J. J. Prompers and R. Brüschweiler, J. Am. Chem. Soc. 123, 7305 (2001).
- Perico et al. (1995) A. Perico, M. Guenza, M. Mormino, and R. Fioravanti, Biopolym. 35, 47 (1995).
- Doi and Edwards (1986) M. Doi and S. Edwards, The theory of polymer dynamics (1986).
- Perico and Guenza (1985) A. Perico and M. Guenza, J. Chem. Phys. 83, 3103 (1985).
- Zwanzig (2001) R. Zwanzig, Nonequilibrium Statistical Mechanics (Oxford University Press, 2001).
- Lamm and Szabo (1986) G. Lamm and A. Szabo, The Journal of chemical physics 85, 7334 (1986).
- Guenza (2008) M. G. Guenza, J. Phys.: Cond. Matt. 20, 033101 (2008).
- Bixon and Zwanzig (1978) M. Bixon and R. Zwanzig, J. Chem. Phys. 68, 1896 (1978).
- Portman et al. (1998) J. J. Portman, S. Takada, and P. G. Wolynes, Phys. Rev. Lett. 81, 5237 (1998).
- García de la Torre et al. (2000) J. García de la Torre, M. L. Huertas, and B. Carrasco, Biophys. J. 78, 719 (2000).
- Granek (2011) R. Granek, Phys. Rev. E 83, 020902 (2011).
- Rotne and Prager (1969) J. Rotne and S. Prager, J. Chem. Phys. 50, 4831 (1969).
- Sagnella et al. (2000) D. E. Sagnella, J. E. Straub, and D. Thirumalai, J. Chem. Phys. 113, 7702 (2000).
- Zwanzig et al. (1968) R. Zwanzig, J. Kiefer, and G. H. Weiss, Proc. Natl. Acad. Sci. 60, 381 (1968).
- Guenza and Perico (1992) M. Guenza and A. Perico, Macromol. 25, 5942 (1992).
- Favro (1960) L. D. Favro, Phys. Rev. 119, 53 (1960).
- Spronk et al. (2003) C. A. Spronk, S. B. Nabuurs, A. M. Bonvin, E. Krieger, G. W. Vuister, and G. Vriend, J. Biomol. NMR 25, 225 (2003).
- Jamroz et al. (2014) M. Jamroz, A. Kolinski, and S. Kmiecik, Bioinformatics p. btu184 (2014).
- Damm and Carlson (2007) K. L. Damm and H. A. Carlson, J. Am. Chem. Soc. 129, 8225 (2007).
- Levy and Onuchic (2006) Y. Levy and J. N. Onuchic, Annu. Rev. Biophys. Biomol. Struct. 35, 389 (2006).
- Pal et al. (2002) S. K. Pal, J. Peon, and A. H. Zewail, Proc. Natl. Acad. Sci. 99, 1763 (2002).
- Lindorff-Larsen et al. (2010) K. Lindorff-Larsen, S. Piana, K. Palmo, P. Maragakis, J. L. Klepeis, R. O. Dror, and D. E. Shaw, Proteins: Struct., Funct., Bioinf. 78, 1950 (2010).
- Berendsen et al. (1995) H. J. Berendsen, D. van der Spoel, and R. van Drunen, Comp. Phys. Comm. 91, 43 (1995).
- Lindahl et al. (2001) E. Lindahl, B. Hess, and D. Van Der Spoel, J. Mol. Model. 7, 306 (2001).
- Van Der Spoel et al. (2005) D. Van Der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark, and H. J. Berendsen, J. Comput. Chem. 26, 1701 (2005).
- Hess et al. (2008) B. Hess, C. Kutzner, D. Van Der Spoel, and E. Lindahl, J. Chem. Theor. Comput. 4, 435 (2008).
- Towns et al. (2014) J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, et al., Comput. Sci. & Eng. 16, 62 (2014).
- Bussi et al. (2007) G. Bussi, D. Donadio, and M. Parrinello, J. Chem. Phys. 126, 014101 (2007).
- Amadei et al. (1999) A. Amadei, B. De Groot, M.-A. Ceruso, M. Paci, A. Di Nola, and H. Berendsen, Proteins: Struct., Funct., Bioinf. 35, 283 (1999).
- Ruppert (2010) D. Ruppert, Statistics and data analysis for financial engineering (Springer, 2010).
- Jackson (1993) M. B. Jackson, J. Chem. Phys. 99, 7253 (1993).
- Adam and Gibbs (1965) G. Adam and J. H. Gibbs, J. Chem. Phys. 43, 139 (1965).
- Starr et al. (2013) F. W. Starr, J. F. Douglas, and S. Sastry, J. Chem. Phys. 138, 12A541 (2013).
- Wales (2015) D. Wales, The Journal of chemical physics 142, 130901 (2015).
- Palmer III (2001) A. G. Palmer III, Ann. Rev. Biophys. Biomol. Struct. 30, 129 (2001).
- Gao et al. (2000) G. Gao, V. Semenchenko, S. Arumugam, and S. R. Van Doren, J. Mol. Biol. 301, 537 (2000).
- Go et al. (2008) A. Go, S. Kim, J. Baum, and M. H. Hecht, Protein Sci. 17, 821 (2008).
- Lu et al. (2003) J. Lu, D. P. Cistola, and E. Li, J. Mol. Biol. 330, 799 (2003).
- Lee et al. (2003a) G.-i. Lee, Z. Ding, J. C. Walker, and S. R. Van Doren, Proc. Natl. Acad. Sci. 100, 11261 (2003a).
- Lee et al. (2003b) G.-i. Lee, J. Li, J. C. Walker, and S. R. Van Doren, J. Biomol. NMR 25, 253 (2003b).
- Williams et al. (2007) C. Williams, D. Rezgui, S. N. Prince, O. J. Zaccheo, E. J. Foulstone, B. E. Forbes, R. S. Norton, J. Crosby, A. B. Hassan, and M. P. Crump, Structure 15, 1065 (2007).
- Vijay-Kumar et al. (1987) S. Vijay-Kumar, C. E. Bugg, and W. J. Cook, J. Mol. Biol. 194, 531 (1987).
- Lienin et al. (1998) S. Lienin, T. Bremi, B. Brutscher, R. Brüschweiler, and R. Ernst, J. A. Chem. Soc. 120, 9870 (1998).
- Ishima et al. (2001) R. Ishima, R. Ghirlando, J. Tözsér, A. M. Gronenborn, D. A. Torchia, and J. M. Louis, J. Biol. Chem. 276, 49110 (2001).
- Ishima et al. (2003) R. Ishima, D. A. Torchia, S. M. Lynch, A. M. Gronenborn, and J. M. Louis, J. Biol. Chem. 278, 43311 (2003).
- Wong and Case (2008) V. Wong and D. A. Case, J. Phys. Chem. B 112, 6013 (2008).
- Lindorff-Larsen et al. (2005) K. Lindorff-Larsen, R. B. Best, M. A. DePristo, C. M. Dobson, and M. Vendruscolo, Nature 433, 128 (2005).
- Fenwick et al. (2011) R. B. Fenwick, S. Esteban-Martín, B. Richter, D. Lee, K. F. Walter, D. Milovanovic, S. Becker, N. A. Lakomek, C. Griesinger, and X. Salvatella, J. Am. Chem. Soc. 133, 10336 (2011).
- Montalvao et al. (2012) R. W. Montalvao, A. De Simone, and M. Vendruscolo, J. Biomol. NMR 53, 281 (2012).
- Richter et al. (2007) B. Richter, J. Gsponer, P. Várnai, X. Salvatella, and M. Vendruscolo, J. Biomol. NMR 37, 117 (2007).
- Cornilescu et al. (1998) G. Cornilescu, J. L. Marquardt, M. Ottiger, and A. Bax, J. Am. Chem. Soc. 120, 6836 (1998).
- Babu et al. (2001) C. R. Babu, P. F. Flynn, and A. J. Wand, J. Am. Chem. Soc. 123, 2691 (2001).
- Madl et al. (2009) T. Madl, W. Bermel, and K. Zangger, Angewandte Chemie Int. Ed. 48, 8259 (2009).
- Maltsev et al. (2014) A. S. Maltsev, A. Grishaev, J. Roche, M. Zasloff, and A. Bax, J. Am. Chem. Soc. 136, 3752 (2014).
- Komander (2009) D. Komander, Biochem. Soc. Trans. 37, 937 (2009).