# New method for deciphering free energy landscape of three-state proteins

## Abstract

We have developed a new simulation method to estimate the distance between the native state and the first transition state, and the distance between the intermediate state and the second transition state of a protein which mechanically unfolds via intermediates. Assuming that the end-to-end extension is a good reaction coordinate to describe the free energy landscape of proteins subjected to an external force, we define the midpoint extension between two transition states from either constant-force or constant loading rate pulling simulations. In the former case, is defined as a middle point between two plateaus in the time-dependent curve of , while, in the latter one, it is a middle point between two peaks in the force-extension curve. Having determined , one can compute times needed to cross two transition state barriers starting from the native state. With the help of the Bell and microscopic kinetic theory, force dependencies of these unfolding times can be used to locate the intermediate state and to extract unfolding barriers. We have applied our method to the titin domain I27 and the fourth domain of Dictyostelium discoideum filamin (DDFLN4), and obtained reasonable agreement with experiments, using the C-Go model.

## I Introduction

Despite numerous advances achieved within recent years (1), deciphering the free energy landscape (FEL) of biomolecules remains a challenge to molecular biology. The most detailed information on the FEL may be gained from all-atom simulations, but this approach, due to its high computational expenses, is restricted to rather short peptides and proteins (2); (3). In this situation, the AFM (4) or optical tweezers (5) have been proved a very useful tool for probing the FEL of proteins. In most experiments (6); (7), assuming that mechanical unfolding is well described by a two-state scenario, one can extract the distance between the native state (NS) and the transition state (TS), as well as the unfolding barrier . Latest theoretical studies (8); (9); (10) showed that simple coarse-grained models can provide reasonable estimates for those quantities.

In recent experimental works (11); (12), the FEL of three-state proteins has been studied. A schematic plot of the FEL for this class of proteins is shown in Fig. 1. Single molecule force measurements allow the distance between the native and the first transition state (TS1) and the distance between the intermediate state (IS) and the second transition state (TS2) to be evaluated. The unfolding barriers and can also be determined. As far as we know, there were no theoretical attempts to calculate these important parameters. Therefore, the goal of this work is to develop a new method for computing them from simulations. Using the Go model (13) and our new method, we calculated the free energy landscape parameters for the three-state titin domain I27 and the domain DDFLN4. Our results are in reasonable agreement with the experiments (11); (12).

## Ii Method

The new method is based on the fact that the end-to-end extension, , is a good reaction coordinate for describing mechanical unfolding. It should be noted that the direction of the force and therefore is changing upon partial unfolding of a protein. The time for a molecule to go from the NS to the IS is defined as a time needed to achieve the intermediate point between the TS1 and TS2 (Fig. 1). The time to cross the TS2 barrier starting from the IS, , is a time to reach the rod state starting from . We propose to determine from either constant-force or constant loading rate pulling simulations. Using the force dependencies of and , and the Bell formula (14), we can extract and . The unfolding barriers and can be estimated in the framework of the microscopic kinetic theory (15).

To illustrate our method, we used the Go model (13), which is an appropriate choice not only because the construction of FEL for long enough biomolecules by all-atom models is beyond present computational facilities, but also because unfolding of biomolecules is mainly governed by the native conformation topology (8). Moreover, it has been recently shown (9); (10) that the Go model provides reasonable estimates for and in the two-state approximation and is expected to be applicable to three-state molecules. Details of the Go model are given in our previous works (9); (10). We use the same set of parameters as for Go modeling of ubiquitin (16). The main computations were carried out at K , where is the Boltzmann constant, and is the hydrogen bond energy. The friction was chosen to be the same as for water, i.e. (17), where ps. Here the characteristic bond length between successive beads and the typical mass of amino acid residues g (17). For the water viscosity, one can neglect the inertia term and use the Euler method with the time step to solve the corresponding Langevin equation.

Since native titin is a highly heterogeneous polymer, in experiments, one used a poly-protein composed of identical tandem repeats of the Ig module (I27 (18); (19); (11)) to study elastic properties of a single domain at high solution. In the DDFLN4 case, a single DDFLN4 domain is sandwiched between Ig domains I27-30 and domains I31-34 from titin (20). Unfolding of DDFLN4 can be easily studied, as its mechanical stability is much lower than that of Ig-domains (DDFLN4 would have lower peaks in the force-extension curve). In AFM experiments, one end of a poly-protein is anchored to a surface and the another end to a tip of a very soft cantilever. The molecule is stretched by increasing the distance between the surface and the cantilever as the external force acts on one of termini via the tip. Because a poly-protein mechanically unfolds domain by domain, one can consider that one end of a single domain is anchored and the force is applied to the another one. Therefore, as in all-atom steered molecular dynamics simulations (21), when the force is ramped linearly with time, we fix the N-terminal and pull the C-terminal by applying a force . Here is the displacement of the pulled atom from its original position (21), and the spring constant is set to be the same as in the harmonic term of the potential energy for the studied system (13); (16). The pulling direction was chosen along the vector directed from the fixed atom to the pulled one. The pulling speed was set equal to nm/s, which is about 3 - 4 orders of magnitude faster than those used in experiments. In the constant force simulations, we add the term to the total energy of the system, where is the end-to-end distance, and the force applied to the both termini.

We define the unfolding time as an average of the first passage times needed to reach the extension . Different trajectories start from the same native conformation, but with different random number seeds. The unfolding time is an average of the first passage times needed for the molecule to achieve the rod state starting from the extension . In order to get a reasonable estimate for and , we have generated 30 - 50 trajectories for each value of .

## Iii Results

The crucial point in our method is how to determine .
We will show that both simulation ways specified above are valid
for this purpose.

Determination of for I27

Fig. 2a presents the force-extension curve obtained at the speed nm/s. In accordance with the experiments (19) and all-atom simulations (21), we observe two peaks, which ascertain that unfolding proceeds via intermediates (if the protein unfolded without intermediates, a single peak would be observed). The first peak, located at Å, occurs due to a detachment of strand A (Fig. 2a) from the protein core. One can show that the appearance of the second peak at Å is related to full unfolding of strands A’, F, and G. Assuming that is a middle point between two local peaks, we have Å.

We now intend to show that can also be determined from
constant-force simulations.
As is evident from Fig. 2(b),
two
plateaus occur at Å
and Å in the time
dependence of .
Within error bars, locations of plateaus coincide with those for
peaks shown in Fig. 2(a).
Again, the occurrence
of two plateaus indicates that this domain mechanically unfolds in
a three-state manner.
Since the plateaus are related to crossing the unfolding barriers
and ,
the middle point between the
TS1 and TS2 can be identified
as a middle point between two plateaus, i.e.
. For I27,
Å, which is close to the result,
obtained by constant-velocity pulling simulations.

Determination of for DDFLN4

Fig. 3b shows the force-extension profile for the same pulling speed as in the I27 case. Two peaks appear at and Å. Using these values, we obtain Å. The existence of intermediates is also evident from the constant force simulations which give two plateaus at Å and Å (Fig. 3b). Therefore, which is not far from the value followed from the force-extension curve. Since the peaks are not sharp and the plateau position fluctuates, one can consider that two simulation modes gave the same result. We will use the averaged value for computing and for DDFLN4.

In accordance with the experiments (20); (12), the Go model captures the overall behavior of DDFLN4 that it mechanically unfolds via intermediates. However, the location and structure of Go intermediates are different from the experimental ones. In the experimental force-extension curve (20), two peaks occurs at and which are very different from our results (Fig. 3a). Using loop mutations (20), it was suggested that during the first unfolding event (first peak) strands A and B detach from the domain and unfold. Therefore, strands C - G form a stable intermediate structure, which then unfolds in the second unfolding event (second peak). In order to sort out intermediates in the Go model, we plot fractions of native contacts formed by each strand with the rest of the protein as a function of (Fig. 4a). Here, we present the results obtained for the case when the N-terminal is kept fixed and the force is applied to the C-terminal with the same loading rate as in Fig. 3a. At the position of the first peak in the force-extension curve (), strand G fully unfolded, while strands A and B are still structured (Fig. 4a). Thus, contrary to the experiments (20); (12), Go intermediate conformations consist of six strands A-F. A typical snapshot of intermediates is shown in Fig. 4b, where all contacts between G and F are broken, but a single contact between A and F remains intact. At the second peak position () denoted by an arrow in Fig. 4a, together with G, strand F becomes unstructured and most of contacts of strand C are lost. The number of contacts of strands A, B, D and E drops drastically as the protein unfolds quickly after this second unfolding event. As in the experiments (20); (12), at , strands A and B are detached from the core, but in our Go model, strands F and G have already unfolded.

From Fig. 4a, we obtain the following unfolding pathway for DDFLN4:

(1) |

In addition to this dominant pathway, other routes to the rode state are also possible (Fig. 4c). Mechanical unfolding pathways may be different, but they share a common feature that strand G always unfolds first. This also contradicts the experimental suggestion (20) that unfolding initiates from the N-terminal. It is not entirely clear, why the Go model gives different unfolding pathways and intermediates compared to the experiments. Presumably, the discrepancy comes from the simplification of Go modeling, where the non-native interaction and the effect of environment are omitted. All-atom simulations are required to clarify this issue.

Calculation of free energy landscape parameters

Once is found, one can compute the times and as the functions of the external force and extract and , using the Bell equation (14):

(2) |

We have tried several values of in the interval Å, and for I27 and DDFLN4, respectively. Since the results remain essentially the same, we will present those obtained for Å, and for I27, and DDFLN4, respectively. Fig. 5 shows the force dependencies not only for and , but also for the full unfolding time . Strictly speaking, the formula is valid if the probability of missed unfolding events is negligible. This happens when the applied force exceeds several pN, but not in the limit (22). Since our computations were carried out at of tens pN (Fig. 5), the mentioned above equality is applicable to extract .

For I27, is about 2-3 times larger than . It is also evident from Fig. 2(b), which demonstrates that the second plateau exists during shorter time intervals than the first one. A similar situation happens for DDFLN4, but, for high forces, becomes eventually larger than (Fig. 5(b)).

In the low-force regime, fitting the data by Bell Eq. (2) (straight lines in Fig. 5), we obtain , and quoted in Table. For both I27 and DDFLN4 our estimate of agrees very well with the experiments, while that for is a bit higher than the experimental data. Given the simplicity of the Go model, the agreement between the theory and the experiment should be considered reasonable, but it would be interesting to check if a more comprehensive account for non-native contacts and environment could improve our results.

It is to be noted here that although the Go model gives a different location of intermediates in the DDFLN4 force-extension curve in comparison with the experiments, it still provides reasonable estimates for and . This is because the nature of and is different from that of the end-to-end distance: the former are a measure of the force dependence of barrier crossing rates, while the later is a real distance. Nevertheless, one has to be careful in comparison of Go results with experiments on DDFLN4.

In the Bell approximation, one assumes that the location of the transition state does not move under the action of an external force. However, our simulations for ubiquitin, e.g., showed that it does move toward the NS (16). Recently, assuming that depends on the external force and using the Kramers theory, several groups (22); (15) have tried to go beyond the Bell approximation. We follow Dudko et al. who proposed the following force dependence for the unfolding time (15):

(3) |

Here, is the unfolding barrier, and and 2/3 for the cusp (23) and the linear-cubic free energy surface (24), respectively. Note that corresponds to the phenomenological Bell theory (Eq. (2)). An important consequence following from Eq. (3), is that one can apply it to estimate not only , but also , if . Since the fitting with is valid in a wider interval as compared to the case, we consider the former case only. The region, where the fit works well, is expectedly wider than that for the Bell scenario (Fig. (5)). However, for DDFLN4 this fit can not cover the entire force interval.

In the I27 case, from the nonlinear fitting (Eq. (3) and Fig. 5(a)), we obtain Å, and Å, which are larger than the Bell theory-based estimations (see Table). Using raw experimental data (18) and fitting with , in two-state approximation, Dudko et al. (15) obtained , which is close to our result. For DDFLN4, the nonlinear fit (Fig. 5(b)) gives Å, and which are about twice as large as the Bell theory-based estimates (Table). Such high values of are typical for -proteins and they may come from the low resistance of DDFLN4 because the less stable protein, the larger is (9). Recently, Dietz and Rief (25) have shown that the product pN nm is probably universal for all proteins. Using the experimental result pN (20); (22) and , we obtain pN nm which is not far from this universal value. From this point of view, big values of and are still acceptable, but additional experimental data are required to settle this problem.

Theoretical values for , and , followed from Fig. 5, are listed in Table. To estimate the unfolding barriers from the available experimental data for I27 (11) and DDFLN4 (12); (22), we used the following formula:

(4) |

where denotes the unfolding time in the absence of force, and is a typical unfolding prefactor. Since is not known for unfolding, we used the value typical of folding, s (26); (27); (28).

For I27, we used s and total unfolding time s (11). is extracted as . Using these unfolding times and Eq. (4), , and were calculated (Table). The best agreement between theory and experiment is obtained for . Interestingly, using the similar fitting procedure and raw experimental data (18), in two-state approximation, Dudko et al. (15) obtained , which is close to our result.

In the DDFLN4 case, we used s and s of (12) to estimate and . Our result for agrees well with the experimental data, but the theoretical value for turned out higher than the experimental one. This disagreement may be due to the limitation of the Go model. An another possible reason is that the experimental estimations were obtained using the same prefactor s for all cases and this might be invalid.

To conclude, we have proposed a new simulation approach to delineate the FEL of multi-state proteins. Our method is simple to use, and it does not require any extra CPU cost, because the unfolding times , and are computed in a single run for every trajectory. Using this method and the simple Go model, we obtained , and , which are in reasonable agreement with the experimental data for I27 (11) and DDFLN4 (12); (22). There is a discrepancy between theoretical and experimental estimations for some unfolding barriers. Therefore, it would be useful to go beyond the Go model to see if one could obtain better agreement. Our method is universal and may be applied to other multi-state biomolecules. The work in this direction is in progress. One can also extend our approach to the case of folding under quenched force for computing folding barriers.

We thank M. Kouza for helpful discussions. MSL highly appreciates the correspondence with M. Rief on AFM experiments. AMG and AIV are grateful to Kasa im J. Mianowskiego, Polski Koncern Naftowy ORLEN and Fundacja Zygmunta Zalieskiego for sponsoring their visits to Warsaw. MSL was supported by the Ministry of Science and Informatics in Poland (grant No 202-204-234).

(Å) | (Å) | ||||
---|---|---|---|---|---|

I27 | Theory | 3.2 0.1 | 3.0 0.1 | 16.9 | 17.0 |

Exp. (11) | 2.2 | 3.0 | 20.9 | 24.2 | |

DDFLN4 | Theory | 6.1 0.2 | 5.1 0.2 | 25.8 | 18.7 |

Exp. (12); (22) | 4.0 | 5.3 0.4 | 17.4 | 17.2 |

Table. Parameters , and were obtained in the Bell approximation. Theoretical values of the unfolding barriers were extracted from the microscopic theory of Dudko et al (15) (Eq. (3)) with , while their experimental estimates were obtained using Eq. (4) and s.

Figure Captions

FIGURE 1. Schematic plot of the free energy landscape for a three-state protein as a function of the end-to-end distance. and refer to the distance between the NS and the first transition state (TS1) and the distance between the intermediate state (IS) and the second transition state (TS2). The unfolding barrier and . A midpoint between TS1 and TS2 can be determined from simulations.

FIGURE 2. (a) The force-extension curve at the loading rate nm/s for I27. The results are averaged over 100 trajectories. The arrows indicate the position of Å and Å. The vertical solid line refers to Å. The inset shows the native state structure of I27 (PDB ID: 1TIT), which contains seven -strands labeled as A to G. (b) The time dependence of the end-to-end extension for 10 representative trajectories. Dashed lines refer to the first and the second plateau, which are located at Å and Å, respectively. The solid straight line corresponds to Å. K and pN.

FIGURE 3. (a) The same as in Fig. 2a but for DDFLN4. The results are averaged over 100 trajectories. The arrows indicate the position of Å and Å. The dashed line refers to Å. The inset shows the native state structure of DDFLN4 taken from PDB (PDB ID: 1KSR). There are seven -strands: A (6-9), B (22-28), C (43-48), D (57-59), E (64-69), F (75-83), and G (94-97). In the native state, there are 15, 39, 23, 10, 27, 49, and 20 native contacts formed by strands A, B, C, D, E, F and G with the rest of the protein, respectively. The end-to-end distance in the native state Å. (b) The same as in Fig. 2b but for DDFLN4. Dashed lines refer to the first and the second plateau, which are located at Åand Å, respectively. The solid straight line corresponds to Å. K and pN.

FIGURE 4. (a) The dependence of fractions of native contacts on the end-to-end extension for DDFLN4. The results were obtained from the constant loading rate pulling simulations with nm/s as in Fig. 3a. Arrows refer to positions of the peaks in the force-extension curve in Fig. 3a. (b) A typical snapshot at Å. A single contact between strands A and F is not broken (solid line), while all 11 native contacts between strands F and G are already broken (dashed lines). Note that for the cutoff , there is only one contact between A and F in the native state. (c) Shown are probabilities of unfolding pathways for seven -strands. The values of are written on top of the histograms. Results were averaged over 100 trajectories.

FIGURE 5. (a) The force dependencies of unfolding times for (circle), (squares) and (diamonds) for I27 at K. The straight black lines refer to linear fits in the Bell approximation for and . The red curves correspond to the fitting by Eq. (3) with . The unfolding barriers, followed from this non-linear fit, are listed in Table. (b) The same as in (a) but for DDFLN4. The fit curves go up at high forces, where the Eq. (3) is no longer valid (15).

### References

- J. N. Onuchic and P. G. Wolynes, Curr. Opin. Struct. Biol. 14, 70 (2004)
- S. Gnanakaran, et al., Curr. Opin. Struct. Biol. 13, 168 (2003)
- P. H. Nguyen, G. Stock, E. Mittag, C. K. Hu, and M. S. Li, Proteins: Structures, Functions, and Bioinformatics 61, 795 (2005).
- M. Rief, et al., Science 276, 1109 (1997)
- M. S. Z. Kellermayer, S. B. Smith, H. L. Granzier, and C. Bustamante, Science 276, 1112 (1997).
- D. J. Brockwell et al., Biophys. J 83, 458 (2002).
- M. Carrion-Vazquez et al., Prog. Biophys. Mol. Biol. 74, 63 (2000).
- D. K. West, D. J. Brockwell, P. D. Olmsted et al., Biophys. J. 90, 287 (2006).
- M. S. Li, Biophys. J. 93, 2644 (2007)
- M. Kouza, C. K. Hu, and M. S. Li, J. Chem. Phys. 128,045103 (2008).
- P. M. Williams et al., Nature 422, 446 (2003).
- I. Schwaiger, M. Schleicher, A.A. Noegel, and M. Rief, EMBO reports 6, 46 (2005).
- C. Clementi and H. Nymeyer and J. N. Onuchic, J. Mol. Biol. 298, 937 (2000).
- G. I. Bell, Science 100, 618 (1978).
- O. K. Dudko, G. Hummer, and A. Szabo, Phys. Rev. Lett. 96, 108101 (2006).
- M. S. Li, M. Kouza, and C. K. Hu, Biophys. J. 92, 547 (2007).
- T. Veitshans, D. K. Klimov, and D. Thirumalai, Folding and Design 2, 1 (1997).
- M. Carrion-Vasquez et al., Proc. Natl. Acad. Sci. USA 96, 3694 (1999).
- P. E. Marszalek et al., Nature 402, 100 (1999).
- I. Schwaiger, A. Kardinal, M. Schleicher, A. A. Noegel, and M. Rief, Nat. Struct. Mol. Biol. 11, 81 (2004).
- H. Lu and B. Isralewitz and A. Krammer and V. Vogel and K. Schulten, Biophys. J 75, 662 (1998).
- M. Schlierf, and M. Rief, Biophys. J. 90, L33 (2006).
- G. Hummer and A. Szabo, Biophys. J. 85, 5 (2003).
- O.K. Dudko, A.E Filippov, J. Klafter and M. Urbakh, Proc. Natl. Acad. Sci. USA 100, 11378 (2003).
- H. Dietz, and M. Rief, Phys. Rev. Lett. 100, 098101 (2008).
- M. S. Li, D. K. Klimov, and D. Thirumalai, Polymer 45, 573 (2004); M. Kouza, M. S. Li, C. K. Hu, E. P. O’Brien Jr., and D. Thirumalai, J. Phys. Chem. A 110, 671 (2006).
- B. Schuler, E. A. Lipman, and W. A. Eaton, Nature 419, 743 (2002).
- M. Schlierf, and M. Rief, J. Mol. Biol. 354, 497 (2005).