Universal effects of solvent species on the stabilized structure of a protein
We investigate the effects of solvent specificities on the stability of the native structure (NS) of a protein on the basis of our free-energy function (FEF). We use CPB-bromodomain (CBP-BD) and apoplastocyanin (apoPC) as representatives of the protein universe and water, methanol, ethanol, and cyclohexane as solvents. The NSs of CBP-BD and apoPC consist of 66 -helices and of 35 -sheets and 4 -helices, respectively. In order to assess the structural stability of a given protein immersed in each solvent, we contrast the FEF of its NS against that of a number of artificially created, misfolded decoys possessing the same amino-acid sequence but significantly different topology and a-helix and -sheet contents. In the FEF, we compute the solvation entropy using the morphometric approach combined with the integral equation theories, and the change in electrostatic (ES) energy upon the folding is obtained by an explicit atomistic but simplified calculation. The ES energy change is represented by the break of protein-solvent hydrogen bonds (HBs), formation of protein intramolecular HBs, and recovery of solvent-solvent HBs. Protein-solvent and solvent-solvent HBs are absent in cyclohexane. We are thus able to separately evaluate the contributions to the structural stability from the entropic and energetic components. We find that for both CBP-BD and apoPC, the energetic component dominates in methanol, ethanol, and cyclohexane, with the most stable structures in these solvents sharing the same characteristics described as an association of a-helices. In particular, those in the two alcohols are identical. In water, the entropic component is as strong as or even stronger than the energetic one, with a large gain of translational, configurational entropy of water becoming crucially important, so that the relative contents of -helix and -sheet and the content of total secondary structures are carefully selected to achieve sufficiently close packing of side chains. If the energetic component is excluded for a protein in water, the priority is given to closest side-chain packing, giving rise to the formation of a structure with very low -helix and -sheet contents. Our analysis, which requires minimal computational effort, can be applied to any protein immersed in any solvent and provides robust predictions that are quite consistent with the experimental observations for proteins in different solvent environments, thus paving the way toward a more detailed understanding of the folding process.
Protein folding is one of the most fundamental examples of biological self-assembly processes, and unveiling its mechanism is a crucially important task for understanding life phenomena. Shortly after it was established by Anfinsen Anfinsen73 () that the primary sequence encoded all the necessary information to obtain the three-dimensional native fold, it also became clear that protein folding could be achieved only in aqueous environment. The hydrophobic effect has long been discussed as an essential factor driving a protein to fold. Ball08 () According to the conventional view Kauzmann59 (); Tanford62 (), when a nonpolar group, which cannot participate in hydrogen bonding of water, comes in contact with water, structuring of water occurs near the nonpolar group for retaining as many hydrogen bonds (HBs) as possible, causing entropic instability. The amount of such unstable water is reduced upon the burial of nonpolar groups within the protein interior. This is the only origin of the hydrophobic effect. We have proposed a substantially different view: Kinoshita08 (); Yoshidome12 (); Kinoshita13 (); Oshima15 () Protein folding in aqueous solution under physiological conditions leads to a gain of the translational, configurational entropy of water in the entire system (); and this is the hydrophobic effect. The reduction of water crowding (i.e., entropic correlation among water molecules coexisting with the protein) is the principal contributor to the water-entropy gain.Yoshidome12 (); Kinoshita13 (); Oshima15 () The folding is also accompanied by a decrease in the protein intramolecular interaction energy (), increase in the protein-water interaction energy (), decrease in the water-water interaction energy () due to the structural reorganization of water released to the bulk, and loss of the protein conformational entropy (). Each of , , consists of electrostatic (ES) and van der Waals (vdW) components. It is clear that , , and favorably promote the folding whereas and oppose it. (More detailed information is presented in Table 1.) However, the task of accounting for each of these physical factors in a theoretical model is daunting and has not been accomplished yet.
A clue to the protein folding mechanism and relative magnitudes of the physical factors mentioned above is to know how the stability of the native structure (NS) of a protein is affected when water is replaced by a different, less polar solvent. A previous study Pace04 () suggested that the NS would become unstable in most of less polar solvents such as ethanol and methanol, very stable in nonpolar solvents such as cyclohexane, and even more stable in vacuum. Experimental observations Hirota98 () indicate that alcohol induces a protein to form Î±-helices and the resultant helical structure is independent of the alcohol species. The aim of this study is to elucidate the effects of solvent specificities on the structural stability of a protein and deepen the understanding of the folding process. In the preceding work,Hayashi17 () we studied the structures stabilized in water and a hard-sphere solvent whose particle diameter and packing fraction were set at those of water. We chose a particular protein, protein G, consisting of 27 Î±-helices and 39 Î²-sheets. The NS and a number of artificially created, misfolded decoys were examined, and the structure giving lowest value to our free-energy function (FEF)11â13 was identified in water or the hard-sphere solvent. In this study, on the other hand, we resort to two alternative and wildly different (from each other and from protein G) proteins: one including only Î±-helices and the other only -sheets (strictly, it consists of 35 -sheets and 4 a-helices). Furthermore, we consider a total of four different solvents matching those exploited in the aforementioned experiments: water, methanol, ethanol, and cyclohexane, thus changing the polarity (see Table 2), molecular size, and packing fraction. The proteins in vacuum are also considered. As in our preceding work,11 the NS and a number of decoys are examined for each protein and the same FEF is applied to the identification of the most stable fold in each solvent. In Table 1, when a different solvent is considered, âwaterâ should be replaced by âsolventâ. We use a simplified and yet realistic representation of all of the entropic and energetic contributions to thermodynamics of protein folding where the FEF fully accounts for , neglects because only compact structures are considered, assumes that , , cancel out, and represents , , and in terms of the change in the sum of protein-solvent, solvent-solvent, and protein-protein hydrogen bonding energies. The relative magnitudes of the physical factors are dependent on the solvent species. In cyclohexane modeled as a completely nonpolar solvent, and . In vacuum,, ,, ,, but and can be neglected. The solvation entropy, which is denoted simply by hereafter, is dependent on the solvent species: It becomes smaller as the molecular diameter increases and/or the packing fraction decreases. is also influenced by the solvent-solvent interaction potential. The effect of relative to that of becomes larger as the molecular polarity of the solvent decreases. is computed using the morphometric approach (MA)Yoshidome12 (); Oshima15 (); Koning04 (); Roth06 () combined with the integral equation theories. Kinoshita08 (); Hansen06 (); Kusalik88a (); Kusalik88b (); Kinoshita96 (); Cann97 () The change in the sum of hydrogen bonding energies upon the folding is obtained by an explicit atomistic but simplified calculation. For each model solvent, we estimate the parameters used in calculating the FEF by referring to the experimental data. As a great advantage, the evaluation of the FEF is accomplished in 1 s per protein structure on a standard workstation, so that considerably many different structures can be examined quite efficiently at once.Hayashi17 (); Yoshidome09 (); Yasuda11 () The results common to both the -helix-rich and -sheet-rich proteins can briefly be summarized as follows. The NS is identified as the most stable structure in water, manifesting the reliability of our FEF. The structures stabilized in methanol, ethanol, cyclohexane, and vacuum are characterized by high -helix contents (associated a-helices). Those in methanol and ethanol are identical. The structure stabilized in cyclohexane is also quite similar. In vacuum, the structure possessing the maximum number of protein intramolecular hydrogen bonds (IHBs) is the most stable. It is an association of -helices as structural units but the number of the units is smaller than that in methanol, ethanol, and cyclohexane. It is worthwhile to add that the NS of the -helix-rich protein is somewhat different from the structures stabilized in the three other solvents and vacuum. These results are suggestive that only in water a variety of structures with different -helix and -sheet contents are simultaneously stabilized depending on the amino-acid sequence. The physical reasons for the effects of solvent species are discussed in detail. It is also argued that the results obtained are consistent with the aforementioned experimental observations for proteins in different solvent environments. This study shows that our FEF can be applied to any protein immersed in any solvent, underpins the stabilization effects of different solvents on the folding mechanism within a single theoretical framework, and highlights the unique role played by water.
Ii Model and Theory
ii.1 Entropic excluded-volume effect
The solvent plays a significant role in protein folding through the entropic excluded-volume (EV) effect Yoshidome12 (); Kinoshita13 (); Oshima15 () as pictorially illustrated in Fig.1. The EV is defined as the volume of the space unaccessible to the centers of solvent molecules. The formation of secondary structures (-helix and -sheet) by the backbone and the packing of side chains lead to the reduction of the total EV. This reduction results in a gain of the solvent entropy.Kinoshita13 (); Oshima15 (); Harano05 (); Yasuda10 (); Yasuda12 () The close packing of side chains with a variety of geometric features is especially important from the entropic point of view. We note that there are both translational and rotational contributions to the solvent-entropy gain, but the former is much larger than the latter even when the solvent is water. Kinoshita08 (); Yoshidome08 () As detailed below, we exploit a molecular model for the solvent because the EV effect cannot be described by the dielectric continuum model. The solvent molecules are entropically correlated because the presence of each solvent molecule generates an EV for the other solvent molecules. We refer to this solvent-solvent entropic correlation as âsolvent crowdingâ. Upon protein folding, the solvent crowding is significantly reduced, and this protein-solvent many-body correlation makes an essential contribution to the solvent-entropy gain. It also plays a pivotal role in cold and pressure denaturation of a protein.Yoshidome12 (); Kinoshita13 (); Oshima15 () The many-body correlation is not taken into account in the Asakura-Oosawa (AO) theory. Asakura54 (); Asakura58 () See Appendix A for more information. In the conventional view of the hydrophobic effect,Kauzmann59 (); Tanford62 () when water is replaced by a nonpolar solvent, the contact between the solvent and a nonpolar group is no more unstable, leading to the absence of solvophobic effect. In our view, by contrast, the solvophobic effect does exist due to the aforementioned solvent crowding.
ii.2 Free-energy function
For a protein immersed in a solvent, the FEF () is defined as11â13
where is the absolute temperature and is the Boltzmann constant ( is set at ). and its energetic and entropic components, and , respectively, are dependent on the protein structure. is defined as â for a decoy structureâ minus â for the NSâ:
and â are also defined in similar fashions. When is negative, the decoy structure is more stable than the NS for the specified solvent.
ii.3 Solvent models
|Solvent||Diameter (nm)||Packing fraction (-)||Multipoles||Reduced Dipole moment (-)|
In order to span the entire spectrum of solvent polarity, we consider the four solvents, water, methanol, ethanol, and cyclohexane (see Table 2) using the experimentally measured density at 298 K and 1 atm for each solvent. As in our earlier works,Hayashi17 (); Yoshidome12 (); Yasuda11 () the model water molecule adopted is a hard sphere with diameter =0.28 nm (=0.7317; is the number density of bulk solvent) in which a point dipole and a point quadrupole of tetrahedral symmetry are embedded. Kinoshita08 (); Kusalik88a (); Kusalik88b () Methanol and ethanol molecules are modeled as dipolar hard spheres. Their diameters, dipole moments, and packing fractions are estimated as follows. Assuming that monohydric alcohol molecules interact through the Stockmayer potential, for a polar gas of a monohydric alcohol Monchick and Mason Monchick61 () determined its potential parameters by adapting the Chapman-Enskog theory combined with experimental viscosity data. We modify the parameters in the dipole-dipole interaction part to make the potential pertinent to a liquid state. Specifically, the dipole moment is replaced by a larger, effective one accounting for the polarization effect,Jorgensen86 () and the orientations of the dipole moments of two molecules are chosen so that the dipole-dipole attractive interaction can be maximized (i.e., the most probable orientations Kinoshita91 () are chosen). The diameter is then evaluated as the distance between two molecular centers at which the potential equals (=298 K). This procedure is followed for both methanol and ethanol. Once is estimated, the packing fraction can be calculated using the experimental data of methanol or ethanol density at 298 K and 1 atm. The results are =0.347 nm, =1.678, and =0.326 for methanol and =0.403 nm, =1.059, and =0.352 for ethanol. (For water, =7.662 and =0.383: The effect of molecular polarizability is incorporated in the water model using a self-consistent mean field theory. Kusalik88a (); Kusalik88b ()) A cyclohexane molecule is modeled as a neutral hard sphere with diameter =0.56 nm. According to X-ray crystallography data at K, cyclohexane is in a plastic solid state.30 The molecules are rotationally disordered about the lattice points of the face-centered cubic cell with a length of each side of 0.87 nm.30 The density calculated from the solid data is slightly higher than that of liquid cyclohexane. Hence, the effective diameter of molecules in liquid state is also slightly smaller than the value estimated from the solid data. We set at the double of the molecular diameter of water. Using the experimentally measured density of cyclohexane at 298 K and 1 atm, we obtain =0.512. Table 3 summarizes the relevant quantities for each solvent. A word of caution is in order here. In our model systems, the pressure varies from solvent to solvent and departs from 1 atm. For example, the pressure of cyclohexane is much higher than 1 atm because the solvent-solvent vdW attractive interaction is neglected. However, the solvation entropy is determined by and rather than by the pressure, and these are carefully estimated as described above. In the preceding paper,Hayashi17 () we considered water and a hard-sphere solvent sharing the same values of and . The pressure of the hard-sphere solvent is much higher than that of water, nevertheless the solvation entropies are not significantly different.
ii.4 Calculation of entropic component
For a protein with a prescribed structure, the solvation entropy in Eq. (1) is calculated under the isochoric condition. A thermodynamic quantity of solvation calculated under the isochoric condition is not influenced by the expansion or compression of bulk solvent upon the solute insertion.Cann97 () Hence, it is physically more insightful than that calculated under the isobaric condition. The calculation of is performed by a hybrid method combining the integral equation theory (IET)Kinoshita08 (); Hansen06 (); Kusalik88a (); Kusalik88b (); Cann97 () and the morphometric approach (MA). Yoshidome12 (); Oshima15 (); Koning04 (); Roth06 () An angle-dependent version Kinoshita08 (); Kusalik88a (); Kusalik88b (); Cann97 () of the IET is employed for water, methanol, and ethanol whereas the IET for cyclohexane is its radial-symmetric counterpart. Hansen06 () Since is fairly insensitive to the protein-solvent interaction potential and influenced primarily by the geometric characteristics of the polyatomic structure,Imai06 () the protein can be modeled as a set of fused, neutral hard spheres. The diameter of each protein atom is set at the Lennard-Jones potential parameter assigned to it. In the MA, is expressed as the following morphometric form:Yoshidome12 (); Kinoshita13 (); Oshima15 (),
The EV (), solvent-accessible surface area (), and integrated mean and Gaussian curvatures of the accessible surface ( and , respectively) act as the geometric measures of the polyatomic structure. The four coefficients , which are considered to be dependent only on the solvent species and its thermodynamic state, can then be determined beforehand by treating isolated hard-sphere solutes with various diameters. The calculation procedure is summarized below.
Calculate the solvation entropy of an isolated hard-sphere solute (SIHSS) with diameter using the IET. Sufficiently many different values of , which are in the range , are considered. For cyclohexane whose molecular diameter is considerably large, the range considered is . For water, methanol, and ethanol, both of the translational and rotational contributions to the solvation entropy are taken into account though the former is much larger than the latter.
Determine by applying the least-squares method to the following morphometric form for isolated hard-sphere solutes:
(4) Solvent (nm) (nm) (nm) (-) Water -0.19676 0.04517 0.25671 -0.35690 Methanol -0.08726 0.04905 -0.03971 0.00918 Ethanol -0.07010 0.03247 -0.02605 0.00075 Cyclohexane -0.11111 0.20763 -0.38348 0.19614 Table 4: Four coefficients in the morphometric forms of Eqs. (3) and (4).
The values of thus determined for water, methanol, ethanol, and cyclohexane are collected in Table 4.
Calculate , , , and of a protein with a prescribed structure using an extended version of Connollyâs algorithm. Connolly83 (); Connolly85 () The coordinates of the center of each protein atom and its diameter are served as the input data. The value of is taken from the CHARMM22 force field. MacKerell98 ()
Calculate from Eq. (3) in which determined in step (2) are used.
In earlier works Oshima15 (); Roth06 () we corroborated that the MA gives sufficiently accurate results. We tested two types of simple solvents and calculated of protein G with a variety of structures via two different routes: the three-dimensional integral equation theory (3D-IET) Ikeguchi95 (); Kinoshita02 () and the MA combined with the radial-symmetric version of the IET. The protein polyatomic structure is explicitly treated at the atomic level by the 3D-IET. The error of the combined approach was smaller than . Moreover, the calculation of was remarkably accelerated by the application of the MA. Roth06 () In the morphometric form hinging on the Hadwiger theorem, Likos95 () the four coefficients are equal to thermodynamic quantities of pure bulk solvent. In the case of the solvation free energy , for instance, the first and second coefficients are the pressure and the surface tension , respectively. However, the Hadwiger theorem is valid only for an infinitely large solute. As argued in our earlier works, Yoshidome12 (); Kinoshita13 (); Oshima15 (); Kinoshita17 () the form becomes problematic when it is applied to a nonpolar solute immersed in water. At =1 atm, the EV term is negligibly small, and the form is approximated by (). However, becomes larger as is lowered with the result that increases and the hydrophobicity is strengthened. This conflicts with the experimental evidence that at low temperatures the hydrophobicity is weakened and a protein is denatured. Yoshidome12 (); Kinoshita13 (); Oshima15 () No such problem arises in our form, because the four coefficients are calculated using the IET.
ii.5 Calculation of energetic component
In the calculation of , we choose a fully extended structure possessing the maximum number of protein-solvent hydrogen bonds (HBs) and no protein intramolecular HBs (IHBs) as the reference structure: for the reference structure. When a protein folds into a compact structure, many donors and acceptors (nitrogen and oxygen atoms) of the protein are buried in the interior after the break of protein-solvent HBs, but many IHBs are formed. The break and the formation lead to an increase and a decrease in energy, respectively. is calculated on the basis of this concept. We note that protein-solvent HBs are absent in cyclohexane and vacuum. In order to determine whether an IHB is formed or not, we employ the criteria proposed by McDonald and Thornton, McDonald94 () that require all the following conditions to be satisfied: The distance between centers of D and A (D is a donor and A is an acceptor) is shorter than 3.9 ; the distance between centers of the hydrogen atom (H) and A is shorter than 2.5 ; and the angle formed by âDâHAâ (the dotted line signifies an HB) is larger than 90. We examine the donors and acceptors not only for backbone-backbone but also backbone-side chain and side chain-side chain IHBs. When a water-accessible surface area of a donor or an acceptor, which is calculated using Connollyâs algorithm, is smaller than 0.001 , we consider that it is buried.
ii.6 Hydrogen-bonding parameters for water
The thermodynamic cycle referring to water is illustrated in Fig. 2(a). Upon the burial of a donor and an acceptor in the protein interior, when an IHB is formed (e.g., NH W (Exposed) + OW (Exposed) OHN (Buried); âWâ denotes an oxygen atom in a water molecule), we assume that there is no energy change occurring. When the burial of a donor or an acceptor is not accompanied by the IHB formation (e.g., NHW (Exposed) NH (Buried) and OW (Exposed) O (Buried)), an energy increase of is assumed. The formation of an IHB undergoes no energy change when the donor, acceptor, and IHB are all exposed (e.g., NHW (Exposed) + O W (Exposed) OHN (Exposed)). On the other hand, the formation of an IHB within the protein interior leads to a decrease in energy of â2E (e.g., NH (Buried) + O (Buried) OHN (Buried)). It follows from the thermodynamic cycle that . See Appendix A for more information. is set at (=298 K) (see Fig. 2(a)) for the following reason. According to the result of quantum-chemistry calculations, if an H-acceptor pair (H is the hydrogen atom covalently bound to the donor) is completely isolated in vacuum, can be approximated by -. Mitchell91 () However, the pair is in the protein interior where atoms with positive and negative partial charges are present (factor 1). Further, the decrease in energy upon protein folding occurs due to the electrostatic interaction other than hydrogen bonding as well (factor 2). Factors 1 and 2 make E smaller and larger, respectively, but factor 2 dominates for a water-soluble protein. This leads to our estimate, (). Hayashi17 () We note that this estimate, in which all the electrostatic interactions are effectively included in the hydrogen bonding energies, has also been used successfully in our previous studies on protein folding. Yoshidome12 (); Hayashi17 (); Yoshidome09 (); Yasuda11 (); Yasuda12 ()
ii.7 Hydrogen-bonding parameters for methanol and ethanol
How do the protein-water and protein intramolecular hydrogen-bonding parameters change when water is replaced by either methanol or ethanol? We first consider the energy lowering arising from the formation of protein-solvent HBs. We calculate the number of hydrogen atoms of water, methanol, or ethanol in contact with a nitrogen atom or with an oxygen atom in a peptide. Kinoshita00 () That is, we count the number of peptide nitrogen-water oxygen or peptide oxygen-water oxygen HBs. The calculations were made using the dielectrically-consistent reference interaction site model (DRISM) theory Perkyns92 () combined with an all-atom model for the peptide and a modified SPC/E model Pettit82 (); Berendsen87 () for water. The results are as follows (the numbers are normalized by those in the case where the solvent is water): =0.454 and =0.609 for methanol and =0.367 and =0.524 for ethanol. and become smaller in the order, water¿methanol¿ethanol. There are two reasons for this. Firstly, the number density of hydrogen atoms in the bulk liquid decreases in this order. Secondly, the steric hindrance by the hydrocarbon group in an alcohol molecule makes it more difficult for an oxygen atom in an alcohol molecule to form hydrogen bonding with a nitrogen atom or with an oxygen atom in a peptide, and since the hydrocarbon group in an ethanol molecule is bulkier than that in the methanol molecule, the steric hindrance effect for ethanol is larger. The average value defined by is found to be 0.532 for methanol and 0.446 for ethanol. This leads to an estimate of for methanol and for ethanol (e.g., NHX (Exposed) NH (Buried) and OX (Exposed) O (Buried); X=M or E; âMâ or âEâ denotes the oxygen atom in a methanol or ethanol molecule). , the energy decrease upon the formation of an IHB within the protein interior, should be independent of the solvent species. Furthermore, we assume that the burial of an IHB (e.g., OHN (Exposed) OHN (Buried)) accompanies no energy change as in the case of water. The protein-solvent and protein intramolecular hydrogen-bonding parameters for methanol and ethanol can be constructed as illustrated in Figs. 2 (b) and (c), respectively.
ii.8 Hydrogen-bonding parameters for cyclohexane
Cyclohexane is a paradigmatic example of a nonpolar solvent. Even in the absence of a solvent, a donor, acceptor, and IHB are not isolated in vacuum, and significantly many protein atoms with positive and negative partial charges are close to the pair. For simplicity, we assume that the formation of an IHB always leads to an energy decrease of , Hayashi17 () regardless of whether the donor, acceptor, or IHB is buried or not. Hence, is calculated simply by counting the number of IHBs in a protein with a prescribed structure. The resulting thermodynamic cycle is illustrated in Fig. 2(d), which is also applied to a protein in vacuum. As further elaborated in Section V.1 , the qualitative aspects of our conclusions are robust and not affected by the uncertainty of the hydrogen-bonding parameters set for methanol, ethanol, and cyclohexane (also see Appendix B).
ii.9 Preparation of the native and decoy structures
For the NSs of CPB-bromodomain (CBP-BD) and apoplastocyanin (apoPC), we adopt the most recent data obtained by X-ray crystallography experiments (PDB CODE: 4OUF for CBP-BD45 and 2PCY for apoPC46). The NSs are shown in Figs. 3 (a) and (b), respectively. CBP-BD possesses 114 residues and apoPC 99 residues. The NS of CBP-BD is characterized by an -helix content of 66. The -sheet and -helix contents in the NS of apoPC are 35 and 4, respectively. (In this study, the -helix and -sheet contents are calculated using the DSSP program. Kabsch83 ()) We prepare compact decoy structures using different tools. Firstly, we exploit the 3Drobot tool, Deng16 () which is an extension of the fragment assembly simulation protocol which starts from multiple structure scaffolds identified from the input structure. Unlike the other existing protocols, 3Drobot does not scarify the IHBs and the compactness of the input structure. In this study, the NS is served as the input structure and the Root-Mean-Square-Distance (RMSD) cutoff for the output decoys is set at 15 : Their values of RMSD from the input structure are smaller than 15 Ã . We thus obtain 1000 decoys for each protein that are denoted as â3Drobotâ hereafter. Since â3Drobotâ includes decoys which are quite similar to the NS, it is suited to a stringent test of our FEF for a protein immersed in water. However, its main drawback hinges on its tendency to generate only the structures whose -helix and -sheet contents are not far from those of the input structure (i.e., the NS).
In order to ensure the presence of compact decoys with a wider variety of Î±-helix and Î²-sheet contents, we devise a new procedure detailed below and schematically represented in Fig. 4. We employ the so-called âTop500 protein databaseâ Chen10 () that is regarded as a good representative of the entire PDB universe and where the template structures of a protein possess a resolution of 1.8 or higher, with no missing atoms, a low clash score, and without unusual amino-acid substitutions. Starting from a protein structure taken from this database, we parse different fragments matching the length of the target protein. Retaining only the backbone C atom representation of a matching fragment, we create its homopolymer-coarse-grained representation50,51 in which every amino acid is modeled as two spherical beads, one representing the backbone atoms with a vdW radius matching that of glycine, and the other representing the side-chain atoms with a radius of 2.5 , being the approximate averaged value of the vdW radii for the 20 amino acids. Each side-chain sphere is tangent to the corresponding backbone sphere and is placed in the negative normal direction with respect to the local Frenet frame of the corresponding amino acid as detailed in a recent article. Skrbic16b () Using this coarse-grained homopolymer representation, we perform fragment compactification using a Monte Carlo Metropolis scheme aimed at minimizing the gyration radius of the chain.Skrbic16a (); Skrbic16b () We employ only pivot moves with maximum angle of two degrees, in order to minimally affect the original secondary structure of the starting protein fragment. We then go back to the original all-atom protein representation employing primary sequence of the target protein and using the PULCHRA tool. Rotkiewicz08 () Finally, we perform an energy minimization through a steepest descent algorithm using the GROMACS package vanderSpoel05 () and discard any structure suffering unrepairable atomic clashes detected. The structures generated by this procedure are denoted as âDiverse SSâ (âSSâ is an abbreviation of âsecondary structuresâ) in the rest of the paper. We also prepare a complete -helix (a single helix), âall-â, and associated two -helices, âall--2â, as shown in Figs. 3 (c) and (d), respectively. The structure all--2 is optimized using a molecular dynamics (MD) simulation with water modeled as a dielectric continuum. Further, very compact structures are generated using an MD simulation in vacuum. The initial structure is either the NS or all-, and only the protein intramolecular vdW interaction and the bonded energy are taken into consideration. The structures generated via this method are identified as âMD (vdW)â, with some of them possessing little secondary structures. Last, we generate a set of structures classified as âMD (GB)â. Many of them are characterized by quite high -helix contents. They are generated using an MD simulation with water modeled as a dielectric continuum. For CBP-BD, the initial structure is either all--2 or one of the two structures chosen from â3Drobotâ as those which are quite stable in vacuum (i.e., in terms of IHBs). For apoPC, the initial structure is either all--2 or one of the two structures chosen from âDiverse SSâ as those with relatively higher -helix contents. The slight, unrealistic overlaps of the constituent atoms occurring in a protein structure are removed by the local minimization of the energy function using the CHARMM54 and MMTSB55 packages based on the CHARMM22 force field MacKerell98 () combined with the CMAP correction Mackerell04 () and the GBMV implicit solvent model. Lee03 (); Chocholousova06 () We believe that the decoy structures generated in this study are sufficiently many and diverse for the following reasons: (1) 3Drobot used in this study was developed especially for testing a free-energy function in terms of its performance of discriminating the NS (i.e., the structure stabilized in aqueous solution under physiological conditions) from a number of misfolded decoys. The structures which resemble the NS very much are included in the decoys generated by 3Drobot; (2) in addition to 3Drobot, using our own new procedure, we generate decoys with a wide variety of Î±-helix and Î²-sheet contents and values of the RMSD from the NS; and (3) further, using MD simulations, we generate significantly many structures. One of the MD simulations is performed in vacuum. Only the protein intramolecular vdW interaction and the bonded energy are taken into consideration (i.e., all of the partial charges of protein atoms are shut off). This type of simulation tends to generate more compact structures with fewer intramolecular hydrogen bonds and less content of the secondary structures. The other MD simulation is undertaken so that structures with quite high -helix contents can be generated. The decoys in (2) and (3) are necessitated because we consider not only water but also methanol, ethanol, and cyclohexane as the solvent.
Iii Results for an -helix-rich protein: CPB-BD
We now proceed to assess the structural stability of a protein in water, methanol, ethanol, cyclohexane, and vacuum by contrasting the FEF of its NS with that of all the misfolded decoys generated by the procedure explained above. We start with CPB-BD, an -helix-rich protein. We discriminate between the entropic and energetic components of the FEF to underpin the relative role of each component. ,, or identifies a particular decoy that is more stable than the NS in each solvent with respect to the FEF, entropic component, or energetic one.
Figure 5 shows the relation between the RMSD from the NS in terms of the C atoms and (a), (b), or (c). Interestingly, there are significantly many structures which are more stable than the NS with respect to one of and . However, the NS is always the optimal choice in terms of the sum of the two (i.e. in terms of ) and is thus the most stable. The two components are often opposing, which is exemplified in the plots for âMD (vdW)â. âMD (vdW)â includes the structures which are extremely compact and entropically more stable than the NS but suffer the lack of IHBs causing the energetic instability. Though there is a tendency that ÎF decreases and the stability becomes higher as the -helix content increases as observed in Fig. 6(a), all- and all--2 are much less stable than the NS. As expected, all- is the most stable with respect to . It is interesting that for the -helix-rich protein is less correlated with the -sheet content (see Fig. 6(b)).
In methanol or ethanol
Figure 7 reports (a), (b), or (c) as a function of the RMSD and for methanol. The analog plots for ethanol are shown in (d), (e), and (f), respectively. Overall, the results for methanol and ethanol look quite similar. Significantly many structures are more stable than the NS in terms of , for which the energetic component is mainly responsible. The same structure depicted in Fig. 8(a) is found to be the most stable in the two alcohols. It is characterized by an association of four -helices with an -helix content of 72 . It is significantly different from the NS (see Fig. 3). We look at the second most stable one, but this structure and the structure in Fig. 8(a) share the same characteristics expressed as âassociated -helicesâ. It is possible that there is a structure which is even more stable than the two structures but missing in the decoy structures generated. However, we believe that it also comprises associated -helices. As observed in Figs. 9 (a) and (c), the stability tends to become higher as the -helix content increases for both methanol and ethanol. However, neither all--2 nor the all- is very stable. is less correlated with the -sheet content (see Figs. 9(b) and (d)).
In Fig. 10, we plot (a), (b), or (c) against the RMSD. The results observed in Fig. 10 are qualitatively similar to those obtained for the two alcohols. The structure identified as the most stable one is shown in Fig. 8(b). It is characterized by an association of four -helices with an -helix content of 61. As compared in Fig. 8(c), the most stable structures in methanol, ethanol, and cyclohexane are similar to one another but somewhat different from the NS (see Fig. 8 (d)). There is a tendency that the stability becomes higher as the -helix content increases, but it is less correlated with the -sheet content (the plots are not shown).
There is only the energetic component for vacuum. The relation between the RMSD and is obtained by replacing in Fig. 10(c) by . The most stable structure is shown in Fig. 8(e). It is characterized by an association of two -helices with an -helix content of 71. Again, there is a tendency that the stability becomes higher as the -helix content increases, but it is less correlated with the -sheet content (the plots are not shown).
Iv Results for a -sheet-rich protein: apoPC
We then consider the results for the mirrored calculation for apoPC, a -sheet-rich protein.
Figure 11 shows the relation between the RMSD from the NS in terms of the C atoms and (a), (b), or (c). The NS is again correctly identified as the most stable structure using , though there are significantly many structures which are more stable than the NS with respect to one of the energetic and entropic components. In contrast to the -helix-rich protein, the structure of the -sheet-rich protein tends to become more stable as the Î²-sheet content increases whereas the stability is less correlated with the -helix content (see Figs. 12(a) and (b)).
In methanol or ethanol
Since the results for methanol and ethanol are qualitatively the same, we present only those for methanol. As observed in Figs. 13(a), (b), and (c) where ,, and are plotted against the RMSD, respectively, there are structures which are more stable than the NS in terms of , which is attributed to the energetic component. The most stable structure is illustrated in Fig. 14(a). We note that the same structure is identified as the most stable one in ethanol. It is characterized by an association of four Î±-helices with an Î±-helix content of 54. It is interesting that is almost equally correlated with the -helix and -sheet contents (see Figs. 15 (a) and (b)).
As in the case of CPB-BD, the results are qualitatively similar to those for the two alcohols. The structure identified as the most stable one in the two alcohols is also the most stable in cyclohexane (see Fig. 14(a)). Though the NS is rich in the -sheet, the structure stabilized in methanol, ethanol, and cyclohexane is rich in the -helix. is almost equally correlated with the -helix and -sheet contents (the plots are not shown).
The most stable structure in vacuum is shown in Fig. 14(b). It is characterized by an association of two -helices with an -helix content of 71. is almost equally correlated with the -helix and -sheet contents (the plots are not shown).
v.1 Structures of a protein stabilized in different model solvents
The formation of the -helix or -sheet leads not only to a solvent-entropy gain (see Fig. 1(a) and (b)) but also to the compensation of break of protein-solvent HBs by the assurance of IHBs. For this reason, these secondary structures are fundamental units most favored in the NS. In general, the -helix is capable of forming more IHBs than the -sheet, because in the latter significantly many donors and acceptors remain unpaired. Therefore, when the energetic component dominates, the -helix is more favorable than the -sheet and a structure possessing the highest possible -helix content is stabilized. In the absence of the solvent-entropy effect (i.e., in vacuum), the structure with the maximum number of IHBs is highly favorable. In terms of the number of backbone-backbone IHBs, all-Î± is the most stable. However, backbone-side chain and side chain-side chain IHBs can also be formed, which makes a significant contribution to the structural stability. We find that an association of two Î±-helices is more stable than all-Î±. It is even more stable than all-Î±-2 due to a more favorable arrangement of the two Î±-helices (compare Fig. 3(d) with Figs. 8(e) and 14(b)). Remarkably, this result is independent of the amino-acid sequence and therefore universal. We then consider a protein immersed in cyclohexane, ethanol, and methanol. The strength of the energetic component follows the expected order, âcyclohexane¿ethanol¿methanolâ (see Fig. 2). The energetic penalty caused by the break of a protein-solvent HB, which vanishes in cyclohexane, is larger in methanol than in ethanol, resulting in the above ordering (see Fig. 2). The differences among the three solvents in terms of the strength of entropic EV effect are rather small. This can be rationalized as follows. The EV effect becomes stronger as dS decreases and/or increases. The values of and of the three solvents are given in Table 3. According to the AO theory,Asakura54 (); Asakura58 ()which accounts for only the EV term of the solute-solvent pair correlation Hansen06 () but can be used in an approximate discussion, the strength is in proportion to (0.939, 0.873, and 0.914 for methanol, ethanol, and cyclohexane, respectively). Qualitatively, all the three solvents possess the common characteristic that the energetic component is stronger than the entropic one but the latter also contributes significantly to the structural stability. With respect to the solvent-entropy effect, close packing of side chains is the most important. Since this effect is moderately strong in methanol, ethanol, and cyclohexane, the close side-chain packing is also necessitated. A good solution is to construct an association of more than two -helices achieving overall closer side-chain packing and more reduction of the total EV: We find that an association of four -helices (see Figs. 8(a), (b), and 14(a)) is quite stable. As discussed above, the structures stabilized in vacuum, cyclohexane, ethanol, and methanol can all be characterized by an association of -helices as structural units, which is suggestive that the qualitative aspects of our conclusions are not affected by the uncertainty of the hydrogen-bonding parameters set for these three solvents. However, the number of the units is smaller in vacuum than in methanol, ethanol, and cyclohexane. This result is independent of the amino-acid sequence. Even the -sheet protein, apoPC, would change its structure to an association of a-helices if placed in these solvent environments. In water, the effect of the entropic component is exceptionally strong. Thanks to the hydrogen-bonding network, water exists as a dense liquid at ambient temperature and pressure despite its quite a small molecular size (=1.37). More importantly, in our earlier works Oshima15 (); Hayashi17 (); Yoshidome09 (); Yasuda11 () we demonstrated that the effect is considerably stronger in water than in the hard-sphere solvent for which and are set at the values of water. In other words, the dependence of the solvation entropy of a protein on its structure becomes largest when the solvent is water, which originates from the strong water-water attractive interaction potential. On the other hand, the energetic penalty is quite large, much larger than that of methanol (see Fig. 2). It follows that in water the entropic component is at least as strong as the energetic one. It is required that the secondary structures (preferably, the -helix) be formed as much as possible, but close packing of side chains is also imperative. The close packing cannot always be achievable with a structure possessing a high -helix content depending on the amino-acid sequence. A good example is apoPC for which the -sheet is preferentially chosen to achieve the close side-chain packing.
v.2 Comparison with experimental observations
According to the experimental observations, Hirota98 () alcohol induces a protein to form Î±-helices and the helical structure induced by alcohol is independent of the alcohol species. Our results are in good accord with these observations. For proteins in nonpolar solvents, two additional interesting features are known in the literature: Pace04 (); Griebenow96 (); Klibanov01 ()
An enzyme is not soluble in a nonpolar solvent but forms suspensions (some water molecules are retained on the enzyme surface). The enzyme NS, which is not collapsed, is considerably more thermostable than in water.
According to the experimental studies examining enzymes in aqueous-organic mixtures,Griebenow96 () an enzyme is denatured. However, as stated in (1), when the enzyme is introduced into a nonpolar solvent, its NS is retained. The reason for this counter-intuitive behavior is that in the absence of water, enzymes are very rigid.
Thus, it is likely that the NS of a protein is retained even after it is introduced into a nonpolar solvent. This can be interpreted as follows. In the nonpolar solvent, the NS becomes highly stable (more stable than in water), because it is very difficult to break the protein IHBs already formed in the NS. That is, the NS is a metastable state: There is a very high free-energy barrier for the protein to overcome to reach the most stable structure that should be characterized by associated a-helices. It is also well known that the addition of a small amount of water to the protein-nonpolar solvent system (the environment is still far from the aqueous one) makes a protein more flexible and denatured.Griebenow96 (); Klibanov01 () In this case, the protein can overcome the free-energy barrier for the denaturation because the energy increase due to the break of IHBs can be compensated by the energy decrease brought by the formation of protein-water HBs. The protein can change its structure to the most stable one. Thus, features (1) and (2) are consistently interpretable though they are not actually reproduced in our calculations.
We have investigated the structures of two proteins, CPB-bromodomain (CBP-BD) Mujtaba04 () and apoplastocyanin (apoPC), Garret84 () stabilized in model water, methanol, ethanol, cyclohexane, and vacuum using our free-energy function (FEF). Hayashi17 (); Yoshidome09 (); Yasuda11 () The structure stabilized in aqueous solution under the physiological condition is referred to as ânative structure (NS)â. The NS of CBP-BD possesses an -helix content of 66. That of apoPC possesses -sheet and -helix contents of 35 and 4, respectively. A water molecule is modeled as a hard sphere in which a point dipole and a point quadrupole of tetrahedral symmetry are embedded.Kusalik88a (); Kusalik88b () The model of a methanol or ethanol molecule is a hard sphere in which only a point dipole is embedded.Kinoshita91 () A cyclohexane molecule is modeled as a neutral hard sphere. The transition to a compact structure of a protein is accompanied by the break of protein-solvent hydrogen bonds (HBs), formation of protein intramolecular HBs (IHBs), and recovery of solvent-solvent HBs. These are taken into account in the energetic component of the FEF. Protein-solvent and solvent-solvent HBs are not present in cyclohexane and vacuum. The hydrogen-bonding parameters employed in calculating the energetic component have newly been determined for methanol and ethanol. The structural transition is also accompanied by a solvent-entropy gain except in vacuum. The diameter, packing fraction, and multipoles are parameterized to match the basic properties of each solvent. The solvation entropy of a protein with a prescribed structure is calculated using a radial-symmetric Hansen06 () or angle-dependent Kinoshita08 (); Kusalik88a (); Kusalik88b (); Kinoshita96 (); Cann97 () integral equation theory combined with our morphometric approach. Yoshidome12 (); Oshima15 (); Koning04 (); Roth06 () For a protein, it is important to form as many IHBs as possible (requirement 1). This requirement becomes stronger in the order: âvacuum=cyclohexane¿ethanol¿methanol¿waterâ. It is also important to keep the solvent entropy as high as possible except in vacuum (requirement 2). This requirement becomes stronger in the order: âwater¿methanolï¾ethanolï¾cyclohexaneâ. The optimally stabilized structure is determined by the competition of requirements 1 and 2. However, requirement 1 dominates in all solvents except water, and the priority is then given not to the close side-chain packing but to the formation of a maximum number of IHBs. In this case, the -helix is more favorable than the -sheet because the donors and acceptors left unbounded unavoidably remain in the latter. Therefore, the most stable structure is characterized by an association of -helices with a high -helix content. For both of CBP-BD and apoPC, the most stable structures in methanol and ethanol are the same and can be characterized by an association of a-helices as structural units. The most stable structure in cyclohexane is also similar. In vacuum, the structure possessing the maximum number of IHBs is stabilized. It is also characterized by an association of -helices as structural units. Since the solvent-entropy effect certainly works in methanol, ethanol, and cyclohexane, the number of the units is made larger than in vacuum so that the close side-chain packing can be achieved to some extent. The qualitative characteristics of the structures stabilized in vacuum and the three solvents are the same for both CBP-BD and apoPC and therefore almost independent of the amino-acid sequence. They are somewhat different from the NS even for the -helix-rich protein CBP-BD because of the crucial role played by the close side-chain packing in water. In water, requirement 2 is as strong as or even stronger than requirement 1. In particular, the close packing of side chains is essential. A high -helix content is not necessarily suited to the achievement of sufficiently close packing. There are many cases where the -sheet is the preferentially selected elemental unit. The contents of -helix, -sheet, and total secondary structure are optimized in terms of the interplay of the energetic and entropic effects. This explains why, only in water, a variety of structures are stabilized depending on the amino-acid sequence. It has been verified that our FEF is capable of discriminating the NS from a number of decoy structures as the one for which the FEF becomes lowest. In this study, we consider CPB-BD and apoPC. In our preceding work, Hayashi17 () we considered protein G with 56 residues possessing -helix and -sheet contents of 27 and 39, respectively. Importantly, we considered 133 proteins in an earlier work. Yasuda11 () Thus, the number of proteins we have tested is 136. Using the same FEF, we have been successful in discriminating the NS from a number of decoys for 131 proteins. We were unsuccessful for the other 5 proteins. Yasuda11 () However, this nonsuccess can be justified as follows. For two of them, the structures stabilized under acidic conditions (pH=3.5 and 4.5) are regarded as the NSs. They should be significantly different from the true NSs. For any of the other three proteins, portions of the terminus sides are removed and a significantly high percentage of the secondary structures is lost. In summary, as long as the NS is not unrealistic, we have always been successful in the discrimination. A membrane protein is in the environment that is similar to a nonpolar solvent like cyclohexane. The fact that its stabilized structure is usually characterized by associated -helices is in good accord with the result mentioned above. There is another type of membrane-protein structure featuring the -barrel. In the -barrel there are very few donors and acceptors without forming IHBs unlike in the -sheet. In this sense, the -barrel is more like the -helix. It is intriguing to explore if the -barrel structure becomes more stable than an association of -helices when the amino-acid sequence of a -barrel protein is used in the calculation. It is definite that the solvent-entropy effect plays a pivotal role in distinguishing the two types of structures through the achievement of close packing of side chains. For a -barrel protein, an association of -helices should be much more unfavorable from the entropic point of view. Work in this direction is in progress.
Acknowledgements.One of the authors (M. K.) developed the computer program for the MA with R. Roth and Y. Harano. This work was supported by Grant-in-Aid for Scientific Research (B) (No. 17H03663) from Japan Society for the Promotion of Science (JSPS) to M. K. and by MIUR PRIN-COFIN2010-2011 (contract 2010LKE4CC) to A. G. The use of the SCSCF multiprocessor cluster at the UniversitÃ Caâ Foscari Venezia is gratefully acknowledged.
- (1) C. B. Anfinsen, Science 181, 223 (1973).
- (2) P. Ball, Chem. Rev. 108, 74 (2008).
- (3) W. Kauzmann, Advances in Protein Chem. 14, 1 (1959).
- (4) 4C. Tanford, J. Am. Chem. Soc. 84, 4240 (1962).
- (5) M. Kinoshita, J. Chem. Phys. 128, 024507 (2008).
- (6) T. Yoshidome and M. Kinoshita, Phys. Chem. Chem. Phys. 14, 14554 (2012).
- (7) M. Kinoshita, Biophys. Rev. 5, 283 (2013).
- (8) H. Oshima and M. Kinoshita, J. Chem. Phys. 142, 145103 (2015).
- (9) C. N. Pace, S. TreviÃ±o, E. Prabhakaran, and J. M. Scholts, Phil. Trans. R. Soc. Lond. B 359, 1225 (2004).
- (10) N. Hirota, K. Mizuno, and Y. Goto, J. Mol. Biol. 275, 365 (1998).
- (11) T. Hayashi, S. Yasuda, T. Å krbiÄ, A. Giacometti, and M. Kinoshita, J. Chem. Phys. 147, 125102 (2017).
- (12) T. Yoshidome, K. Oda, Y. Harano, R. Roth, Y. Sugita, M. Ikeguchi, and M. Kinoshita, Proteins 77, 950 (2009).
- (13) S. Yasuda, T. Yoshidome, Y. Harano, R. Roth, H. Oshima, K. Oda, Y. Sugita, M. Ikeguchi, and M. Kinoshita, Proteins 79, 2161 (2011).
- (14) P.-M. KÃ¶nig, R. Roth, and K. R. Mecke, Phys. Rev. Lett. 93, 160601 (2004).
- (15) R. Roth, Y. Harano, and M. Kinoshita, Phys. Rev. Lett. 97, 078101 (2006).
- (16) .-P. Hansen and L. R. McDonald, Theory of Simple Liquids, 3rd ed. (Academic Press, London, 2006).
- (17) P. G. Kusalik and G. N. Patey, J. Chem. Phys. 88, 7715 (1988).
- (18) P. G. Kusalik and G. N. Patey, Mol. Phys. 65, 1105 (1988).
- (19) M. Kinoshita and D. R. BÃ©rard, J. Comput. Phys. 124, 230 (1996).
- (20) N. M. Cann and G. N. Patey, J. Chem. Phys. 106, 8165 (1997).
- (21) Y. Harano and M. Kinoshita, Biophys. J. 89, 2701 (2005).
- (22) S. Yasuda, T. Yoshidome, H. Oshima, R. Kodama, Y. Harano, and M. Kinoshita, J. Chem. Phys. 132, 065105 (2010).
- (23) S. Yasuda, H. Oshima, and M. Kinoshita, J. Chem. Phys. 137, 135103 (2012).
- (24) T. Yoshidome, M. Kinoshita, S. Hirota, N. Baden, and M. Terazima, J. Chem. Phys. 128, 225104 (2008).
- (25) S. Asakura and F. Oosawa, J. Chem. Phys. 22, 1255 (1954).
- (26) S. Asakura and F. Oosawa, J. Polym. Sci. 33, 183 (1958).
- (27) L. Monchick and E. A. Mason, J. Chem. Phys. 35, 1676 (1961).
- (28) W. L. Jorgensen, J. Phys. Chem. 90, 1276 (1986).
- (29) M. Kinoshita and M. Harada, Mol. Phys. 74, 443 (1991).
- (30) N. P. Funnell, M. T. Dove, A. L. Goodwin, S. Parsons, and M. G. Tucker, J. Phys.: Condens. Matter 25, 454204 (2013).
- (31) T. Imai, Y. Harano, M. Kinoshita, A. Kovalenko, and F. Hirata, J. Chem. Phys. 125, 24911 (2006).
- (32) M. L. Connolly, Science 221, 709 (1983).
- (33) M. L. Connolly, J. Am. Chem. Soc. 107, 1118 (1985).
- (34) A. D. MacKerell, Jr., D. Bashford, M. Bellott, R. L. Dunbrack, Jr., J. D. Evanseck, M. J. Field, S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph-McCarthy, L. Kuchnir, K. Kuczera, F. T. K. Lau, C. Mattos, S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, W. E. Reiher III, B. Roux, M. Schlenkrich, J. C. Smith, R. Stote, J. Straub, M. Watanabe, J. WioÌrkiewicz-Kuczera, D. Yin, and M. Karplus, J. Phys. Chem. B 102, 3586 (1998).
- (35) M. Ikeguchi and J. Doi, J. Chem. Phys. 103, 5011 (1995).
- (36) M. Kinoshita, J. Chem. Phys. 116, 3493 (2002).
- (37) C. N. Likos, K. R. Mecke, and H. Wagner, J. Chem. Phys. 102, 9350 (1995).
- (38) M. Kinoshita and T. Hayashi, Phys. Chem. Chem. Phys. 19, 25891 (2017).
- (39) I. K. McDonald and J. M. Thornton, J. Mol. Biol. 238, 777 (1994).
- (40) J. B. O. Mitchell and S. L. Price, Chem. Phys. Lett. 180, 517 (1991).
- (41) M. Kinoshita, Y. Okamoto, and F. Hirata, J. Am. Chem. Soc. 122, 2773 (2000).
- (42) J. S. Perkyns and B. M. Pettitt, J. Chem. Phys. 97, 7656 (1992).
- (43) B. M. Pettit and P. J. Rossky, J. Chem. Phys. 77, 1451 (1982).
- (44) H. J. C. Berendsen, J. R. Grigera, and T. P. Straatsma, J. Phys. Chem. 91, 6269 (1987).
- (45) S. Mujtaba, Y. He, L. Zeng, S. Yan, O. Plotnikova, Sachchidanand, R. Sanchez, N. J. Zeleznik-Le, Z. Ronai, and M.-M. Zhou, Mol. Cell 13, 251 (2004).
- (46) T. P. J. Garrett, D. J. Clingeleffer, J. M. Guss, S. J. Rogers, and H. C. Freeman, J. Biol. Chem. 259, 2822 (1984).
- (47) W. Kabsch and C. Sander, Biopolymers 22, 2577 (1983).
- (48) H. Deng, Y. Jia, and Y. Zhang, Bioinformatics 3, 378 (2016).
- (49) V. B. Chen, W. B. Arendall 3rd, J. J. Headd, D. A. Keedy, R. M. Immormino, G. J. Kapral, L. W. Murray, J. S. Richardson, and D. C. Richardson, Acta Crystallogr. D 66, 12 (2010); http://kinemage.biochem.duke.edu/databases/top8000.php.
- (50) T. Skrbic, A. Badasyan, T. Hoang, R. Podgornik, and A. Giacometti, Soft Matter 12, 4783 (2016)
- (51) T. Skrbic, T. Hoang, and A. Giacometti, J. Chem. Phys. 145, 084904 (2016).
- (52) P. Rotkiewicz and J. Skolnick, 2008. J. Comput. Chem. 29, 1460 (2008).
- (53) D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark, and H. J. C. Berendsen, J. Comput. Chem. 26, 1701 (2005).
- (54) B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, M. Karplus, J. Comput. Chem. 4, 187 (1983).
- (55) M. Feig, J. Karanicolas, and C. L. Brooks III, J. Mol. Graphics Modell. 22, 377 (2004).
- (56) A. D. Mackerell, Jr., M. Feig, and C. L. Brooks III, J. Comput. Chem. 25, 1400 (2004).
- (57) M. S. Lee, M. Feig, F. R. Salsbury, Jr., and C. L. Brooks III, J. Comput. Chem. 24, 1348 (2003).
- (58) J. ChocholousÌovaÌ and M. Feig, J. Comput. Chem. 27, 719 (2006).
- (59) K. Griebenow and A. M. Klibanov, J. Am. Chem. Soc. 47, 11695 (1996).
- (60) K. Klibanov, Nature 409, 241 (2001).
- (61) T. Lazaridis and M. E. Paulaitis, J. Phys. Chem. 96, 3847 (1992).
- (62) H. S. Ashbaugh and M. E. Paulaitis, J. Phys. Chem. 100, 1900 (1996).
- (63) T. Yoshidome, Y. Harano, and M. Kinoshita, Phys. Rev. E, 79, 011912 (2009).
- (64) Y. Karino and N. Matubayasi, J. Chem. Phys. 134, 041105 (2011).
- (65) F. Kamo, R. Ishizuka, and N. Matubayasi, Protein Sci. 25, 56 (2016).
- (66) Y. Yamamori, R. Ishizuka, Y. Karino, S. Sakuraba, and N. Matubayasi, J. Chem. Phys. 144, 085102 (2016)
Appendix A Comment of consistency with all-atom computer simulation studies
Since a rather simplified set of models are employed in this study, it may be worthwhile to comment on the consistency with all-atom computer simulation studies for the following two representative points: the importance of the translational entropy of water and the reliability of our physical picture illustrated in Fig. 2(a). Using a Monte Carlo simulation, Paulaitis and coworkers Lazaridis92 (); Ashbaugh96 () studied relative magnitudes of the translational and orientational restrictions contributing to the solvation entropy of a nonpolar solute inserted into water. They considered only the solute-water pair correlation term but showed that the translational component is larger than the orientational one (the former takes 55â70 of the total). Later, Kinoshita and coworkers Kinoshita13 (); Oshima15 (); Yoshidome09 () examined the translational and orientational components and their pair and many-body correlation terms. It was shown that when the many-body correlation term is also considered, the translational component is much larger than the orientational one. We note, however, that the orientational component is also taken into account in this study using the angle-dependent version Kinoshita08 (); Kusalik88a (); Kusalik88b (); Kinoshita96 (); Cann97 () of the IET. In Fig. 2(a), upon the burial of a donor and an acceptor in the protein interior, when an IHB is formed (e.g., NHW (Exposed) + OW (Exposed) OHN (Buried); âWâ denotes an oxygen atom in a water molecule), we assume that there is no energy change occurring. This assumption is consistent with the results observed by Matubayasi and coworkers Karino11 (); Kamo16 (); Yamamori16 () in all-atom MD simulations of a set of protein structures immersed in water: The protein-water electrostatic interaction energy is strongly correlated with the protein intramolecular electrostatic interaction energy; when the former becomes higher, the latter becomes lower (when the former becomes lower, the latter becomes higher); and the magnitudes of changes in the two quantities are not significantly different from each other.
Appendix B Robustness of the results against uncertainty of the hydrogen-bonding parameters
Since the results for methanol, ethanol, and cyclohexane are almost the same, they are robust against the uncertainty of the hydrogen-bonding parameters in Figs. 2(b), (c), and (d). The robustness of the result for water was already corroborated in our earlier work (see Fig.2(a)).12 However, we show it for the two proteins considered in this study, CPB-BD and apoPC. As judged from in Fig. 16, even when in Fig. 2(a) is changed to or ( is changed to or ), the conclusions are not altered at all. Our FEF is capable of discriminating the NS from the decoys as the structure with lowest value of the FEF.