Efficient model chemistries for peptides. II.
Basis set convergence in the B3LYP method. Pablo ECHENIQUE
Instituto de Biocomputación y Física de Sistemas Complejos (BIFI),
and Departamento de Física Teórica, Universidad de Zaragoza,
Pedro Cerbuna 12, E50009 Zaragoza, Spain
Email: echenique.p@gmail.com Gregory A. CHASS
Global Institute Of COmputational Molecular and Materials Science (GIOCOMMS),
and School of Chemistry, University of Wales, Bangor, Gwynedd, LL57 2UW United Kingdom,
and College of Chemistry, Beijing Normal University, Beijing, 100875, China
PACS: 07.05.Tp; 31.15.Ar; 31.50.Bc; 87.14.Ee; 87.15.Aa; 89.75.k
Keywords: peptides, quantum chemistry, PES, B3LYP, basis set convergence
Abstract
Small peptides are model molecules for the amino acid residues that are the constituents of proteins. In any bottomup approach to understand the properties of these macromolecules essential in the functioning of every living being, to correctly describe the conformational behaviour of small peptides constitutes an unavoidable first step. In this work, we present an study of several potential energy surfaces (PESs) of the model dipeptide HCOLAlaNH. The PESs are calculated using the B3LYP densityfunctional theory (DFT) method, with Dunning’s basis sets ccpVDZ, augccpVDZ, ccpVTZ, augccpVTZ, and ccpVQZ. These calculations, whose cost amounts to approximately 10 years of computer time, allow us to study the basis set convergence of the B3LYP method for this model peptide. Also, we compare the B3LYP PESs to a previous computation at the MP2/6311++G(2df,2pd) level, in order to assess their accuracy with respect to a higher level reference. All data sets have been analyzed according to a general framework which can be extended to other complex problems and which captures the nearness concept in the space of model chemistries (MCs).
1 Introduction
In any bottomup attempt to understand the behaviour of protein molecules (in particular, the still elusive protein folding process [3, 1, 5, 2, 4]), the characterization of the conformational preferences of short peptides [13, 12, 7, 11, 6, 9, 10, 8] constitutes an unavoidable first step. Due to the lower numerical effort required and also to the manageability of their conformational space, the most frequently studied peptides are the shortest ones: the dipeptides [14, 17, 16, 15], in which a single amino acid residue is capped at both the N and Ctermini with neutral peptide groups. Among them, the most popular choice has been the alanine dipeptide [34, 30, 26, 23, 27, 24, 21, 22, 6, 20, 29, 19, 33, 25, 31, 28, 32, 18], which, being the simplest chiral residue, shares many similarities with most of the rest of dipeptides for the minimum computational price.
Although classical force fields [35, 36, 37, 38, 39, 40, 41, 42, 43] are the only feasible choice for simulating large molecules at present, they have been reported to yield inaccurate potential energy surfaces (PESs) for dipeptides [44, 45, 46, 47, 29] and short peptides [48, 6]. Therefore, it is not surprising that they are widely recognized as being unable to correctly describe the intricacies of the whole protein folding process [49, 50, 51, 44, 52, 53, 54, 55]. On the other hand, albeit prohibitively demanding in terms of computational resources, ab initio quantum mechanical calculations [56, 57, 58] are not only regarded as the correct physical description that in the long run will be the preferred choice to directly tackle proteins (given the exponential growth of computer power and the advances in the search for pleasantly scaling algorithms [60, 59]), but they are also used in small peptides as the reference against which less accurate methods must be compared [61, 62, 44, 45, 47, 29, 6] in order to, for example, parameterize improved generations of additive, classical force fields for polypeptides.
However, despite the sound theoretical basis, in practical quantum chemistry calculations a plethora of approximations must be typically made if one wants to obtain the final results in a reasonable human time. The exact ‘recipe’ that includes all the assumptions and steps needed to calculate the relevant observables for any molecular system has been termed model chemistry (MC) by John Pople. In his own words, a MC is an “approximate but welldefined general and continuous mathematical procedure of simulation” [63].
After assuming that the particles involved move at nonrelativistic velocities and that the greater weight of the nuclei allows to perform the BornOppenheimer approximation, we are left with the problem of solving the nonrelativistic electronic Schrödinger equation [60]. The two starting approximations to its exact solution that a MC must contain are, first, the truncation of the electron space (in wavefunctionbased methods) or the choice of the functional (in DFT) and, second, the truncation of the oneelectron space, via the LCAO scheme (in both cases). The extent up to which the first truncation is carried (or the functional chosen in the case of DFT) is commonly called the method and it is denoted by acronyms such as RHF, MP2, B3LYP, CCSD(T), FCI, etc., whereas the second truncation is embodied in the definition of a finite set of atomcentered Gaussian functions termed basis set [60, 64, 57, 58, 65], which is also designated by conventional short names, such as 631+G(d), TZP or ccpVTZ(–f). If we denote the method by a capital and the basis set by a , the specification of both is conventionally denoted by and called a level of the theory. Typical examples of this are RHF/321G or MP2/ccpVDZ [56, 57, 58].
Note that, apart from these approximations, which are the most commonly used and the only ones that are considered in this work, the MC concept may include a lot of additional features: the heterolevel approximation (explored in a previous work in this series [34]), protocols for extrapolating to the infinitebasis set limit [66, 67, 68, 69, 70], additivity assumptions [71, 72, 73, 74], extrapolations of the MøllerPlesset series to infinite order [75], removal of the socalled basis set superposition error (BSSE) [76, 77, 78, 79, 80, 81, 82], etc. The reason behind most of these techniques being the urging need to reduce the computational cost of the calculations.
Now, although general applicability is a requirement that all MCs must satisfy, general accuracy is not mandatory. Actually, the fact is that the different procedures that conform a given MC are typically parameterized and tested in very particular systems, which are often small molecules. Therefore, the validity of the approximations outside that native range of problems must be always questioned and checked. However, while the approximate computational cost of a given MC for a particular system is rather easy to predict on the basis of simple scaling relations, its expected accuracy on a particular problem could be difficult to predict a priori, specially if we are dealing with large molecules in which interactions in very different energy scales are playing a role. The description of the conformational behaviour of peptides (or, more generally, flexible organic species), via their PESs in terms of the soft internal coordinates, is one of such problems and the one that is treated in this work.
To this end, we first describe, in sec. 2, the computational and theoretical methods used throughout the rest of the document. Then, in sec. 3, we introduce a basic framework that rationalizes the actual process of evaluating the efficiency of any MC for a complex problem. These general ideas are used, in sec. 4, to perform an study of the densityfunctional theory (DFT) B3LYP [83, 84, 85, 86] method with the ccpVDZ, augccpVDZ, ccpVTZ, augccpVTZ, and ccpVQZ Dunning’s basis sets [87, 88]. To this end, we apply these levels of the theory to the calculation the PES of the model dipeptide HCOLAlaNH (see fig. 1), and assess their efficiency by comparison with a reference PES. Finally, in sec. 5, the most important conclusions are briefly summarized.
2 Methods
All ab initio quantum mechanical calculations have been performed using the GAMESSUS program [89, 90] under Linux and on 2.2 GHz PowerPC 970FX machines with 2 GB RAM memory.
The internal coordinates used for the Zmatrix of the HCOLAlaNH dipeptide in the GAMESSUS input files are the Systematic Approximately Separable Modular Internal Coordinates (SASMIC) ones introduced in ref. 91. They are presented in table 1 (see also fig. 1 for reference).
Atom name  Bond length  Bond angle  Dihedral angle 

H  
C  (2,1)  
N  (3,2)  (3,2,1)  
O  (4,2)  (4,2,1)  (4,2,1,3) 
C  (5,3)  (5,3,2)  (5,3,2,1) 
H  (6,3)  (6,3,2)  (6,3,2,5) 
C  (7,5)  (7,5,3)  (7,5,3,2) 
C  (8,5)  (8,5,3)  (8,5,3,7) 
H  (9,5)  (9,5,3)  (9,5,3,7) 
H  (10,8)  (10,8,5)  (10,8,5,3) 
H  (11,8)  (11,8,5)  (11,8,5,10) 
H  (12,8)  (12,8,5)  (12,8,5,10) 
N  (13,7)  (13,7,5)  (13,7,5,3) 
O  (14,7)  (14,7,5)  (14,7,5,13) 
H  (15,13)  (15,13,7)  (15,13,7,5) 
H  (16,13)  (16,13,7)  (16,13,7,15) 
All PESs in this study have been discretized into a regular 1212 grid in the bidimensional space spanned by the Ramachandran angles and , with both of them ranging from to in steps of . To calculate the PES at a particular level of the theory, we have run constrained energy optimizations at each point of the grid, freezing the two Ramachandran angles and at the corresponding values. In order to save computational resources, the starting structures were taken, when possible, from PESs previously optimized at a lower level of the theory. All the basis sets used in the study have been taken from the GAMESSUS internally stored library, and spherical Gaussiantype orbitals (GTOs) have been preferred, thus having 5 dtype and 7 ftype functions per shell.
We have computed 5 PESs, using the DFT B3LYP [83, 84, 85, 86] method with the ccpVDZ, augccpVDZ, ccpVTZ, augccpVTZ, and ccpVQZ Dunning’s basis sets [87, 88]. The total cost of these calculations in the machines used is around 10 years of computer time.
Also, let us note that the correcting terms to the PES coming from massmetric tensors determinants and from the determinant of the Hessian matrix have been recently shown to be relevant for the conformational behaviour of peptides [18]. (The latter may be regarded as a residual entropy arising from the elimination of the hard coordinates from the description.) Although, in this study, we have included none of these terms, the PES calculated here is the greatest part of the effective free energy [18], so that it may be considered as the first ingredient for a further refinement of the study in which the correcting terms are taken into account. The same may be said about another important source of error in the calculation of relatives energies in peptide systems: the already mentioned BSSE [31].
In order to compare the PESs produced by the different MCs, a statistical criterium (distance) introduced in ref. 92 has been used. Let us recall here that this distance, denoted by , profits from the complex nature of the problem studied to compare any two different potential energy functions, and . From a working set of conformations (in this case, the 144 points of each PES), it statistically measures the typical error that one makes in the energy differences if is used instead of the more accurate , admitting a linear rescaling and a shift in the energy reference.
Despite having energy units, the quantity approximately presents all properties characteristic of a typical mathematical metric in the space of MCs (hence the word ‘distance’), such as the possibility of defining a symmetric version of it and a fulfillment of the triangle inequality (see ref. 92 for the technical details and sec. 3 for more about the importance of these facts). It also presents better properties than other quantities customarily used to perform these comparisons, such as the energy RMSD, the average energy error, etc., and it may be related to the Pearson’s correlation coefficient by
(1) 
where is the standard deviation of in the working set.
Moreover, due to its physical meaning, it has been argued in ref. 92 that, if the distance between two different approximations of the energy of the same system is less than , one may safely substitute one by the other without altering the relevant dynamical or thermodynamical behaviour. Consequently, we shall present the results in units of (at K, so that kcal/mol).
Finally, if one assumes that the effective energies compared will be used to construct a polypeptide potential and that it will be designed as simply the sum of monoresidue ones (more complex situations may be found in real problems [93]), then, the number of residues up to which one may go keeping the distance between the two approximations of the the residue potential below is [92]
(2) 
According to the value taken by for a comparison between a fixed reference PES, denoted by , and a candidate approximation, denoted by , we shall divide the whole accuracy range in sec. 4 in three regions depending on the accuracy: the protein region, corresponding to , or, equivalently, to ; the peptide region, corresponding to , or ; and, finally, the inaccurate region, where , and even for a dipeptide it is not advisable to use as an approximation to . Of course, these are only approximate regions based on the general idea that we are not interested on the dipeptides as a final system, but only as a mean to approach protein behaviour from the bottonup. Therefore, not only the error in the dipeptides must be measured, but it must also be estimated how this discrepancy propagates to polypeptide systems.
3 General framework
The general abstract framework behind the investigation presented in this study (and also implicitly behind most of the works found in the literature), may be described as follows:
The objects of study are the model chemistries defined by Pople [63] and discussed in the introduction. The MCs under scrutiny are applied to a particular problem of interest, which may be thought to be formed by three ingredients: the physical system, the relevant observables and the target accuracy. The MCs are then selected according to their ability to yield numerical values of the relevant observables for the physical system studied within the target accuracy. The concrete numerical values that one wants to approach are those given by the exact model chemistry MC, which could be thought to be either the experimental data or the exact solution of the nonrelativistic electronic Schrödinger equation [60]. However, the computational effort needed to perform the calculations required by MC is literally infinite, so that, in practice, one is forced to work with a reference model chemistry MC, which, albeit different from MC, is thought to be close to it. Finally, the set of MCs that one wants to investigate are compared to MC and the nearness to it is seen as approximating the nearness to MC.
These comparisons are commonly performed using a numerical quantity that is a function of the relevant observables. In order for the intuitive ideas about relative proximity in the space to be captured and the above reasoning to be meaningful, this numerical quantity must have some of the properties of a mathematical distance. In particular, it is advisable that the triangle inequality is obeyed, so that, for any model chemistry MC, one has that
(3a)  
(3b) 
and, assuming that is small (and is a positive function), we obtain
(4) 
which is the sought result in agreement with the ideas stated at the beginning of this section.
The distance introduced in ref. 92 and summarized in the previous section, measured in this case on the conformational energy surfaces (the relevant observable) of the model dipeptide HCOLAlaNH (the physical system), approximately fulfills the triangle inequality and thus captures the nearness concept in the space of model chemistries.
This space, , containing all possible MCs, is a rather complex and multidimensional one. For example, two commonly used ‘dimensions’ which may be thought to parameterize are the size of the basis set and the amount of electron correlation in the model (or the quality of the DFT functional used). However, since there are many ways in which the size of a basis set or the electron correlation may be increased and there are additional approximations that can be included in the MC definition (see sec. 1), the ‘dimensions’ of can be considered to be many more than two.
The definition of a distance, such as the one described in the previous lines, for a given problem of interest helps to provide a certain degree of structure into this complex space. In fig. 2 a twodimensional scheme of the overall situation found in this study is presented.
4 Results
MCs  

B3LYP/augccpVTZ  0.079  15.2  159.8  79.09% 
B3LYP/ccpVTZ  0.191  21.1  27.4  9.78% 
B3LYP/augccpVDZ  0.172  82.8  33.7  5.27% 
B3LYP/ccpVDZ  1.045  109.4  0.9  1.29% 
Before starting with the results of the calculations, let us introduce the concept of efficiency of a particular MC that shall be used: It is laxly defined as a balance between accuracy (in terms of the distance introduced in sec. 2) and computational cost (in terms of computer time). It can be graphically extracted from the efficiency plots, where the distance between any given MC and a reference one is shown in units of in the axis, while, in the axis, one can find the computer time taken for each MC (see the following pages for two examples). As a general thumbrule, we shall consider a MC to be more efficient for approximating the reference when it is placed closer to the origin of coordinates in the efficiency plot. This approach is intentionally nonrigorous due to the fact that many factors exist that influence the computer time but may vary from one practical calculation to another; such as the algorithms used, the actual details of the computers (frequency of the processor, size of the RAM and cache memories, system bus and disk access velocity, operating system, mathematical libraries, etc.), the starting guesses for the SCF orbitals or the starting structures in geometry optimizations.
Taking all this into account, the only conclusions that shall be drawn in this work about the relative efficiency of the MCs studied are those deduced from strong signals in the plots and, therefore, those that can be extrapolated to future calculations; in other words, the small details shall be typically neglected.
In the first part of the study, we compare all B3LYP MCs to the one with the largest basis set, B3LYP/ccpVQZ (the highest level of the theory calculated for this work, depicted in fig. 4) using the distance introduced in sec. 2. All mentions to the accuracy of any given MC in this part must be understood as relative to this reference. However, it has been reported that MP2 is a superior method to B3LYP to account for the conformational behaviour of peptide systems [94]. Therefore, the absolute accuracy of the B3LYP MCs calculated here is probably closer to the relative accuracy with respect to the MP2/6311++G(2df,2pd) reference in what follows. In this spirit, this part of the study should be regarded as an investigation of the convergence to the infinite basis set B3LYP limit, for which the best B3LYP MC here is probably a good approximation.

Regarding the convergence to the infinite basis set limit, we observe that only the most expensive MC, B3LYP/augccpVTZ, correctly approximates the reference for peptides of more than 100 residues. On the other hand, for only 5.27% of the computer time taken by the reference MC, we can use B3LYP/augccpVDZ, which correctly approximates it up to 30residue peptides. Finally, the MC with the smallest basis set, B3LYP/ccpVDZ cannot properly replace the reference even in dipeptides.

In ref. [34], using Pople’s basis sets [95, 96, 102, 97, 98, 99, 100, 101], we saw that “the general rule that is sometimes assumed when performing quantum chemical calculations, which states that ‘the more expensive, the more accurate’, is rather coarsegrained and relevant deviations from it may be found.” We recognized that “One may argue that this observation is due to the unsystematic way in which Pople basis sets can be enlarged and that the correlation between accuracy and cost will be much higher if, for example, only Dunning basis sets are used.”, which is definitely observed in fig. 3, but we argued that this was something to be expected, since “there are two few Dunning basis sets below a reasonable upper bound on the number of elements to see anything but a line in the efficiency plot”. In the results presented in this work, we can see that, even if the correlation between accuracy and cost is higher in the case of Dunning’s basis sets than in the case of Pople’s, due to the smaller number of the former, we can still observe that the thumbrule ‘the more expensive, the more accurate’ breaks also in this case, since the B3LYP/augccpVDZ MC is, at the same time, more accurate and less costly than B3LYP/ccpVTZ. In general, this idea applies to all the approximations that a MC may contain (see the introduction for a partial list), and justifies the systematic search for the most efficient combination of them for a given problem. This work is our second step (ref. [34] is the first one) in that path for the particular case of the conformational behaviour of peptide systems.

The observation in the previous point also suggests that it may be efficient to include diffuse functions (the ‘aug’ in augccpVDZ) in the basis set for this type of problems.

The error of the studied MCs regarding the differences of energy (as measured by ) is much smaller than the error in the absolute energies (as measured by ), suggesting that the largest part of the discrepancy must be a systematic one.
In the second part of the study, we assess the absolute accuracy of the B3LYP MCs by comparing them to the (as far as we are aware) highest homolevel in the literature, the MP2/6311++ G(2df,2pd) PES in ref. [34]. If one assumes that this level of the theory may be close enough to the exact result for the given problem at hand, then this comparison measures the ‘absolute’ accuracy of the B3LYP MCs, and not only their relative accuracy with respect to the B3LYP infinite basis set limit, as we did in the previous part. This is the fundamental difference between figs. 3 and 5.
MCs  

B3LYP/ccpVQZ  1.008  457.2  0.98  1861 
B3LYP/augccpVTZ  1.029  442.0  0.94  1472 
B3LYP/ccpVTZ  1.058  436.1  0.89  182 
B3LYP/augccpVDZ  1.006  374.4  0.99  98 
B3LYP/ccpVDZ  1.533  347.8  0.43  24 
The results of this part of the study are depicted in fig. 5, and in table 3. We can extract several conclusions from them:

All B3LYP MCs, including the largest one, B3LYP/ccpVQZ, lie in the inaccurate region of the efficiency plot in fig. 5, meaning that they cannot be reliably used to approximate the MP2/6311++G(2df,2pd) reference even in the smallest dipeptides.

Related with the observations in the previous part of the study, we see that there is no point, if one is worried about absolute accuracy, in going beyond the augccpVDZ basis set in B3LYP.

The B3LYP/ccpVDZ MC again performs significantly worse than the rest, agreeing with the results in the previous part of the study, and suggesting that ccpVDZ may be a too small basis set for the problem tackled here.

Again, the error of the MCs in the differences of energy (as measured by ) is much smaller than the error in the absolute energies (as measured by ).
5 Conclusions
In this study, we have investigated 5 PESs of the model dipeptide HCOLAlaNH, calculated with the B3LYP method, and the ccpVDZ, augccpVDZ, ccpVTZ, augccpVTZ, and ccpVQZ Dunning’s basis sets. We have first assessed the convergence of the B3LYP MCs to the infinite basis set limit, and then we have evaluated their absolute accuracy by comparing them to the (as far as we are aware) highest homolevel in the literature, the MP2/6311++G(2df,2pd) PES in ref. [34]. All the comparisons have been performed according to a general framework which is extensible to further studies, and using a distance between the different PESs that correctly captures the nearness concept in the space of MCs. The calculations performed here have taken around 10 years of computer time.
The main conclusions of the study are the following:

The complexity of the problem (the conformational behaviour of peptides) renders the correlation between accuracy and computational cost of the different quantum mechanical algorithms imperfect. This ultimately justifies the need for systematic studies, such as the one presented here, in which the most efficient MCs are sought for the particular problem of interest.

Assuming that the MP2/6311++G(2df,2pd) level of the theory is closer to the exact solution of the nonrelativistic electronic Schrödinger equation than B3LYP/ccpVQZ, B3LYP is not a reliable method to study the conformational behaviour of peptides. Even if, as we emphasize at the end of this section, it may be dangerous to state that a method that performs well in the particular model of an alanine residue studied here will also be recommendable for longer and more complex peptides, we can clearly reject any method that already fails in HCOLAlaNH.

If B3LYP is still needed to be used, due to, for example, computational constraints, augccpVDZ represents a good compromise between accuracy and cost.

The error of the studied MCs regarding the differences of energy (as measured by ) is much smaller than the error in the absolute energies (as measured by ), suggesting that the largest part of the discrepancy must be a systematic one.
Finally, let us stress again that the investigation performed here have used one of the simplest dipeptides. The fact that we have treated it as an isolated system, the small size of its side chain and also its aliphatic character, all play a role in the results obtained. Hence, for bulkier residues included in polypeptides, and, specially for those that contain aromatic groups, those that are charged or may participate in hydrogenbonds, the methods that have proved to be efficient here must be retested and the conclusions drawn about the B3LYP convergence to the infinite basis set limit, as well as those regarding the comparison between B3LYP and MP2, should be reevaluated.
Acknowledgments
The numerical calculations in this work have been performed thanks to a computer time grant at the Zaragoza node (Caesaraugusta) of the Spanish Supercomputing Network (RES). We thank all the support staff there, for the efficiency at solving the problems encountered. We also thank J. L. Alonso for illuminating discussions.
This work has been supported by the research projects DGA (Aragón Government, Spain) E24/3 and MEC (Spain) FIS200612781C0201. P. Echenique is supported by a MEC (Spain) postdoctoral contract.
References
 [1] C. B. Anfinsen, Principles that govern the folding of protein chains, Science 181 (1973) 223–230.
 [2] V. Daggett and A. R. Fersht, Is there a unifying mechanism for protein folding?, Trends Biochem. Sci. 28 (2003) 18–25.
 [3] P. Echenique, Introduction to protein folding for physicists, Contemp. Phys. 48 (2007) 81–108.
 [4] B. Honig, Protein folding: From the Levinthal paradox to structure prediction, J. Mol. Biol. 293 (1999) 283–293.
 [5] J. Skolnick, Putting the pathway back into protein folding, Proc. Natl. Acad. Sci. USA 102 (2005) 2265–2266.
 [6] M. Beachy, D. Chasman, R. Murphy, T. Halgren, and R. Friesner, Accurate ab initio quantum chemical determination of the relative energetics of peptide conformations and assessment of empirical force fields, J. Am. Chem. Soc. 119 (1997) 5908–5920.
 [7] R. A. DiStasio Jr., Y. Jung, and M. HeadGordon, A ResolutionofTheIdentity implementation of the local TriatomicsInMolecules model for secondorder MøllerPlesset perturbation theory with application to alanine tetrapeptide conformational energies, J. Chem. Theory Comput. 1 (2005) 862–876.
 [8] M. Elstner, K. J. Jalkanen, M. KnappMohammady, T. Frauenheim, and S. Suhai, DFT studies on helix formation in acetyl(Lalanyl)methylamide for =1–20, Chem. Phys. 256 (2001) 15–27.
 [9] R. Hegger, A. Altis, P. Nguyen, and G. Stock, How complex is the dynamics of peptide folding?, Phys. Rev. Lett. 98 (2007) 028102.
 [10] A. Perczel, I. Jákli, and I. G. Csizmadia, Intrinsically stable secondary structure elements of proteins: A comprehensive study of folding units of proteins by computation and by analysis of data determined by Xray crystallography, Chem. Eur. J. 9 (2003) 5332–5342.
 [11] A. Perczel, P. Hudáky, A. K. Füzéry, and I. G. Csizmadia, Stability issues of covalently and noncovalently bonded peptide subunits, J. Comput. Chem. 25 (2004) 1084–1100.
 [12] D. Toroz and T. van Mourik, The structure of the gasphase tyrosineglycine dipeptide, Mol. Phys. 104 (2006) 559–570.
 [13] H. Zhong and H. A. Carlson, Conformational studies of polyprolines, J. Chem. Theory Comput. 2 (2006) 342–353.
 [14] A. G. Császár and A. Perczel, Ab initio characterization of building units in peptides and proteins, Prog. Biophys. Mol. Biol. 71 (1999) 243–309.
 [15] P. Hudáky, I. Jákli, A. G. Császár, and A. Perczel, Peptide models. XXXI. Conformational properties of hydrophobic residues shaping the core of proteins. An ab initio study of NformylLvalinamide and NformylLphenylalaninamide, J. Comput. Chem. 22 (2001) 732–751.
 [16] J. C. P. Koo, G. A. Chass, A. Perczel, Ö. Farkas, L. L. Torday, A. Varro, J. G. Papp, and I. G. Csizmadia, Exploration of the fourdimensionalconformational potential energy hypersurface of NacetylLaspartic acidN’methylamide with its internally hydrogen bonded sidechain orientation, J. Phys. Chem. A 106 (2002) 6999–7009.
 [17] A. Láng, I. G. Csizmadia, and A. Perczel, Peptide models. XLV: Conformational properties of NformylLmethioninamide ant its relevance to methionine in proteins, PROTEINS: Struct. Funct. Bioinf. 58 (2005) 571–588.
 [18] P. Echenique, I. Calvo, and J. L. Alonso, Quantum mechanical calculation of the effects of stiff and rigid constraints in the conformational equilibrium of the Alanine dipeptide, J. Comput. Chem. 27 (2006) 1748–1755.
 [19] M. Elstner, K. J. Jalkanen, M. KnappMohammady, and S. Suhai, Energetics and structure of glycine and alanine based model peptides: Approximate SCCDFTB, AM1 and PM3 methods in comparison with DFT, HF and MP2 calculations, Chem. Phys. 263 (2001) 203–219.
 [20] G. Endrédi, A. Perczel, O. Farkas, M. A. McAllister, G. I. Csonka, J. Ladik, and I. G. Csizmadia, Peptide models XV. The effect of basis set size increase an electron correlation on selected minima of the ab initio 2DRamachandran map of ForGlyNH and ForLAlaNH, J. Mol. Struct. (Theochem) 391 (1997) 1526.
 [21] R. F. Frey, J. Coffin, S. Q. Newton, M. Ramek, V. K. W. Cheng, F. A. Momany, and L. Schäfer, Importance of correlationgradient geometry optimization for molecular conformational analyses, J. Am. Chem. Soc. 114 (1992) 5369–5377.
 [22] I. R. Gould, W. D. Cornell, and I. H. Hillier, A quantum mechanical investigation of the conformational energetics of the alanine and glycine dipeptides in the gas phase and in aqueous solution, J. Am. Chem. Soc. 116 (1994) 9250–9256.
 [23] T. HeadGordon, M. HeadGordon, M. J. Frisch, C. Brooks III, and J. Pople, A theoretical study of alanine dipeptide and analogs, Intl. J. Quant. Chem. 16 (1989) 311322.
 [24] T. HeadGordon, M. HeadGordon, M. J. Frisch, C. L. Brooks III, and J. A. Pople, Theoretical study of blocked glycine and alanine peptide analogues, J. Am. Chem. Soc. 113 (1991) 5989–5997.
 [25] M. Iwaoka, M. Okada, and S. Tomoda, Solvent effects on the potential surfaces of glycine and alanine dipeptides studied by PCM and IPCM methods, J. Mol. Struct. (Theochem) 586 (2002) 111–124.
 [26] M. Mezei, P. K. Mehrotra, and D. L. Beveridge, Monte Carlo determination of the free energy and internal energy of hydration for the Ala dipeptide at 25C, J. Am. Chem. Soc. 107 (1985) 2239–2245.
 [27] A. Perczel, J. G. Angyán, M. Kajtar, W. Viviani, J.L. Rivail, J.F. Marcoccia, and I. G. Csizmadia, Peptide models. 1. Topology of selected peptide conformational potential energy surfaces (glycine and alanine derivatives), J. Am. Chem. Soc. 113 (1991) 62566265.
 [28] A. Perczel, Ö. Farkas, I. Jákli, I. A. Topol, and I. G. Csizmadia, Peptide models. XXXIII. Extrapolation of lowlevel HartreeFock data of peptide conformation to large basis set SCF, MP2, DFT and CCSD(T) results. The Ramachandran surface of alanine dipeptide computed at various levels of theory, J. Comput. Chem. 24 (2003) 1026–1042.
 [29] A. M. Rodríguez, H. A. Baldoni, F. Suvire, R. Nieto Vázquez, G. Zamarbide, R. D. Enriz, Ö. Farkas, A. Perczel, M. A. McAllister, L. L. Torday, J. G. Papp, and I. G. Csizmadia, Characteristics of Ramachandran maps of Lalanine diamides as computed by various molecular mechanics, semiempirical and ab initio MO methods. A search for primary standard of peptide conformational stability, J. Mol. Struct. (Theochem) 455 (1998) 275–301.
 [30] P. J. Rossky and M. Karplus, Solvation. A molecular dynamics study of a dipeptide in water, J. Am. Chem. Soc. 101 (1979) 1913.
 [31] R. Vargas, J. Garza, B. P. Hay, and D. A. Dixon, Conformational study of the alanine dipeptide at the MP2 and DFT levels, J. Phys. Chem. A 106 (2002) 3213–3218.
 [32] Z.X. Wang and Y. Duan, Solvation effects on alanine dipeptide: A MP2/ccpVTZ//MP2/631G** study of () energy maps and conformers in the gas phase, ether and water, J. Comput. Chem. 25 (2004) 1699–1716.
 [33] C.H. Yu, M. A. Norman, L. Schäfer, M. Ramek, A. Peeters, and C. van Alsenoy, Ab initio conformational analysis of Nformyl Lalanine amide including electron correlation, J. Mol. Struct. 567–568 (2001) 361–374.
 [34] P. Echenique and J. L. Alonso, Efficient model chemistries for peptides. I. General framework and a study of the heterolevel approximation in RHF and MP2 with Pople splitvalence basis sets, J. Comput. Chem. 29 (2008) 1408–1422.
 [35] J. W. Ponder and D. A. Case, Force fields for protein simulations, Adv. Prot. Chem. 66 (2003) 27–85.
 [36] A. D. MacKerell Jr., B. Brooks, C. L. Brooks III, L. Nilsson, B. Roux, Y. Won, and M. Karplus, CHARMM: The energy function and its parameterization with an overview of the program, in The Encyclopedia of Computational Chemistry, edited by P. v. R. Schleyer, P. R. Schreiner, N. L. Allinger, T. Clark, J. Gasteiger, P. Kollman, and H. F. Schaefer III, pp. 217–277, John Wiley & Sons, Chichester, 1998.
 [37] B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus, CHARMM: A program for macromolecular energy, minimization, and dynamics calculations, J. Comput. Chem. 4 (1983) 187–217.
 [38] W. F. Van Gunsteren and M. Karplus, Effects of constraints on the dynamics of macromolecules, Macromolecules 15 (1982) 1528–1544.
 [39] W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, J. Merz, K. M., D. M. Ferguson, D. C. Spellmeyer, T. Fox, J. W. Caldwell, and P. A. Kollman, A second generation force field for the simulation of proteins, nucleic acids, and organic molecules, J. Am. Chem. Soc. 117 (1995) 5179–5197.
 [40] D. A. Pearlman, D. A. Case, J. W. Caldwell, W. R. Ross, T. E. Cheatham III, S. DeBolt, D. Ferguson, G. Seibel, and P. Kollman, AMBER, a computer program for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to elucidate the structures and energies of molecules, Comp. Phys. Commun. 91 (1995) 1–41.
 [41] W. L. Jorgensen and J. TiradoRives, The OPLS potential functions for proteins. Energy minimization for crystals of cyclic peptides and Crambin, J. Am. Chem. Soc. 110 (1988) 1657–1666.
 [42] W. L. Jorgensen, D. S. Maxwell, and J. TiradoRives, Development and testing of the OPLS allatom force field on conformational energetics and properties of organic liquids, J. Am. Chem. Soc. 118 (1996) 11225–11236.
 [43] T. A. Halgren, Merck Molecular Force Field. I. Basis, form, scope, parametrization, and performance of MMFF94, J. Comput. Chem. 17 (1996) 490–519.
 [44] A. R. MacKerell Jr., M. Feig, and C. L. Brooks III, Extending the treatment of backbone energetics in protein force fields: Limitations of gasphase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations, J. Comput. Chem. 25 (2004) 1400–1415.
 [45] A. R. MacKerell Jr., M. Feig, and C. L. Brooks III, Improved treatment of the protein backbone in empirical force fields, J. Am. Chem. Soc. 126 (2004) 698–699.
 [46] Y. K. Kang and H. S. Park, Comparative conformational study of of NacetylLN’methylprolineamide with different basis sets, J. Mol. Struct. (Theochem) 593 (2002) 55–64.
 [47] G. A. Kaminski, R. A. Friesner, J. TiradoRives, and W. L. Jorgensen, Evaluation and reparametrization of the OPLSAA force field for proteins via comparison with accurate quantum chemical calculations on peptides, J. Phys. Chem. B 105 (2001) 6476–6487.
 [48] T. Wang and R. Wade, Force field effects on a sheet protein domain structure in thermal unfolding simulations, J. Chem. Theory Comput. 2 (2006) 140–148.
 [49] C. D. Snow, E. J. Sorin, Y. M. Rhee, and V. S. Pande, How well can simulation predict protein folding kinetics and thermodynamics?, Annu. Rev. Biophys. Biomol. Struct. 34 (2005) 43–69.
 [50] O. SchuelerFurman, C. Wang, P. Bradley, K. Misura, and D. Baker, Progress in modeling of protein structures and interactions, Science 310 (2005) 638–642.
 [51] K. Ginalski, N. V. Grishin, A. Godzik, and L. Rychlewski, Practical lessons from protein structure prediction, Nucleic Acids Research 33 (2005) 1874–1891.
 [52] A. V. Morozov, T. Kortemme, K. Tsemekhman, and D. Baker, Close agreement between the orientation dependence of hydrogen bonds observed in protein structures and quantum mechanical calculations, Proc. Natl. Acad. Sci. USA 101 (2004) 6946–6951.
 [53] C. GómezMoreno Calera and J. Sancho Sanz, editors, Estructura de Proteínas, Ariel ciencia, Barcelona, 2003.
 [54] M. Karplus and J. A. McCammon, Molecular dynamics simulations of biomolecules, Nat. Struct. Biol. 9 (2002) 646–652.
 [55] R. Bonneau and D. Baker, Ab initio protein structure prediction: Progress and prospects, Annu. Rev. Biophys. Biomol. Struct. 30 (2001) 173–189.
 [56] C. J. Cramer, Essentials of Computational Chemistry: Theories and Models, John Wiley & Sons, Chichester, 2nd edition, 2002.
 [57] F. Jensen, Introduction to Computational Chemistry, John Wiley & Sons, Chichester, 1998.
 [58] A. Szabo and N. S. Ostlund, Modern Quantum Chemistry: Introduced to Advanced Electronic Structure Theory, Dover Publications, New York, 1996.
 [59] Y. Shao, L. F. Molnar, Y. Jung, J. Kussmann, C. Ochsenfeld, S. T. Brown, A. T. B. Gilbert, L. V. Slipchenko, S. V. Levchenko, D. P. Oneill, R. A. Distasio, R. C. Lochan, T. Wang, G. J. O. Beran, N. A. Besley, J. M. Herbert, C. Y. Lin, T. Van Voorhis, S. H. Chien, A. Sodt, R. P. Steele, V. A. Rassolov, P. E. Maslen, P. P. Korambath, R. D. Adamson, B. Austin, J. Baker, E. F. C. Byrd, H. Dachsel, R. J. Doerksen, A. Dreuw, B. D. Dunietz, A. D. Dutoi, T. R. Furlani, S. R. Gwaltney, A. Heyden, S. Hirata, C.P. Hsu, G. Kedziora, R. Z. Khalliulin, P. Klunzinger, A. M. Lee, M. S. Lee, W. Liang, I. Lotan, N. Nair, B. Peters, E. I. Proynov, P. A. Pieniazek, Y. M. Rhee, J. Ritchie, E. Rosta, D. C. Sherrill, A. C. Simmonett, J. E. Subotnik, L. H. Woodcock, W. Zhang, A. T. Bell, and A. K. Chakraborty, Advances in methods and algorithms in a modern quantum chemistry program package, Phys. Chem. Chem. Phys. 8 (2006) 3172–3191.
 [60] P. Echenique and J. L. Alonso, A mathematical and computational review of HartreeFock SCF methods in Quantum Chemistry, Mol. Phys. 105 (2007) 3057–3098.
 [61] P. Maurer, A. Laio, H. W. Hugosson, M. C. Colombo, and U. Rothlisberger, Automated parametrization of biomolecular force fields from Quantum Mechanics/Molecular Mechanics (QM/MM) simulations through force matching, J. Chem. Theory Comput. 3 (2007) 628–639.
 [62] Y. A. Arnautova, A. Jagielska, and H. A. Scheraga, New force field (ECEPP05) for peptides, proteins and organic molecules, J. Phys. Chem. B 110 (2006) 5025–5044.
 [63] J. A. Pople, Nobel lecture: Quantum chemical models, Rev. Mod. Phys. 71 (1999) 1267–1274.
 [64] J. M. García de la Vega and B. Miguel, Basis sets for computational chemistry, in Introduction to Advanced Topics of Computational Chemistry, edited by L. A. Montero, L. A. Díaz, and R. Bader, chapter 3, pp. 41–80, Editorial de la Universidad de la Habana, 2003.
 [65] T. Helgaker and P. R. Taylor, Gaussian basis sets and molecular integrals, in Modern Electronic Structure Theory. Part II, edited by D. R. Yarkony, pp. 725–856, World Scientific, Singapore, 1995.
 [66] P. Jurečka, J. Šponer, J. Černý, and P. Hobza, Benchmark database of accurate (MP2 and CCSD(T) complete basis set limit) interaction energies of small model complexes, DNA base pairs, and amino acid pairs, Phys. Chem. Chem. Phys. 8 (2006) 1985–1993.
 [67] G. A. Petersson, D. K. Malick, M. J. Frisch, and M. Braunstein, The convergence of complete active space selfconsistentfield energies to the complete basis set limit, J. Chem. Phys. 123 (2005) 074111.
 [68] F. Jensen, Estimating the HartreeFock limit from finite basis set calculations, Theo. Chem. Acc. 113 (2005) 267–273.
 [69] Z.H. Li and M. W. Wong, Scaling of correlation basis set extension energies, Chem. Phys. Lett. 337 (2001) 209–216.
 [70] M. R. Nyden and G. A. Petersson, Complete basis set correlation energies. I. The assymptotic convergence of pair natural orbital expansions, J. Chem. Phys. 75 (1981) 1843–1862.
 [71] P. Jurečka and P. Hobza, On the convergence of the term for complexes with multiple Hbonds, Chem. Phys. Lett. 365 (2002) 89–94.
 [72] E. W. Ignacio and H. B. Schlegel, On the additivity of basis set effects in some simple fluorine containing systems, J. Comput. Chem. 12 (1991) 751–760.
 [73] J. S. Dewar and A. J. Holder, On the validity of polarization and correlation additivity in ab initio molecular orbital calculations, J. Comput. Chem. 3 (1989) 311–313.
 [74] R. H. Nobes, W. J. Bouma, and L. Radom, The additivity of polarization function and electron correlation effects in ab initio molecularorbital calculations, Chem. Phys. Lett. 89 (1982) 497–500.
 [75] J. A. Pople, M. J. Frisch, B. T. Luke, and J. S. Binkley, A MollerPlesset study of the energies of AH molecules (A = Li to F), Intl. J. Quant. Chem. 17 (1983) 307–320.
 [76] R. CrespoOtero, L. A. Montero, W.D. Stohrer, and J. M. García de la Vega, Basis set superposition error in MP2 and densityfunctional theory: A case of methanenitric oxide association, J. Chem. Phys. 123 (2005) 134107.
 [77] M. L. Senent and S. Wilson, Intramolecular basis set superposition errors, Intl. J. Quant. Chem. 82 (2001) 282–292.
 [78] I. Mayer and P. Valiron, Second order MøllerPlesset perturbation theory without basis set superposition error, J. Chem. Phys. 109 (1998) 3360–3373.
 [79] F. Jensen, The magnitude of intramolecular basis set superposition error, Chem. Phys. Lett. 261 (1996) 633–636.
 [80] I. Mayer, On the nonadditivity of the basis set superposition error and how to prevent its appearance, Theo. Chem. Acc. 72 (1987) 207–210.
 [81] S. F. Boys and F. Bernardi, The calculation of small molecular interactions by the differences of separate total energies. Some procedures with reduced errors, Mol. Phys. 19 (1970) 553–566.
 [82] H. B. Jansen and P. Ros, Nonempirical molecular orbital calculations on the protonation of carbon monoxide, Chem. Phys. Lett. 3 (1969) 140–143.
 [83] P. J. Stephens, F. J. Devlin, C. F. Chabalowski, and M. J. Frisch, Ab initio calculation of vibrational absorption and circular dichroism spectra using density functional force fields, Journal of Physical Chemistry A 98 (1994) 11623–11627.
 [84] A. D. Becke, Densityfunctional thermochemistry. III. The role of exact exchange, J. Chem. Phys. 98 (1993) 5648.
 [85] C. Lee, W. Yang, and R. G. Parr, Development of the ColleSalvetti correlationenergy formula into a functional of the electron density, Phys. Rev. B 37 (1988) 785–789.
 [86] S. H. Vosko, L. Wilk, and M. Nusair, Accurate spindependent electron liquid correlation energies for local spin density calculations: a critical analysis, Can. J. Phys. 58 (1980) 12001211.
 [87] T. H. Dunning Jr., Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen, J. Chem. Phys. 90 (1989) 1007–1023.
 [88] R. A. Kendall, T. H. Dunning Jr., and R. J. Harrison, Electron affinities of the firstrow atoms revisited. Systematic basis sets and wave functions, J. Chem. Phys. 96 (1992) 6796–6806.
 [89] M. W. Gordon, M. S. ans Schmidt, Advances in electronic structure theory: GAMESS a decade later, in Theory and Applications of Computational Chemistry: The first forty years, edited by C. E. Dykstra, G. Frenking, K. S. Kim, and Scuseria, pp. 1167–1189, Elsevier, Amsterdam, 2005.
 [90] M. W. Schmidt, K. K. Baldridge, J. A. Boatz, S. T. Elbert, M. S. Gordon, H. J. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. Su, T. L. Windus, M. Dupuis, and J. A. Montgomery, General Atomic and Molecular Electronic Structure System, J. Comput. Chem. 14 (1993) 1347–1363.
 [91] P. Echenique and J. L. Alonso, Definition of Systematic, Approximately Separable and Modular Internal Coordinates (SASMIC) for macromolecular simulation, J. Comput. Chem. 27 (2006) 1076–1087.
 [92] J. L. Alonso and P. Echenique, A physically meaningful method for the comparison of potential energy functions, J. Comput. Chem. 27 (2006) 238–252.
 [93] P. Echenique, A note on the accuracy of free energy functions in protein folding: Propagation of errors from dipeptides to polypeptides, In progress, 2008.
 [94] J. Kaminský and F. Jensen, Force field modelling of amino acid conformational energies, J. Chem. Theory Comput. 3 (2007) 1774–1788.
 [95] R. Ditchfield, W. J. Hehre, and J. A. Pople, Selfconsistent molecularorbital methods. IX. An extended Gaussiantype basis for molecularorbital studies of organic molecules, J. Chem. Phys. 54 (1971) 724–728.
 [96] W. J. Hehre, R. Ditchfield, and J. A. Pople, Selfconsistent molecularorbital methods. XII. Further extensions of Gaussiantype basis sets for use in molecularorbital studies of organic molecules, J. Chem. Phys. 56 (1972) 2257–2261.
 [97] M. J. Frisch, J. A. Pople, and J. S. Binkley, Selfconsistent molecularorbital methods. 25. Supplementary functions for Gaussian basis sets, J. Chem. Phys. 80 (1984) 3265–3269.
 [98] R. Krishnan, J. S. Binkley, R. Seeger, and J. A. Pople, Selfconsistent molecularorbital methods. XX. A basis set for correlated wave functions, J. Chem. Phys. 72 (1980) 650–654.
 [99] J. S. Binkley, J. A. Pople, and W. J. Hehre, Selfconsistent molecularorbital methods. 21. Small splitvalence basis sets for firstrow elements, J. Am. Chem. Soc. 102 (1980) 939–947.
 [100] G. W. Spitznagel, T. Clark, J. Chandrasekhar, and P. v. R. Schleyer, Stabilization of methyl anions by first row substituents. The superiority of diffuse functionaugmented basis sets for anion calculations, J. Comput. Chem. 3 (1982) 363–371.
 [101] T. Clark, J. Chandrasekhar, G. W. Spitznagel, and P. v. R. Schleyer, Efficient diffuse functionaugmented basis sets for anion calculations. III. The 321+G basis set for firstrow elements Li–F, J. Comput. Chem. 4 (1983) 294–301.
 [102] P. C. Hariharan and J. A. Pople, The influence of polarization functions on molecular orbital hydrogenation energies, Theor. Chim. Acta 28 (1973) 213–222.