Revised selfconsistent continuum solvation in electronicstructure calculations
Abstract
The solvation model proposed by Fattebert and Gygi fattebert_jcomputchem_2002 () and Scherlis et al. scherlis_jcp_2006 () is reformulated, overcoming some of the numerical limitations encountered and extending its range of applicability. We first recast the problem in terms of induced polarization charges that act as a direct mapping of the selfconsistent continuum dielectric; this allows to define a functional form for the dielectric that is well behaved both in the highdensity region of the nuclear charges and in the lowdensity region where the electronic wavefunctions decay into the solvent. Second, we outline an iterative procedure to solve the Poisson equation for the quantum fragment embedded in the solvent that does not require multigrid algorithms, is trivially parallel, and can be applied to any Bravais crystallographic system. Last, we capture some of the nonelectrostatic or cavitation terms via a combined use of the quantum volume and quantum surface cococcioni_prl_2005 () of the solute. The resulting selfconsistent continuum solvation (SCCS) model provides a very effective and compact fit of computational and experimental data, whereby the static dielectric constant of the solvent and one parameter allow to fit the electrostatic energy provided by the PCM model with a mean absolute error of 0.3 kcal/mol on a set of 240 neutral solutes. Two parameters allow to fit experimental solvation energies on the same set with a mean absolute error of 1.3 kcal/mol. A detailed analysis of these results, broken down along different classes of chemical compounds, shows that several classes of organic compounds display very high accuracy, with solvation energies in error of 0.30.4 kcal/mol, whereby larger discrepancies are mostly limited to selfdissociating species and strong hydrogenbond forming compounds.
I introduction
Continuum solvation models have proved to be very effective in capturing the complexity of a solvent in an implicit fashion tomasi_chemrev_1994 (); cramer_chemrev_1999 (); orozco_chemrev_2000 (); tomasi_chemrev_2005 (). Even if one could argue that an explicit description of the solvent would be a more faithful representation of the real system, an explicit solvent would require extensive molecular dynamics (MD) simulations in order to obtain meaningful thermodynamic averages. In addition, even if these were computationally feasible, an abinitio densityfunctional theory (DFT) description of the solvent would rarely provide the correct results, due to the limitations in the accuracy of the functionals adopted. Critical limitations in the representation of the structure of liquid water with DFT are well known in the literature laasonen_jcp_1993 (); sprik_jcp_1996 (); grossman_jcp_2004 (); vandevondele_jcp_2005 (); sit_jcp_2005 (), and can be related to the lack of a proper description of van der Waals interactions and hydrogen bonds in standard DFT. Moreover, given the presence of hydrogen atoms in liquid water, neglecting the quantum motion of the nuclei of the system could significantly affect the computed physical properties of the system stern_jcp_2001 (); chen_prl_2003 (); morrone_prl_2008 (). The above limitations in the DFT description of liquid water translate into phase diagrams that differ from the experimental one, and melting temperatures that are much larger than the experimental temperature sit_jcp_2005 (); vandevondele_jcp_2005 (). Being unable to accurately describe the structure and the dynamics of liquid water, it is questionable whether the most popular DFT functionals could correctly reproduce its dielectric behavior. As a typical dipolar liquid, the dielectric properties of liquid water derive from the electric dipole moment carried by the individual molecules. Even though much effort has been put in the characterization of the dipole moment of water in its liquid state, a consensus on the subject is still lacking silvestrelli_prl_1999 (); pasquarello_prb_2003 (); sharma_prl_2007 (). Moreover, it was shown in the literature sharma_prl_2007 () that the dielectric response of water is dominated by the short range effects of the hydrogen bond environment, thus implying that the lack of accuracy in the description of intermolecular interactions in liquid water could lead to a poor characterization of its dielectric properties.
Many continuum solvation models have been proposed and widely developed tomasi_chemrev_1994 (); cramer_chemrev_1999 (); orozco_chemrev_2000 (); tomasi_chemrev_2005 (), especially in the chemistry literature, since the earliest work of Onsager onsager_jacs_1936 (). Within these many approaches, one of the most popular is the Polarizable Continuum Model (PCM) of Tomasi and coworkers tomasi_chemrev_1994 (); tomasi_chemrev_2005 () that, in its latest formulation in terms of Integral Equations (IEF) mennucci_jpcb_1997 (); cances_jmathchem_1998 (), encompasses a wide range of similar methods (e.g. the COSMO approach klamt_jchemsoc_1993 ()). Being mostly linked to the chemistry community, PCM has not been used in condensed matter and solid state simulations. In particular, the possibility to deal with metallic systems within PCM was introduced only later in an implicit way corni_jcp_2001 (), and the algorithm was not really developed to be interfaced with periodic systems and abinitio MD simulations.
In an effort to extend solvation methods to planewave, periodicboundary codes and abinitio MD, Fattebert and Gygi proposed a new model of continuum solvation fattebert_jcomputchem_2002 (); fattebert_intjqchem_2003 (), where the dielectric is defined as a smooth selfconsistent function of the electronic density of the solute. This model was further extended by Scherlis et al. scherlis_jcp_2006 () to include the calculation of the cavitation energy, by defining it in terms of the quantum surface of the solute cococcioni_prl_2005 (). Despite the simple and elegant formulation of the model, some illconditioning of the problem in its original formulation led to abandoning or relaxing selfconsistency dziedzic_epl_2011 (): models in which the dielectric is defined in terms of a fictitious, atomcentered electronic density have been proposed sanchez_jcp_2009 (), together with approaches were the electronic density of the solute precomputed in vacuum is adopted to define the continuum solvent. In addition, all of the reported implementations and derivations of the Fattebert and Gygi method (e.g. Refs. scherlis_jcp_2006 (); dziedzic_epl_2011 ()) rely on multigrid solvers, which entail highorder discretization of the Laplacian operator and that typically require Cartesian geometries and serial implementations.
In the present work, starting from the model of Fattebert and Gygi, a revised selfconsistent continuum solvation (SCCS) model is derived, in which the challenges outlined above are tackled as follows. First, along the lines of PCM tomasi_chemrev_2005 (), the method is reimplemented in terms of polarization charges and solved using an iterative approach that is intrinsically parallel, extendable to any kind of Bravais lattice, and straightforwardly transferable into any planewave code. Second, some of the numerical instabilities of the original formulation fattebert_jcomputchem_2002 (); fattebert_intjqchem_2003 () are solved by properly redefining the relation between the dielectric and the electronic density of the solute. Third, the model features a revised version of the extension of Scherlis et al. scherlis_jcp_2006 (); cococcioni_prl_2005 () to treat the cavitation contribution to solvation free energies, where the concepts of quantum surfaces and quantum volumes cococcioni_prl_2005 () have been extended to model dispersion and repulsion effects in a simplified way.
The accuracy of the proposed method has been tested on a reference sample of 240 neutral solutes in water. Its ability to accurately reproduce with only one fitting parameter PCM results provides a strong validation of the method and allows its extension to reproduce experimental solvation energies. The resulting agreement with experiments has a mean absolute error of 1.3 kcal/mol when two fitting parameters are used. Moreover, an analysis of the error as a function of the functional groups of the solutes provides some critical insights into potential limitations of the model. In particular, results for several classes of organic compounds present very high accuracies (with errors on the order of 0.30.4 kcal/mol), and discrepancies are mostly limited to chemical compounds that undergo dissociation in water, such as carboxylic acids and amines. Significant errors are also found for some strong hydrogenbond forming species, such as ethers, alcohols and fluorinated compounds. These findings suggest that an extension of the model that takes into account selfdissociation or a shell of explicit water molecules around the solute could provide results in quantitative agreement with experiment.
The paper is organized as follows. In Section II the main features of previous continuum solvation methods are reviewed. In Section III the basic equations of the electrostatic part of the proposed method are presented. In Section IV some limitations of previous models are discussed and a new definition of the dielectric function that solves most of these limitations is presented. In Section V a new numerical approach is discussed, based on an iterative solution of the dielectric problem, alternative of and simpler than multigrid solvers. In Section VI, complementary nonelectrostatic contributions to solvation are introduced. Eventually, in Section VII parametrizations of the model and a comparison with theoretical results from similar well assessed techniques are reported.
Ii Previous approaches
The most widespread continuum solvation method is the Polarizable Continuum Model by Tomasi and coworkers tomasi_chemrev_2005 (), that in its formulation in terms of integral equations mennucci_jpcb_1997 (); cances_jmathchem_1998 () represents a most general approach to continuum solvation. The basic physical picture behind the model is one of a solute contained in an adhoc cavity surrounded by a continuous polarizable dielectric, whose response to the solute charge distribution is fully characterized by the value of its static dielectric constant . In this model, the transition between the vacuum region inside the solute cavity and the surrounding dielectric continuum is sharp and discontinuous. This discontinuity allows to treat the effect of the surrounding environment on the solute through introducing a polarization charge density that is exactly localized at the vacuumdielectric interface. In addition to homogeneous isotropic dielectrics, a suitable surface charge density can be defined via integral equations also in complex embedding environments, that include multiple interfaces and metallic systems. Thus, by representing the response of the environment in terms of an effective surface polarization density, IEFPCM reduces a threedimensional problem into a twodimensional one. Numerically, IEFPCM adopts a Boundary Element Method (BEM) to discretize the surface of the molecular cavity and the operators defined on the surface domain pascualahuir_jcomputchem_1990 (); silla_jmolgraph_1990 (). The final ingredient is the definition of the molecular cavity: in this respect, different choices have been adopted scalmani_jcp_2010 (), the most popular one being a rigid cavity built as the superposition of atomcentered spheres with fixed radii, corresponding to empirical van der Waals atomic radii multiplied by a solventdependent scaling factor. This choice allows to have a regular discretization of the cavity surface that helps numerical convergence (see Refs.scalmani_jcp_2010 (); lipparini_jcp_2010 () and references therein). On the other hand, the numerical discretization of the cavity surfaces has lead, in the original formulations, to atomic forces that were not continuous with respect to atomic positions and suffered from numerical singularities. For this reason it has been difficult to extend PCM to abinitio MD simulations. In order to solve such a problem, modified versions of continuum models have been proposed in the literaturesenn_jcp_2003 (); lange_jcp_2010 (). Moreover, it is worth noting that more advanced definitions of the molecular cavity have been proposed in terms of an isodensity of the electronic density of the solute. Isosurfaces of both the frozen electronic density of the solute (Isodensity PCM foresman_jpc_1996 ()) and the selfconsistent density (SCIPCM wiberg_jpc_1995 ()) have been considered. Nonetheless, few applications of the above methods have been reported in the literature, probably because of the absence of analytic gradients.
In order to extend continuum solvation to abinitio MD simulations, a novel approach has been proposed by Fattebert and Gygi fattebert_jcomputchem_2002 (); fattebert_intjqchem_2003 (). The main difference with respect to PCM is in the definition of a dielectric function , defined in terms of the electronic density of the solute, and smoothly varying between 1, when is large, to when . The physical problem is expressed in terms of the electrostatic response generated by the embedding dielectric and acting on the solute. This response field, which is defined in the whole threedimensional space, is then obtained numerically by using a multigrid solver. The method of Fattebert and Gygi requires  as it is argued here  a careful choice for the dielectric function in terms of the electronic density. Moreover, its implementation in planewave (PW) abinitio codes has been hindered by the necessity to interface such codes with an efficient, highorder, ideally parallel multigrid solver able to work in the arbitrary geometry of any Bravais lattice.
Iii Present model
We show now that, starting from the basic equations of Fattebert and Gygi fattebert_jcomputchem_2002 (); fattebert_intjqchem_2003 (), it is possible to recast the electrostatic problem in terms of a polarization density, similarly to what is done in PCM. The key assumption is that of a dielectric medium selfconsistently modeled on the electronic density of the solute, via a suitably defined relationship
(1) 
such that the dielectric is excluded () from the inner part of the solute, where the electronic density is high, while it smoothly goes to the bulk dielectric constant of the solvent () outside the solute, where the electronic density goes to zero. By adding a dielectric medium to the system, the electrostatic field that enters into the quantummechanical problem is no longer given by the Poisson equation in vacuum
(2) 
where the total charge density in vacuum is the sum of the electronic and ionic densities
(3) 
but has to be the solution of the more complex Poisson equation
(4) 
that is nothing but the standard Maxwell equation
(5) 
for the displacement field
(6) 
where and are the electric field and the polarization.
Eq. (5) can be transformed into an equation in terms of the electric field
(7) 
and, by mapping the effect of the dielectric into a polarization charge density
(8) 
a vacuumlike Poisson problem is recovered
(9) 
As in PCM, the nonlinear nature of the problem that arises from the mutual polarization of the solvent and solute is accounted for via a polarization charge density that depends on itself through the total electrostatic potential.
By evaluating the gradient on the right hand side of Eq. (8) and performing some simple algebraic manipulations, the polarization charge density can be expressed alternatively as the sum of two distinct terms:
(10) 
As in the case of PCM, introducing polarization charges allows to deal with localized physical quantities. Indeed, both terms in the above equations are only defined in a narrow region around the solute. In particular, the first term contains all the nonlinear character of the problem, and is confined within the vacuumsolvent interface, where the dielectric function has a gradient different from zero. The second term, instead, is a simple rescaling of the solute charge density that extends into the vacuum region (see Ref. chipman_jcp_2006 () and references therein): although in principle defined in the whole threedimensional space, this term is also localized around the solute owing to the exponential decay of the electronic charge density in the dielectric region.
It should be noted that formulating the problem in terms of the polarization charge density instead of the solvent electrostatic potential allows  thanks to the Gauss theorem  to define a sum rule for the total polarization charge surrounding the solute, namely,
(11) 
Eventually, from the linearity of the Poisson equation Eq. (9), it follows that the total field can also be written as
(12) 
where the or fields are solutions of vacuumlike Poisson equations in terms of the or charge densities. Such a formal separation of the potential is useful in order to express all of the important quantities that enter into quantummechanical simulations (total energy, potentials, forces, etc.) in terms of two contributions: one explicitly depending on the solute charge density alone (identified, now and in the following, by the superscript ) and one explicitly depending on the dielectric (identified by the superscript ). The first term is analogous to a quantity computed in vacuum and it is readily provided by the simulation code adopted. On the other hand, the polarization contributions need to be explicitly added, in order to include the effect of the dielectric in the simulation. Although the method can be applied to different quantummechanical approaches, in the present article we will limit our discussion to the case of Density Functional Theory (DFT) in the KohnSham formulation. The derivation of the polarization contributions to the electrostatic energy, KohnSham energy, KohnSham potential, and atomic forces is presented below.
iii.0.1 Electrostatic Energy
Once the Poisson equation is solved and the polarization of the dielectric is known, the electrostatic energy of the system can be expressed as
(13) 
By integrating by parts the last expression, the electrostatic energy can be expressed as
(14) 
The same expression can be obtained starting from the vacuumlike problem in Eq. (9), by expressing the electrostatic energy as the sum of the total interaction energy of the charge densities of the system, both the solute and the polarization one,
(15) 
with the addition of the work done to polarize the dielectric. Such a work, assuming a linear behavior for the dielectric, can be shown to be
(16) 
that, combined with Eq.(15), correctly provides the result in Eq. (14).
Adopting the decomposition of the potential introduced in Eq. (12), the electrostatic energy of the system can be further written as
(17) 
where the first term is the electrostatic energy of the solute, including both electrons and ions,
(18) 
and is the analogous of the electrostatic energy of the system in vacuum. Now, by exploiting the fact that
(19) 
the polarization term can be included in two equivalent ways: following PCM, as the integral of the polarization charge density times the potential of the solute charge densities acting on it, or, viceversa, as the integral of the solute charge density times the polarization potential acting on it
(20)  
(21) 
iii.0.2 KohnSham Energy and Potential
From the above equations, the total DFT energy of the system can be expressed as
(22) 
where, in the last equality, we underline the fact that the quantity
(23) 
does not depend explicitly on the polarization charge distribution and, thus, is analogous to the total KohnSham energy of a system in vacuum.
The effective DFT potential acting on the electrons can be written as (see, for example, Appendix A of Ref sanchez_jcp_2009 ()),
(24) 
where the first term on the righthand side includes all of the contributions to the potential that do not depend explicitly on the polarization charge density and is analogous to the potential of a system in vacuum. As for the two additional terms, the first one is a purely local potential contribution and is simply given by the electrostatic potential generated by the polarization charge density
(25) 
while the second term arises from the dependence of the dielectric on the electronic charge, as defined in Eq. (1), and is given by
(26) 
iii.0.3 Forces
Exploiting the HelmannFeynman theorem, the force acting on an atom of the solute is given by
(27) 
where is the position of atom and we exploit the fact that the electronic kinetic energy and exchangecorrelation potentials do not depend explicitly on nuclear positions. By using the expression for the electrostatic energy in Eq. (13), the force is given by
(28) 
Provided that the dielectric permittivity is not an explicit function of the atomic positions of the solute, as is the case of the definition adopted in Eq. (1), the first contribution in the above equation is equal to zero. The second contribution, instead, can be expressed as
(29) 
By applying the partial differential to the first formulation of the electrostatic energy in Eq. (13), an alternative equation for the force is obtained, namely
(30) 
which, combined with Eq. (29), provides the useful relation
(31) 
Similarly to the expressions for the electrostatic and the KohnSham energies in Eqs. (17) and (22), also the interatomic forces can be defined as sums of two terms
(32)  
(33) 
The first contribution does not depend explicitly on the polarization charge density and is analogous to the interatomic force on a system in vacuum. By using Eq. (31) and similarly to what is done in Eq. (19), the polarization contribution to interatomic forces can also be expressed as
(34)  
(35) 
where the partial derivative is now applied to analytic functions of nuclear positions.
When the dielectric depends instead explicitly on atomic positions, Eq. (31) is no longer exact and additional contributions to forces would arise. This is the case, for example, when the dielectric constant is entirely defined in terms of a fictitious electronic density centered at the atomic positions, as described in Ref. sanchez_jcp_2009 (). It is important to notice that similar contributions to the forces arise also when the dielectric is defined in terms of the sum of the electronic density plus a fictitious ionic density, as is the case in, e.g., the original Fattebert and Gygi model, where additional core charges were added to saturate the dielectric constant in the solute region fattebert_jcomputchem_2002 (); fattebert_intjqchem_2003 (). In this case, a contribution similar to the one derived in Ref. sanchez_jcp_2009 () should be explicitly added to the forces, unless the derivative of the dielectric with respect to this fictitious nuclear density is zero, i.e., unless this density is added only in a region of space where the resulting dielectric constant is flat.
Iv Choice of dielectric function
Although formally equivalent to the original GygiFattebert model, the equations presented above have the advantage of highlighting the numerical challenges of the original formulation. In particular, the main ingredient of the model is the dielectric function that appears in Eq. (1), and for the model to work properly and seamlessly some conditions on this function should be imposed. These conditions are as follows:

Extrema: the dielectric function should go monotonically from a value of 1 (vacuum) inside the molecule to a value of in the bulk of the solvent:
(36) 
Flatness inside the solute: the dielectric should be exactly equal to one above a certain density threshold, to avoid spurious polarization effects due to the interaction of the dielectric with the ion cores.

Flatness in bulk solvent: the dielectric should be exactly equal to below a certain threshold, to avoid spurious polarization charges in the bulk of the solvent, due to potential numerical noise in the exponentially vanishing electronic density away from the solute.

Smoothness: since we are interested in implementing the model in a planewave, periodic electronicstructure code, the dielectric function and its gradient have to be smooth enough to be well described in a threedimensional grid that has a resolution given by the typical density cutoffs used in planewave DFT calculations. Such condition is crucial to the convergence of the iterative SCF calculation. Moreover, since our formulation of the method relies on the polarization density of the medium, this later quantity should also be smooth enough to be well described with the density cutoffs that are conventionally adopted.
The original dielectric function by Fattebert and Gygi fattebert_jcomputchem_2002 (); fattebert_intjqchem_2003 ()
(37) 
satisfies the first requirement, going very smoothly from in the proximity of the solute to in the bulk of the solvent.
The smoothness of the function allows it to be well defined in the threedimensional mesh used in typical calculations (see Figure (2)). Nonetheless, convergence issues may arise in some particular systems. Such issues have been reported by Sanchez et al. sanchez_jcp_2009 () for simulations of twodimensional systems (i.e. slabs), and result from the fact that the gradient of the dielectric function and the dielectricdependent contribution to the potential in the energy functional are too abruptly varying to be correctly described with typical grid sizes. It is worth noting here that the extra potential term of Eq. (26) is somewhat similar to the polarization charge density: both quantities depends on the gradient of the dielectric and of the total field, thus their are both confined in the small region around the solute and they both suffer for the same conditioning problems (compare, for example, the two quantities in Figure (2)). Last, the smoothness of the Fattebert and Gygi function leads to a polarization density in a region very close to the nuclei, potentially breaking the second requirement. As mentioned in Section III, this problem has already been pointed out in the original paper by Fattebert and Gygi, and it has been solved by using an additional density in the definition of , in order to force the dielectric constant to go to in the proximity of the nuclei fattebert_jcomputchem_2002 (); fattebert_intjqchem_2003 (). While this approach is correct in theory, in practice it introduces some additional terms in the forces acting on the nuclei that have to be explicitly considered (see final discussion in the previous Section). Moreover, the final result of the calculation depends strongly on the choice of the introduced additional density or on the choice of parameters used to model the ionic density (i.e. shape of the nuclei, Gaussian spread, etc.). For these reasons it appears that the original function, at least in the parametrization adopted in Ref scherlis_jcp_2006 (), is excessively smooth where it should not (close to the nuclei), while not smooth enough where it should (outside the molecule).
With the aim of correcting these drawbacks, we decided to introduce a new expression for the dielectric function by using a piecewise definition of the dielectric of the form
(38) 
where is a general smooth switching function that decreases monotonically from to . Several switching functions were considered, e.g. the trigonometric function
(39) 
Nonetheless, despite their similarity with the original function of Fattebert and Gygi, all of the trial functions resulted in even worse or no convergence of the electronicstructure calculation at any reasonable cutoffs. The reason for such a illconditioning is that the region where the function is defined lies outside of the molecule, where the electronic density decays exponentially. Thus, the resulting function also decays exponentially. Nevertheless, since the electronic density is known to vanish exponentially, the problematic analytical behavior can be rectified easily by redefining the switching function in terms of the logarithm of the density, namely,
(40) 
In order to increase the smoothness of the dielectric function and of the resulting polarization charges, a further condition can be imposed on the form of the switching function. In particular, from Eq. (10) it is clear that the polarization charges are mostly given by a term of the form . While no assumptions can be made, a priori, on the behavior of the gradient of the total field, in defining the dielectric function it is convenient to choose a form of the switching function such that the derivative of its logarithm is well behaved. This can be easily achieved by selecting a dielectric function of the form
(41) 
where is a general, smooth function that decreases monotonically from to
Explicitly, in this work, we decided to adopt the trigonometric function
(42) 
whose first derivative
(43) 
vanishes with zero slope at the extrema of the interval of definition, thus making and its first two derivatives continuous in the whole space. It is important to notice that the proposed function does not contain any internal adjustable parameters, but only depends on the value of the dielectric constant and on the two density thresholds and .
V Iterative vs Multigrid
In order to effectively compute the polarization field in Eq. (4) and related polarization charges, different numerical procedures can be adopted. While most of the previous implementations rely on multigrid solvers, we have found more advantageous to rely on an iterative procedure, derived along the lines of a similar approach first introduced in PCM.
In particular, given the total charge density of the solute, the second term for polarization charge appearing in Eq. (10) is readily obtained, while for the first one, that we label :
(44) 
the following procedure can be adopted:

In order to avoid calculations of polarization effects for a system with a non converged electronic density, solvent effects are computed only when the accuracy of the SCF calculation reaches a given threshold value .

At the first iteration for the dielectric, the initial polarization charge is fixed to be equal to zero. All following calculations at each subsequent electronic SCF step use the polarization density from the previous step, namely
(45) 
The total density of the system (solute plus polarization) at iteration is computed as
(46) (47) (48) 
The gradient of the total field is computed in reciprocal space via one Fast Fourier Transform (FFT) and three inverse FFTs (IFFTs):
(49) (50) (51) 
The iterative part of the polarization charge at the th step is computed by using Eq. (44), namely
(52) 
A linear mixing of the polarization at the th and th steps is performed, in order to stabilize the iterative procedure:
(53) with a given mixing parameter (usually ), and the residual is computed
(54) 
The polarization density is converged when
(55) with a given tolerance, and the iterative procedure is stopped.
Typical results for the polarization density of neutral molecules in water are reported in Figure (4).
For numerical stability, we find important to compute in real space, e.g. by using finite differences. While computing such a gradient via FFTs would yield a truly variational energy expression, going through reciprocal space would unavoidably give rise to spurious polarization charges in “forbidden regions”. In particular, polarization charges close to the nuclei would form, thereby severely compromising the convergence of our procedure. Several realspace finitedifferences schemes were tested, with the simplest 3points central differences algorithm providing converged results for the majority of tests and simulations. Nonetheless, for microcanonical molecular dynamics simulations, total energy conservation was found to strongly depend on the order of the finitedifference scheme adopted, with only highorder algorithms providing the necessary degree of accuracy in the interatomic forces (see the discussion in subsection VII.2.3).
Similarly to standard SCF calculations, convergence of iterative calculations depends crucially on the use of a mixing scheme. Different mixing schemes can be adopted, such as the Anderson mixing anderson_jacm_1965 (), the modified Broyden mixing johnson_prb_1988 () or the Direct Inversion in the Iterative Subspace (DIIS) pulay_cpl_1980 (). In order to reduce the number of numerical parameters involved in the calculations, in this article only results obtained with a simple linear mixing are reported. Apart from enhancing convergence, no significant effects on the final results exist for different mixing parameters in the range .
The total number of polarization iterations required is strictly related to the tolerance imposed on the residuals. A large value of in Eq. (55) would allow the polarization procedure to stop in a few iterations. Nonetheless, poorly converged polarization screening does prevent the electrons from reaching their ground state, requiring a much higher number of SCF steps to converge (see top and middle panels of Figure (5)).
Since a single SCF step is much more computationally expensive than a polarization iteration, which only involves four FFTs, it has been found preferable to impose a tight convergence tolerance on the polarization charge (bottom panel of Figure (5)). More advanced schemes that rely on a tolerance that depends on the convergence of the electronic density, have been tested. Moreover, a scheme could be envisaged where the polarization density and the electronic density are optimized with the same common iterative algorithm, i.e., where the polarization density is mixed once per SCF cycle and with the same procedure used for the electronic density. Nonetheless, for sake of simplicity, only results obtained with a fixed tolerance on the polarization charge are presented.
The behavior of the SCF procedure is also strictly dependent on the choice of the threshold value for the onset of the polarization calculation, . Waiting for the electrons to be converged before starting the polarization calculation could seem an efficient way of proceeding. Nonetheless, switching on the polarization fields strongly affects the SCF procedure, bringing the electrons as far from their ground state as at first SCF steps. For this reason, larger values of , of the order of 0.10.5 Ry, were found to give smoother SCF convergence, with no effects on the final results.
Overall, the whole procedure benefits significantly from the advantageous scaling of FFT techniques. Contrary to multigrid solvers, FFTs are easily available in fast, well established, parallel libraries and form already a key ingredient in most planewave electronicstructure codes. Moreover, the method relies on fully periodic boundary conditions, so it can be straightforwardly applied to periodic systems in arbitrary unitcell geometries without requiring a cubic cell and imposing arbitrary Dirichlet boundary conditions on the potential, as in the multigrid cases. In polar solvents, periodicity does not affect final results, since the solvent screens very effectively the charge density of the solute. Moreover, periodic image correction schemes, such as the MakovPayne correction makov_prb_1995 () or countercharge methods dabo_prb_2008 (); li_prb_2011 (), can easily be adopted to include the polarization charge density, by treating it at the same level as the molecular charge distribution.
Vi Additional solvation terms
The model described in the previous sections focuses only on electrostatic contributions to solvation tomasi_chemrev_2005 ()
(56) 
where is the abinitio energy of the isolated solute in vacuum and is the analogous quantity computed in solution, i.e. from Eq. (22). Such a contribution is always negative, i.e., allowing the medium to polarize always stabilizes the solute and lowers its total energy. Of course, other effects than electrostatic are present in any solvation process and they play a crucial role in balancing the overall solvation energies that can be negative or positive. According to the formal definitions of BenNaim bennaim1987 (); bennaim1992 (), PCM introduced other nonelectrostatic terms in the solute Hamiltonian, still assuming a continuum approach in the description of the medium. The main terms can be divided (see Eq. (79) of Ref. tomasi_chemrev_2005 ()) into:
(57) 
In the above formula, the cavitation energy corresponds to the energy necessary to build into the solvent the cavity containing the solute. The repulsion and dispersion terms are the continuum equivalent of the nonbonded shortrange interactions generated by the Pauli exclusion principle and by van der Waals interactions. The thermal motion contribution arises from the change in the vibrational and rotational properties of the solvated system with respect to the isolated one, and the pressure term takes into account the change in volume of the solvated system. Different models exist for each of the nonelectrostatic terms, a comprehensive review being given in Ref. tomasi_chemrev_2005 (). In standard PCM calculations only the first three non electrostatic terms are explicitly accounted for. Of these, cavitation and repulsion energies are positive by definition, while the dispersion contribution is always negative. Overall, these terms tend to cancel each other and their effect on molecular properties and chemical reactions has been generally regarded as less important than the electrostatic term.
The model by Fattebert and Gygi was extended by Scherlis et al. scherlis_jcp_2006 () to include a cavitation term, simply expressed as the product of the experimental surface tension of the solvent times the surface of the solute cavity, namely
(58) 
The surface above is the “quantum surface” introduced in cococcioni_prl_2005 () and defined, via the “quantum volume” cococcioni_prl_2005 (), by finite difference between two isosurfaces of the electronic density as
(59) 
From the functional derivative of the cavitation energy (Eq. (58)) with respect to the charge density, an extra potential term of the form
(60) 
can be straightforwardly added to the KohnSham Hamiltonian. The function entering in the above equations was originally designed to be a step function, switching from zero to one at the threshold , namely,
(61) 
To improve the numerical stability of the algorithm, a smoothed step function of the same form, i.e. , was also suggested. The finite difference parameter was then used in Eq. (59) to determine two adjacent isosurfaces separated by a constant separation in terms of electronic density.
Subsequently, in order to use a definition consistent with the dielectric function used in the electrostatic solvation calculation, the function
(62) 
was adopted. We note that, while very similar in spirit to the original formulation by Cococcioni et. al. cococcioni_prl_2005 (), the use of Eq. (62) together with Eq. (59) is not formally exact, since the parameter entering in Eq. (62) is not in a linear relationship with the argument of the switching function. Thus, applying the finite difference procedure to this arbitrary parameter as in Eq. (59) does not correspond to calculating the spatial distance between the two isosurfaces. For this reason, surfaces computed with Eq. (59) result in an overestimate of approximately a factor of 1.2 of the correct quantum surface. Such an overestimate can be avoided by applying the finite difference procedure directly to the argument of the switching function, namely,
(63) 
where now the function is a given switching function that depends on an arbitrary set of parameters and that goes from zero to one at a fixed threshold larger than . We notice here that, while the above expression is valid for any kind of switching function, for a function defined according to the original formulation of Cococcioni et al. [Eq. (61)], Eqs. (59) and (63) are equivalent. For the sake of consistency with the electrostatic solvation term the following definition is adopted
(64) 
where is now given by Eq. (41).
In a similar way, the method of Cococcioni et al. to treat systems under pressure can be immediately extended to the calculation of the term that appears in Eq. (57). Explicitly, by adopting the same switching function used to define the solute surface Eq. (64), the solute volume can be expressed as
(65) 
that gives rise to an extra potential term in the KohnSham Hamiltonian of the form
(66) 
By including the term in the total energies of both the system in vacuum and in solution, the contribution is automatically included in the solvation free energy computed via Eq. (56). Such a contribution can be important for systems under pressure while, for systems at standard pressures, it can be safely neglected.
As for the remaining contributions to the solvation free energy, we have decided to treat them in a simplified way, their explicit modeling being the subject of future developments. In particular, similarly to other models of solvation, the thermal motion contribution has been neglected, while we express the sum of dispersion and repulsion free energies as a term linearly proportional to the quantum surface and the quantum volume of the molecular cavity, namely
(67) 
where the two factors and are solventspecific tunable parameters that can be fitted, together with the other parameters in the model, e.g. to reproduce total solvation energies. This approach is coherent, for example, with the definition of repulsion energy used in PCM amovilli_jpcb_1997 (), where such a term is proportional to the solute electronic density that lies outside the molecular cavity: in a model where the molecular cavity is defined on an isosurface of the density, the amount of electronic charge outside the cavity will be proportional to the surface of the cavity. Moreover, such an approach is similar in spirit, but with less parameters, to the one adopted by Cramer and Truhlar in their Generalized Born (GB) method of solvation and, more recently, in their SMx models (see Ref. cramer_accchemres_2008 () and references therein). In these classes of methods, all of the nonelectrostatic terms of solvation are described as the sum of atomic terms, proportional to the solventaccessible surface area of each atom of the solute.
The final expression for the solvation free energy of our simplified model is thus given by
(68) 
where the main parameters involved in the electrostatic contribution are explicitly reported.
Vii Results
vii.1 Numerical Details
A large set of experimental solvation free energies for 240 small neutral organic molecules in water was used to test the proposed computational methodology. The molecules of the set have been collected in Ref. shivakumar_jctc_2010 () to test the accuracy of classical molecular dynamics free energy perturbation calculations. Based on hierarchical clustering, a smaller representative set of 13 molecules (reported in Figure 6) spanning the main functional groups of organic molecules is also provided in Ref. shivakumar_jctc_2010 () and is used here to test the convergence of numerical parameters and to fit tunable parameters.
PCM calculations were performed at the HartreeFock (HF) and density functional theory (DFT) level, using both the G03 g03 () and G09 g09 () versions of the GAUSSIAN code. For the sake of brevity, in the remaining discussions the labels g03/g09 will be used to refer to the two different flavors of PCM mennucci_jpcb_1997 (); scalmani_jcp_2010 () implemented as defaults in the G03/G09 versions of the GAUSSIAN code. Both standard PBE and hybrid B3LYP exchangecorrelation functionals were tested. The dependence of the results on the chosen theoretical approximations was found to be minor and generally negligible compared to the overall agreement with experimental results. Thus, only results obtained with the PBE functional will be reported. Geometries were optimized both in vacuum and in solution with a double zeta 631g(d) basis set. Starting from the relaxed geometries, total energies were computed with a triple zeta 6311+g(d,p) basis set. Due to the lack of such basis sets for the iodine atom, calculations with GAUSSIAN were not performed on the eight molecules of the set containing this atomic species. For the other molecules, calculations were performed using PCM to compute both electrostatic and cavitation energies. For the latter contribution and in order to be coherent with the original definition of the cavitation term, calculations were performed with a cavity built using unscaled Bondi’s atomic radii bondi_jpc_1964 (). For all other calculations the default cavities were used. In particular, IEFPCM as implemented in G03 adopts a cavity defined according to the solvent excluded surface connolly_jac_1983 () as approximated by the GePol algorithm pascualahuir_jcc_1994 (), and based on spheres of United Atom radii barone_jcp_1997 (). The version of IEFPCM implemented in G09, instead, relies on a simpler cavity scalmani_jcp_2010 (), built as the combination of atom centered van der Waals spheres of Universal ForceField radii rappe_jacs_1992 (), and scaled by a factor of 1.1.
Our continuum solvation model has been implemented in the publicdomain PWSCF parallel code included in the QuantumESPRESSO package QUANTUMespresso (), based on DFT, periodicboundary conditions, planewave basis sets and pseudopotentials (PP) to represent the ionelectron interactions. All the calculations reported in this work were performed at the Gamma point of the Brillouin zone (as is appropriate for molecules), using the PBE exchangecorrelation functional and Vanderbilt ultrasoft pseudopotentials (USPP), as contained in the PSlibrary of A. Dal Corso pslibrary (). For bromine and iodine we have used the USPP available online pseudoQE (). KohnSham wavefunctions and charge densities were expanded in plane waves up to kinetic energy cutoffs of 30 and 300 Ry, respectively. For molecules containing fluorine, these cutoffs were further raised to 45 and 450 Ry. Simulation cells were chosen to be at least 15.0 a.u. larger than the maximum size of the molecule in vacuum, and not smaller than 20.0 a.u.. Spurious periodicimage effects were taken into account, for all the molecules, using the MakovPayne corrective scheme makov_prb_1995 ().
In order to ensure a consistent description of solvent effects across our large set of simulations used to fit the tunable parameters of the method, a larger basis set was adopted, corresponding to wavefunction and density cutoffs of 40 and 400 Ry, respectively. Moreover, since the effect of the tunable parameters on the final geometries of the benchmark molecules was found to be of secondary importance (less than 0.1 kcal/mol), the fitting simulations were performed with no geometry relaxation and starting from molecular geometries optimized using a reasonable set of parameters ( a.u., a.u., dyn/cm, dyn/cm, GPa).
The accuracy of the method and, in particular, of the forces calculation were tested by means of BornOppeneimer abinitio molecular dynamics simulations of a prototypical system in vacuum and in solution. Equilibration of the system was first performed with a timestep of 60 a.u. ( fs) in the NVT ensemble at 300 K, as imposed by a Berendsen thermostat. Following equilibration at constant temperature, simulations were performed in the microcanonical ensemble. In order to ensure proper energy conservation for the simulation in vacuum, some key simulations parameters had to be tightened. In particular, the wavefunction and density cutoffs were raised to 40 Ry and 600 Ry, respectively. Similarly, the timestep was decreased to 20 a.u. ( fs) and the threshold controlling the convergence of the electronic SCF procedure was imposed to be Ry. The same set of parameters was then used for the simulations in the presence of the solvent.
In order to study the influence of the numerical parameters of the method on the final results, unrelaxed electrostatic solvation energies were calculated as
(69) 
where, contrary to Eq.(56), the electrostatic energy of the system in solution is computed using the geometry optimized in vacuum (). This was done with the aim of removing a further source of errors from the analysis of the numerical sensitivity of the method.
vii.2 Parametrization
As summarized in Eq. (68), our approach involves six main parameters. Of these, the bulk dielectric constant and the surface tension represent physical quantities, specific of the solvent, that can be computed from first principles or, more often, extracted from experimental results. The remaining four parameters, i.e. the two thresholds entering into the definition of the dielectric function and the and constants of Eq. (67), are tunable quantities that affect the computed solvation energies. As such, these parameters have to be fitted on reference calculations or experimental data, as reported in the following subsection (VII.2.1). It is important to underline here that the comparisons with PCM are not intended to represent a direct assessment of the accuracy of PCM or of other continuum models in the literature, but rather they serve two other purposes. First, they are intended to show the intrinsic flexibility of the SCCS formulation, which can reproduce well results of other approaches exploiting a much reduced number of parameters. Second, using well established continuum models simplifies the fitting procedure in comparison to relying solely on experimental data. For the above reasons and due to their wide spread use and assessment in the literature, comparisons and fittings are only made with the two main versions of IEFPCM, as implemented in the Gaussian simulation packages cossi_jcp_2002 (); scalmani_jcp_2010 (). We are aware that other continuum models exist in the literature klamt_jchemsoc_1993 (); wiberg_jpc_1995 (); zhan_jcp_1998 (); tomasi_chemrev_2005 (); cramer_accchemres_2008 () and some of these wiberg_jpc_1995 (); zhan_jcp_1998 () could also be fitted within the SCCS formulation; nonetheless such extensive comparison is beyond the purpose of this article.
In addition to the parameters reported in Eq. (68), the proposed methodologies rely on a certain number of computational parameters. These are purely numerical quantities, that influence the speed and stability of the calculations but, apart from pathological cases, should not affect the calculated results. These parameters include the standard parameters of planewave periodic boundary calculations (wavefunction and density cutoffs, cell size, pseudopotentials) together with some solvent specific quantities, such as the tolerances of the iterative approach and , the parameter entering into the polarization density mixing approach, the finite difference parameter used in the calculation of the molecular surface, and the radii of the fictitious ionic densities used in the Ewald evaluation of electrostatic contributions. While, in theory, these parameters can be freely adjusted to make the method more efficient and more stable, a careful analysis of their effects on the final results is mandatory and is reported in Appendix A (APPENDIX A: Effect of numerical parameters).
vii.2.1 Fitting of tunable parameters
In particular, since the proposed model mostly focuses on the electrostatic contribution to solvation free energies, it is difficult and even not physical to fit tunable parameters of the model to reproduce total solvation free energies of molecules. Indeed, such a fit would intrinsically rely on cancellation of errors and would be highly dependent on the solutes and solvents considered. Furthermore, it is mostly the electrostatic part of solvation that plays a role in determining the molecular properties of a solute in solution, such as solvent effects on molecular spectra (IR, UV or NMR) or on reaction rates. An ideal fitting would thus require parametrizing the model on these molecular properties. Since already existing continuum models of solvation have generally achieved a very good description of molecular properties in solution, and the IEFPCM predictions for electrostatic solvation energies have been successfully used by several methods cramer_accchemres_2008 (); curutchet_jcomputchem_2001 (); soteras_jmolstrtheo_2005 (); klamt_accchemres_2009 () as the basis onto which one could parametrize more complete models of solvation, we decided to use the electrostatic part of the results obtained with the PCM to parametrize the electrostatic part of our model, while the nonelectrostatic terms have been tuned in a second stage in order to improve the fit with respect to experimental total solvation free energies.
The two main parameters of the electrostatic part of SCCS to be fitted are the two density thresholds and that enter into the definition of the dielectric constant [Eq. (41). In particular, defines the onset of the dielectric and, for the flatness conditions (see Section IV) to be fulfilled, should correspond to a density isosurface as far from the nuclei as possible. Thus we consider only values lower than 0.005 a.u. for . As for , since the electronic density in planewave codes is computed via FFTs and can present small oscillations also in regions of space far from the molecule, it is important to define this threshold such that it is not influenced by numerical noise in the electronic density. For this reason only values of higher than 0.00005 a.u. have been considered in the fit.
Mean absolute errors (MAE) relative to PCM electrostatic solvation energies as a function of and are reported in Figures (7) and (8), for G09 and G03 respectively, and in the Supplementary Material suppmat (). From both graphs, it appears that there is no unique optimal fit for the two parameters and . In fact, for each value of there exists a value of that provides the optimal fit with respect to the reference. This observation offers some additional freedom in the choice of the two parameters.
The agreement with the G09 results is excellent with a mean absolute error (MAE) lower than 0.4 kcal/mol on the 13 benchmark molecules of the training set (see Figure (7a)). Out of the possible choices of the two parameters, the values of a.u. and a.u. were selected (fitg09), giving a minimal MAE of 0.36 kcal/mol for the 13 molecules of the training set. The MAE is found to further decrease to 0.27 kcal/mol for the full set of 240 molecules reflecting the remarkable transferability of the fit performed on only 13 benchmark molecules. This results is made more remarkable by the fact that our model relies on only two parameters for the definition of the cavity hosting the solute, at variance with PCM that involves in its simplest implementation a different empirical parameter for each atomic species.
Parametrizing the method on the electrostatic solvation energies obtained with the G03 version of the PCM model resulted in a less close agreement, with MAE of the order of 1.0 kcal/mol (see Figure (8a)); again, also in this case, results improve when the full set of 240 neutral molecules is considered. In particular, using a.u. and a.u. (fitg03), an average error of 0.95 kcal/mol is obtained (see Figure (8b)). By comparing Figures (7a) and (8a), it is important to notice that, for each value of , the value of that gives the best fit of the PCM results in the G03 case is always higher than the corresponding G09 fit. This corresponds to a cavity that is closer to the solute, and is a behavior to be expected. The reason for such a result lies in the different definition of the molecular cavity in the two version of the model, the G09 version relying, in the default implementation, on a larger solute cavity scalmani_jcp_2010 (). Similarly, the reason for the less good agreement between SCCS and PCM G03 results is most likely due to the more elaborate definition of the molecular cavity in PCM G03, where an additional set of spheres was added to the cavity to smooth out intersecting regions.
As a further test of the flexibility of the proposed SCCS model, a comparison of the cavitation energies computed with Eq. (58) versus that of the ClaveriePierotti method implemented in PCM was performed and reported in Figure (9). Also in this case, even though the two methods are based on different physical assumptions, results show a remarkable agreement. Errors of the order of 0.8 kcal/mol were found on the computed cavitation energies, despite the fact that they are typically four times larger than solvation energies. Calculations of on the whole set of molecules were performed using the experimental value of dyn/cm for liquid water at room temperature and the fitg09 set of parameters, since these set lie close to the region of best match with the PCM cavitation energies (see Figure (9)).
Eventually, while keeping the two parameters of the cavity fixed to either the fitg09 or fitg03 values, and assuming a surface tension dyn/cm for water at room temperature, the parameters and entering into the definition of the nonelectrostatic terms, Eq. (67), have been optimized to best reproduce the experimental results of solvation free energies (see Figures (10) and (11)).
In both cases, an almost perfectly linear behavior was found for the error as a function of the two parameters. This is probably due to the small variation in the size of the molecules considered in the training set, which makes the volume and the surface terms to be linearly related (see Figure (12)). When looking at the behavior of the quantum volume with respect to the quantum surface of the solute for the 240 molecules of the full set, we found a trend in between the one expected for an ideally spherical system, for which
(70) 
and a linear system, composed by a cylinder of length , terminated by hemispheres, and of radius equal to an average atomic size ( a.u.), for which
(71) 
As a result of this almost linear behavior that reflects the corrugation of the solvation shell, we are again left with the freedom of choosing one of the two parameters, while performing an optimal fit on the other. In an effort to simplify the method and conform to other solvation models in the literature, we decided to choose the sum of the surface parameters and such that the volume contribution vanishes, i.e. in such a manner that the corresponding optimal value of is equal to zero. This choice corresponds to a value of dyn/cm for the fitg09 set of parameters, or a value of dyn/cm in the fitg03 case. Total solvation energies for the 13molecules training set with these choices of parameters resulted in a MAE with respect to experimental results of about 1.6 kcal/mol (see Tables in the Supplementary Material suppmat ()). When considering the whole set of neutral molecules, it is clear that the fitg03 set of parameters presents the best trend with respect to experimental data. This results in a better agreement with experiment for the fitg03 set compared to the fitg09, with MAEs of 1.31 kcal/mol and 1.53 kcal/mol respectively.
Other fits were performed and tested on the full set of molecules, to check the consistency of the results along different combinations of parameters (see Table (1)). It is instructive to compare two choices of parameters that best reproduce the electrostatic energies computed with G03, namely the fitg03 and fitg03’ sets in Table (1). In this case, it appears that MAEs of comparable magnitude on the 13molecule training set (see Figure (8)) correspond to similar errors on the full set of molecules, thus further validating the choice of the fitting set. Moreover, it also appears that relaxing the condition on the second nonelectrostatic parameter has minor effects on the overall ability of the method to reproduce experimental energies, with MAE improving by 0.020.04 kcal/mol for two of the sets considered. In the third set considered (fitg03) an improvement on the quality of the fit of the order of 0.11 kcal/mol was found. Although not explicitly parametrized for it, this last set of parameters (fitg03+) is also the one that better reproduces the solvation energy of water in water, with an excellent agreement with the experimental value of 6.30 kcal/mol (see Table (1)). Nonetheless, the relative improvement in the accuracy of the results, obtained by relaxing the condition on the parameter, is still marginal compared to the overall performances of the method.
Name  (a.u.)  (a.u.)  (GPa)  MAE  MAE  MAE  
(dyn/cm)  (kcal/mol)  (kcal/mol)  (kcal/mol)  (kcal/mol)  
fitg09  0.0001  0.0015  2.5  0.0  5.13    0.27  1.53 
fitg09+  0.0001  0.0015  11  0.08  4.90    0.27  1.51 
fitg03  0.0001  0.0050  11.5  0.0  7.40  0.95    1.31 
fitg03+  0.0001  0.0050  50  0.35  6.29  0.95    1.20 
fitg03’  0.0003  0.0030  12  0.0  7.55  0.90    1.32 
fitg03’+  0.0003  0.0030  20  0.08  7.35  0.90    1.28 
Indeed, reported errors show that the proposed method is not yet able to be used as a tool to quantitatively predict free energies of solvation. Nonetheless, when compared with similar computational methods, the above results are remarkable, since they have been obtained with a model based on only two independent parameters. In particular, while the model of Cramer and Truhlar is able to achieve a MAE of the order of 0.55 kcal/mol for solvation of neutral compounds in aqueous solution cramer_accchemres_2008 (), it relies on a much higher number of tunable parameters. Similarly, the use of more tunable methods to describe the dispersion and repulsion contributions were shown to improve the agreement with experiment for other IEFPCM based methods, with MAE of 0.58 kcal/mol and 1.01 kcal/mol reported for the COSMOtherm and the MST methods (see Ref. klamt_accchemres_2009 () and references therein).
vii.2.2 Errors vs functional groups
In order to better understand which physical aspects mostly limit the accuracy of the method, the errors on solvation energies, computed with the fitg03+ parameters have been reported in Figure (13) for all the 240 molecules of the extended set classified according to their main functional groups. The performances of the method appear to be very good for a large set of organic molecules, including linear and cyclic alkanes, alkenes, arenes, aldehydes, ketones, and esters. With the noticeable exception of fluorine, etheroatoms and halogens do not affect the accuracy of the results, and compounds containing chlorine, bromine, iodine, mono and di sulfides, nitro and nitrile groups show reasonable errors (the largest deviations being less than 1.0 kcal/mol), well within the accuracy of the best implicit solvation computational methods. Considering only the above classes of compounds, a moderate MAE of 0.46 kcal/mol is obtained.
Instead, the performance of the method is poor for two specific types of functional groups: amines and carboxylic acids, that show positive (amines) and negative (acids) errors on solvation energies of up to 45 kcal/mol. The acid and basic natures of these two classes of compounds is probably the cause of large discrepancies between computed and experimental results. In particular, SCCS completely neglects the possibility that different species of the same solute could be in equilibrium in solution, corresponding to more or less protonation/deprotonation. Thermodynamically meaningful solvation energies should instead be computed as a weighted average of the solvation energy of the neutral species and that of the protonated/deprotonated species. A refinement of the fit to include the effects of acidbase equilibria will be the subject of further developments.
In addition to the acid/basic compounds, poor results are obtained also for ethers, alcohols and amides. Whether this is due to the lack of a correct description of the hydrogen bonding environment with a continuum model is still under study. Nonetheless, the better performance of alcohols with respect to ethers seem to suggest that hydrogen bonding should not be the only reason for such discrepancies. Fluorine compounds, and in particular multifluorinated alcohols and ethers, also show poor results compared to experiment. This effect could be again due to the lack of explicit hydrogen bonding in SCCS, that should affect fluorinated compounds more than other halogenated molecules. Moreover, the discrepancies in the computed results could be due to the different numerical treatment given to molecules containing fluorine, where higher density and wavefunction cutoffs were used in order to compensate for the hardness of the fluorine pseudopotentials. Eventually, alkynes and branched alkanes show somewhat poor agreement with experiment, probably due to the unideal description of the cavitation and repulsion potentials.
vii.2.3 Molecular dynamics simulations
In order to validate the accuracy of the proposed method for MD simulations, a prototypical system composed by a cyclic tetramer of deuterated water molecules has been studied, similarly to what was done in Ref. scherlis_jcp_2006 (). As reported in the numerical details subsection VII.1, convergence of constant energy simulations in vacuum, as reflected by total energy conservation, required to sensibly tighten some of the key parameters of the simulation. When the same parameters of the calculations in vacuum were adopted for a simulation in the presence of continuum aqueous solution, a sensible worsening of the accuracy of the method was found. This behaviour is due to a small inaccuracy in the calculation of the forces, that has no significant effects on the geometry relaxations of the tested molecules. Such an error could be traced back to be related to the use of a simplified 3points central difference schemes in the evaluation of the gradient of the dielectric that appears in the definition of the polarization charges (see, e.g., Eqs. (10) and (44)). Several alternative and more advanced schemes can be adopted for a finite difference calculation of such a gradient, including higher order central differences or smooth noiserobust differentiators snrd (). By exploiting a number of points in the discretized space larger than three, these schemes can produce an unphysical nonvanishing gradient in the flat regions of space close to the solventvacuum interface. Nonetheless, the more accurate description of the gradient in the interfacial region produces interatomic forces more accurate than the ones obtained by the simplest 3points central difference method, thus resulting in total energy conservation of the same quality of the simulation in vacuum (compare left and right panel of Figure (14)).
Viii Conclusions
A revised selfconsistent continuum solvation model has been presented that overcomes most of the limitations of previous approaches and includes, in a simplified way, additional nonelectrostatic contributions to solvation free energies via a combined use of the concepts of quantum volume and quantum surface cococcioni_prl_2005 () for isolated fragments. The model has been implemented using a novel iterative approach that is robust, efficient, fully parallel and convenient to implement in electronicstructure codes, both for isolated or periodic geometries independently of the charge density representation or the Bravais lattice chosen. Parametrization of the method to reproduce the experimental results has been performed, and the overall performance over an extended set of solvation free energies in water compares favorably with the results from reference theoretical methods of similar physical background and more parameter intensive. The static dielectric constant of the solvent and one parameter allow to fit the electrostatic energy provided by the PCM model with a mean absolute error of 0.3 kcal/mol on a set of 240 neutral solutes, and two parameters allow to fit experimental solvation energies on the same set with a mean absolute error of 1.3 kcal/mol. A detailed analysis of these results, broken down along different classes of chemical compounds, shows that most molecules display even closer agreement, whereby larger errors are mostly limited to selfdissociating species and strong hydrogenbond forming compounds. Last, a careful analysis of the effects of numerical parameters on the predictions of the method have been presented, to make present results fully reproducible.
Acknowledgements.
This work was carried our as part of the MIT Energy Initiative, with the support of Robert Bosh LLC. The authors would like to thank D. Scherlis, J.L. Fattebert, C.K. Skylaris, C.H. Park and J. Tomasi for a careful reading of the manuscript and their suggestions. We are particularly grateful to B. Mennucci for useful discussions and her help with the continuum solvation literature.APPENDIX A: Effect of numerical parameters
In an effort of making the present results reproducible and to study the accuracy of the model proposed, a careful analysis of the effect of all other numerical and computational parameters on solvation energies is mandatory. In fact, only a few of these have been discussed and shown to have a negligible effect on the final results. This is the case of the finitedifference parameter that appears in the definition of the cavitation energy, Eq. (60), which was found to have negligible effects on the computed energies scherlis_jcp_2006 (). For this reason, a value of a.u. was used throughout the simulations. Similarly, as discussed in Section V, the parameters entering into the iterative algorithm to compute the polarization charges show no significant impact on the final results.
The main parameter that enters into an abinitio calculation is the size of the basis set, that in the planewave codes for which the model was developed corresponds to the kinetic energy cutoffs of the plane waves used to represent the wavefunctions and electronic density. The density cutoff, in particular, corresponds to the grid size in real space used in all the calculations of the polarization density (see Section (V)) and, as such, it represents a crucial numerical parameter of the model. Since the electronic density is given by the square of the wavefunction, a cutoff on the density four times larger than the one on the wavefunction is usually adopted. Nonetheless, the use of ultrasoft pseudopotential, while allowing to lower the planewave cutoffs for the description of the wavefunction, requires a much higher ratio between the two cutoffs, with density cutoffs 8 to 10 times larger than the ones on wavefunctions.
Convergence of the electrostatic solvation free energy with wavefuntion cutoffs for the thirteen molecules of the fitting set is reported in Figure (15). Results show a reasonably fast convergence with respect to wavefunction cutoff: while variations on the computed energies are of the order of the kcal/mol even for the larger cutoffs considered, solvation free energies are well converged for wavefunction cutoffs of the order of 30 Ry. At lower cutoffs a small subset of molecules present errors on solvation energies of the order of 0.51.0 kcal/mol: since the molecules on this subset are the ones containing oxygen, it seems that the error is mostly related to the use of a pseudopotential for oxygen that is harder than the ones used for the other atomic types. The fact that changing the solvent parameters has a negligible effect on the convergence of the results also supports the conclusion that the solvation calculation is independent on the choice of the wavefunction cutoffs.
Different results are obtained for the convergence with respect to the density cutoff, as shown in Figure (16). Convergence is slightly slower than with respect to wavefunction cutoffs, with errors of the order of 0.20.3 kcal/mol for the normal range of this parameter (i.e. between 300 and 400 Ry). This residual error appear to be mostly related in the first place to the sharpness of the polarization charge and potentials. Moreover, the numerical details of the method could also be crucial, as reflected by the fact that the polarization charge density is not neutral. In particular, the calculation of that enters into Eq. (44) is performed using a simplified approach, that involves a centered threepoint finite difference algorithm in real space. Nonetheless, for the range of solvent parameters investigated, the error is generally less than 0.1 kcal/mol, thus well within the typical accuracies of conventional continuum solvation models.
In Figure (17) the cellsize dependence of the electrostatic contribution to solvation free energies is reported, for the thirteen molecules of the fitting set, with and without corrections for periodicboundary conditions. As already mentioned, cell sizes were chosen to be equal to the maximum size of the molecule optimized in vacuum, plus an adjustable amount of 10 to 40 a.u. (denoted cell extra size in Figure (17)). Since calculations are performed for a solvent with a high dielectric constant, screening of solute charges makes the effects of periodicboundary conditions relatively small. When including corrections for periodicboundary conditions, such as in the MakovPayne scheme makov_prb_1995 (), no significant variations were found for solvation energies calculated from Eq. (69), while large changes of the energy in vacuum appear. Thus, the error is mostly related to the accuracy of the calculation in vacuum. Moreover, the overall error was found to be lower than 0.1 kcal/mol for most of the molecules in the fitting set, while larger errors up to 0.5 kcal/mol were found for the molecules with larger dipole moments. When a correction for periodic images is explicitly accounted for, convergence is quite fast even for the smallest cell sizes considered. More advanced and accurate periodic boundary correction schemes, such as the densitycountercharge correction (DCC) dabo_prb_2008 (), can be easily extended to calculations with the SCCS, by explicitly considering the polarization charge density.
Last, the effect on solvation free energies of the fictitious atomic density used to compute the electrostatic field of nuclei and core electrons has to be taken into account (see Figure (18)). In order to study such an effect, we assumed the nuclei to be described by Gaussians of fixed spread that do not depend on the atomic type. This is a typical approximation in planewave codes, and the correct electrostatic is usually recovered in the total energy by including Ewald complementaryerrorfunction electrostatic terms. Note that more advanced approaches can be adopted, e.g. by defining Gaussian spreads to depend on pseudopotential radii, or by explicitly accounting for the core electronic density.
As explained in Section (IV), in order for the energy of the solvated system to be well defined, the fictitious ionic charge density should be entirely included in the solvent exclusion region. Such a region is directly related to the parameters and used in the definition of the dielectric constant. In particular, a high value of implies the presence of dielectric contributions close to the nuclei. As a result, for higher values of , the effect of the ionic density becomes sensitive at lower values of . Nonetheless, in the whole range of density thresholds considered we always obtain well converged results for Gaussian spreads in the window a.u..
References
 (1) J. L. Fattebert and F. Gygi, Journal of Computational Chemistry 23, 662 (2002)
 (2) D. A. Scherlis, J. L. Fattebert, F. Gygi, M. Cococcioni, and N. Marzari, Journal of Chemical Physics 124, 074103 (2006)
 (3) M. Cococcioni, F. Mauri, G. Ceder, and N. Marzari, Physical Review Letters 94, 145501 (2005)
 (4) J. Tomasi and M. Persico, Chemical Reviews 94, 2027 (1994)
 (5) C. J. Cramer and D. G. Truhlar, Chemical Reviews 99, 2161 (1999)
 (6) M. Orozco, Chemical Reviews 100, 4187 (2000)
 (7) J. Tomasi, B. Mennucci, and R. Cammi, Chemical Reviews 105, 2999 (2005)
 (8) K. Laasonen, M. Sprik, M. Parrinello, and R. Car, Journal of Chemical Physics 99, 9080 (1993)
 (9) M. Sprik, J. Hutter, and M. Parrinello, Journal of Chemical Physics 105, 1142 (1996)
 (10) J. C. Grossman, E. Schwegler, E. W. Draeger, F. Gygi, and G. Galli, Journal of Chemical Physics 120, 300 (2004), ISSN 00219606
 (11) J. VandeVondele, F. Mohamed, M. Krack, J. Hutter, M. Sprik, and M. Parrinello, Journal of Chemical Physics 122, 014515 (2005)
 (12) P. H. L. Sit and N. Marzari, Journal of Chemical Physics 122, 204510 (2005)
 (13) H. A. Stern and B. J. Berne, Journal of Chemical Physics 115, 7622 (2001)
 (14) B. Chen, I. Ivanov, M. L. Klein, and M. Parrinello, Physical Review Letters 91, 215503 (2003)
 (15) J. A. Morrone and R. Car, Physical Review Letters 101, 017801 (2008)
 (16) P. L. Silvestrelli and M. Parrinello, Physical Review Letters 82, 3308 (1999)
 (17) A. Pasquarello and R. Resta, Physical Review B 68, 174302 (2003)
 (18) M. Sharma, R. Resta, and R. Car, Physical Review Letters 98, 247401 (2007)
 (19) L. Onsager, Journal of the American Chemical Society 58, 1486 (1936)
 (20) B. Mennucci, E. Cances, and J. Tomasi, Journal of Physical Chemistry B 101, 10506 (1997)
 (21) E. Cances and B. Mennucci, Journal of Mathematical Chemistry 23, 309 (1998)
 (22) A. Klamt and G. Schuurmann, Journal of the Chemical Society, Perkin Transactions 2, 799(1993)
 (23) S. Corni and J. Tomasi, Journal of Chemical Physics 114, 3739 (2001)
 (24) J. L. Fattebert and F. Gygi, International Journal of Quantum Chemistry 93, 139 (2003)
 (25) J. Dziedzic, H. H. Helal, C. K. Skylaris, A. A. Mostofi, and M. C. Payne, Epl 95 (2011)
 (26) V. M. Sanchez, M. Sued, and D. A. Scherlis, Journal of Chemical Physics 131, 174108 (2009)
 (27) J. L. Pascualahuir and E. Silla, Journal of Computational Chemistry 11, 1047 (1990)
 (28) E. Silla, F. Villar, O. Nilsson, J. L. Pascualahuir, and O. Tapia, Journal of Molecular Graphics 8, 168 (1990)
 (29) G. Scalmani and M. J. Frisch, Journal of Chemical Physics 132, 114110 (2010)
 (30) F. Lipparini, G. Scalmani, B. Mennucci, E. Cances, M. Caricato, and M. J. Frisch, Journal of Chemical Physics 133, 014106 (2010)
 (31) H. M. Senn, P. M. Margl, R. Schmid, T. Ziegler, and P. E. Blochl, Journal of Chemical Physics 118, 1089 (2003)
 (32) A. W. Lange and J. M. Herbert, Journal of Chemical Physics 133 (2010)
 (33) J. B. Foresman, T. A. Keith, K. B. Wiberg, J. Snoonian, and M. J. Frisch, Journal of Physical Chemistry 100, 16098 (1996)
 (34) K. B. Wiberg, T. A. Keith, M. J. Frisch, and M. Murcko, Journal of Physical Chemistry 99, 9072 (1995)
 (35) D. M. Chipman, Journal of Chemical Physics 124, 224111 (2006)
 (36) D. G. Anderson, Journal of the Acm 12, 547 (1965)
 (37) D. D. Johnson, Physical Review B 38, 12807 (1988)
 (38) P. Pulay, Chemical Physics Letters 73, 393 (1980)
 (39) G. Makov and M. C. Payne, Physical Review B 51, 4014 (1995)
 (40) I. Dabo, B. Kozinsky, N. E. SinghMiller, and N. Marzari, Physical Review B 77, 115139 (2008)
 (41) Y. Li and I. Dabo, Physical Review B 84, 155127 (2011)
 (42) A. BenNaim, Solvation Thermodynamics (Plenum Press: New York, 1987)
 (43) A. BenNaim, Statistical Thermodynamics for Chemists and Biochemists (Plenum Press: New York, 1992)
 (44) C. Amovilli and B. Mennucci, Journal of Physical Chemistry B 101, 1051 (1997)
 (45) C. J. Cramer and D. G. Truhlar, Accounts of Chemical Research 41, 760 (2008)
 (46) D. Shivakumar, J. Williams, Y. J. Wu, W. Damm, J. Shelley, and W. Sherman, Journal of Chemical Theory and Computation 6, 1509 (2010)
 (47) “Supplementary material document no. for information on supplementary material, see http://www.aip.org/pubservs/epaps.html,”
 (48) M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, J. A. Montgomery, Jr., T. Vreven, K. N. Kudin, J. C. Burant, J. M. Millam, S. S. Iyengar, J. Tomasi, V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G. A. Petersson, H. Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, M. Klene, X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski, P. Y. Ayala, K. Morokuma, G. A. Voth, P. Salvador, J. J. Dannenberg, V. G. Zakrzewski, S. Dapprich, A. D. Daniels, M. C. Strain, O. Farkas, D. K. Malick, A. D. Rabuck, K. Raghavachari, J. B. Foresman, J. V. Ortiz, Q. Cui, A. G. Baboul, S. Clifford, J. Cioslowski, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. L. Martin, D. J. Fox, T. Keith, M. A. AlLaham, C. Y. Peng, A. Nanayakkara, M. Challacombe, P. M. W. Gill, B. Johnson, W. Chen, M. W. Wong, C. Gonzalez, and J. A. Pople, “Gaussian 03, Revision C.02,” Gaussian, Inc., Wallingford, CT, 2004
 (49) M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, B. Mennucci, G. A. Petersson, H. Nakatsuji, M. Caricato, X. Li, H. P. Hratchian, A. F. Izmaylov, J. Bloino, G. Zheng, J. L. Sonnenberg, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, J. A. Montgomery, Jr., J. E. Peralta, F. Ogliaro, M. Bearpark, J. J. Heyd, E. Brothers, K. N. Kudin, V. N. Staroverov, R. Kobayashi, J. Normand, K. Raghavachari, A. Rendell, J. C. Burant, S. S. Iyengar, J. Tomasi, M. Cossi, N. Rega, J. M. Millam, M. Klene, J. E. Knox, J. B. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski, R. L. Martin, K. Morokuma, V. G. Zakrzewski, G. A. Voth, P. Salvador, J. J. Dannenberg, S. Dapprich, A. D. Daniels, O. Farkas, J. B. Foresman, J. V. Ortiz, J. Cioslowski, and D. J. Fox, “Gaussian 09 Revision A.1,” Gaussian Inc. Wallingford CT 2009
 (50) A. Bondi, Journal of Physical Chemistry 68, 441 (1964)
 (51) M. L. Connolly, Journal of Applied Crystallography 16, 548 (1983)
 (52) J. L. Pascualahuir, E. Silla, and I. Tunon, Journal of Computational Chemistry 15, 1127 (1994)
 (53) V. Barone, M. Cossi, and J. Tomasi, Journal of Chemical Physics 107, 3210 (1997)
 (54) A. K. Rappe, C. J. Casewit, K. S. Colwell, W. A. Goddard, and W. M. Skiff, Journal of the American Chemical Society 114, 10024 (1992)
 (55) P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car, C. Cavazzoni, D. Ceresoli, G. L. Chiarotti, M. Cococcioni, I. Dabo, A. Dal Corso, S. de Gironcoli, S. Fabris, G. Fratesi, R. Gebauer, U. Gerstmann, C. Gougoussis, A. Kokalj, M. Lazzeri, L. MartinSamos, N. Marzari, F. Mauri, R. Mazzarello, S. Paolini, A. Pasquarello, L. Paulatto, C. Sbraccia, S. Scandolo, G. Sclauzero, A. P. Seitsonen, A. Smogunov, P. Umari, and R. M. Wentzcovitch, Journal of Physics: Condensed Matter 21, 395502 (2009)
 (56) A. D. Corso, “Pslibrary version 0.1,” http://qeforge.org/projects/pslibrary
 (57) M. Authors, “Quantumespresso online pseudopotential library,” http://www.quantumespresso.org/pseudo.php
 (58) M. Cossi, G. Scalmani, N. Rega, and V. Barone, Journal of Chemical Physics 117, 43 (2002)
 (59) C. G. Zhan, J. Bentley, and D. M. Chipman, Journal of Chemical Physics 108, 177 (1998)
 (60) C. Curutchet, M. Orozco, and F. J. Luque, Journal of Computational Chemistry 22, 1180 (2001)
 (61) I. Soteras, C. Curutchet, A. BidonChanal, M. Orozco, and F. J. Luque, Journal of Molecular StructureTheochem 727, 29 (2005)
 (62) A. Klamt, B. Mennucci, J. Tomasi, V. Barone, C. Curutchet, M. Orozco, and F. J. Luque, Accounts of Chemical Research 42, 489 (2009)
 (63) P. Holoborodko, “Smooth noise robust differentiators,” http://www.holoborodko.com/pavel/numericalmethods/numericalderivative/smoothlownoisedifferentiators/ (2008)