Basis set construction for molecular electronic structure theory:
Natural orbital and Gauss-Slater basis for smooth pseudopotentials
A simple yet general method for constructing basis sets for molecular electronic structure calculations is presented. These basis sets consist of atomic natural orbitals from a multi-configurational self-consistent field calculation supplemented with primitive functions, chosen such that the asymptotics are appropriate for the potential of the system. Primitives are optimized for the homonuclear diatomic molecule to produce a balanced basis set. Two general features that facilitate this basis construction are demonstrated. First, weak coupling exists between the optimal exponents of primitives with different angular momenta. Second, the optimal primitive exponents for a chosen system depend weakly on the particular level of theory employed for optimization. The explicit case considered here is a basis set appropriate for the Burkatzki-Filippi-Dolg pseudopotentials. Since these pseudopotentials are finite at nuclei and have a Coulomb tail, the recently proposed Gauss-Slater functions are the appropriate primitives. Double- and triple-zeta bases are developed for elements hydrogen through argon. These new bases offer significant gains over the corresponding Burkatzki-Filippi-Dolg bases at various levels of theory. Using a Gaussian expansion of the basis functions, these bases can be employed in any electronic structure method. Quantum Monte Carlo provides an added benefit: expansions are unnecessary since the integrals are evaluated numerically.
In quantum chemistry (QC) calculations, molecular orbitals are traditionally expanded in a combination of primitive Gaussian basis functions and linear combinations of Gaussian primitives called contracted basis functions DunningJr1989 (). These basis sets cannot express the correct molecular orbital asymptotic behavior but are used in QC calculations to permit analytic evaluation of the two-electron integrals Boys1950 ().
Analytic integral evaluation significantly limits flexibility in basis set choice but is essential for computational efficiency in QC calculations. However, in practice, other basis function forms can be considered since an arbitrary function can be expanded in Gaussians. Of course, the fidelity of this representation is limited. An expansion in a finite number of Gaussians cannot reproduce the exponential decay of the wavefunction at large distances or the Kato cusp conditions Kato1957 () at nuclei, but it can mimic these features over a finite range.
Quantum Monte Carlo (QMC) calculations Foulkes2001 () offer greater freedom in choice of basis functions because matrix elements are evaluated using Monte Carlo integration. Consequently, the correct short- and long-distance asymptotics can be satisfied exactly. For systems with a divergent nuclear potential, Slater basis functions can exactly reproduce the correct electron-nucleus cusp and long-range asymptotic behavior of the orbitals. For calculations on systems with a potential that is finite at the nucleus and has a Coulomb tail, Gauss-Slater (GS) primitives Petruzielo2010 () are the appropriate choice since they introduce no cusp at the origin and reproduce the exponential long-range asymptotic behavior of the orbitals.
Despite shortcomings, traditional QC basis sets have yielded good results. The natural orbitals (NOs) from a post Hartree-Fock (HF) method are a particularly successful form of contracted function Almlof1987 (); Widmark1990 (); Widmark1991 (); Veryazov2004 (). The simplest NO construction involves diagonalizing the one-particle density matrix from a ground state atomic calculation Almlof1987 (). This construction is unbalanced due to obvious bias favoring the atom. More complicated constructions involve diagonalizing the average one-particle density matrix of several systems: atomic ground and excited states, ions, diatomic molecules, and atoms in an external electric field Widmark1990 (); Widmark1991 (); Veryazov2004 (). These constructions produce excellent results, but they are complex.
A simple but general method for constructing basis sets for molecular electronic structure calculations is proposed and tested here. The bases are combinations of the NOs obtained from diagonalizing the one-particle density matrix from an atomic multiconfigurational self-consistent field (MCSCF) calculation and primitive functions appropriate for the potential in the system. The primitives are optimized for the homonuclear dimer in coupled cluster calculations with single and double excitations (CCSD), with the intention of producing a balanced basis set. Importantly, optimal exponents for the primitive functions are shown to depend weakly on the level of theory used in the optimization. Additionally, results show that coupling is weak between primitive functions of different angular momenta. This enables efficient determination of optimal exponents.
The utility of the above construction is demonstrated for the elements hydrogen through argon with the non-divergent pseudopotentials of Burkatzki, Filippi, and Dolg (BFD) Burkatzki2007 (). Since these pseudopotentials are finite at the nuclei and have a Coulomb tail, the GS functions are the appropriate primitives. These pseudopotentials are chosen for demonstrated accuracy in all cases tested and because they are accompanied by a basis set. The BFD basis Burkatzki2007 () serves as a metric for testing the new basis. The benefits of our bases extend to all electronic structure methods tested, including CCSD, HF, the Becke three-parameter hybrid density functional (B3LYP) Becke1993 (), and QMC.
The main area of interest for the authors is QMC. Since QMC results depend less on basis set than traditional QC methods Petruzielo2010 (), only double-zeta () and triple-zeta () bases are presented.
Ii Basis Set
The number of basis functions for each angular momentum follows the correlation consistent polarized basis set prescription of Dunning DunningJr1989 (). and bases appropriate for the BFD pseudopotentials are generated for the elements hydrogen through argon. Since the BFD pseudopotential removes no core for hydrogen and helium, the basis for these elements consists of two S functions and one P function, while the basis consists of three S functions, two P functions, and one D function. Since the BFD pseudopotential removes a helium core for the first row atoms and a neon core for the second row atoms, the remaining elements lithium through argon have the same number of basis functions. In particular, the basis consists of two S functions, two P functions, and one D function, while the basis consists of three S functions, three P functions, two D functions, and one F function.
The bases consist of a combination of contracted and primitive functions. Since the BFD pseudopotentials are finite at the origin and have a Coulomb tail, the GS functions are the appropriate primitives. With the exception of the elements in Group of the periodic table (i.e. H, Li, and Na), the basis for each element includes a single S contraction and a single P contraction combined with an appropriate number of GS primitives. Only two contractions are employed to reduce the computational cost of using this basis in QC calculations. Since elements in Group of the periodic table have only one electron for the BFD pseudopotentials, a single S orbital is the ground state wavefunction, and this can be obtained exactly in HF. Thus, the basis for each element in Group includes a single S contraction, no P contractions, and an appropriate number of GS primitives.
ii.1 Contracted Functions
A contracted basis function is a linear combination of Gaussian primitives:
where are the standard spherical coordinates, is the principal quantum number, is the azimuthal quantum number, is the magnetic quantum number, is a real spherical harmonic, is the expansion coefficient, and is the Gaussian exponent. In practice, the restriction applies.
The exponents of the primitive functions that form the contracted basis functions are determined as follows. For each angular momentum for which a contraction is desired, an uncontracted basis consisting of nine even-tempered primitive Gaussians is generated. For each set of uncontracted Gaussians, the minimum exponent and even-tempering coefficient are varied to minimize the CCSD energy of the atom using a Python wrapper around GAMESS Schmidt1993 ().
An assumption of weak coupling between the different angular momenta underlies the optimization procedure. Consequently, the uncontracted basis for each angular momentum is optimized separately. This optimization is performed by calculating the CCSD energy on an initially coarse grid composed of different minimum exponents and even-tempering coefficients. Once regions of low CCSD energy are identified, a finer grid is used to obtain the final minimum exponent and even-tempering coefficient. In addition to the assumption of weak coupling, two other properties of the problem make this global optimization possible with modest computer resources; low dimensionality of search space and efficiency of atomic CCSD calculations.
Next, an atomic MCSCF calculation in a complete active space (CAS) with the optimized uncontracted basis is performed in GAMESS. For these calculations, all electrons not removed by the pseudopotential are allowed to excite. For helium, the active space consists of the orbitals from the and shells. For beryllium through neon, the active space includes the orbitals from the and shells. For magnesium through argon, the active space is composed of the orbitals from the and shells, with the exception of the D and F orbitals. A subset of the natural orbitals from the MCSCF calculations are used as the contracted functions of our basis.
All atomic calculations are performed in symmetry since GAMESS does not permit imposition of full rotational symmetry. Hence, different components of the same atomic subshell are not necessarily equivalent. Additionally, mixing may occur among orbitals of different angular momenta. For instance, there is mixing of S orbitals with both D and D orbitals. This anisotropy can be removed by averaging the different components of a particular subshell and zeroing out the off-diagonal blocks of the one-particle density matrix Widmark1990 ().
A simpler approach taken in this work is found to produce results of similar quality. For each angular momentum for which a contraction is desired, the NO with that angular momentum which has the largest occupation number is chosen. Additionally, NO elements which do not correspond to the dominant character of the orbital are zeroed out. For instance, an NO with large coefficients on the S basis functions and small coefficients on the D basis functions is considered to be dominated by S character, so the D coefficients are zeroed out. Finally, the NOs are normalized. The NOs selected in this procedure generate the contracted functions for the basis set. The expansions of the contractions are given in the supplementary material supplementary ().
ii.2 Gauss-Slater Primitives
GS functions Petruzielo2010 () are defined as
where is the GS exponent and is the normalization factor. The restriction is imposed for GS functions. For , the GS behaves like a Gaussian:
and for , the GS behaves like a Slater:
Consequently, GS functions introduce no cusp at the origin and can reproduce correct long-range asymptotic behavior of the orbitals.
Unlike Gaussians and Slaters, normalization of GSs has no closed form expression. Nevertheless, normalizing an arbitrary GS is trivial with the following scaling relation between and :
Values for are given in the supplementary material supplementary ().
Since GSs are not analytically integrable, the radial part must be expanded in Gaussians for use in QC programs that evaluate matrix elements analytically. The expansion is
where is the expansion coefficient and is the Gaussian exponent. Notice that the expansion permits the case for which for the GS function. Additionally, the following scaling relations hold for the expansion coefficients and Gaussian exponents:
Once the Gaussian expansions are found for unit exponents, expansions of arbitrary GSs follow immediately from the scaling relations. For QC calculations in this paper, GSs are expanded in six Gaussians. However, if the purpose of the initial QC calculation is to generate crude starting orbitals for QMC calculations in which orbital optimization is performed, it is only necessary to expand GS primitives in a single Gaussian. In this case, the cost of QC calculations is the same for Gaussian and GS primitives. The expansions of GS functions with unit exponent in both one and six Gaussians are given in the supplementary material supplementary ().
As mentioned above, the restriction is imposed for GS functions, instead of the more familiar restriction imposed for Gaussian primitives. This motivates construction of two types of bases. In the first, ANO-GS, the restriction is enforced. In the second, ANO-GSn, for each there can be at most a single GS primitive with a particular . For each additional primitive with a particular , must be incremented.
For example, consider lithium. The ANO-GS basis has one S contraction, one GS-1S function, two GS-2P functions, and one GS-3D function. On the other hand, the ANO-GSn basis has one S contraction, one GS-1S function, one GS-2P function, one GS-3P function, and one GS-3D function.
A caveat to the above definition of the ANO-GSn basis is that GS-2S functions are not permitted since a single GS-2S function will introduce an undesired cusp in the wavefunction. Additionally, the ANO-GS and ANO-GSn basis sets are identical for all elements except lithium and sodium. When the ANO-GS and ANO-GSn basis sets are identical, the basis sets are referred to as a ANO-GS/GSn basis. For both lithium and sodium, the basis sets differ because these systems have no P contractions and instead have a second P primitive for the basis. This primitive is a GS-2P for the ANO-GS basis and a GS-3P for the ANO-GSn basis. Additionally, weak coupling between functions of different angular momentum causes the GS-1S and GS-3D functions in the ANO-GS bases for lithium and sodium to differ from their counterparts in the ANO-GSn bases. However, the optimal exponents differ by less than .
Optimal exponent selection for the GS primitives is discussed now. Instead of optimizing exponents for the atom as was done to generate the contractions, optimization of the GS exponents is performed for the homonuclear diatomic molecule at experimental bond length HubHer79 (); Gur79 (); NIST (); CCCBDB (); Grisenti2000 (); Bondybey1984 (); Aziz1989 (); Fu1990 (); Herman1988 (). This advantageously produces a balanced basis set.
Weak coupling between GS functions of different angular momenta is assumed, so the initial optimization for each angular momentum is performed separately. This assumption is validated in Figure 1, which contains plots of the CCSD energy for Si while varying individual GS exponents in the ANO-GS/GSn basis. Both the curve shape and exponent value which minimizes the energy vary little with fixed exponent value, signifying weak coupling between GS functions of different angular momentum.
The optimization is performed at the CCSD level of theory using a Python wrapper around GAMESS. For each angular momentum, an energy landscape is defined by a grid of primitive exponents ranging from to with spacing. Thorough investigation has revealed that exponents larger than are not optimal for the systems considered. Low lying minima of this energy landscape are then handled with increasingly finer grids until energy changes are less than mH. During this investigation of local minima, all angular momenta are handled simultaneously to account for any coupling effects. Results of this optimization are shown in Figure 2. Optimal exponents for ANO-GS and ANO-GSn bases exhibit a linear trend across each row of the periodic table. For nearly degenerate minima, the exponent following the trend in the figure is chosen as optimal, resulting in energy increase no greater than several mH. The optimal GS exponents are given in the supplementary material supplementary ().
In some cases, the optimal exponents for primitives with the same and are very close. This can lead to large equal and opposite coefficients on these basis functions when constructing molecular orbitals. Numerical problems could result, providing further motivation for the ANO-GSn basis, in which each pair of and is unique. However, all of our tests with the ANO-GS basis have had no numerical problems.
Finally, the optimal primitive exponents are found to depend weakly on the electronic structure method employed in the optimization, as demonstrated in Figure 3 for Si with the ANO-GS/GSn basis. The globally minimizing exponents are nearly equal in different methods. This exponent transferability to different levels of theory is extremely attractive for a basis set.
Section II demonstrates that the ANO-GS and ANO-GSn bases exhibit desirable properties. However, it remains to be shown that these basis sets produce accurate results. Fortunately, the basis set accompanying the BFD pseudopotential serves as a metric for testing ANO-GS and ANO-GSn basis quality. The BFD basis for elements in Groups and of the periodic table has recently been updated BFD (), but the number of functions in the new basis is inconsistent with the correlation consistent polarized basis prescription DunningJr1989 (). Since comparison would be difficult, their published functions are considered in this work.
Figure 4 shows the CCSD total energy gain per electron of the ANO-GS and ANO-GSn bases over the BFD bases Burkatzki2007 () for atoms and homonuclear dimers of hydrogen through argon. Energy gains per electron tend to increase across each row of the periodic table. Both ANO-GS and ANO-GSn bases yield energy gains for most molecules and atoms. The energy gains per electron are generally larger for molecules than for atoms, and larger for the ANO-GSn basis than for the ANO-GS basis. The energy gains for the bases are generally larger than for the bases, as expected, since the energy left to recover becomes smaller as the basis size increases.
The ANO-GS and ANO-GSn bases also produce more accurate CCSD atomization energies than the BFD basis for the homonuclear dimers of hydrogen through argon. Figure 5 shows the fraction of experimental atomization energy recovered in CCSD for the homonuclear dimers which are not weakly bound. The ANO-GS/ANO-GSn basis recovers more atomization energy than the BFD basis for all dimers except those of Group elements. Similarly, the ANO-GSn basis recovers more atomization energy than the BFD basis for the same systems, but the differences are small. The ANO-GSn is on average slightly better than the ANO-GS basis, the largest gains being for F and Cl.
For Group elements, the BFD bases recover more atomization energy in CCSD than do their ANO-GS or ANO-GSn counterparts. This occurs due to inaccurate BFD energies for the atoms, as can be seen in Figure 4. However, as described above, we used the published BFD bases for these elements rather than the updated BFD bases BFD () to maintain consistency.
Finally, improvements of the ANO-GS and ANO-GSn bases extend to other systems and methods. Figure 6 shows the fraction of experimental atomization energy recovered for five systems in the G2 set Curtiss1991 () with the BFD, ANO-GS, and ANO-GSn bases in three quantum chemistry methods. For CCSD, the ANO-GS and ANO-GSn bases outperform the BFD basis for all systems. For sulfur dioxide the improvement due to the ANO-GS and ANO-GSn bases is dramatic: the ANO-GS/GSn result is nearly halfway between the and BFD results, and the ANO-GS/GSn result is nearly halfway between the and BFD results. ANO-GS and ANO-GSn benefits are more prominent in HF and B3LYP: for most systems, the ANO-GS/GSn result is closer to the BFD result than the BFD result, and the ANO-GS/GSn result is closer to the BFD result than the BFD result. Differences between results with the ANO-GS and ANO-GSn bases are small.
Figure 7 shows the fraction of experimental atomization energy recovered using diffusion Monte Carlo (DMC) with the BFD, ANO-GS, and ANO-GSn bases. For each system, the DMC calculations are performed with both a single-configuration state function (single-CSF) reference (DMC-1CSF) and full-valence complete active space reference (DMC-FVCAS). However, for each of the constituent atoms in these molecules, the FVCAS and single-CSF references are equivalent. All DMC calculations are performed with a H time step and trial wavefunction obtained by optimizing Jastrow, orbital, and configuration state function (CSF) parameters (where applicable) via the linear method Toulouse2007 (); Toulouse2008 (); Umrigar2007 () in variational Monte Carlo. The DMC-1CSF and DMC-FVCAS calculations exhibit similar trends to the HF and B3LYP calculation for most systems: the ANO-GS/GSn result is closer to the BFD result than the BFD result, and the ANO-GS/GSn result is closer to the BFD result than the BFD result. Again, differences between results with the ANO-GS and ANO-GSn bases are small.
There are several important points that can be made by comparing the DMC calculations of Figure 7 to the CCSD calculations of Figure 6. First, the DMC results for the atomization energies have a weaker dependence on basis size than the CCSD results. Second, for a given basis set, the most basic DMC calculations, DMC-1CSF, yield superior results compared to CCSD. In addition to yielding superior results, DMC-1CSF calculations have better computational cost scaling than CCSD calculations. Under certain assumptions, the cost of DMC-1CSF calculations scales as Needs2010 (), while the cost of CCSD calculations scales as Helgaker2000 (), where is the number of electrons. However, it is important to note that the prefactor of the scaling is significantly smaller for the CCSD calculations.
Finally, our results are not the first to show that DMC calculations can produce accurate atomization energies. In particular, DMC-1CSF calculations of the entire G2 set have been performed for both pseudopotential and all-electron systems Grossman2002 (); Nemec2010 () and produced excellent results. Additionally, there is good agreement between the pseudopotential and all-electron results with a mean absolute deviation of about kcal/mol over the entire G2 set Nemec2010 (). Although these previous results are very good, there is room for improvement, particularly for the open shell systems. A systematic study with DMC-FVCAS calculations is currently underway in our group, which should produce results to (near) chemical accuracy for all systems in the G2 set.
A simple yet general method for constructing basis sets for molecular electronic structure theory calculations has been presented. These basis sets consist of a combination of atomic natural orbitals from an MCSCF calculation with primitive functions optimized for the corresponding homonuclear dimer. The functional form of the primitive functions is chosen to have the correct asymptotics for the nuclear potential of the system.
It was shown that optimal exponents of primitives with different angular momenta are weakly coupled. This enables efficient determination of optimal exponents. Additionally, it was demonstrated that the particular electronic structure method employed in optimization has little effect on the optimal values of the primitive exponents.
Two sets of and bases, ANO-GS and ANO-GSn, appropriate for the Burkatzki, Filippi, and Dolg non-divergent pseudopotentials were constructed for elements hydrogen through argon. Since these pseudopotentials do not diverge at nuclei and have a Coulomb tail, GS functions are the appropriate primitives.
It was demonstrated that both ANO-GS and ANO-GSn basis sets offer significant gains over the Burkatzki, Filippi and Dolg basis sets for CCSD, HF, B3LYP Becke1993 (), and QMC calculations. Improvements were observed in both total energies and atomization energies. The latter indicates that basis sets providing a balanced description of atoms and molecules were produced by using both the atom and the dimer in the optimization. On average, the ANO-GSn basis is slightly better than the ANO-GS basis, but either is a sound choice.
In the future, these basis sets will be extended to include the transition metals, and, bases will be constructed for all-electron calculations, for which Slater functions are the appropriate primitives.
We thank Claudia Filippi for very valuable discussions. This work was supported by the NSF (Grant Nos. DMR-0908653 and CHE-1004603). Computations were performed in part at the Computation Center for Nanotechnology Innovation at Rensselaer Polytechnic Institute.
- (1) T. Dunning Jr, J. Chem. Phys. 90, 1007 (1989)
- (2) S. Boys, Proc. R. Soc. A 200, 542â554 (1950)
- (3) T. Kato, Comm. Pure Appl. Math 10, 151â177 (1957)
- (4) W. Foulkes, L. Mitas, R. Needs, and G. Rajagopal, Rev. Mod. Phys. 73, 33â83 (2001)
- (5) F. R. Petruzielo, J. Toulouse, and C. J. Umrigar, J. Chem. Phys. 132, 094109 (2010)
- (6) J. Almlöf and P. Taylor, J. Chem. Phys. 86, 4070 (1987)
- (7) P.-O. Widmark, P.-A. Malmqvist, and B. Roos, Theor. Chim. Acta. 77, 291 (1990)
- (8) P.-O. Widmark, B. Joakim, and B. Roos, Theor. Chim. Acta. 79, 419 (1991)
- (9) V. Veryazov, P.-O. Widmark, and B. O. Roos, Theor. Chim. Acta. 111, 345 (2004)
- (10) M. Burkatzki, C. Filippi, and M. Dolg, J. Chem. Phys. 126, 234105 (2007)
- (11) A. Becke, J. Chem. Phys. 98, 5648 (1993)
- (12) See Supplementary Material at http://dx.doi.org/10.1063/1.3551512 for atom specific basis sets, Gaussian fits of Gauss-Slater functions, and data used for making figures.
- (13) M. W. Schmidt, J. A. Boatz, K. K. Baldridge, S. T. Elbert, M. S. Gordon, J. H. Jensen, S. Koseki, N. Matsunaga, K. A. Nguyen, S. Su, T. L. Windus, M. Dupuis, and J. A. Montgomery, J. Comp. Chem. 14, 1347 (1993)
- (14) K. P. Huber and G. Herzberg, Constants of Diatomic Molecules, Molecular Spectra and Molecular Structure Vol 4 (Van Nostrand Reinhold Company, 1979)
- (15) L. Gurvich, I. V. Veyts, and C. B. Alcock, Thermodynamic Properties of Individual Substances, Fouth Edition (Hemisphere Pub. Co., 1989)
- (16) NIST Chemistry WebBook, NIST Standard Reference Database Number 69, edited by P. J. Linstrom and W. G. Mallard (NIST, Gaithersburg, MD, 2005)
- (17) NIST Computational Chemistry Comparison and Benchmark Database, NIST Standard Reference Database Number 101, edited by R. D. Johnson (NIST, Gaithersburg, MD, 2010)
- (18) R. Grisenti, W. Schollkopf, J. Toennies, G. Hegerfeldt, T. Kohler, and M. Stoll, Phys. Rev. Lett. 85, 2284 (2000)
- (19) V. E. Bondybey and J. H. English, J. Chem. Phys. 80, 568 (1984)
- (20) R. Aziz and M. Slaman, Chem. Phys. 130, 187 (1989)
- (21) Z. Fu, G. W. Lemire, G. A. Bishea, and M. D. Morse, J. Chem. Phys. 93, 8420 (1990)
- (22) P. R. Herman, P. E. Larocque, and B. P. Stoicheff, J. Chem. Phys. 89, 4535 (1988)
- (23) http://www.burkatzki.com/pseudos/index.2.html
- (24) K. Irikura, J. Phys. Chem. Ref. Data 36, 389 (2007)
- (25) Y. R. Luo, Comprehensive Handbook of Chemical Bond Energies (CRC Press, 2007)
- (26) L. A. Curtiss, K. Raghavachari, G. W. Trucks, and J. A. Pople, J. Chem. Phys. 94, 7221 (1991)
- (27) D. Feller and K. A. Peterson, J. Chem. Phys. 110, 8384 (1999)
- (28) J. Toulouse and C. Umrigar, J. Chem. Phys. 126, 084102 (2007)
- (29) J. Toulouse and C. Umrigar, J. Chem. Phys. 128, 174101 (2008)
- (30) C. Umrigar, J. Toulouse, C. Filippi, S. Sorella, and R. Hennig, Phys. Rev. Lett. 98, 110201 (2007)
- (31) R. J. Needs, M. D. Towler, N. D. Drummond, and P. López Ríos, J. Phys. Condens. Matter 22, 023201 (2010)
- (32) T. Helgaker, P. Jorgensen, and J. Olsen, Molecular Electronic-Structure Theory (John Wiley & Sons LTD, Chichester, England, 2000)
- (33) J. C. Grossman, J. Chem. Phys. 117, 1434 (2002)
- (34) N. Nemec, M. D. Towler, and R. J. Needs, J. Chem. Phys. 132, 034111 (2010)