Simple formalism for efficient derivatives and multi-determinant expansions in quantum Monte Carlo
We present a simple and general formalism to compute efficiently the derivatives of a multi-determinant Jastrow-Slater wave function, the local energy, the interatomic forces, and similar quantities needed in quantum Monte Carlo. Through a straightforward manipulation of matrices evaluated on the occupied and virtual orbitals, we obtain an efficiency equivalent to algorithmic differentiation in the computation of the interatomic forces and the optimization of the orbital paramaters. Furthermore, for a large multi-determinant expansion, the significant computational gain recently reported for the calculation of the wave function is here improved and extended to all local properties in both all-electron and pseudopotential calculations.
In the application of quantum Monte Carlo (QMC) methods to electronic systems in real space Foulkes et al. (2001); Kolorenč and Mitas (2011), one computes expectation values of random variables depending on where is a variational ansatz of the exact wave function, and are the coordinates of the electrons. The total energy is for instance estimated as the expectation value of the local energy , where is the Hamiltonian of the system. Other examples are the derivatives of and with respect to the atomic coordinates or the variational parameters, which are needed to evaluate the interatomic forces or to optimize the wave function , respectively. It is very important to compute these quantities efficiently because of their large number, typically or , and also because they must be calculated for the many steps of the sampling process needed to collect significant statistics on the quantity of interest.
Here, we propose a general, simple and efficient method to compute these properties for the most common ansatz of found in the literature, namely, a sum of Slater determinants times a Jastrow correlation factor ,
where is a reference determinant (the Hartree-Fock solution, for example) of spin-orbitals (one-body functions depending on the position and the spin) and are excited determinants. The formulation we propose here relies on the fact that one-body operators and derivatives can be written using the same compact expression, namely, the trace of the product of two matrices, when acting on a one-determinant Jastrow-Slater wave function. Consequently, derivatives, local quantities, and derivatives of local quantities are easy to obtain for the reference and can be very simply and efficiently updated when is replaced by an excited determinant . In practice, the method requires only the calculation of molecular orbitals and their derivatives with respect to some parameter related to the quantity being computed (e.g. the local energy or the derivative of the local energy) for all positions of the electrons. This information is stored in rectangular matrices of size where is the total number of orbitals (occupied in the reference plus virtual in case of a multi-determinant expansion). Such one-body quantities are very simple to code and already available in many QMC codes. One has then to apply the few formulas we develop that involve inverses and products of selected square submatrices. Because these formulas are simple and common to all the properties introduced above, this method requires a minimal programming effort.
Our theoretical framework is very efficient in the regime of small and large . In case of a single-determinant wave function so often used in QMC calculations and for small , our formulas can for instance be used to achieve great computational savings in the evaluation of first-order derivatives such as the internuclear forces using zero-variance estimators and the space-warp transformation Umrigar (1989); Filippi and Umrigar (2000). We recover in fact the same scaling of which was obtained in Ref. Sorella and Capriotti, 2010 with the use of algorithmic differentiation (AD). Here, however, we do not need to employ AD since we have at our disposal a very simple and transparent formula for the derivative of the energy. A similar favorable scaling is also obtained in the computation of the derivatives needed to optimize all orbital parameters in the determinantal component of the wave function.
In the large regime, we provide a very compact formula for the computation of any local quantity and its derivatives evaluated for an excited determinant, , by exploiting that differs from by a few orbital excitations. In the calculation of , we know from the work by Clark et al. Clark et al. (2011) that, once has been computed, can be updated using matrices of order where is the order of the excitation. Here, because in our formulation all the properties introduced above are treated on an equal footing, the favorable asymptotic scaling of , instead of or , obtained Clark et al. (2011) in the calculation of , and applies to all properties including and its derivatives. For and , we also avoid an additional term, where is the number of active single excitations. Importantly, it holds in all-electron and pseudopotentials calculations alike. We stress that the formulas we propose here are general and that we do not employ strategies which exploit further assumptions on the wave function (e.g. the possible equality of some determinants of a given spin in the expansion Sce ()), which could of course be introduced to further improve the scaling.
The remainder of this paper is organized as follows: In Section II, we introduce the formulas for a local operators acting on a single determinant and its derivatives, which are extended to the multi-determinantal case in Section III. In Section IV, we present further details on how the expressions are modified in the presence of the Jastrow factor or non-local pseudopotentials and give numerical demonstration of our formulation in Section V. The formulas for the second derivative of an excited determinant are given in Section. VI.
Ii Derivatives and one-body operators
We begin with a single (Hartree-Fock) Slater determinant with occupied orbitals, , and denote it as
with the Slater matrix ,
The orbitals and the electrons correspond respectively to the columns and the lines.
Many important quantities like the local energy, the drift or the internuclei forces involve the derivative of a Slater determinant with respect to some parameter . The following identity will be the basis of all subsequent developments
where the dependence of with respect to the parameter is implicit. For a proof, one can for example resort to simple chain rule and differentiate with respect to the elements of
The second equality comes from the expansion of the determinant in minors. If is the first coordinate of the first electron , one obtains the corresponding component of the drift
where the matrix is zero with the exception of the first row. If one is interested in computing the interatomic forces, one needs to evaluate the derivative of with respect to the atomic coordinates and can employs the same formula with where is an atomic coordinate. Derivatives of a Slater determinant with respect to any variational parameter of the orbitals are also useful for optimization purposes.
Importantly, the application of a one-body operator to a Slater determinant can also be written as a first-order derivative for an appropriate choice of the matrix . This observation will be at the core of the very efficient computation of local quantities and their gradients for single- and large multi-determinantal wave functions. To show this, we consider the one-body operator
where is an operator which acts only on a function of . Applying the operator to the determinant as
and expanding the product inside the sum, we have
which is the sum of all mono-excitations obtained by replacing in turn each orbital with . It is easy to check that
where the derivative is taken at and
To prove the first identity in Eq. (10), we just have to perform the derivative of
with respect to and use the multi-linearity of the determinant. The second identity follows from Eq. (4). A very important example is the kinetic energy operator with
Eq. (10) then holds with the definition
In other words, the Laplacian can be written as a first-order derivative when applied to a Slater determinant.
Expression (4) or (10) can be generalized to wave functions including a Jastrow factor and to other operators like a non-local pseudopotential. The corresponding matrices are easy to write and will be given later.
ii.1 Second-order derivatives
The compact trace expressions of a local quantity (Eq. 10) offers the advantage that its derivative with respect to a parameter can be straightforwardly written as
where and are the matrices of the derivatives of the elements of and , respectively, and the matrix is defined as
This can easily be shown by using and the cyclic property of the trace.
Therefore, to compute the derivative of a local quantity with respect to many parameters, one evaluates and stores the matrix at a cost proportional to and then computes its derivatives with the cost for each parameter being at most . For instance, in the computation of the derivatives of with respect to the nuclear coordinates, this procedure allows the efficient computation of the forces with a cost per Monte Carlo step proportional to the one of the energy: Since the number of atoms scales linearly with the number of electrons, the cost of computing the forces at each Monte Carlo step is .
The same expression (Eq. 15) can be used in an orbital optimization run to efficiently compute the derivatives of the local energy with respect to the orbital variations. If we consider where is occupied in the original determinant and unoccupied, the derivative of a local quantity with respect to is
where the derivative is taken at and the rectangular matrix
is computed from the and rectangular extensions of and to the virtual orbitals. To show this, we simply note that all elements of the matrices and are zero with the exception of the row which contains and , respectively. Then, if the number of orbital variations is (which equals the number of determinants created via single excitations ), the cost of evaluating the matrix is only . In a standard implementation, one would instead compute the full inverse matrix of the corresponding mono-excitation to obtain for example the derivatives with respect to the electron positions in the kinetic energy with a cost proportional to . Therefore, also in the optimization of the determinantal component as in the case of the interatomic forces, our formulation leads to total cost of estimating the needed quantities proportional to the cost of computing the energy, , since grows at most like .
Also in this case, however, if and denote two sets of variables and one set, for instance , is significantly smaller than the other, it is computationally convenient to group the matrices differently and precompute the matrices , followed by the evaluation of the trace with the more numerous .
Iii Multiple excitations
We now consider multiple excitations of the original Slater determinant and deduce all subsequent formulas from Eq. (10) where the derivative is taken at .
iii.1 Determinant of a multiple excitation
If columns of are modified (in an excitation of order ), the new Slater determinant is
To compute , we introduce the projector on the space of columns which are modified, namely, a diagonal matrix such that if the orbital has been modified and zero otherwise. For example, if only the first and third column of and are different,
Then, since , we have
The matrix is the identity in the block projected by because all columns of in the same block are zero. The determinant of the total matrix is therefore the determinant of the remaining block which is :
where the subscript is introduced to specify that the determinant is computed for the block identified by the projector . We note that this formula can also be proved using the determinant lemma Clark et al. (2011).
In practice, once we have computed and , we can evaluate and store the matrix where is the rectangular extension of to the unoccupied orbitals. For a -order excitation, we only have to compute the determinant of which is a simple submatrix of with dimension . This submatrix is built by selecting the coefficients such that is the index of a substituted orbital and is the index of an excited orbital. For example, if and are the list of excitations, the matrix is
Note that the first block composed of the first columns of corresponds to occupied orbitals and is the identity matrix. It is never used and does not have to be stored. In practice, one needs to compute only the submatrix where is the number of virtual orbitals present in the multi-determinant expansion.
iii.2 One-body operator applied to a -excited determinant
Applying formula (10) to a new determinant , we have
where is built with different orbitals and is the corresponding new matrix . Note that, similarly to , the columns of corresponding to non-excited orbitals are zero. Using equation (23),
and computing the derivative at , we have
where we used again Eq. (5) and defined
The matrix is given in Eq. (16) and is such that . We have omitted the index in the trace above since yields the same result as the complete trace, which runs over additional zero matrix elements. We recall that the same expression (26) also applies to the logarithmic derivative where and, correspondingly, entering in is given by .
In practice, we again need to compute the rectangular matrix (18). For any -order excitation, is a simple square submatrix of , which is built in the same way as is built from . Then, one has to perform the trace of the inverse of the matrix times the matrix . The cost of this calculation is of order due to the computation of the inverse. For example, if and are the list of excitations, Eq. (26) becomes
For a mono-excitation , projects on a one-dimensional space. is therefore a scalar, which is equal to thanks to Eq. (23), is the matrix element , and where is a mono-excitation parameter introduced in equation (17). Using Eq. (26), we recover that for a mono-excitation
We note that the first columns of are identically zero and, as in the case of the matrix , do not have to be stored. The matrix elements of should only be computed for the lines and columns corresponding to the active single excitations, which are in general fewer than the product of the occupied active and virtual orbitals. The cost is if this product is evaluated from the left to the right, while it is if one starts from the right. If the matrices and the corresponding are sparse, the cost is smaller. In particular, for the drift, these matrices have only one non-zero row and the additional cost of evaluating the derivative with respect to the coordinates of one electron is .
Iv Jastrow factor, pseudopotentials, and other expressions for
When a Jastrow factor is included,
and the expression of the matrix in a local quantity must be modified to account for the presence of the Jastrow factor.
We begin with the local kinetic energy,
Following the derivations in Section II, we identify a generalization of the operator as the operator within the square brackets and obtain
The local kinetic energy can then be written as
It is possible to cast the contribution of the potential to the local energy in a similar form, starting from the more complicated non-local component
where the summations over , , and run over the electrons, the nuclei, and the angular components of the non-local pseudopotential, respectively. For each electron coordinate, the integral is over a sphere centered on a nucleus with radius given by the electron-nucleus distance and the angle is between the vectors and . In QMC, the integral is computed as a sum over quadrature points characterized by weights and unit directions ,
where and is the angle between and . In general, a different number of angular components and quadrature points can be used for the different atom types. We then identify the matrix as
In analogy to the treatment of the Laplacian of the Jastrow in the local kinetic energy (Eq. 31), we can rewrite the contribution of the local potential as
The complete matrix in the trace expression of the local energy is the sum of the kinetic (Eq. 31) and potential (Eqs. 36 and 39) contributions. From the rectangular extensions and of the and matrices to the unoccupied orbitals, one can compute the local energy of any multiple excitation at low cost and, therefore, of any wave function given by a CI expansion times a Jastrow factor,
where is the reference Slater matrix computed from the occupied orbitals and is a -order excited Slater matrix. Once we have computed (Eq. 32), , , and , the local energy reads
where the matrices and are submatrices of and , respectively.
In Appendix A, we give the expressions of the derivatives of the matrices and with respect to the atomic coordinates (in particular, the formula for the derivatives of ) needed in the computation of the interatomic forces. We also discuss how to efficiently evaluate the additional terms in the force estimator introduced by the use of the space-warp transformation on the electron Umrigar (1989); Filippi and Umrigar (2000), which also require the derivatives of and with respect to the electronic coordinates. From the extension of these matrices to the unoccupied orbitals, one can easily compute the forces for a general multi-determinant Jastrow-Slater wave function.
Before presenting numerical examples, we now discuss the computational cost of a typical QMC run using these formulas. We assume that we compute the energy in either variational or diffusion Monte Carlo after performing a full sweep over the electrons in an all-electron move or one-elecron moves, and give the total cost. The evaluation of , , and all needed matrices (3 for the drift and one for the local energy) scales as . The total cost act () of the corresponding matrices is , and the use of formula (26) for determinants O(). The overall cost to build the sampling process is then where . Note that the dependence on is times smaller than the one presented in Ref. Clark et al., 2011, which is of course a significant gain when is large. Assuming now that this scaling can be simplified to since .
V Numerical examples
We demonstrate the formulas above on the CH molecular series with =4-60. We employ the CHAMP code Cha () with scalar-relativistic energy-consistent Hartree-Fock pseudopotentials and the corresponding cc-pVDZ basis set Burkatzki, Filippi, and Dolg (2007); Hps (). The Jastrow factor is limited to a simple two-body electron-electron term and the single determinant is built from Hartree-Fock orbitals.
The low computational cost of the derivative expression (Eq. 15) in the VMC calculation of the interatomic forces for a one-determinant Jastrow-Slater wave function is demonstrated in Fig. 1: For the largest system considered here which includes 122 atoms, computing all interatomic gradients costs less than 4 times a VMC simulation where one only evaluates the total energy. A similar factor has been reported in Ref. Sorella and Capriotti, 2010 where the forces were however evaluated with the aid of algorithmic differentiation (AD). Here, we demonstrate that a simple algebraic manipulation of the quantities needed to compute the forces leads to transparent, simple formulas to implement and an equivalent computational gain to the use of AD. We note that the ratio of the CPU time of evaluating the local energy and the interatomic forces to the time of computing the energy alone should asymptotically be constant: The very weak linear dependence on the number of atoms (electrons) observed here is due to the term in the computational cost being more important in the energy than in the force calculation, at least at these system sizes.
We illustrate the gain achieved in the application of the same derivative expression to the orbital optimization of a one-determinant Jastrow-Slater wave function (Eq. 17) in Fig. 2. For each system, we consider all possible orbital variations and compute the local energy together with the quantities and needed in the linear optimization method Umrigar et al. (2007). The ratio of the cost of such a VMC simulation to the cost of only evaluating the local energy should not grow with system size: can be straightforwardly obtained from and the cost of calculating for all possible orbital variations is proportional to as discussed in Section II.1. We find that this ratio remains well below 4 for system sizes leading to as many as orbital variations.
Finally, we demonstrate the speedup in a VMC simulation performed using expression (Eq. 26) to compute a local operator acting on a multi-determinant wave function. We focus on the local energy, which is evaluated after all the electrons have been moved once, and employ the same formula also in the computation of the gradient with respect to the coordinates of the electron being moved during the sweep over all the electrons. For CH, CH, and CH, we generate a set of doubly excited determinants in either the up- or the down-spin component, treat all up- and down-spin determinants as distinct, and excite also from the core. Additional computational saving can therefore be achieved by exploiting that different excitations may share the same spin component Sce () or by limiting the number of active orbitals. For CH, we also investigate the use of triple excitations in either the up- or the down-spin determinants.
In Fig. 3, we present three different measures of speedup. In the left panel, we compare with the results presented in Ref. Clark et al., 2011 (see their Fig. 3 and green curve) and only estimate the cost of computing the wave function, the drift, and the relevant matrix updates (not the orbitals) with respect to the standard method of computing and updating the inverse matrices of all determinants. In agreement with Ref. Clark et al., 2011, we find that the ratio between the two computational costs increases quickly with the number of determinants before settling to a value, which we however find to be greatly dependent on the used machine and compiler. A more direct comparison would require further knowledge of the separate performance of the algorithms employed in the standard and improved calculation. The speedup for more than approximately 100 determinants ranges between about 10 and 100 for the systems studied here.
The central panel represents a more realistic assessment of the formulas presented, showing the ratio of the time of a complete VMC computation of the energy with the standard and the new algorithm. The speedup measured in this way is rather comparable to what reported in the left panel, indicating that other parts of the code either scale similarly or do not affect the overall ratio. As expected, the gain is a bit smaller for triple than for double excitations due to the larger dimension of the matrices needed to evaluate the local energy of a higher excitation (Eq. 26). Finally, the right panel allows a comparison with the way the speedup is measured in Ref. Sce, , where the cost of the improved calculation is compared to times the cost of a run with only one determinant, namely, . The higher values obtained with this measure, however, are not the speedups one gains in reality in comparison with the standard method, which is in fact much faster than just times the cost of a simulation with a mono-determinantal wave function.
Vi Second-derivative of a -excited determinant
We compute here the derivative of a local quantity applied to an excited determinant . This is the equivalent of formula (15) applied to , and is obtained by differentiating expression (26) with respect to ,
where . With the use of chain rule, it is straightforward to show that
so that . To evaluate these rectangular matrices, we need to extend the computation of and (Eq. 15) to the virtual orbitals while other relevant matrices like and are already available from the computation of the excited determinant and the corresponding local quantity. The matrices and are then simple submatrices constructed from the elements whose row and column indices correspond to the substituted occupied and the excited orbitals, respectively. We note that is the matrix of second derivatives of the local quantity with respect to and the mono-excitation parameter (Eq. 17).
It should be apparent by now that, in practice, one needs to calculate the product of with other matrices as in , , , etc. and that these matrix products constitute the building blocks of the second derivatives and all other quantities derived so far. Additionally, in the computation of the second derivatives, it might be computationally advantageous to evaluate the products , , and as detailed above and in Section II.1. The formulas more explicitly written in terms of these products and therefore closer to the actual implementation are given for clarity in Appendix C .
Since the computational cost of building is of order , the total cost of evaluating the last term in expression (42) for a multi-determinant wave function typically becomes where is the number of excited determinants. The same scaling will also characterize higher-order derivatives. If represents the coordinate of one atom, the final cost to construct the gradient with respect to the nuclear coordinates is
If the number of active occupied orbitals , the computation of the three terms in Eq. 43 can be carried out in the same way as discussed before act () for the matrix . For a single-determinant wave function or a small expansion with and/or small, we recover the cost described in Section II.1.
Vii Concluding remarks
We have presented general and simple formulas to efficiently compute derivatives of wave functions, local quantities, and their derivatives needed in QMC simulations, when the wave function is written as a Jastrow factor times an expansion of Slater determinants.
The simplicity of the formulas stems from the fact that a derivative of one-determinant wave function and a local quantity such as the local energy are treated on an equal footing and expressed as a trace of matrices which only require the computation of one-body functions (molecular orbitals) and their derivatives. The extension of these formulas to excited determinants is straightforward: One evaluates the matrix elements also for the virtual orbitals (unoccupied in the reference) and computes products involving the resulting rectangular matrices, sub-matrices and their inverses in the spirit of what Clark et al. Clark et al. (2011) had developed for the calculation of the multi-determinant wave function. Furthermore, our formulation allows an easy generalization to higher derivatives. Regarding the efficiency, it leads to significant gains for large when one computes many local properties and/or derivatives in a practical simulation.
Appendix A Interatomic forces
a.1 Derivatives with respect to nuclei positions
To calculate the derivative of the local energy and wave function with respect to the nuclear coordinates, we need to evaluate the corresponding matrices and in addition to the logarithmic derivatives of the Jastrow factor. The matrix elements of are simply the derivative of the single-particle orbitals, which we expand on an atomic basis as
where is the number of basis functions on atom . Then, we obtain
Consequently, since the gradients of the basis functions are also needed to calculate , the computation of only requires quantitites which are normally evaluated in sampling the local energy. Similarly, the computation of (Eq. 32) requires derivatives of the basis functions, most of which have already been evaluated for the local energy, with the exception of the off-diagonal components of the hessian and the gradient of the laplacian of . We do not report here the relatively simple expression of but focus on the somewhat more complicated (Eq. 36).
In taking the derivative of with respect to the nuclear coordinates, we need to consider its explicit dependence on the nuclear coordinates as for instance in (Eq. 45), as well as the implicit dependence through the quadrature points. Therefore, we have
where we simplified the notation as . Therefore, differentiating the term in the local energy due to the non-local potential results in a very compact formula (instead of the multiple expressions presented in Ref. Badinski and Needs, 2007). This only requires the gradients of the orbitals and Jastrow factor with respect to the electronic and nuclear positions (Eq. 45) computed at the quadrature points, and simple quantities such as some geometrical terms and the derivatives of the radial components of the non-local potentials.
a.2 Warped coordinates
An improved estimator of forces (and also other observables) is obtained through the use of warped coordinates Umrigar (1989); Filippi and Umrigar (2000) which, as detailed in Refs. Assaraf and Caffarel, 2003; Sorella and Capriotti, 2010, introduces additional terms in the force estimator: For any component of the force, one also needs to compute and , where the gradient is taken with respect to the electron coordinates. The vector field depends on the electron and nuclear positions, and is different for the force components of the different atoms. These two terms can be written as first derivatives of and
When is a single determinant times a Jastrow factor , the first term (48) is
where the coefficients of the matrix are
When the determinantal component of is a sum of determinants, we simply need to compute the rectangular extensions and of the matrices and . Again, calculations of (48) and (49) are then straighforward using expressions (26) and (42). Note that, if we want to keep the calculation of and of order , the vector field should be localized around the atom whose force we are evaluating, i.e. when the distance between the electron and the atom we are considering is larger than a given threshold. This is in fact how the space-warp transformation was introduced in Refs. Umrigar, 1989; Filippi and Umrigar, 2000.
Appendix B First derivative using the Sherman-Morrison-Woodbury formula
We can also obtain (26) using the Sherman-Morrison-Woodbury formula instead of performing the derivative of (23). The calculation will be a bit longer and less straighforward, but allow us to better understand the relationship with the approaches followed in Refs. Clark et al., 2011 and Sce, .