Cubic-scaling iterative solution of the Bethe-Salpeter equation for finite systems
The Bethe-Salpeter equation (BSE) is currently the state of the art in the description of neutral electron excitations in both solids and large finite systems. It is capable of accurately treating charge-transfer excitations that present difficulties for simpler approaches. We present a local basis set formulation of the BSE for molecules where the optical spectrum is computed with the iterative Haydock recursion scheme, leading to a low computational complexity and memory footprint. Using a variant of the algorithm we can go beyond the Tamm-Dancoff approximation (TDA). We rederive the recursion relations for general matrix elements of a resolvent, show how they translate into continued fractions, and study the convergence of the method with the number of recursion coefficients and the role of different terminators. Due to the locality of the basis functions the computational cost of each iteration scales asymptotically as with the number of atoms, while the number of iterations is typically much lower than the size of the underlying electron-hole basis. In practice we see that , even for systems with thousands of orbitals, the runtime will be dominated by the operation of applying the Coulomb kernel in the atomic orbital representation
Ab initio simulation of optical spectra is an essential tool in the study of excited state electronic properties of solids, molecules and nanostructures. For finite systems time-dependent density functional theory (TDDFT) Runge and Gross (1984) based on local or semi local functionals is widely used. However, TDDFT fails in certain cases, notably for charge transfer excitations Casida et al. (2006) which are essential in, e.g., photovoltaic applications. An alternative to TDDFT is Hedin’s approximation Hedin (1965) followed by the solution of the Bethe-Salpeter equation (BSE) Onida et al. (2002). Based on many-body perturbation theory Fetter and Walecka (1971); Abrikosov et al. (1963), the /BSE method is a more systematic approach than TDDFT, and it has been shown to give a qualitatively correct description of excitonic effects in solids Hanke and Sham (1980); Onida et al. (2002) and charge transfer excitationsBechstedt et al. (1997); Blase and Attaccalite (2011).
The Bethe-Salpeter equation is a Dyson-like equation for the two-particle Green’s function, or equivalently for the four-point polarizability Albrecht et al. (1997). Within the field of electronic structure theory, developments of the BSE can be traced back to the beginning of sixties Abrikosov et al. (1963); Noziéres (1964); Sham and Rice (1966), with the first ab initio implementations appearing a couple of decades later Shirley and Louie (1993); Onida et al. (1995); Benedict et al. (1998). The GW/BSE method has been implemented using plane waves and real space grids, Albrecht et al. (1997); Benedict et al. (2003); Palummo et al. (2004); Garcia-Lastra et al. (2011); Rocca et al. (2010); Grüning et al. (2011); Ramos et al. (2008); Gatti and Sottile (2013); Benedict and Shirley (1999); Sitt et al. (2007), linear combination of atomic orbitals (LCAO) Rohlfing and Louie (1998); Casida (2005); Baumeier et al. (2012); Bruneval and Marques (2013); Boulanger et al. (2014) and within the FLAPW framework Ambrosch-Draxl et al. (2006). In practice, the standard way of solving the BSE is by converting it to an effective eigenvalue problem in a particle-hole basis. Since the size of the particle-hole basis scales quadratically with the number of atoms , a straightforward diagonalization of the BSE Hamiltonian will scale like . This very steep scaling makes it difficult to treat large scale systems like nanostructures and realistic models of organic photovoltaic devices. For such systems an improved scaling with the number of atoms would be highly beneficial.
Avoiding an explicit diagonalization of the BSE Hamiltonian can be done by using an iterative method to obtain a few low-lying transitions (e.g. the Davidsson method Demmel et al. (2000); Valiev et al. (2010)), or to directly aim for the spectrum, which can be done frequency by frequency using for example the GMRES method Saad (2003a); Demmel et al. (2000); Koval et al. (2010) or for the full spectrum with the Haydock recursion scheme Haydock et al. (1972); Rocca et al. (2008); Grüning et al. (2011). Another option is to go over to the time domain and solve the equations of motion by time propagation Schmidt et al. (2003); Attaccalite et al. (2011). These methods only require matrix-vector products to be performed, and assuming that the number of iterations, or time steps, is much smaller than the size of the particle-hole basis, the asymptotic scaling will be . However, setting up the BSE Hamiltonian explicitly will still have the cost of , and to avoid this, the matrix-vector products need to be performed on the fly, without explicitly constructing the matrix.
Benedict and Shirley made use of the Haydock recursion method to compute optical spectra in the Tamm-Dancoff approximation (TDA) without actually computing the whole BSE Hamiltonian Benedict and Shirley (1999). This was achieved by using, in addition to the particle-hole basis, a real space grid product basis , in which the screened direct Coulomb interaction is diagonal (the exchange term is sparse in this representation). The scaling of the algorithm was reported to be with the number of atoms, however, a more careful analysis shows that it can be made to scale like by a proper ordering of the loops Benedict ().
This favorable scaling is heavily based on the use of a real-space representation for the particle-hole states. Similar gains can be obtained with the use of LCAO basis sets, where the same asymptotic scaling can be obtained by making use of the sparsity in both direct and exchange Coulomb interaction terms. It should be mentioned that by using additional assumptions of locality, which implies screening away Coulomb matrix elements between basis functions that are spatially far from each other, one could even achieve linear scalingOchsenfeld et al. (2007), however, the BSE has so far not been treated with these methods. Another linear scaling approach to many-body theory methods has recently been published by Baer and coworkers that make use of stochastic wave functions together with time propagation Neuhauser et al. (2013, 2014); Rabani et al. (2015).
In the present publication, we will not venture into the realm of linear scaling but rather make use of the more standard iterative methods that, together with locality, lead to cubic scaling with the number of atoms. We present an iterative algorithm to obtain the BSE spectrum for molecules, making use of localized basis sets both for orbitals and products of orbitals. To go beyond the TDA a pseudo-Hermitian version the Haydock recursion scheme Grüning et al. (2011) is used. We derive the recursion relations for general matrix elements of a resolvent and show how they translate into continued fractions. Our method has been interfaced to the SESTA code Soler and etal (2002) which is widely used for ground state density functional theory calculations (as an alternative, we can do all-electron calculation using numerical orbitals in an in-house implementation). For the case of the benzene molecule, as a prototypical example, we present a detailed study of the convergence properties of the iterative method, both within the Tamm-Dancoff approximation and for the full BSE. In particular, we study the effect of different termination schemes. Furthermore, for the sake of clarity, we provide a detailed account of the BSE method itself using our notation. Our algorithm scales asymptotically like with the number of atoms and uses memory. We present proof of principle calculations of our implementation, where the runtime is seen to be dominated by the scaling operations for systems up to several thousand orbitals, and discuss some of the bottlenecks and possible improvements of the scheme.
ii.1 Quasiparticles with the approximation
Before the BSE can be set up and solved, the quasiparticle energies must be obtained from a preceding calculation Hedin (1965). Since the details of our implementation have been published elsewhere Foerster et al. (2011); Koval et al. (2014), we will here only give a brief summary of the method. The poles of the one-particle Green’s function for an -electron system occur at the ground and excited states of the corresponding +1 and -1 systems, that is at the electron addition and removal energies. Hedin’s approximation connects the (irreducible) polarizability , the non-interacting and interacting Green’s functions ( and ), the screened interaction , and the self energy in a set of closed equations
In our implementation of the method the Green’s function is expanded in a basis of numerical atomic orbitals (AO) of finite support
Here and in the following we explicitly write out the overlaps when they appear, the matrix quantities are always contravariant and the placement of the indices as subscripts or superscript is arbitrary. With this representation of the Green’s function , we see that the polarizability (1) involves products of AOs . These products are expanded in an (auxiliary) product basis of localized numerical functions Foerster et al. (2011); Koval et al. (2014)
where the expansion coefficients and the product basis functions are determined by numerically expanding the products around a common center and removing redundant functions by a diagonalization based procedure Foerster (2008). Only overlapping pairs of orbitals are considered, making the matrix of expansion coefficients sparse when using AOs of local support. The indices will be reserved for atomic orbitals and for product functions of atomic orbitals in the following. Using the product basis, the polarizability is represented similarly to the Green’s functions (5)
where the overlap of the product functions appears. Similarly, it can be seen from equation (2) that the matrix elements of the bare and screened Coulomb interaction must be expanded in the product basis, while the self-energy is expanded in the AO basis. For finite systems both the and the product basis can be chosen as real.
The frequency-dependent quantities like and are represented on an even-spaced, real-axis, frequency grid via their corresponding spectral functions. An imaginary part of the energy is added in the Green’s function and polarizability , that is sufficient to ensure their smoothness on the chosen frequency grid. The convolutions of spectral functions implied by equations (1) and (3) are computed via fast Fourier transforms. Due to the the fast convolutions and the locality of the product basis set, the asymptotic scaling of the algorithm is with the number of atoms Foerster et al. (2011). Finally the Dyson equation (4) is directly solved for each frequency to obtain . The quasiparticle energies are poles in and can in certain cases be determined from inspection of the density of states. This does not give the quasiparticle wave function, however. In this paper we adopt the standard way of proceeding and assume that the Kohn-Sham Kohn and Sham (1965) (KS) or Hartree-Fock (HF) eigenfunctions that are used to construct the zeroth order Green’s function are good approximations to the quasiparticle states, so that they can be kept fixed and only the quasiparticle energy corrected. We will here only consider the so-called approximation where a single iteration of the equations is performed without self-consistency. We focus on the KS “starting point” in this subsection. The KS Hamiltonian is
with the kinetic energy, the external potential, the Hartree potential and the exchange-correlation potential. The KS eigenfunctions are expanded in the AO basis
where are the eigenvectors of the generalized eigenvalue problem
If we additionally assume that interacting Green’s function is diagonal in the KS eigenstates , the Dyson equation (4) reduces to a set of scalar equations
where we have subtracted the exchange-correlation potential in order to be able to work with the KS eigenvalues. The (assumed real) poles are then found by identifying the zeros of the denominator, either by a graphical solution if the full frequency-dependent quantities are available, or more commonly, by an expansion of around , which leads to
Since we have access to the full frequency dependence of the self energy we can use the graphical method, which in principle is more accurate and also has the advantage that problems with satellite peaks can be avoided Ljungberg et al. (). For comparison purposes we will also make use of the simpler equation (12).
ii.2 Optical spectra with the Bethe-Salpeter equation
The directionally averaged absorption cross section of a molecule is given by
where is the dynamical dipole polarizability tensor given by
The interacting density response function, or reducible polarizability, is defined in the time domain as a functional derivative of the density with respect to the change of the external potential: . Numbered bold indices, , refer to space, spin, and time coordinates, whereas plain numbered indices contain space and spin, . is a two-point quantity and it is directly connected to the non-interacting density response in RPA or in TDDFT with semi-local functionals Petersilka et al. (1996). However, when the Hamiltonian becomes non-local in space (as in the case of TDHF, TDDFT with hybrid functionals or Hedin’s approximation) one must first find the retarded four-point polarizability , and then obtain the two-point one using the relation (see appendix A).
The four-point polarizability satisfies the Bethe-Salpeter equation as derived in appendix A. In the frequency domain the BSE can be written
with the non-interacting four-point polarizability and
the BSE kernel. Already here the approximation has been made that the screened interaction is independent of the frequency. Introducing an orthonormal two-particle basis that has the representation in terms of the quasiparticle spin orbitals, we can expand as
with the matrix elements given by
is expanded similarly. This leads to the matrix equation
Equation (19) has to be inverted for each frequency which is computationally cumbersome. Fortunately, with certain approximations, it can be reformulated as an effective eigenvalue problem that only has to be solved once. To proceed with this we choose as our one-particle states the quasiparticle states in which the interacting Green’s function is assumed to be diagonal. This leads to being diagonal in the two-particle basis
where we introduced the frequency-independent BSE Hamiltonian
The matrix is non-Hermitian. If we solve for its right eigenvectors and eigenvalues
and define expansion coefficients of the eigenvectors in terms of the the two-particle basis , we can obtain a spectral representation of the interacting polarizability as
Here the overlap of the right eigenvectors appears because the eigenvectors of a non-Hermitian eigenvalue problem are generally not orthogonal. Using equations (14), (17) and a resolution of the identity in the quasiparticle product states, we can rewrite in terms of as
with the transition dipoles
Here is the spatial part of , and is the corresponding spin function. Here we denote the dipole operator as a ket, since in general a normal two-point operator can be expanded as . In the preceding analysis spin is explicit in the orbitals. However, is not diagonal in a spin orbital basis. If it is diagonalized in the spin indices (see appendix B), one singlet and three triplet product functions result, where the singlet one being the only one to have a non-vanishing transition dipole moment and so the one visible in the optical response. In the following we will suppress the spin indices and only work with the space quantities. Because of spin symmetry the coupling elements are modified with the factor being for the singlet and for the triplet
and the transition dipoles for the singlet get an additional factor of (see appendix B)
and the triplet transition dipole is zero. This means that the dynamic dipole polarizability effectively gets an additional factor of two for the singlet transition. Since we always consider the singlet for dipole transitions we can drop the ”singlet” superscript and let refer to equation 28. An important simplification to the problem is that, due to the occupation factors, only particle-hole and hole-particle product states contribute to the polarizability (see appendix B) and we can write the eigenfunctions of as
Here and in the following the indices will denote occupied (valence), empty (conduction, unoccupied) and general molecular orbitals. Projecting the eigenvalue equation (23) from the left with and we obtain a matrix equation with the following block structure
where . Using the symmetry properties of the BSE kernel and of the non-interacting Hamiltonian , we can also write
The second form (31) is useful because it leads to computational savings when explicitly setting up the matrix. In the Tamm-Dancoff approximation the off-diagonal blocks in the (i.e. the couplings between hole-particle and particle-hole states) are set to zero. This leads to two uncoupled Hermitian eigenvalue equations for and . Due to the symmetries displayed in equation (31) we see that the eigenvalues of the two blocks are related as , and the eigenvectors as , where the superscript refers either to the or the -sector. Therefore, only one of the equations needs to be solved, for example the one for the the -sector: . Using the fact that the eigenvectors are orthogonal for a Hermitian problem, the non-zero blocks of the the four-point polarizability are
The TDA is a widely used approximation that, in addition to the computational advantages, often provide good agreement with experimental excitation energies for organic molecules Peach and Tozer (2012); Peach et al. (2011); Sharifzadeh et al. (2013). At this point it is interesting to note the similarities of the BSE, TDDFT and time-dependent Hartree-Fock (TDHF). In TDDFT, although for semi-local functionals it is in principle sufficient to look at the response of the density, one can more generally look at the response of the density matrix as was done by Casida Casida (1995). The resulting equations are very similar to the BSE, with the only difference that the eigenvalues are replaced by KS eigenvalues, and that the direct term is replaced by a TDDFT exchange-correlation kernel. For semi-local exchange-correlation functionals, and real orbitals, the resulting eigenvalue problem can be reduced to a Hermitian problem of half the size — the preferred formulation of TDDFT in quantum chemistry. However, when Hartree-Fock exchange is included (in hybrid functionals for example) the reduction to the Hermitian form does not simplify things quite as much, since one needs to take the square root of a full matrix which requires an additional diagonalization. The TDHF response equations have the same structure as the BSE ones with Hartree-Fock eigenvalues and an unscreened direct term. The Tamm-Dancoff approximation is also useful in TDDFT and TDHF. For TDHF with TDA one recovers the configuration interaction singles (CIS) equation.
To set up and diagonalize the BSE Hamiltonian (31) is feasible only for systems with a few thousand of particle-hole pairs. For larger matrices an iterative procedure is essential both for memory and runtime requirements. In the following we describe how the the dynamical dipole polarizability tensor (25) can be computed with a Lanczos-type iterative method.
Continued fraction expression for the BSE polarizability
In the last equation refers to the singlet transition dipole in equation (28), and we denote the occupation difference matrix by
In the Tamm-Dancoff approximation we only consider states which means that the transition dipoles become
(the TDA-superscript since it will be clear from the context if the TDA is used or not).
An attractive method of dealing with resolvents is the Haydock recursion scheme Haydock et al. (1972), where a diagonal matrix element of a resolvent is efficiently computed from Lanczos recursion coefficients by means of continued fractions. Recently it has been shown that also non-diagonal matrix elements of the resolvent can be computed from the same Lanczos coefficients Rocca et al. (2008); Grüning et al. (2011). Usually the continued fraction representation of the resolvent is derived by determinant relations. Here we present an alternative derivation that only uses the power series expansion of the resolvent and the orthogonality among the Lanczos vectors. The off-diagonal matrix elements come out naturally in this formulation, and it is straight-forwardly extendible to block Lanczos, two-sided Lanczos and pseudo-Hermitian Lanczos schemes. Our derivation also connects to the theory of relaxation functions, also known as the Mori projection technique, first introduced to describe the Laplace transformed correlation function of dynamical systems Mori (1965) and later reformulated by Lee Lee (1982) in a form more closely related to the one we use here.
We want to compute — a general matrix element of the resolvent of the Hermitian operator , with the frequency in general a complex number. Let us define a frequency-dependent solution vector
where is the normalized . The matrix element of the resolvent in terms of the solution vector (37) reads
Now we generate a set of orthonormal Lanczos vectors with the starting state , using the standard recursion relations Saad (2003b)
with the real coefficients and . Next we expand the solution vector in the Lanczos basis
where the frequency dependent expansion coefficients are given by projection onto the basis
The expansion coefficients contain the information necessary to compute the sought matrix elements of the resolvent. The diagonal matrix element is especially simple (remembering that )
that is, only the zero-th coefficient is needed.
In the original Haydock recursion scheme only diagonal matrix element were computed. For our purposes we also need the off-diagonal elements, which can be computed using the higher expansion coefficients
The projections of the vectors with the Lanczos basis can be computed and saved when the Lanczos vectors are available, thus avoiding the storage of more than the last two vectors. As we shall see, the coefficients can be computed from continued fractions. An advantage of using continued fractions is that one can terminate them in a physically sensible way which can reduce the number of Lanczos vectors one has to explicitly compute. Projecting the Hermitian transpose of equation (39) onto the solution vector gives
Applying the operator onto the solution vector gives
which follows directly from the definition of the inverse
For the relation can be rearranged to give
while for we obtain
After the relaxation functions have been computed for a certain frequency, the expansion coefficients can be recovered by inverting the relation (50)
In summary, first the coefficients and , as well as the needed projections are obtained from equation (39), then for each (adding a small positive imaginary part, as appropriate for the retarded response), the relaxation functions are computed from equations (51) using a properly chosen terminator. Then, the expansion coefficients are obtained from equation (52). Finally, the matrix elements are computed from equations (42) and (43).
Iterative non-TDA BSE
The full BSE Hamiltonian is non-Hermitian which means that the Lanczos procedure outlined above must be modified. A two-sided Lanczos procedure where both left and right eigenvectors are generated in the recursive procedure can be used, although it suffers from instability issues due to the loss of orthogonality between the Lanczos vectors, often requiring explicit reorthogonalization Greenbaum (1997); Brezinski and Wuytack (2001). It also involves twice the number of applications of the Hamiltonian. Recently, a pseudo Hermitian algorithm was published that exploits the structure of the BSE eigenproblem to convert it into a Hermitian problem in a special scalar product Grüning et al. (2011). In this algorithm one avoids the extra multiplication of the Hamiltonian that is present in the two-sided scheme. Below we summarize the pseudo-Hermitian algorithm in our notation.
An operator is pseudo-Hermitian Mostafazadeh (2002a) with respect to the invertible Hermitian operator , if
This means that is Hermitian, or equivalently that is Hermitian under the scalar product , provided that the metric is positive definite so that the scalar product is well-defined. Furthermore, the eigenvalues of are real if it is pseudo-Hermitian with respect to an operator that can be written like with an invertible operator Mostafazadeh (2002b), and such a factorization can always be found for a positive definite . If is a product of two Hermitian operators , then is pseudo-Hermitian with and , which can be checked using equation (53). The BSE Hamiltonian given by equation (22) can be written in matrix form
with given by equation (35). Since we can write
Since is diagonal and real, and , it follows that is Hermitian. From the preceding discussion it is clear that is pseudo-Hermitian with respect to or . Since is not positive definite it doesn’t serve as a metric for a scalar product. however, should be positive definite unless there exist singlet-triplet instabilities Bauernschmitt and Ahlrichs (1996); Peach and Tozer (2012); Peach et al. (2011). Such instabilities do occur for molecules, and especially for triplet excitations can lose its positive definiteness. This will make the pseudo-Hermitian algorithm fail. However, since in this case also direct diagonalization gives unphysical results one should not view this failure as a drawback of the method.
Within the pseudo-Hermitian Lanczos scheme the same steps are followed as in the Hermitian case. The only difference is that the scalar product is changed form the ordinary to , with the Lanczos vectors orthonormal in this product. This means that equation (39) stays the same, but the Lanczos coefficients are modified to and , which can be seen by multiplying equation (39) by and using the orthogonality of the Lanczos vectors in the scalar product. To make the starting vector normalized, it is chosen as .
Due to the metric introduced in our scalar product we effectively have right and left Lanczos vectors, related by , and , although only one set of vectors is necessary in the actual computation. The resolution of the identity in the Lanczos vectors is
which means that the matrix element of the resolvent must be computed as
Here replaces equation (41) — the other equations that are needed can be derived as in the Hermitian case, only replacing the scalar product. Here, even if we only want a diagonal matrix element we have to sum over the projections of all the Lanczos vectors, because the starting (right) vector is not orthogonal (in the ordinary scalar product) to the other Lanczos vectors.
ii.3 Implementation of the Bethe-Salpeter equation
Having a general description of the BSE and of an iterative algorithm for solving it, we will describe our implementation using local basis functions.
It is straightforward to compute the matrix in equation (31) and diagonalize it to obtain the four-point polarizability from equations (24) and (25). The matrix elements of the kernel are computed using equation (27). The construction of the matrix requires operations ( being the number of atoms) and memory for storage. Solving the resulting eigenvalue problem using standard diagonalization techniques gives an even more prohibitive scaling of with the number of atoms. A way to avoid this excessive scaling is to limit the number of electron-hole pairs that are included in the calculation. However, the energy range covered by a constant number of pairs decreases with increasing system size, leading to a deteriorated description of the spectrum. In practice, the limit where explicit diagonalization is feasible is reached for a few tens of atoms: for larger systems iterative schemes are more efficient. Nevertheless, for small systems and for testing purposes straightforward diagonalization is a simple and useful alternative. Using our localized product basis set , the exchange and direct terms in equation (27) take the following form
where the bare and screened Coulomb matrix elements in the local product basis are
The expansion coefficient of a product of two quasiparticle states is given by
where the expansion coefficients are those appearing in equation (6). Unlike the local product coefficients , the eigenstate product coefficients are not sparse and the equations (59) will scale like if the loops are ordered in the proper way as shown by the boxes (equation (61) costs operations). The singlet transition dipoles can also be calculated from the product functions
where the dipole moments in the local product basis are .
Iterative computation of the BSE
Let us first look at the TDA which is simpler than the full BSE. Because only the -sector needs to be solved, the eigenvalue problem is Hermitian. Moreover, because in equation (36), we only need to calculate a diagonal matrix element of the resolvent to get the diagonal dynamical dipole polarizability
The Lanczos procedure for TDA is
& — f_-1 ⟩= 0,
& — ~f_0 ⟩= —~D_m ⟩,
& — ~f_n+1 ⟩= H^BSE —f_n ⟩- a_n —f_n⟩- b_n —f_n-1 ⟩,
&b_n+1 = ⟨~f_n+1— ~f_n+1 ⟩^1/2,
& — f_n+1 ⟩= — ~f_n+1 ⟩/ b_n+1,
&a_n = ⟨f_n — H^BSE —f_n ⟩ , wh