Accelerating Optical Absorption Spectra and Exciton Energy Computation for Nanosystems via Interpolative Separable Density Fitting

# Accelerating Optical Absorption Spectra and Exciton Energy Computation for Nanosystems via Interpolative Separable Density Fitting

## Abstract

We present an efficient way to solve the Bethe–Salpeter equation (BSE), a model for the computation of absorption spectra in molecules and solids that includes electron–hole excitations. Standard approaches to construct and diagonalize the Bethe–Salpeter Hamiltonian require at least operations, where is proportional to the number of electrons in the system, limiting its application to small systems. Our approach is based on the interpolative separable density fitting (ISDF) technique to construct low rank approximations to the bare and screened exchange operators associated with the BSE Hamiltonian. This approach reduces the complexity of the Hamiltonian construction to with a much smaller pre-constant. Here, we implement the ISDF method for the BSE calculations within the Tamm–Dancoff approximation (TDA) in the BerkeleyGW software package. We show that ISDF-based BSE calculations in molecules and solids reproduce accurate exciton energies and optical absorption spectra with significantly reduced computational cost.

## 1 Introduction

Many-Body Perturbation Theory [20] is a powerful tool to describe one-particle and two-particle excitations, and to obtain excitation energies and absorption spectra in molecules and solids [21]. Within many-body perturbation theory, Hedin’s GW approximation [10] has been successfully used to compute quasi-particle (one-particle) excitation energies and the Bethe–Salpeter equation (BSE) [24] describes the interaction of a electron–hole pair (two-particle excitation) produced during the spectral absorption in molecules and solids [23]. Good agreement between theory and experiment can only be achieved taking into account such electron–hole interaction by solving the BSE. The BSE is an eigenvalue problem. In the context of optical absorption, the eigenvalues of the Bethe–Salpeter Hamiltonian (BSH) are related to quasi-particle exciton energies and the corresponding eigenfunctions yield the exciton wavefunctions.

The BSH has a special block structure to be shown in the next section. It consists of bare and screened exchange kernels that depend on the products of single-particle orbitals obtained from a mean-field calculation. The evaluation of these kernels requires at least operations in a conventional approach, which is very costly for large complex systems that contain hundreds or even thousands of atoms. There are several methods, which have been developed recently to generate a reduced basis set, to reduce such high computational cost for the BSE calculations [2, 12, 16, 19, 22].

In this paper, we present a more efficient way to construct the BSH. Such a construction allows the BSE to be solved efficiently by an iterative diagonalization scheme. Our approach is based on the recently developed interpolative separable density fitting (ISDF) decomposition [18]. This ISDF decomposition has been applied to accelerate a number of applications in computational chemistry and materials science, including two-electron integral computation [18], correlation energy in the random phase approximation [17], density functional perturbation theory [15], and hybrid density functional calculations [11]. Such a decomposition is used to approximate the matrix consisting of products of single-particle orbital pairs as the product of a matrix consisting of a small number of numerical auxiliary basis vectors and an expansion coefficient matrix [11]. This low rank approximation effectively allows us to construct low rank approximations to the bare and screened exchange kernels. The construction of ISDF compressed BSE Hamiltonian matrix only requires operations if the rank of the numerical auxiliary basis can be kept at and if we keep the bare and screened exchange kernel in a low rank factored form. This is considerably more efficient than the complexity required in a conventional approach.

By keeping these kernels in the decomposed form, the matrix–vector multiplications required in iterative diagonalization procedures for computing the desired eigenvalues and eigenvectors of can be performed efficiently. We may also use these efficient matrix–vector multiplications in a structure preserving Lanczos algorithm [25] to obtain an approximate absorption spectrum, without explicitly diagonalizing the approximate .

We have implemented the ISDF based BSH construction in the BerkeleyGW software package [5]. We demonstrate that this approach can produce accurate exciton energies and optical absorption spectra for molecules and solids, and reduce the computational cost associated with the construction of the BSE Hamiltonian significantly compared to the algorithms used in the existing version of the BerkeleyGW software package.

## 2 Bethe–Salpeter equation

The Bethe–Salpeter equation is an eigenvalue problem of the form

 HBSEX=XE, (1)

where is an exciton wavefunction, is the corresponding exciton energy and is the Bethe–Salpeter Hamiltonian that has the following block structure

 HBSE=[D+2VA−WA2VB−WB−2¯¯¯¯VB+¯¯¯¯¯¯WB−D−2¯¯¯¯VA+¯¯¯¯¯¯WA], (2)

where is an diagonal matrix with , being the quasi-particle energies associated with valence bands and , being the quasi-particle energies associated with conduction bands. These quasi-particle energies are typically obtained from the so-called GW calculation [23]. The and matrices represent the (bare) exchange of electron–hole pairs, and the and matrices are often referred to as the direct terms that describe screened exchange of electron–hole pairs. These matrices are defined as follows:

 VA(ivic,jvjc)=∫¯ψic(r)ψiv(r)V(r,r′)¯ψjv(r′)ψjc(r′)drdr′,VB(ivic,jvjc)=∫¯ψic(r)ψiv(r)V(r,r′)¯ψjc(r′)ψjv(r′)drdr′,WA(ivic,jvjc)=∫¯ψic(r)ψjc(r)W(r,r′)¯ψjv(r′)ψiv(r′)drdr′,WB(ivic,jvjc)=∫¯ψic(r)ψjv(r)W(r,r′)¯ψjc(r′)ψiv(r′)drdr′, (3)

where and are usually taken to be the valence and conduction single-particle orbitals obtained from a Kohn–Sham density functional theory (KSDFT) calculation respectively, and and are the bare and screened Coulomb operators. Both and are Hermitian, whereas and are complex symmetric in general. In the so-called Tamm–Dancoff approximation (TDA) [21], both and are neglected and set to zeros in Equation (2). As a result, is Hermitian in this case, and we only need to be concerned with the upper left block of .

Let , , and be matrices that consist of products of discretized orbital pairs in real space, and , , be the reciprocal space representation of these matrices. Equations (3) can then be succinctly written as

 VA=^M∗vc^V^Mvc,WA=reshape(^M∗cc^W^Mvv), (4)

where and are reciprocal space representations of the bare and screened exchange operators and , respectively, and the reshape function is used to map the th element on the right-hand side of (4) to the th element of . A similar set of equations can be derived for and . However, in this paper, we will mainly focus on the TDA model.

The reason that the right-hand sides of (4) are computed in the reciprocal space is that is diagonal and an energy cutoff is often adopted to limit the number of the Fourier components used to represent . As a result, the leading dimension of , and , which we denote by , is often much smaller than that of , and , which we denote by .

In additional to performing Fast Fourier transforms (FFTs) to obtain , and from , and , respectively, we need to perform at least

 O(NgN2cN2v+N2gNcNv) (5)

floating-point operations to obtain and using matrix–matrix multiplication operations.

Note that, to achieve high accuracy with a large basis set, such as the plane wave basis set, is typically much larger than or . The number of occupied bands is either or depending on how spin is counted. In many existing calculations, the actual number of conducting band included in the calculation is often a small multiple of , whereas is often 100–10000 (). Therefore, the second term in Equation (5), which accounts for the cost of multiplying with in Equation (4) can be much larger than other parts under the Tamm–Dancoff approximation (TDA) in the BSE calculations.

## 3 Interpolative separable density fitting (ISDF) decomposition

In order to reduce the computational complexity, we seek to minimize the number of integrals we need to perform in Equation (3). This is possible if we can rewrite the matrix , where the labels and are indices of either valence or conducting orbitals, as the product of a matrix that contains a set of linearly independent auxiliary basis vectors with ( is a small constant referred as a rank truncation parameter) [11] and an expansion coefficient matrix . For large problems, the number of columns of , which is either , , or , is typically larger than the number of grid points on which is sampled, i.e., the number of rows in . As a result, should be much smaller than the number of columns of . Even when a cutoff is used to limit the size of or so that the number of columns in is much less than , we can still approximate by with a that has a smaller rank .

To simplify our discussion, let us drop the subscript of , and for the moment, and describe the basic idea of ISDF. The optimal low rank approximation of can be obtained from a singular value decomposition. However, the complexity of this decomposition is at least or . An alternative decomposition, which is close to optimal, but has a much favorable complexity has recently been developed. This type of decomposition is called interpolative separable density fitting (ISDF) [11], which we will now describe.

In ISDF, instead of computing and simultaneously, we fix the coefficient matrix first, and determine the auxiliary basis matrix by solving a linear least squares problem

 min∥M−ΘC∥2F, (6)

where each column of is given by sampled on a dense real space grids , and contains the auxiliary basis vectors to be determined, denotes the Frobenius norm.

We choose to be a matrix that consists of evaluated at a subset of carefully chosen real space grid points, with and , i.e., each column of indexed by is given by

 [ψi(^r1)¯ψj(^r1),⋯,ψi(^rk)¯ψj(^rk),⋯,ψi(^rNt)¯ψj(^rNt)]T.

If the minimum of the objective function in Equation (6) is zero, the product of and a column of can be viewed as an interpolation of corresponding function in . However, in general, we cannot expect the minimum of (6) be zero. The least squares minimizer is given by

 Θ=MC∗(CC∗)−1. (7)

It may appear that the matrix–matrix multiplications and take operations because the size of is and the size of is . However, both multiplications can be carried out with fewer operations due to the separable structure of and  [11]. As a result, the computational complexity for computing the interpolation vectors is .

Intuitively, the least squares problem in Equation (6) is easier to solve when the rows of , which can be selected from the rows of , are maximally linearly independent. This task can be achieved by performing a QR factorization of with column pivoting (QRCP) [4]. In QRCP, we choose a permutation such that the factorization

 MTΠ=QR (8)

yields a unitary matrix and an upper triangular matrix with decreasing matrix elements along the diagonal of . The magnitude of each diagonal element indicates how important the corresponding column of the permuted is, and whether the corresponding grid point should be chosen as an interpolation point. The QRCP decomposition can be terminated when the -st diagonal element of becomes less than a predetermined threshold. The leading columns of the permuted are considered to be maximally linearly independent numerically. The corresponding grid points are chosen as the interpolation points. The indices for the chosen interpolation points can be obtained from indices of the nonzero entries of the first columns of the permutation matrix .

Roughly speaking, the QRCP moves matrix columns of with large norms to the left, and pushes matrix columns of with small norms to the right. Note that the square of the vector 2-norm of the column of associated with is just

 N∑i,j=1∣∣ψi(r)¯ψj(r)∣∣2=(N∑i=1∣∣ψi(r)∣∣2)(N∑j=1∣∣ψj(r)∣∣2). (9)

In the case when and both belong to the set of occupied orbitals, the norm of each column of is simply the electron density. Hence the interpolation points chosen by QRCP tend to locate in areas where the electron density is relatively large. Once a column is selected, all other columns are immediately orthogonalized with respect to the chosen column. Hence nearly linearly dependent matrix columns will not be selected repeatedly. As a result, the interpolation points chosen by QRCP are well separated spatially. Notice that the standard QRCP procedure has a high computational cost of . But it can be combined with the randomized sampling method [18] so that its cost is reduced to .

## 4 Low rank representations of bare and screened exchange operators via ISDF

Applying the ISDF decomposition to , and yields

 Mcc≈ΘccCcc,Mvc≈ΘvcCvc,Mvv≈ΘvvCvv. (10)

It follows from Equations (3), (4) and (10) that the exchange and direct terms of the BSE Hamiltonian can be written as

 VA=C∗vc˜VACvc,WA=reshape(C∗cc˜WACvv), (11)

where and are the projected exchange and direct terms under the auxiliary basis , and . Here, , and are reciprocal space representations of , and , respectively, that can be obtained via FFTs,

Note that the dimension of the matrix on the right-hand side of Equation (11) is . It needs to be reshaped into a matrix of dimension according to the mapping before it can combined with matrix to construct the BSH.

Once the ISDF approximations for , and are available, the cost for constructing a low rank approximation to the exchange and direct terms reduced to that of computing the projected exchange and direct kernels and , respectively. If the ranks of , and are , and , respectively, then the computational complexity for computing the compressed exchange and direct kernels is , which is significantly lower than the complexity of the conventional approach given in (5). When , and are on the order of , the complexity of constructing the compressed kernels is .

## 5 Iterative diagonalization of the BSE Hamiltonian

In the conventional approach, exciton energies and wavefunctions can be computed by using the recently developed BSEPACK library [26, 27] to diagonalize the BSE Hamiltonian . When TDA is adopted, we may also just use a standard diagonalization procedure implemented in the ScaLAPACK [1] library.

When ISDF is used to construct low rank approximations to the bare and screened exchange operators and , we should keep both matrices in the factored form given by Equation (11). This is because that multiplying the matrices on the right-hand sides of Equation (11) would require at least operations, which is higher than the cost for using the ISDF procedure to construct low rank approximations to the BSH. Instead of using BSEPACK or ScaLAPACK which has a complexity of , we propose to use iterative methods to diagonalize the approximate BSH constructed via the ISDF decomposition.

When TDA is used, several iterative methods such as the Lanczos [14] and LOBPCG [13] algorithms can be used to compute a few desired eigenvalues of the . In each step of these algorithms, we need to multiply with a vector of size . When is kept in the factored form given by (11), can be evaluated as three matrix vector multiplications performed in sequence, i.e.,

 VAx←C∗vc[˜VA(Cvcx)]. (12)

The complexity of these calculations is . If is on the order of , then each can be carried out in operations.

Because cannot be multiplied with a vector of size before it is reshaped, a different multiplication scheme needs to be used. It follows from the separable nature of and that this multiplication can be succinctly described as

 WAx=reshape[Ψ∗c(˜W⊙(ΨcXΨ∗v))Ψv], (13)

where is a matrix reshaped from the vector , is a matrix containing as its elements, is a matrix containing as its elements, and denotes componentwise multiplication (Hadamard product). The reshape function is used to turn the matrix–matrix product back into a size vector. If and are on the order of , then all matrix–matrix multiplications in Equation (13) can be carried out in operations. This make the complexity of each step of the iterative method . If the number of iterative steps required to reach convergence is not excessively large, then the ISDF enabled iterative diagonalization can be carried out in operations.

## 6 Estimating optical absorption spectra without diagonalization

If we have all eigenpairs of , we can easily obtain the optical absorption spectrum, which is the imaginary part of the dielectric function defined as

 ε2(ω)=Im[8π2e2ΩdHr((ω−iη)I−HBSE)−1dl], (14)

where is the volume of the primitive cell, is the elementary charge, and are the right and left optical transition vectors, and is a broadening factor used to account for the lifetime of excitation. However, it can become prohibitively expensive to use an iterative diagonalization method to compute all eigenpairs of .

Fortunately, it is possible to use a structure preserving iterative method to estimate the optical absorption spectrum without explicitly computing all eigenpairs of . In Ref. [3, 25], we developed a structure preserving Lanczos algorithm for estimating the optical spectrum. The algorithm has been implemented in the BSEPACK [27] library. When TDA is used, the standard Lanczos algorithm can be used to estimate the absorption spectrum. When our objective is to obtain the basic shape of the absorption spectrum to identify where the major peaks are, it is not necessary to compute all eigenpairs of . As a result, the accuracy required to construct approximate bare and screened exchange operators used in BSE can possibly be lowered, thereby allowing us to use a more aggressive truncation threshold in ISDF to further reduce the cost of construction. We will demonstrate this possibility in the next section.

## 7 Numerical results

In this section, we demonstrate the accuracy and efficiency of the ISDF method when it is used to compute exciton energies and optical absorption spectrum in the BSE framework. We implemented the ISDF based BSH construction in the BerkeleyGW software package [5]. BerkeleyGW is a massively parallel computational package that uses a many-body perturbation theory and Green’s function formalism to study quasi-particle excitation energies and optical absorption of nanosystems. We use the ab initio software package Quantum ESPRESSO (QE) [7] to compute the ground-state quantities required in the GW and BSE calculations. In our Quantum ESPRESSO density functional theory (DFT) based electronic structure calculations, we use Hartwigsen–Goedecker–Hutter (HGH) norm-conserving pseudopotentials [9] and the LDA [8] exchange–correlation functional. All the calculations were carried out on a single core at the Cori1 systems at the National Energy Research Scientific Computing Center (NERSC).

We performed calculations for three systems. They consist of a bulk silicon Si system and two molecular systems: carbon monoxide (CO) and benzene (CH) as plotted in Fig. 1. All systems are closed shell systems, and the number of occupied bands is , where is the valence electrons in the system.

### 7.1 Accuracy

We first measure the accuracy of the ISDF method by comparing the computed eigenvalues of the BSH matrices constructed with and without the ISDF decomposition.

In our test, we set the plane wave energy cutoff required in the QE calculations to Ha, which is relatively low. However, this is sufficient for assessing the effect of ISDF. Such a choice of results in and for the Si system, and for the CO molecule (), and for the benzene molecule. We only include the lowest conducting bands in the BSE calculation. The number of active conduction bands () and valence bands (), the number of reciprocal grids and the dimensions of the corresponding BSE Hamiltonian for these three systems are listed in Table 1.

In Fig. 2, we plot the singular values of the matrices , and associated with the CO molecule. We observe that the singular values of these matrices decay rapidly. For example, the leading 500 (out of 3600) singular values of decreases rapidly towards zero. All other singular values are below . Therefore, the numerical rank of is roughly 500 ( = 8.3), or roughly 15% of the number of columns in . Consequently, we expect that the rank of produced in ISDF decomposition can be set to 15% of without sacrificing the accuracy of the computed eigenvalues.

This prediction is confirmed in Fig. 3, where we plot the absolute difference between the lowest the exciton energy of Si computed with and without using ISDF to construct . To be specific, the error in the desired eigenvalue is computed as , where is computed from the constructed with ISDF approximation, and is computed from a standard constructed without using ISDF. We first vary one of the ratios , and while holding the others at a constant of . We observe that the error in the lowest exciton energy (positive eigenvalue) is around Ha, when either or is set to 0.1 while the other ratios are held at 1. However, reducing to 0.1 introduces a significant amount of error (0.1 Ha) in the lowest exciton energy. This is likely due to the fact that is too small. We then hold at 0.5 and let both and vary. The variation of with respect to these ratios is also plotted as in Fig. 3. We observe that the error in the lowest exciton energy is still around  Ha even when both and are set to 0.1.

We then check the absolute error (Ha) of all the exciton energies computed with the ISDF method by comparing them with the ones obtained from a conventional BSE calculation implemented in BerkeleyGW for the CO and benzene molecules. As we can see from Fig. 4, the errors associated with these eigenvalues are all below  Ha when is .

### 7.2 Efficiency

At the moment, we have only implemented a sequential version of the ISDF method within the BerkeleyGW software package. Therefore, our efficiency test is limited in the size of the problem as well as the number of conducting bands () we can include in the bare and screened exchange operators. As a result, our performance measurement does not fully reflect the computational complexity analysis presented in the previous sections. In particular, taking benzene as an example, is much larger than and , therefore the computational cost of term is much higher than the term in the conventional BSE calculations.

Nonetheless, in this section, we will demonstrate the benefit of using ISDF to reduce the cost for constructing the BSE Hamiltonian . In Table 2, we focus on the benzene example and report the wall-clock time required to construct the ISDF approximations of the , , and matrices at different rank truncation levels. Without using ISDF, it takes 746.0 seconds to construct the reciprocal space representations of , , and in BerkeleyGW. Most of the timing is spent in the large number of FFTs applied to , , and to obtain the reciprocal space representation of these matrices. We can clearly see that if only is changed from 0.5 ( = 30.0) to 0.1 ( = 6.0), the wall-clock time used to construct a low rank approximation to can be reduced from 578.9 to 34.3 seconds. Furthermore, the total cost of computing , and can be further reduced to around 1/19th that in a conventional approach (39.3 vs. 746.0 seconds) if , and are all set to 0.1.

Note that because ISDF decomposition is carried out on a real space grid, its measured cost reflect the cost for performing QRCP in real space. Even though QRCP with random sampling has complexity, it has a relatively large pre-constant compared to the size of the problem. Hence the measured cost of QRCP is relatively high in this case. Recently, Dong et al. proposed a new approach to find the interpolation points based on the centroidal Voronoi tessellation (CVT) method [6], which offers a much less expensive alternative to the QRCP procedure when the ISDF method is used in hybrid functional calculations. We will explore this CVT method in the BSE-ISDF calculations in the future.

In Table 3, we report the wall-clock time required to construct the projected bare and screened exchange matrices and that appear in Equation (11) once the ISDF approximations of , , and become available. Without ISDF, it takes seconds to construct both and . When / is set to 0.1 only, the construction cost for , which is the dominant cost, is reduced by a factor of 2.8. Furthermore, if , and are all set to 0.1. We reduce the cost for constructing and by a factor of 63.0 and 10.1, respectively. Note that the original implementation of the and in BerkeleyGW is much slower because the elements of and are integrated one by one. For benzene, it take takes seconds ( hours) to construct the BSE Hamiltonian in the original BerkeleyGW code.

### 7.3 Optical absorption spectra

One important application of BSE is to compute the optical absorption spectrum, which is determined by optical dielectric function in Equation (14). Fig. 5 plots the optical absorption spectra for both CO and benzene obtained from approximate constructed with the ISDF method and the constructed in a conventional approach implemented in BerkeleyGW. When the rank truncation ratio / is set to be only 0.10 (), the absorption spectrum obtained from the ISDF approximate is nearly indistinguishable from that produced from the conventional approach. When / is set to 0.05 (), the absorption spectrum obtained from ISDF approximate still preserves the main features (peaks) of the absorption spectrum obtained in a conventional approach even though some of the peaks are slightly shifted, and the height of some peaks are slightly off.

## 8 Conclusion and outlook

In summary, we have demonstrated that the interpolative separable density fitting (ISDF) technique can be used to efficiently and accurately construct the Bethe–Salpeter Hamiltonian matrix. The ISDF method allows us to reduce the complexity of the Hamiltonian construction from to with a much smaller pre-constant. We show that the ISDF based BSE calculations in molecules and solids can efficiently produce accurate exciton energies and optical absorption spectrum in molecules and solids.

In the future, we plan to replace the costly QRCP procedure with the centroidal Voronoi tessellation (CVT) method [6] for selecting the interpolation points in the ISDF method. The CVT method is expected to significantly reduce the computational cost for selecting interpolating point in the ISDF procedure for the BSE calculations.

The performance results reported here are based on a sequential implementation of the ISDF method. In the near future, we will implement a parallel version suitable for large-scale distributed memory parallel computers. Such an implementation will allow us to tackle much larger problems for which the favorable scaling of the ISDF approach is much more pronounced.

## Acknowledgments

This work is partly supported by the Center for Computational Study of Excited-State Phenomena in Energy Materials (C2SEPEM) at the Lawrence Berkeley National Laboratory, which is funded by the U. S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division under Contract No. DE-AC02-05CH11231, as part of the Computational Materials Sciences Program, and by the Center for Applied Mathematics for Energy Research Applications (CAMERA) (L. L. and C. Y.). The authors thank the National Energy Research Scientific Computing (NERSC) center for making computational resources available.

### Footnotes

1. https://www.nersc.gov/systems/cori/

### References

1. T. Auckenthaler, V. Blum, H. J. Bungartz, T. Huckle, R. Johanni, L. Krämer, B. Lang, H. Lederer, and P. R. Willems. Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Comput., 37(12):783–794, 2011.
2. P. Benner, S. Dolgov, V. Khoromskaia, and B. N. Khoromskij. Fast iterative solution of the Bethe–Salpeter eigenvalue problem using low-rank and QTT tensor approximation. J. Comput. Phys., 334:221–239, 2017.
3. J. Brabec, L. Lin, M. Shao, N. Govind, Y. Saad, C. Yang, and E. G. Ng. Efficient algorithms for estimating the absorption spectrum within linear response TDDFT. J. Chem. Theory Comput., 11(11):5197–5208, 2015.
4. T. F. Chan and P. C. Hansen. Some applications of the rank revealing QR factorization. SIAM J. Sci. Statist. Comput., 13:727–741, 1992.
5. J. Deslippe, G. Samsonidze, D. A. Strubbe, M. Jain, M. L. Cohen, and S. G. Louie. BerkeleyGW: A massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures. Comput. Phys. Commun., 183(6):1269–1289, 2012.
6. K. Dong, W. Hu, and L. Lin. Interpolative separable density fitting through centroidal voronoi tessellation with applications to hybrid functional electronic structure calculations. arXiv:1711.01531, 2017.
7. P. Giannozzi, S. Baroni, N. Bonini, M. Calandra, R. Car, C. Cavazzoni, D. Ceresoli, G. L. Chiarotti, M. Cococcioni, I. Dabo, A. D. Corso, S. de Gironcoli, S. Fabris, G. Fratesi, R. Gebauer, U. Gerstmann, C. Gougoussis, A. Kokalj, M. Lazzeri, L. Martin-Samos, N. Marzari, F. Mauri, R. Mazzarello, S. Paolini, A. Pasquarello, L. Paulatto, C. Sbraccia, S. Scandolo, G. Sclauzero, A. P. Seitsonen, A. Smogunov, P. Umari, and R. M. Wentzcovitch. QUANTUM ESPRESSO: A modular and open-source software project for quantum simulations of materials. J. Phys.: Condens. Matter, 21(39):395502, 2009.
8. S. Goedecker, M. Teter, and J. Hutter. Separable dual-space Gaussian pseudopotentials. Phys. Rev. B, 54:1703, 1996.
9. C. Hartwigsen, S. Goedecker, and J. Hutter. Relativistic separable dual-space gaussian pseudopotentials from H to Rn. Phys. Rev. B, 58:3641, 1998.
10. L. Hedin. New method for calculating the one-particle Green’s function with application to the electron–gas problem. Phys. Rev., 139:A796, 1965.
11. W. Hu, L. Lin, and C. Yang. Interpolative separable density fitting decomposition for accelerating hybrid density functional calculations with applications to defects in silicon. J. Chem. Theory Comput., 13(11):5420–5431, 2017.
12. P. B. V. Khoromskaia and B. N. Khoromskij. A reduced basis approach for calculation of the Bethe¨CSalpeter excitation energies by using low-rank tensor factorisations. Mol. Phys., 114:1148–1161, 2016.
13. A. V. Knyazev. Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method. SIAM J. Sci. Comput., 23(2):517–541, 2001.
14. C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Nat. Bur. Standards, 45:255–282, 1950.
15. L. Lin, Z. Xu, and L. Ying. Adaptively compressed polarizability operator for accelerating large scale ab initio phonon calculations. Multiscale Model. Simul., 15:29–55, 2017.
16. M. P. Ljungberg, P. Koval, F. Ferrari, D. Foerster, and D. Sánchez-Portal. Cubic-scaling iterative solution of the Bethe–Salpeter equation for finite systems. Phys. Rev. B, 92:075422, 2015.
17. J. Lu and K. Thicke. Cubic scaling algorithms for RPA correlation using interpolative separable density fitting. J. Comput. Phys., 351:187–202, 2017.
18. J. Lu and L. Ying. Compression of the electron repulsion integral tensor in tensor hypercontraction format with cubic scaling cost. J. Comput. Phys., 302:329–335, 2015.
19. M. Marsili, E. Mosconi, F. D. Angelis, and P. Umari. Large-scale GW-BSE calculations with scaling: Excitonic effects in dye-sensitized solar cells. Phys. Rev. B, 95:075415, 2017.
20. C. Møler and M. S. Plesset. Note on an approximation treatment for many-electron systems. Phys. Rev., 46:618, 1934.
21. G. Onida, L. Reining, and A. Rubio. Electronic excitations: Density-functional versus many-body Green’s-function approaches. Rev. Mod. Phys., 74:601, 2002.
22. D. Rocca, D. Lu, and G. Galli. Ab initio calculations of optical absorption spectra: Solution of the Bethe¨CSalpeter equation within density matrix perturbation theory. J. Chem. Phys., 133:164109, 2010.
23. M. Rohlfing and S. G. Louie. Electron–hole excitations and optical spectra from first principles. Phys. Rev. B, 62:4927, 2000.
24. E. E. Salpeter and H. A. Bethe. A relativistic equation for bound-state problems. Phys. Rev., 84:1232, 1951.
25. M. Shao, F. H. da Jornada, L. Lin, C. Yang, J. Deslippe, and S. G. Louie. A structure preserving Lanczos algorithm for computing the optical absorption spectrum. SIAM J. Matrix. Anal. Appl., to appear.
26. M. Shao, F. H. da Jornada, C. Yang, J. Deslippe, and S. G. Louie. Structure preserving parallel algorithms for solving the Bethe¨CSalpeter eigenvalue problem. Linear Algebra Appl., 488:148–167, 2016.
27. M. Shao and C. Yang. BSEPACK user’s guide, 2016. https://sites.google.com/a/lbl.gov/bsepack/.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters