Computing the density of states for optical spectra by low-rank and QTT tensor approximation

# Computing the density of states for optical spectra by low-rank and QTT tensor approximation

Peter Benner Max Planck Institute for Dynamics of Complex Systems, Sandtorstr. 1, D-39106 Magdeburg, Germany (benner@mpi-magdeburg.mpg.de)    Venera Khoromskaia Max Planck Institute for Mathematics in the Sciences, Leipzig; Max Planck Institute for Dynamics of Complex Systems, Magdeburg (vekh@mis.mpg.de).    Boris N. Khoromskij Max Planck Institute for Mathematics in the Sciences, Inselstr. 22-26, D-04103 Leipzig, Germany (bokh@mis.mpg.de).    Chao Yang
Berkeley Labs, Berkeley, USA (cyang@lbl.gov).
###### Abstract

In this paper, we introduce a new interpolation scheme to approximate the density of states (DOS) for a class of rank-structured matrices with application to the Tamm-Dancoff approximation (TDA) of the Bethe-Salpeter equation (BSE). The presented approach for approximating the DOS is based on two main techniques. First, we propose an economical method for calculating the traces of parametric matrix resolvents at interpolation points by taking advantage of the block-diagonal plus low-rank matrix structure described in [6, 3] for the BSE/TDA problem. Second, we show that a regularized or smoothed DOS discretized on a fine grid of size can be accurately represented by a low rank quantized tensor train (QTT) tensor that can be determined through a least squares fitting procedure. The latter provides good approximation properties for strictly oscillating DOS functions with multiple gaps, and requires asymptotically much fewer () functional calls compared with the full grid size . This approach allows us to overcome the computational difficulties of the traditional schemes by avoiding both the need of stochastic sampling and interpolation by problem independent functions like polynomials etc. Numerical tests indicate that the QTT approach yields accurate recovery of DOS associated with problems that contain relatively large spectral gaps. The QTT tensor rank only weakly depends on the size of a molecular system which paves the way for treating large-scale spectral problems.

Key words: Bethe-Salpeter equation, density of states, absorption spectrum, tensor decompositions, model reduction, low-rank matrix, QTT tensor approximation.

AMS Subject Classification: 65F30, 65F50, 65N35, 65F10

## 1 Introduction

Numerical approximation of the density of states (DOS) or spectral density (see §2.2) of large matrices is one of the challenging problems arising in the prediction of electronic, vibrational and thermal properties of molecules and crystals and many other applications. This topic, first developed in condensed matter physics [14, 44, 41, 13, 43], has long since attracted interest in the community of numerical linear algebra [42, 16, 40], see also a survey on commonly used methodology for approximation of DOS for large matrices of general structure [25]. Most traditional methods are based on a polynomial or fractional-polynomial interpolation of the DOS regularized by Gaussians or Lorentzians, and computing traces of certain matrix valued functions, say matrix resolvents or polynomials, defined at a large set of interpolation points within the spectral interval of interest. Furthermore, the trace calculations are typically accomplished with stochastic sampling over a large number of random vectors [25].

Since the size of matrices resulting from real life applications is usually large (in quantum mechanics it scales as a polynomial of the molecular size), and the DOS of these matrices often exhibits very complicated shape, the above mentioned methods become prohibitively expensive. Moreover, the algorithms based on polynomial interpolants have poor approximating properties when the spectrum of a matrix exhibits gaps or highly oscillating non-regular shapes, as is the case in electronic structure calculations. Furthermore, stochastic sampling leads to poor Monte Carlo estimates with slow convergence rates and low accuracy.

In this paper we present a new method to efficiently and accurately approximate the DOS for large rank-structured symmetric matrices. The approach amounts to estimating the DOS by evaluating matrix functions of structured matrices, in particular traces of the matrix resolvent. Our main contribution is to perform each function evaluation at low cost and to reduce the total number of function evaluations in the case of fine representation grid.

We apply this approximation to the Bethe-Salpeter equation (BSE), which is a widely used model for ab initio estimation of the absorption spectra for molecules or surfaces of solids [35, 18, 39, 32, 27, 31]. In particular, we use the recently developed low-rank structured representation of the BSE Hamiltonian, which was introduced and analyzed in [6]. An efficient and structured eigenvalue solver for this block-diagonal plus low-rank representation of the BSE Hamiltonian as well as to its symmetric positive definite surrogate obtained by the Tamm-Dancoff approximation (TDA) is described in [3].

The approach we take here to approximate the DOS of the BSE Hamiltonian relies on the Lorentzian blurring [17]. The most computationally expensive part of the calculation is reduced to the evaluation of traces of shifted matrix inverses. Our method is based on two main ingredients. First, we propose an economical method for calculating traces of parametric matrix resolvents at interpolation points by taking advantage of the block-diagonal plus low-rank BSE/TDA matrix structure described in [6, 3], which enables the direct inversion of the shifted Hamiltonian within the same matrix structure. This allows us to overcome the computational difficulties of the traditional schemes and avoid the need of stochastic sampling. Second, we show that a regularized or smoothed DOS can be accurately approximated by a low rank QTT tensor [23] that can be determined through a least squares procedure. The accuracy of approximation and interpolation is controlled by -truncation of the corresponding matrix/tensor ranks.

Our fast method for calculating traces of matrix resolvents for the family of rank-structured matrices exhibits almost linear asymptotic complexity scaling with respect to the matrix size. We introduce an explicit rank-structured representation of the matrix inverse which can be evaluated efficiently by using the Sherman-Morrison-Woodbury formula. Note that the diagonal plus low-rank approximation to the BSE Hamiltonian introduced in [6] employs the low-rank approximation to the two-electron integrals tensor in the form of a Cholesky factorization [21]. An efficient structured solver designed to calculate a number of minimal eigenvalues of the block-diagonal plus low-rank representation of the BSE/TDA matrices is described in [3].

Another novelty of this paper is the application of the QTT tensor approximation to the DOS sampled on a fine grid, which results in a long vector. The QTT approximation method was introduced and analyzed for function related vectors in [23]. It was proven that for a length- vector obtained from the discretization of a classical function (complex exponentials, polynomials, Gaussians etc.), its QTT image constructed in the -dimensional tensor space with exhibits an amazingly low separation rank . This rank parameter appears to be independent of the size of the original vector. Thus the use of QTT tensor compression reduces the number of representation parameters from to , which allows asymptotically a much smaller number of functional calls, , to reconstruct the DOS function in the QTT parametrization. This might be beneficial in the limit of a large number of representation points since each functional evaluation of the DOS is highly expensive requiring computation of some matrix valued functions.

For example, for a vector of size representing the exponential function, its reshape (folding) into an -dimensional tensor of size with modes equal to , yields a QTT tensor of rank , which means the reduction of storage from to . For a complex exponential vector there holds , then storage reduces from to . Similar low rank QTT representations were proven for a wide class of functions [24], including strongly oscillating functions of nontrivial shape, see for example [19, 22] and the new results in §4.4 below. For a general class of functional vectors, one computes an -rank QTT approximation which leads to a storage size with logarithmic scaling in .

Numerical tests for moderate size molecules confirm the closeness of DOS for the TDA model to those computed on the exact BSE spectrum. We also justify that the simplified block-diagonal plus low-rank approximation recovers well the main landscape and shape details of the DOS curve on the whole energy interval and check the precision of the low-rank QTT approximation to the length- vector representing the DOS. We demonstrate the almost linear complexity scaling of the trace calculation algorithm applied to TDA matrices of different size. We then show by numerical tests that the low-rank QTT tensor interpolation scheme requires only a small number of adaptively chosen samples in the -vector discretizing the DOS. For instance, a polynomial interpolant of degree needs interpolation points (functional calls) for the representation of a function on a large -grid. However, in the case of highly oscillating DOS functions of interest one should impose . On the contrary, the QTT interpolant over interpolation points provides a rather accurate representation of the functional -vector of the DOS.

We also discuss the opportunity to reduce the cost of multiple trace calculations for the parametric matrix resolvent and, finally, describe modifications necessary to calculate the optical absorption spectrum via a rank-structured BSE model.

The rest of the paper is structured as follows. In Section 2, we recall the main prerequisites for the description of our method including the rank-structured approximation to the BSE/TDA matrix, basic notions of the regularization of DOS by Lorentzians and a short summary on the existing methods for matrices of general structure. Section 3 discusses the main techniques of the presented method and the corresponding analysis in Theorems 3.1 and 3.2. The numerical tests confirm the linear scaling of our algorithm in the size of the grid on which the DOS is evaluated. Section 4 presents a summary of the QTT tensor approximation of function related vectors and the analysis of the QTT tensor ranks of the DOS, see Theorem 4.1. In Section 4.3 the ACA based QTT interpolation is applied to the discretized DOS, where the quality of the interpolation is illustrated numerically. The beneficial features of the new computational schemes are verified by extensive numerical experiments on the examples of various molecular systems. Section 5 outlines the extension of the approach to the case of full BSE system. Conclusions summarize the main results and address the application perspectives.

## 2 Main prerequisites and outline of initial applications

### 2.1 Rank-structured approximation to BSE matrix

In this paper we describe a method for efficient and accurate approximation of the DOS for large rank-structured symmetric matrices. Our basic application is concerned with estimating the DOS and the absorption spectrum for the Bethe-Salpeter problem describing the excitation energies of molecules.

The -block matrix representation of the Bethe-Salpeter Hamiltonian (BSH) leads to the following eigenvalue problem.

 H(xkyk)≡(AB−B∗−A∗)(xkyk)=ωk(xkyk), (2.1)

where the matrix blocks of size , with , are defined by

 A=Δε+V−ˆW,B=V−˜W, (2.2)

and eigenvalues correspond to the excitation energies. Here is a diagonal matrix and

 V=[via,jb]a,b∈Iv:={No+1,…,Nb},i,j∈Io:={1,…,No},

is the rank- two-electron integrals (TEI) matrix projected onto the Hartree-Fock molecular orbital basis, where is the number of Gaussian type orbital (GTO) basis functions and denotes the number of occupied orbitals [6].

The method for solving the Bethe-Salpeter equation (BSE) using low-rank factorizations of the generating matrices has been introduced in [6]. It is based on a tensor-structured grid-based Hartree-Fock (HF) solver which provides not only the full set of eigenvalues and HF orbitals, but also the two-electron integrals tensor in the form of a low-rank Cholesky factorization, see [20] and references therein.

The matrix inherits its low rank from the two-electron integrals tensor, and is also proven to have a small -rank (see [6]). In particular, there holds

 V≈LVLTV,LV∈Rn×RV,RV≤RB, (2.3)

with the rank estimates , and .

In [3], it was shown that the matrix , which does not exhibit an accurate low rank representation, can be well approximated by a block diagonal matrix

 ˆW≈blockdiag[ˆB,D],

where is a dense block with , . The size of is nearly the same as the rank parameter of . As a result, the TDA matrix can be approximated by a sum of a block-diagonal matrix and a low rank matrix shown in Figure 2.1, i.e.,

 A≈ˆA=Δε+QQT−blockdiag[ˆB,D]≡blockdiag[B0,D0]+QQT.

An efficient structured solver designed to calculate a number of minimal eigenvalues of the block-diagonal plus low-rank representation of the BSE/TDA matrices is described in [3]. It is based on an efficient subspace iteration of the matrix inverse, which for rank-structured matrix formats can be evaluated efficiently by using the Sherman-Morrison-Woodbury formula, thus reducing the numerical expense of the direct diagonalization down to in the size of the atomic orbitals basis set, . Furthermore, this solver also includes a QTT-based compression scheme, where both eigenvectors and the rank-structured BSE matrix blocks are represented by block-QTT tensors. The block-QTT representation of the eigenvector is determined by an alternating least squares (ALS) iterative algorithm. The overall asymptotic complexity for computing several smallest in modulo eigenvalues in the BSE spectral problem by using the QTT approximation is estimated to be , where is the number of occupied orbitals.

Matrices in the form (2.1) are called -symmetric or Hamiltonian, see [5] for implications on the algebraic properties of the BSE matrix. In particular, solutions of equation (2.1) come in pairs: excitation energies with eigenvectors , and de-excitation energies with eigenvectors .

The simplification in the BSH, , defined by the symmetric diagonal block is called the Tamm-Dancoff (TDA) approximation. In what follows, we are interested in the TDA spectral problem,

 Auk=λkuk,k=1,…,n,

providing good approximations to .

In general, methods for solving partial eigenvalue problems for matrices with a special structure as in the BSE setting are conceptually related to the approaches for Hamiltonian matrices [4, 7, 15, 9], particularly to those based on minimization principles [1, 2]. A structured Lanczos algorithm for estimation of the optical absorption spectrum was described in [37]. Various structured eigensolvers tailored for electronic structure calculations are discussed in [33, 34, 10, 26, 25, 38].

### 2.2 Density of states for symmetric matrices

To fix the idea, we first consider the case of symmetric matrices. Following [25], we use the simple definition of the DOS for symmetric matrices

 ϕ(t)=1nn∑j=1δ(t−λj),t,λj∈[0,a], (2.4)

where is the Dirac distribution and the ’s are the eigenvalues of ordered as .

Several classes of blurring approximations to are used in the literature. One can replace each Dirac- by a Gaussian function with width , i.e.,

 δ(t)⇝gη(t)=1√2πηexp(−t22η2),

where the choice of the regularization parameter depends on the particular problem setting. As a result, (2.4) can be approximated by

 ϕ(t)≈ϕη(t):=1nn∑j=1gη(t−λj), (2.5)

on the whole energy interval .

We may also replace each Dirac- by a Lorentzian, i.e.,

 δ(t)⇝Lη(t):=1πηt2+η2=1πIm(1t−iη), (2.6)

so that an approximate DOS can be written as

 ϕ(t)≈ϕη(t):=1nn∑j=1Lη(t−λj). (2.7)

When , both Gaussians and Lorentzians converge to the Dirac distribution, i.e.,

 limη→0+gη(t)=limη→0+Lη(t)=δ(t).

However, they exhibit different features of the approximant for small . In the case of Gaussians, one expects a sharp resolution of the spectral peaks, while the Lorentzian based representation aims to resolve better the global landscape of .

Both functions and are continuous, hence, they can be discretized by sampling on a fine grid over . In the following, we use the uniform cell-centered -point grid with the mesh size .

In what follows, we focus on the case of Lorentzian blurring, which will be motivated later on, and apply it to the TDA approximation of the BSE problem (see §2.1 below). We use the simplified block-diagonal plus low-rank approximation to the matrix , see [6, 3], which allows efficient explicit representation of the shifted inverse matrix.

The numerical illustrations in §2.2 represent the DOS for the HO molecule and H chains broadened by Gaussians (2.5). The data corresponds to the reduced basis approach via rank-structured approximation applied to the symmetric TDA model [6, 3] described by the matrix block of the full BSE system matrix.

It was numerically demonstrated in [6] that the spectrum of the TDA model provides a good approximation to the spectrum of the full BSE Hamiltonian. The difference between the two is on the order of for molecules of moderate size.

Figure 2.2, left, compares the DOS for the HO molecule calculated via the eigenvalues of the full BSE Hamiltonian and those of the TDA approximation, while on the right we display the corresponding maximum error.

Figure 2.3, left, compares the same DOS calculations but zoomed on the first compact energy interval eV. The red curve corresponds to the full BSE data, and the blue one represents the TDA case. The figure on the right displays the corresponding maximum error.

Figure 2.4, left, represents the DOS for HO computed by using the exact TDA spectrum (blue) and its approximation based on a simplified model obtained via low-rank approximation to (red), while the right figure shows the relative error.

Figures 2.5 presents the DOS for H (left) and H (right) chains of Hydrogen atoms. We observe the essential similarity in the shapes (only the amplitude is changing) which is apparently a consequence of quasi-periodicity of the system.

The rank-structured approach to calculation of the molecular absorption spectrum in the case of full BSE is sketched in §5. This topic will be addressed elsewhere.

### 2.3 General description of the existing computational schemes

One of the commonly used approaches to the numerical approximation of both functions and is based on the construction of certain polynomial or fractional polynomial interpolants whose evaluation at each sampling point requires the solution of a large linear system with the BSE/TDA matrix, i.e., remains expensive.

In the case of Lorentzian broadening (2.7) the regularized DOS takes the form

 ϕ(t)≈ϕη(t):=1nπn∑j=1Im(1(t−λj)−iη)=1nπIm% Trace[(tI−A−iηI)−1]. (2.8)

To keep real-valued arithmetics, likewise, we can write the latter in the form

 ϕη(t):=1nπn∑j=1η(t−λj)2+η2=1nπTrace[((tI−A)2+η2I)−1]. (2.9)

In both cases the task of computing the approximate DOS reduces to approximating the trace of the matrix resolvent

 (tI−A−iηI)−1or((tI−A)2+η2I)−1.

Here, the price to pay for real-valued arithmetics is to address the more complicated low-rank structure in .

The traditional approach [25] to approximately computing the traces of the matrix-valued analytic function reduces this task to the estimation of the mean of over a sequence of random vectors , , satisfying certain condition (see [25], Theorem 3.1). That is, is approximated by

 Trace[f(A)]≈1mrmr∑m=1vTmf(A)vm. (2.10)

The calculation of (2.10) for

 f1(A)=(tI−A−iηI)−1orf2(A)=((tI−A)2+η2I)−1 (2.11)

reduces to solving linear systems in the form of

 (tI−iηI−A)x=vmform=1,…,mr, (2.12)

or

 (η2I+(tI−A)2)x=vmform=1,…,mr. (2.13)

These linear systems need to be solved for many target points in the course of a chosen interpolation scheme.

In the case of rank-structured matrices , the solution of equations (2.12) or (2.13) can be implemented with a lower cost. However, even in this favorable situation one requires a relatively large number of stochastic realizations to obtain satisfactory mean value approximation. The convergence rate is expected to be on the order of . On the other hand, with the limited number of interpolation points, the polynomial type of interpolation schemes applied to highly non-regular shapes as shown, say, in Figure 2.4 (left), can only provide limited resolution and is unlikely to reveal spectral gaps and many local peaks of interest.

## 3 Fast evaluation of DOS for rank-structured matrices

### 3.1 DOS by the trace of rank-structured matrix inverse

In what follows, we propose an approach that is based on evaluating the trace term in (2.8) directly (without stochastic sampling). This approach relies on the following two techniques:

• using the low-rank BSE matrix structure as in [6], which allows for each fixed the direct matrix inversion and computation of the respective traces,

• the low-rank QTT tensor interpolation of the function sampled on a fine uniform grid in the whole spectral interval or on some subinterval of .

For the class of block-diagonal plus low-rank matrices arising in the reduced model approach for BSE problem [6, 3], we have (see §2.1 for more details)

 A=E+PQT,withP,Q∈Rn×R,E=% blockdiag{B0,D0}, (3.1)

where the rank parameter is small compared to , the full matrix block is of size , , and is a diagonal matrix of size .

Notice that even in the case of structured matrices in (3.1) the traditional approach by (2.10) leads to a sequence of linear systems (2.12) to be solved many times in the course of stochastic sampling, for each of many interpolation points .

In our approach, for the class of rank-structured matrices (3.1), we propose to avoid stochastic sampling in (2.10) by introducing a direct scheme that allows us to evaluate the trace of matrices or defined in (2.11), corresponding to the matrix resolvent in (2.8) and (2.9), respectively, by one-step straightforward matrix calculation.

To that end, let us first construct the reduced-model approximation to the matrix inverse for the matrix in (3.1), where the block-diagonal part corresponds to the case of (2.8), i.e.,

 B(t)=tIB−iηIB+B0,D(t)=tID−iηID+D0. (3.2)

Here and denote the corresponding matrix blocks in the representation of the diagonal block in the initial BSE matrix, see (3.1), and denote the identity matrices corresponding to the respective index subsets. For the ease of exposition, we further assume that the matrix size of the block in (3.2) is bounded by with . This assumption on the block size ensures the linear complexity scaling of our algorithm in the matrix size .

In what follows, we use the notion for a length- vector of all ones, and for the Hadamard product of matrices.

The following result asserts that the cost of trace calculations is estimated to be .

###### Theorem 3.1

Let the matrix family , , be given by (3.1), with

 E=E(t)=blockdiag{B(t),D(t)},

where are defined in (3.2). Then the trace of the matrix inverse can be calculated explicitly by

 trace[A(t)−1]=trace[B(t)−1]+trace[D(t)−1]−1Tn(U(t)⊙V(t))1R,

where , , and

 K(t)=IR+QTE(t)−1(t)P

is a small matrix. For fixed , assume that with , then the numerical cost is estimated by .

Proof. The analysis relies on the particular structure of the matrix blocks. Indeed, we use the direct trace representation for both rank- and block-diagonal matrices. Our argument is based on the observation that the trace of a rank- matrix , where , , , , can be calculated in terms of skeleton vectors by

 trace[U(t)V(t)T]=R∑k=1⟨uk,vk⟩=1Tn(U(t)⊙V(t))1R,

at the expense . For fixed , define the rank- matrices by

 U(t)=E(t)−1PK(t)−1,V(t)=E(t)−1Q,

then the Sherman-Morrison scheme leads to the representation, see [3],

 A(t)−1=blockdiag{B(t)−1,D(t)−1}−E(t)−1PK(t)−1QTE(t)−1,

where the last term simplifies to

 E(t)−1PK(t)−1QTE(t)−1=U(t)V(t)T.

Now we apply the above formula for the trace of a rank- matrix to obtain the desired representation.

The complexity estimate follows taking into account the bound on the size of matrix block . Indeed, forming involves solving the linear system , for , where is the pre-computed , which can be computed by assumptions at the cost . Here would be re-used to compute itself, and thus stored. The cost for solving this system of equations is (LU factorization of ), plus for backward/forward solves. This completes the proof.

The above representation has to be applied many times for calculating the trace of at each fixed interpolating point , .

Here, we notice that the price to pay for the real arithmetics in equation (2.13) is that we compute with squared matrices which, however, do not increase the asymptotic complexity since there is no increase of the rank in the rank-structured representation of the system matrix, see the following Theorem 3.2. In our applications we do not expect a loss of numerical stability of the algorithm since the condition numbers of are moderate. In what follows we denote by the concatenation of two matrices of compatible size.

###### Theorem 3.2

Given matrix , where is defined by (3.1), then the trace of the real-valued matrix resolvent can be calculated explicitly by

 trace[S−1]=trace[E−10]−1Tn(¯¯¯¯U⊙¯¯¯¯V)12R, (3.3)

with , and , where the real-valued block-diagonal matrix is given by

 E0(t)=η2I+t2I−2tE+E2=(η2+t2)I+blockdiag[B2−2tB,D2−2tD],

and the rank- matrices are represented via concatenation

 ¯¯¯¯P=[−2tQ+EQ+QE+Q(QTQ),Q]∈Rn×2R,¯Q=[Q,EQ]∈Rn×2R,

such that the small core matrix takes the form .

Assume that with , then the numerical cost is estimated by up to a low order term.

Proof. Indeed, given the block-diagonal plus low-rank matrix in the form (3.1), we obtain

 S=(tI−A)2+η2I=E0+¯¯¯¯P¯¯¯¯QT,

where the block-diagonal matrix and the rank- matrix are defined as above. Applying the Sherman-Morrison scheme as above to the block-diagonal plus rank- matrix structure in , the representation result follows. Now we take into account that

 trace[E−10]=trace[(B2−2tB)−1]+trace[(D2−2tD)−1],

then the restriction on the size of the block proves the complexity bound similar to the argument in the proof of the previous theorem.

Based on Theorems 3.1 and 3.2, the calculations in item (A) can be implemented efficiently in both complex and real arithmetics. The following numerics demonstrates the efficiency of the DOS calculation for the rank-structured TDA matrix implemented in real arithmetics as described by (3.3) in Theorem 3.2.

Figures 3.1 and 3.2 demonstrate that using only the structure-based trace representation (3.3) in Theorem 3.2, we obtain the approximation which resolves perfectly the DOS function (for the examples of HO and Ethanol molecules). The exact DOS is shown by the blue line, while the results of structure-based DOS calculation is indicated by the red line (we use the acronym “SMW” for the Sherman-Morrison-Woodbury scheme).

Figure 3.3 shows the rescaled CPU time, i.e. , where denotes the total CPU time for computing the DOS by the algorithm implied by Theorem 3.2. This demonstrates almost linear complexity scaling of the algorithm in , . We applied the algorithm to molecules of different system size (i.e. the size of TDA matrix) varying from till (see Table 3.1 for more details). In all cases the -point representation grid with fixed was used.

We conclude that the algorithm based on representation (3.3) demonstrates the perfect resolution of the DOS function at linear complexity in the system size which allows to treat large molecules.

The approach in item (B) requires fast trace calculations for many different values of parameter in the matrix resolvent. Finer resolution of the spectrum for large molecular systems leads to a considerable increase of the number of samples that is practically equal to the grid size, . Hence, the total cost may become prohibitively expensive since the trace computation for each fixed value of still requires complicated matrix operations (see Theorems 3.1 and 3.2).

### 3.2 Calculating multiple traces of A−1 with lower cost

In this section, we describe a further enhancement scheme for fast multiple calculation of traces on the large set of interpolation points. We outline how it is possible to reduce the complexity of these calculations (reduced model) by using a certain smoothness in in the parametric matrix resolvent by introducing the low rank approximation of the large matrix

 E(t)=[E(t1)−1,…,E(tM)−1]andK(t)=[K(t1)−1,…,K(tM)−1]∈RR2×M

obtained by concatenation of vectorized matrices and , , respectively. The idea is that

 E(t)−1=blockdiag[P(t)−1,D(t)−1]

defines an analytic matrix family on the spectral interval , and so is the family of core matrices . This favorable property allows the model reduction via low rank approximation of the matrix families and , . Suppose that the representations

 K(t)−1=RK∑k=1ck(t)Kk

and

 E(t)−1=blockdiag[P(t)−1,D(t)−1]=RE∑m=1pm(t)Em

are precomputed (this is an additional low-rank approximation procedure which separates the parameter ), where and do not depend on , and inherits the block-diagonal structure that obeys.

We take into account that does not depend on , and plug the above decompositions in the main trace-term to obtain

 Trace[E−1QK−1QTE−1]=Trace[RE∑m=1pm(t)EmQ(RK∑k=1ck(t)Kk)QRE∑m′=1pm′(t)Em′].

Now it follows that

 Trace[E−1QK−1QTE−1]=RE∑m=1pm(t)RK∑k=1ck(t)RE∑m′=1pm′(t)Trace[EmQKkQEm′],

where is a small matrix, , with diagonal and the full matrix , such that .

With these prerequisites, we pre-compute a set of ”time-independent” traces

 Tmkm′=Trace[EmQKkQEm′],m,m′=1,…,RE,k=1,…,RK, (3.4)

and store the numbers to obtain the cheap representation of the trace in terms of only a scalar sum,

 Trace[E−1QK−1QTE−1](t)=RE∑m=1RK∑k=1RE∑m′=1pm(t)ck(t)pm′(t)Tmkm′.

The cost of precomputing each trace-value is estimated by as proven by Theorem 3.1, while the number of coefficients to be stored is about and it is expected to be small or moderate. With these data at hand, the evaluation of the required trace for the particular takes scalar operations independently on .

Notice that the computations in (3.4) are intrinsically parallel, which can be exploited on modern computing hardware using multi-threading or distributed computing.

## 4 QTT approximation of DOS

In what follows, we discuss the QTT approximation of the DOS. We also describe a tensor based heuristic QTT approximation of the DOS by using only an incomplete set of sampling points, i.e., QTT representation by adaptive cross approximation (ACA) [30, 36]. Furthermore, we derive the upper bound on the QTT ranks of the DOS by the Gaussians broadening.

### 4.1 Quantized-TT approximation of function related vectors

In the case of large vector size , the number of representation parameters for the corresponding high-order QTT tensor can be reduced to the logarithmic scaling , which allows the QTT tensor interpolation of the target -vector by using only entries, which are chosen adaptively by the heuristic ACA algorithm [30, 36]. The accuracy of this kind of “approximate interpolation” is controlled by the -truncation of the QTT rank parameters. In the present paper, we apply this approximation technique to long -vectors representing the DOS sampled over the fine representation grid .

The QTT-type approximation of an -vector with , , , is defined as the tensor decomposition (approximation) in the TT or canonical format applied to a tensor obtained by the folding (reshaping) of the initial vector to a -dimensional data array. The latter is thought of as an element of the multi-dimensional quantized tensor space , , and is the auxiliary dimension (virtual, in contrary to the real space dimension ) parameter that measures the depth of the quantization transform. A vector is reshaped to its multi-dimensional quantized image in by -adic folding,

 Fq,d′:x→X=[x(j)]∈Qq,d′,j={j1,…,jd′},

with for . Here, for fixed , we have , and is defined via -coding, such that the coefficients are found from the -adic representation of (binary coding for ),

 i−1=C0+C1q1+⋯+Cd′−1qd′−1≡d′∑ν=1(jν−1)qν−1.

Assuming that for the rank- TT approximation of the quantized image there holds , , the complexity of this representation for the tensor reduces to the logarithmic scale

 qr2logqN≪N.

The computational gain of the QTT approximation is justified by the perfect rank decomposition proven in [23] for a wide class of function-related tensors obtained by sampling the corresponding functions over a uniform or properly refined grid. This class of functions includes complex exponentials, trigonometric functions, polynomials and Chebyshev polynomials, as well as wavelet basis functions. We refer to [11, 29, 19, 24] for further results on QTT approximation of functional vectors and various applications.

In estimating the numerical complexity we use the average QTT rank further denoted by calculated as follows,

 rqtt= ⎷1d−1d−1∑k=1r2k, (4.1)

where the QTT ranks are the TT ranks of the quantized image of a vector.

As an example we present the basic results on the rank- (resp. rank-) QTT representation (with ) of the exponential (resp. trigonometric) vectors [23]. For given , and , the exponential -vector, can be reshaped by the dyadic folding to the rank-, -tensor,

 F2,d′:z↦Z=⊗d′p=1[1z2p−1]T∈Q2,d′. (4.2)

The number of representation parameters specifying the QTT image is reduced dramatically from to .

The trigonometric -vector, can be reshaped by the successive dyadic folding

 F2,d′:t↦T∈Q2,d′,

to the -tensor T, which has both the canonical and the QTT-rank equal to , in the complex and real arithmetics, respectively.

The explicit rank- QTT-representation of the single -vector in (see [12, 29]) with , , reads

The number of representation parameters is . A more detailed discussion of the QTT approximation for function related vectors can be found in [23, 24].

In cases when the exact low-rank QTT representation is not known, an -approximation in the QTT format can be computed by using the standard TT multi-linear approximation tools [28]. As a first illustration, we consider the QTT approximation of the DOS for the 1D finite difference Laplacian operator in with Dirichlet boundary conditions, , discretized on the uniform grid of size with . The corresponding eigenvalues are given by

 λk=4sin2(πk2n),k=1,…,n.

Figure 4.1 represents the Lorentzian-DOS and the corresponding approximation error for its QTT -interpolant with , computed on the representation grid of size .

In this paper we apply the QTT approximation method to the DOS regularized by Gaussians or Lorentzians and sampled on a fine representation grid of size . The QTT approximant can be viewed as the rank structured -interpolant to the highly non-regular function regularizing the exact DOS. In this case the application of traditional polynomial or trigonometric type interpolation is inefficient.

The QTT approach provides a good approximation to on the whole spectral interval and requires only a moderate number of representation parameters , where the average QTT rank , see (4.1) is a small rank parameter adaptively depending on the truncation error .

### 4.2 QTT approximation to DOS via Lorentzians: proof of concept

In this section we demonstrate the efficiency of the QTT approximation applied to the DOS via both Gaussian and Lorentzian blurring. We verify by various numerical experiments that the low-rank QTT approximant resolves perfectly the exact DOS.

In the following numerical examples, we use a sampling vector defined on a grid of size . We set the QTT truncation error to , if not explicitly indicated. For ease of interpretation, we set the pre-factor in (2.4) to . It is worth noting that the QTT-approximation scheme is applied to the full TDA spectrum. Our results demonstrate that it renders good resolution in the whole range of energies (in eV) including large ”zero gaps”.

Figure 4.2, left, represents the TDA DOS (blue line) for HO computed by Gaussian blurring with the parameter , and the corresponding rank- QTT tensor approximation (red line) to the discretized function . For this example, the number of eigenvalues is given by . Figure 4.2, right, provides a zoom of the corresponding DOS and its QTT approximant within the small energy interval eV.

Figure 4.3 demonstrates the resolution of the QTT approximation to the DOS via the Lorentzian blurring indicating similar QTT-ranks as in the case of the Gaussians regularization.

Figure 4.4 (Lorentzian blurring) represents similar data, but for the large Glycine amino acid with . It is worth noting that the average QTT rank of sampled on grid points is about , () though the number of eigenvalues in this case is about times larger than for the water molecule. This means that for a fixed , the QTT-rank remains rather modest relative to the molecular size. This observation confirms Theorem 4.1 in Section 4.4.

A comparison of Figures 4.2 and 4.3 indicates that the Lorentzian based DOS blurring is slightly smoother than Gaussian blurring. The moderate size of the QTT ranks in Figures 4.3 and 4.4 clearly shows the potentials of the QTT -interpolation for modeling the DOS of large lattice type clusters.

We observe several gaps in the spectral densities, see Figure 4.2, 4.3 and 4.4 indicating that polynomial, rational or trigonometric interpolation can be applied only to some energy sub-intervals, but not in the whole interval . Remarkably, the QTT approximant resolves well the DOS function in the whole energy interval including nearly zero values within the spectral gaps (hardly possible for polynomial/rational based interpolation).

### 4.3 Numerics for the QTT interpolation to the DOS function

In the previous section we demonstrated that the QTT tensor approximation provides good resolution for the DOS function calculated for a number of molecules. In what follows, we describe a tensor based heuristic QTT approximation of the DOS by using only an incomplete set of sampling points, i.e., QTT representation by adaptive cross approximation (ACA) [30, 36]. This allows us to recover the spectral density in controllable accuracy with interpolation points, where asymptotically scales logarithmically in the grid size . This heuristic approach can be viewed as a kind of “adaptive QTT -interpolation”. In particular, we show by numerical experiments that the low-rank QTT adaptive cross interpolation provides a good resolution of the target DOS with the number of functional calls that asymptotically scales logarithmically, , in the size of the representation grid.

In the case of large , the QTT interpolant can be computed by the ACA tensor approximation procedure (see [30, 36] for the detailed description) that, in general, does not require the full set of functional values over the -grid. In the case of large this beneficial feature allows to compute the QTT approximation by requiring less than computationally expensive functional evaluations of .

The QTT interpolation via ACA tensor approximation serves to recover the representation parameters of the QTT tensor approximant and normally requires about

 M=Csr2qttlog2N (4.3)

samples of the target -vector111In our application, this is the DOS functional -vector corresponding to representations via matrix resolvents in (2.8) or (2.9). with a small pre-factor , usually satisfying , that is independent of the fine interpolation grid size , see, for example, [22]. This cost estimate seems promising in the perspective of extended or lattice type molecular systems, requiring large spectral intervals and, as a result, a large interpolation grid of size . Here the QTT rank parameter naturally depends on the required truncation threshold , characterizing the -error between the exact DOS and its QTT interpolant. The QTT tensor interpolation reduces the number of functional calls, i.e., , if the QTT rank parameters (or threshold ) are chosen to satisfy the condition

 M=Csr2qttlog2N≤N.

The expression on the left-hand side provides a rather accurate estimate on the number of functional evaluations.

To complete this discussion, we present numerical tests for the low-rank QTT tensor interpolation applied to the long vector discretizing the Lorentzian-DOS on a fine representation grid of size .

Figure 4.5 represents the results of the QTT interpolating approximation to the discretized DOS function (HO molecule). We use the QTT cross approximation algorithm based on [23, 30, 36, 29] and implemented in the MATLAB TT-toolbox (https://github.com/oseledets/TT-Toolbox). Here we set , and , providing . The top two figures display the results on the whole spectral interval, while the bottom figures show the zoom of the same data in the small spectral interval eV.

Figure 4.6 illustrates the logarithmic increase in the number of samples required for the QTT interpolation of the DOS (for the HO molecule) represented on the grid of size , where , provided that the rank truncation threshold is chosen by and the regularization parameter is . In this example, the effective pre-factor in (4.3) is estimated by . This pre-factor characterizes the average number of samples required for the recovery of each of the representation parameters involved in the QTT tensor ansatz.

We observe that the QTT tensor interpolant recovers the exact DOS with a high precision. The logarithmic asymptotic complexity scaling (i.e. the number of functional calls required for the QTT tensor interpolation) vs. the grid size can be observed in Figure 4.6 (blue line).

### 4.4 Upper bounds on the QTT ranks of DOS

In this section we analyze the upper bounds on the QTT ranks of the discretized DOS obtained by Gaussian broadening. Our numerical tests indicate that Lorentzian blurring leads to a similar QTT rank compared with Gaussians blurring when both are applied to the same grid and the same truncation threshold is used in the QTT approximation. We consider the more general case of a symmetric interval, i.e. .

Assume that the function , , in equation (2.5) is discretized by sampling over the uniform -grid with , where the generating Gaussian is given by . Denote the corresponding -vector by , and the resulting discretized density vector by

 ϕη(t)↦p=pη=1nn∑j=1gη,j∈RN,

where the shifted Gaussian is assigned by the vector .

Without loss of generality, we suppose that all eigenvalues are situated within the set of grid points, i.e. . Otherwise, we can slightly relax their positions provided that the mesh size is small enough. This is not a severe restriction for the QTT approximation of functional vectors since storage and complexity requests depend only logarithmically on .

###### Theorem 4.1

Assume that the effective support of the shifted Gaussians , , is included in the computational interval . Then the QTT -rank of the vector is bounded by

 rankQTT(pη)≤Calog3/2(|logε|),

where the constant depends only logarithmically on the regularization parameter .

Proof. The main argument of the proof is similar to that in [19, 11]: the sum of discretized Gaussians, each represented in Fourier basis, can be expanded with merely the same number of Fourier harmonics (uniform basis) as each individual Gaussian.

Now we estimate the number of essential Fourier coefficients of the Gaussian vectors with a fixed exponent parameter ,

 m0=O(a|logη|log3/2(|logε|)),

taking into account their exponential decay. Here denotes the rank truncation threshold. Notice that depends logarithmically on . Since each Fourier harmonic has exact rank- QTT representation (see Section 4.1), we arrive at the claimed bound.

Notice that the Fourier transform of the Lorentzian in (2.6) is given by

 e−|k|η,

thus a similar QTT rank bound can be derived for the case of Lorentzian blurred DOS.

Table 4.1 shows that the average QTT tensor rank remains almost independent of the molecular size, which confirms Theorem 4.1. The weak dependence of the rank parameter on the molecular geometry can be observed.

## 5 Towards calculation of the BSE absorption spectrum

In this section we describe the generalization of our approach to the case of the full BSE system. Within the BSE framework, the optical absorption spectrum of a molecule is defined by

 ϵ(ω)≡d\scriptsize{H}rδ(ωI2n−H)dl=2n∑j=1(d\scriptsize{H}r(zr)j)((zl)\scriptsize{H}