# An efficient variational principle for the direct optimization of excited states

## Abstract

We present a variational function that targets excited states directly based on their position in the energy spectrum, along with a Monte Carlo method for its evaluation and minimization whose cost scales polynomially for a wide class of approximate wave functions. Being compatible with both real and Fock space and open and periodic boundary conditions, the method has the potential to impact many areas of chemistry, physics, and materials science. Initial tests on doubly excited states show that using this method, the Hilbert space Jastrow antisymmetric geminal power ansatz can deliver order-of-magnitude improvements in accuracy relative to equation of motion coupled cluster theory, while a very modest real space multi-Slater Jastrow expansion can achieve accuracies within 0.1 eV of the best theoretical benchmarks for the carbon dimer.

The ground state variational principle is probably the most important technique in modern electronic structure theory. Through its roles in optimizing Slater determinants in Hartree Fock (HF) [1] and density functional theory (DFT) [2], the matrix product state (MPS) in density matrix renormalization group (DMRG) [3], trial functions in variational Monte Carlo (VMC) [5], and linear combinations in configuration interaction (CI) [6], it exists as a critical element in the vast majority of ground state electronic structure methods used today. Its success rests on the existence of a function

whose global minimum is the Hamiltonian’s ground eigenstate. This function provides a metric telling us which parameterization of an approximate ansatz is closest to the true ground state, thus allowing us to optimize the ansatz’s full variational freedom for that state alone without regard to the description of any other state. In practice, of course, we are constrained in our choice of ansatz to those permitting an efficient evaluation of . This constraint notwithstanding, the ground state variational principle has become an essential part of most electronic structure methods, even those like coupled cluster (CC) theory [7] whose practical application involves non-variational methods as well.

To date, the lack of an efficient analogous function for excited states has hindered the development of methods that can target such states in the same individual and variational way. Instead, existing excited state methods typically require an ansatz to use its variational freedom to satisfy the needs of many eigenstates simultaneously, the difficulty of which has limited our predictive power over the doubly-excited states in light harvesting systems, the spectra of excited state absorption experiments, and the band gaps of transition metal oxides. For example, linear response (LR) methods such as time dependent HF and DFT [8], CI singles (CIS) [8], equation of motion CC with singles and doubles (EOM-CCSD) [9], and LR DMRG [10] are limited by the requirement that all excited states of interest must be found in the ground state’s LR space, which for a nonlinear ansatz is typically much less flexible than its full variational space. In many other cases, such as state-averaged complete active space methods [13], some VMC approaches [15], and directly targeted DMRG [16], crucial ansatz components (often the one particle basis) are required to be the same for the ground and all excited states. While these methods clearly do not take full advantage of an ansatz’s variational freedom, they have been preferred due to the lack of an efficiently optimizable function whose minimum is an excited state.

This report presents a new variational principle consisting of two parts: first, a function whose global minimum is an excited eigenstate, and second, a method for evaluating and minimizing whose cost scales polynomially for a wide class of approximate wave functions. We will begin by proving that has the necessary properties to be the basis of an excited state variation principle, after which we will detail our method for minimizing it. During this discussion, we will explain which wave functions are compatible with the approach, as well as its general applicability in chemistry, physics, and materials science. Finally, we will present numerical examples that demonstrate the method’s potential.

We employ the function

where the energy shift is assumed to be placed in between distinct eigenvalues of H in order to target the eigenstate whose eigenvalue is immediately above in the ordered eigenvalue spectrum. Assuming real numbers for brevity, we proceed to prove that this eigenstate is the global minimum of as follows. First, we write an exact ansatz as a linear combination of all eigenstates of , , and rewrite in terms of ’s eigenvalues.

Differentiating with respect to the elements of , we see that is a stationary point (SP) if and only if

Recalling that is assumed to be distinct from any of ’s eigenvalues, we see that cannot be a SP if any two of its elements that correspond to distinct Hamiltonian eigenvalues are non-zero, as this would prevent from vanishing for both of them. In other words, cannot be a SP of unless the nonzero values in all correspond to one (possibly degenerate) eigenvalue of . As the eigenstates of are clearly SPs of , we see that is a SP if and only if it is an eigenstate. We thus conclude that the global minimum of is the eigenstate whose eigenvalue is immediately above , or a linear combination of such eigenstates if this eigenvalue is degenerate. As can describe any state in Hilbert space, this minimum value will be less than or equal to that of any approximate ansatz, thus achieving the variational property we desire.

While formally interesting, the mere existence of a variational function for excited states is not useful without an efficient way to evaluate and minimize it. Indeed, the presence of makes the straightforward evaluation of drastically more expensive than the ground state function , which is why studies that have worked implicitly with this function in the past [16] have, to the best of our knowledge, always approximated this term (see discussion of harmonic Ritz methods below). Here we avoid explicitly squaring by resolving identities via complete sums over states,

which we may evaluate (up to a controllable statistical uncertainty that obeys the zero variance principle) through Monte Carlo integration as

where the elements of are sampled from via a Metropolis walk (note that the normalization constants for numerator and denominator cancel). Thus any ansatz admitting efficient evaluations for will be compatible with our approach. This includes the wide class of wave functions already used in ground state VMC, such as Slater-Jastrow (SJ) [5], multi-Slater-Jastrow (MSJ) [18], and the Jastrow antisymmetric geminal power (JAGP) [19] as well as the MPS. Moreover, the method is applicable to real space (in which case is a position vector) and Fock space (in which case is an occupation number vector), in open systems (e.g. chemistry) or period ones (e.g. materials’ band gaps), thus enabling essentially all of the tools we have for ground state VMC to be employed in the direct optimization of an individual, -targeted excited state.

To this purpose, we now present a minimization method analogous to the linear method (LM) [25] used for ground states. Replacing the original ansatz with a linear combination of itself and its first derivatives with respect to its parameters ,

we may then minimize with respect to by solving the generalized eigenproblem

Assuming we were already near the minimum, in which case all elements except will be small, then we may use to update through a reverse Taylor expansion exactly as in the traditional LM (in practice we may also shift the eigenproblem as in the LM to ensure this assumption is valid). This entire procedure is then iterated in similar fashion to Newton’s method until the minimum of is reached, thus optimizing both the linear and nonlinear parameters of our original ansatz . As the matrix elements for the eigenproblem can be evaluated through the same stochastic identity resolution as described above, we arrive at a full-fledged and efficient method for the evaluation and minimization of for any ansatz that can be efficiently used with the ground state LM. The precise polynomial cost scaling will of course depend on the choice of , with examples including for real space SJ and JAGP and for Hilbert space JAGP, where and are the number of samples and electrons, respectively, both of which will grow linearly with system size.

Note the similarity of this eigenvalue equation to the harmonic Davidson equation that arises in applications [27] of the harmonic Ritz principle [29] for targeting interior eigenvalues of a matrix. In fact, some of these approaches [16] appear to have been minimizing an approximation to with respect to linear parameters, in which was approximated by , where is the projector into the subspace corresponding to the linear parameters in question. Except for its controllable statistical uncertainty, the present approach makes no approximation when evaluating and can optimize both linear and nonlinear parameters alike.

Before discussing results, we should consider the formal consequences of employing an approximate ansatz. First, the excitation energies we report are taken as the difference in between the ground and excited states. One could instead compute energies as on the assumption that each was an exact eigenstate, but we find this approximation to be much less accurate than computing directly. Second, due to the nonzero energy variance of an approximate ansatz, the value at which ’s global minimum switches states tends to lie lower in energy than it would for an exact ansatz, a trend that all of our results display. Third, as occurs for in the ground state, the global minimum of for an approximate ansatz may break symmetry. Fourth, the SPs of will not necessarily be orthogonal to one another, either within those for a single shift or between different shifts, but they may be orthogonalized after the fact by diagonalizing the Schrödinger equation within the basis they define. In the present results, doing so was only necessary for CH’s 5th and 6th excited states, which our method found as symmetry-broken mixtures of one another and for which we report the post-2-state-diagonalization energies. While these various complications are undesirable, one should remember that the ground state function has been enormously successful despite suffering the same deficiencies. Indeed, the whole point of the variational principle is that as becomes more flexible these issues must abate, and so just as in the ground state case, our excited state variational principle offers a systematically improvable path towards resolving its own difficulties.

As a demonstration in Fock space, we have used the method to optimize the Hilbert space JAGP ansatz [24] for singlet excited states in CH (Figure 1), an H ring (Figure 2), and C (Figure 3). (Full computational details for all demonstrations are available in the appendix.) In CH, the two doubly excited states are absent in CIS due to HF’s limited LR space and are treated poorly by EOM-CCSD. While CCSD’s LR space contains doubles, it lacks the triples necessary to describe the orbital relaxations that should accompany the excitation. Although JAGP’s LR space also lacks triples, it agrees much better with full CI (FCI) [6], because the variational minimization of explores regions of parameter space beyond the LR regime. The excited states of H were more complicated, each having FCI expansions with 12 or more determinants with normalized coefficients above 0.1 as compared to 8 or less for CH. Nonetheless, the same pattern emerges: the large errors in EOM-CCSD are reduced by an order of magnitude in variationally optimized JAGP.

C provides further evidence of JAGP’s superiority to EOM-CCSD for double excitations while also revealing the limits of the ansatz’s flexibility. While JAGP delivers 0.1 eV accuracy versus FCI for excited states 1, 2, 4, and 5, it shows an error almost as large as EOM-CCSD for state 3, a complicated excitation involving four different electrons in a mixture of double excitations. Even with this difficulty, which arises due to the inherently two-electron nature of the ansatz, JAGP proves more accurate than EOM-CCSD for each of the first five singlet excitations of C.

To showcase the method’s systematic improvability and compatibility with a real space Monte Carlo walk, we have also treated C with a MSJ ansatz consisting of short configuration state function (CSF) expansions and spline-based 1- and 2-body Jastrow factors (Figure 4). For each state, we selected CSFs with coefficients above a given threshold from a complete active space (CAS) wave function, leading to fewer than 10 (65) CSFs per state for a threshold of 0.1 (0.01). Under variational optimization (with the random walk now in real space), the worst-case MSJ excitation energy error is found to drop from 0.3 to 0.1 eV upon lowering the threshold, as expected for a systematically improvable method. As a benchmark we use Davidson-corrected multi-reference CI (MRCI+Q) in a triple-zeta basis, which for excited state 5 (the state) is within 0.03 eV of the recent quadruple-zeta DMRG [31] and FCI quantum Monte Carlo (FCIQMC) [32] benchmarks. Significantly, our MSJ result for this state (2.57eV) is within 0.1 eV of these benchmarks (2.47eV), despite containing fewer than 100 variational parameters, compared to more than 4,000 in EOM-CCSD, millions in DMRG, and 2,000 in the FCIQMC trial function. This success, along with MSJ’s high accuracy for C’s other excited states, demonstrates the advantage of optimizing an ansatz directly and variationally for an individual excited state.

We have presented a Monte Carlo method for the efficient, variational optimization of a function whose global minimum can be tuned to target individual excited states and which may be used at polynomial cost with a wide range of approximate wave functions. In demonstrations on three small molecules with low-lying doubly excited states, the method’s ability to explore an ansatz’s full variational freedom allows for drastic improvements in accuracy compared to linear response theories such as EOM-CCSD, which stands as the current state-of-the-art in polynomial-cost methods for excited states in chemistry. Further, we have shown that for the notoriously difficult double excitations of the carbon dimer, variational optimization allows a *very* modest multi-Slater Jastrow expansion to achieve accuracies on par with the much more cumbersome DMRG and FCIQMC benchmarks. Given the central importance of double excitations in light harvesting and excited state absorption experiments, the method’s compatibility with periodic boundary conditions and thus the solid state, its systematically improvable nature, and its direct connection to the most widely used method in ground state electronic structure, we look forward to further exploring its usefulness in modeling challenging excited states.

We acknowledge funding from the Office of Science, Office of Basic Energy Sciences, the US Department of Energy, Contract No. DE-AC02-05CH11231.

## AGeneral Information

EOM-CCSD and FCI results were computed with [33], CIS results with QChem [34], MSJ results with a modified version of QMCPACK [36] with the CAS truncation taken from GAMESS [38], and JAGP results with our own prototype Hilbert space quantum Monte Carlo code with one- and two-electron integrals imported from Psi3 [39]. In JAGP we worked exclusively in the symmetrically orthogonalized “” one particle basis and froze the C 1s orbital at the HF level. All statistical uncertainties were converged to less than 0.01eV in all cases.

## BCh

For CH we used a minimal STO-3G basis set [40] and shifts in Hartree of = -38.4, -38.3, -39.198, -38.15, -38.110, and -38.1 for excited states 1 to 6, respectively. As mentioned in the main text, this resulted in minima of for the last two shifts that corresponded JAGPs that were symmetry broken combinations of the 5th and 6th excited states. The numbers we report for the excitation energies are those after rediagonalizing the Schödinger equation in the basis of these two JAGP wave functions, which restores the desired symmetry. Finally, in Å, the CH geometry was

C | -0.0722376285 | -0.0574604043 | 0.0000000000 |

H | -0.0198102890 | 1.0990427214 | 0.0000000000 |

H | 1.0664179823 | -0.2665333714 | 0.0000000000 |

## CH

For H we used the 6-31G basis [41] and shifts in Hartree of = -3.17, -3.15, and -3.15 for excited states 1 to 3, respectively. The geometry for H was chosen as a regular hexagon with edge lengths (i.e. bond distances) of 1.5 Å.

## DFock Space C

For C with a Fock space random walk we used the 6-31G basis [41] and shifts in Hartree of = -74.85, -74.85, -74.65, -74.60, and -74.60 for excited states 1 to 5, respectively. Note that these shifts were not plotted in the main text as they are all *below* the JAGP ground state energy of -75.5915 Hartree and would not conveniently fit on the plot. Recall that the finite variance of an approximate wave function causes the values of at which the -minimum switches states to shift down in energy, and indeed in C this effect appears to be large enough to push the switching energies below the ground state energy. None the less, when is minimized for these shifts and then the energy is calculated for the resulting JAGP states, the results are those plotted in the main text. In future work, we believe it will be possible to build modified versions of that share its formal properties while also supressing this “early switching”, but for now we simply choose shifts based on where the -minimum switches states. The C bond distance was 1.2425146399 Å.

## EReal Space C

For C with a real space random walk we used the cc-pVTZ basis [42], both for the orbitals of the MSJ wave function and for the CIS, EOM-CCSD, CASSCF, and MRCI+Q calculations. The CAS expansion from which CSFs were were taken for MSJ was the CAS space resulting from an equal weighted state averaged (8e,8o) CASSCF calculation including the ground state and the first 5 singlet excited states. The CSF orbitals were taken as the optimized CASSCF orbitals. The Jastrow factors (one each for electron-nuclear, opposite-spin-electron, and same-spin-electron) were one dimensional functions of the magnitude of the interpartical distance, the natural logarithms of which were parameterized as a 10-section bspline with a cutoff radius of 5 Bohr for the electron nuclear and 10 Bohr for the electron-electron.

We found that the downshifting of the switching values of was even more severe in real space, and that a significant difference was seen when finding the ground state by minimizing versus (note that no significant difference of this type was seen in the Fock space cases). For consistency, we minimized for all states, including the ground state, whose shift was chosen to lie near the point at which the minimum switched to the first excited state. We observed that so long as it was near the switching point, the precise choice of in the ground state optimization had only a very minor effect on the predicted excitation energies, whereas optimizing instead (equivalent to choosing ) produced a substantial ground state bias. The values used for the reported excitation energy calculations were -79.15 for the ground state and -79.00, -79.00, -78.70, -78.70, and -78.68 for excited states 1 to 5, respectively. The C bond distance was 1.2425146399 Å.

### Footnotes

- Electronic mail: eneuscamman@berkeley.edu

### References

*Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory*,

A. Szabo and N. S. Ostlund, Dover Publications, Mineola, N.Y., 1996.*Density-functional theory of atoms and molecules*,

R. G. Parr and W. Yang, Oxford University Press, New York, 1989.**Rev. Mod. Phys.****77**, 259 (2005).

U. Schollwöck,**Annu. Rev. Phys. Chem.****62**, 465 (2011).

G. K.-L. Chan and S. Sharma,**Rev. Mod. Phys.****73**, 33 (2001).

W. M. C. Foulkes, L. Mitas, R. J. Needs, and G. Rajagopal,*Molecular Electronic Structure Theory*,

T. Helgaker, P. Jørgensen, and J. Olsen, John Wiley & Sons, Ltd., West Sussex, England, 2000.**Rev. Mod. Phys.****79**, 291 (2007).

R. J. Bartlett and M. Musiał,**Chem. Rev.****105**, 4009 (2005).

A. Dreuw and M. Head-Gordon,**Annu. Rev. Phys. Chem.****59**, 433 (2008).

A. I. Krylov,**J. Chem. Phys.****140**, 024108 (2014).

N. Nakatani, S. Wouters, D. V. Neck, and G. K.-L. Chan,**Phys. Rev. B****88**, 075122 (2013).

S. Wouters, N. Nakatani, D. V. Neck, and G. K.-L. Chan,**Phys. Rev. B****88**, 075133 (2013).

J. Haegeman, T. J. Osborne, and F. Verstraete,**Chem. Phys. Lett.****115**, 259 (1985).

P. J. Knowles and H.-J. Werner,**Chem. Phys. Lett.****288**, 299 (1998).

J. Finley, P. Malmqvist, B. O. Roos, and L. Serrano-Andrés,**J. Chem. Theory Comput.****5**, 2074 (2009).

C. Filippi, M. Zaccheddu, and F. Buda,**J. Chem. Phys.****127**, 084109 (2007).

J. J. Dorando, J. Hachmann, and G. K.-L. Chan,**J. Chem. Theory Comput.****7**, 103 (2011).

T. S. Chwee and E. A. Carter,**J. Chem. Theory Comput.****8**, 2181 (2012).

M. A. Morales, J. McMinis, B. K. Clark, J. Kim, and G. E. Scuseria,**J. Chem. Phys.****119**, 6500 (2003).

M. Casula and S. Sorella,**J. Chem. Phys.****121**, 7110 (2004).

M. Casula, C. Attaccalite, and S. Sorella,**J. Chem. Phys.****127**, 014105 (2007).

S. Sorella, M. Casula, and D. Rocca,**J. Chem. Phys.****131**, 154116 (2009).

M. Marchi, S. Azadi, M. Casula, and S. Sorella,**Phys. Rev. Lett.****109**, 203001 (2012).

E. Neuscamman,**J. Chem. Phys.****139**, 194105 (2013).

E. Neuscamman,**Phys. Rev. Lett.****98**, 110201 (2007).

C. J. Umrigar, J. Toulouse, C. Filippi, S. Sorella, and R. G. Hennig,**J. Chem. Phys.****128**, 174101 (2008).

J. Toulouse and C. J. Umrigar,**Phys. Rev. B****66**, 245104 (2002).

A. R. Tackett and M. D. Ventra,**J. Comput. Phys.****231**, 4836 (2012).

G. Jordan, M. Marsman, Y.-S. Kim, and G. Kresse,**Linear Algebra Appl.****154-156**, 289 (1991).

R. B. Morgan,**SIAM J. Matrix Anal. Appl.****17**, 401 (1996).

G. L. G. Sleijpen and H. A. Van Der Vorst,**J. Chem. Theory Comput.****142**, 024107 (2015).

S. Sharma,**J. Chem. Theory Comput.****143**, 134117 (2015).

N. S. Blunt, S. D. Smart, G. H. Booth, and A. Alavi,**, version 2012.1, a package of ab initio programs,**

H.-J. Werner et al., see http://www.molpro.net.**Phys. Chem. Chem. Phys.****8**, 3172 (2006).

Y. Shao et al,**WIREs Comput. Mol. Sci.****3**, 317 (2013).

A. Krylov and P. Gill,**JPCS****402**, 012008 (2012).

J. Kim et al.,**Comput. Sci. Eng.****14**, 40 (2012).

K. P. Esler, J. Kim, L. Shulenburger, and D. M. Ceperley,**J. Comput. Chem.****14**, 1347 (1993).

M. W. Schmidt et al.,**J. Comput. Chem.****28**, 1610 (2007).

T. D. Crawford et al.,**J. Chem. Phys.****51**, 2657 (1969).

W. J. Hehre, R. F. Stewart, and J. A. Pople,**J. Chem. Phys.****56**, 2257 (1972).

W. J. Hehre, R. Ditchfield, and J. A. Pople,**J. Chem. Phys.****90**, 1007 (1989).

T. H. Dunning Jr.,