# When Diabatic Trumps Adiabatic in Quantum Optimization

## Abstract

We provide and analyze examples that counter the widely made claim that tunneling is needed for a quantum speedup in optimization problems. The examples belong to the class of perturbed Hamming-weight optimization problems. In one case, featuring a plateau in the cost function in Hamming weight space, we find that the adiabatic dynamics that make tunneling possible, while superior to simulated annealing, result in a slowdown compared to a diabatic cascade of avoided level-crossings. This, in turn, inspires a classical spin vector dynamics algorithm that is at least as efficient for the plateau problem as the diabatic quantum algorithm. In a second case whose cost function is convex in Hamming weight space, the diabatic cascade results in a speedup relative to both tunneling and classical spin vector dynamics.

The possibility of a quantum speedup for finding the solution of classical optimization problems is tantalizing, as a quantum advantage for this class of problems would provide a wealth of new applications for quantum computing. The goal of many optimization problems can be formulated as finding an -bit string that minimizes a given function , which can be interpreted as the energy of a classical
Ising spin system, whose ground state is . Finding the ground state of such systems can be hard if, e.g., the system is strongly frustrated, resulting in a complex energy landscape that cannot be efficiently explored with any known algorithm due to the presence of many local minima Nishimori (2001).
This can occur, e.g., in classical simulated annealing (SA) Kirkpatrick *et al.* (1983),
when the system’s state is trapped in a local minimum.
Provided the barriers between minima are sufficiently thin, quantum mechanics allows the system to tunnel in order to escape from such traps, though such comparisons must be treated with care since the quantum and classical potential landscapes are in general different. It is with this potential advantage over classical annealing that quantum annealing Ray *et al.* (1989); Finnila *et al.* (1994); Kadowaki and Nishimori (1998) and the quantum adiabatic algorithm (QA), were proposed Farhi *et al.* (2000). Thermal hopping and quantum tunneling provide two starkly different mechanisms for solving optimization problems, and finding optimization problems that favor the latter continues to be an open theoretical question S. Suzuki and A. Das (guest
eds.) (2015); Heim *et al.* (2014). To attack this question, in this work we focus on a well-known class of problems known as perturbed Hamming weight oracle (PHWO) problems. These are problems for which instances can be generated where QA either has an advantage over classical random search algorithms with local updates, such as SA Farhi *et al.* (2002a); Reichardt (2004), or has no advantage van Dam *et al.* (2001); Reichardt (2004). Moreover, for PHWO problems with qubit permutation symmetry there is an elegant interpretation of tunneling in terms of a semiclassical potential Farhi *et al.* (2002a); Schaller and Schützhold (2010), which we exploit in this work.

If the total evolution time is sufficiently long so that the adiabatic condition is satisfied, QA is guaranteed to reach the ground state with high probability Jansen *et al.* (2007); Lidar *et al.* (2009). However, this condition is only sufficient, and the scaling of the time to reach the adiabatic regime is therefore not necessarily the right computational complexity metric. The optimal time to solution (TTS), commonly used in benchmarking studies Rønnow *et al.* (2014) [also see the Supplementary Material (SM)], is a more natural metric. It is defined as the minimum total time such that the ground state is observed at least once with desired probability :

(1) |

where is the duration (in QA) or the number of single spin updates (in SA) of a single run of the algorithm, and is the probability of finding the ground state in a single such run. The use of TTS allows for the possibility that multiple short (diabatic) runs of the evolution, each lasting an optimal annealing time , result in a better scaling than a single long (adiabatic) run with an unoptimized .

Here we demonstrate that for a specific class of PHWO problems, the optimal evolution time for QA is far from being adiabatic, and this optimal evolution involves no multi-qubit tunneling. Instead the system leaves the ground state, only to return through a sequence of diabatic transitions associated with avoided-level crossings. We also show that spin vector dynamics, which can be interpreted as a semi-classical limit of the quantum evolution with a product-state approximation, evolves in an almost identical manner.

We note that PHWO problems are strictly toy problems since these problems are typically represented by highly non-local Hamiltonians (see
the SM) and thus are not physically implementable, in the very same sense that the adiabatic Grover search problem is unphysical Roland and Cerf (2002); Rezakhani *et al.* (2010). Nevertheless, these problems provide us with important insights into the mechanisms behind a quantum speed-up, or lack thereof.

The plateau problem.—We focus on PHWO problems with a Hamiltonian of the form:

(2) |

where denotes the Hamming weight of the bit string . The minimum of this function is . In the absence of a perturbation, i.e., , the problem can be solved efficiently classically using a local spin-flip algorithm such as SA, since flipping any ’’ to a ’’ will lower the energy. Note that SA can be interpreted as a random walk, where the decision to walk left or right and flip a spin is given by the Metropolis update rule.

QA evolves the system from its ground state at according to a time-dependent Hamiltonian

(3) |

where we have chosen the standard transverse field “driver” Hamiltonian that assumes no prior knowledge of the form of , and a linear interpolating schedule, with being the dimensionless time parameter.

Reichardt proved the following lower bound on the spectral gap for adiabatic evolutions for general PHWO problems Reichardt (2004):

(4) |

where is the maximum height of the perturbation. Note that while the lower-bound holds for all perturbations, it is only non-trivial when it is positive for all . Details of our simulations methods and a summary of the proof are given in the SM.

We focus on the following “plateau” problem:

(5) |

Thus, here . We note that when both a lower bound is not obtainable from Reichardt’s proof (this is explained in the SM), although numerical diagonalization reveals a constant gap. We demonstrate below that this case is nevertheless particularly hard for SA, and hence it is the focus of our study.

Adiabatic dynamics.—We now demonstrate that, under the assumption of adiabatic dynamics, QA efficiently tunnels through a barrier and solves the plateau problem in at most linear time, while SA requires a time that grows polynomially in the problem size .

In the adiabatic (long-time) limit, SA follows the thermal Gibbs state parametrized by the inverse temperature , whereas QA follows the instantaneous ground state parametrized by . We quantify the distance of these two states from the target (the state) by calculating the expectation value of the Hamming weight operator, defined as . In particular, when , the system has a high probability of being in the state .

As we tune (the annealing parameters) and , the plateau in Eq. (5) induces a dramatic change in , of the order of the plateau width and over a narrow range of the annealing parameters, as shown in Fig. 1. For SA, traversing the plateau to reach the Gibbs state is a hard problem because, as the random walker moves closer to , the probability to hop in the wrong direction of increasing Hamming weight becomes greater than the reverse. As we prove in the SM, consequently SA takes single-spin updates to find the ground-state.

Unlike SA, to solve the plateau problem QA must tunnel through an energy barrier. To demonstrate this, we first construct the semiclassical effective potential arising from the spin-coherent path-integral formalism Klauder (1979):

(6) |

where are the spin-1/2 (symmetric) coherent states ^{1}

Optimal QA via Diabatic Transitions.—Even though QA encounters a constant gap and can tunnel efficiently, the possibility remains that this does not lead to an optimal TTS, since this result assumes the adiabatic limit. We thus consider a diabatic form of QA and next demonstrate, using the optimal TTS criterion defined in Eq. (1), that the optimal annealing time for QA is far from adiabatic. Instead, as shown in Fig. 2, the optimal TTS for QA is such that the system leaves the instantaneous ground state for most of the evolution and only returns to the ground state towards the end. The cascade down to the ground state is mediated by a sequence of avoided energy level-crossings. As increases for fixed , repopulation of the ground state improves for fixed , hence causing TTS to decrease with , as seen Fig. 2, until it saturates to a constant at the lowest possible value, corresponding to a single run at . At this point the problem is solved in constant time , compared to the scaling of the adiabatic regime. Moreover, as shown in Fig. 2, there are no sharp changes in , suggesting that the non-adiabatic dynamics do not entail multi-qubit tunneling events, unlike the adiabatic case.

Given the absence of tunneling in the time-optimal quantum evolution, we are motivated to consider a semiclassical limit of the evolution, particularly that of classical spin-vector dynamics (SVD), which we describe in detail in the SM. SVD can be derived as the saddle-point approximation to the path integral formulation of QA in the spin-coherent basis Owerre and Paranjape (2015). The equations share the same qubit permutation symmetry as the quantum Hamiltonian, which significantly simplifies the computation of the oracle [i.e., the potential (and its derivatives), now given by Eq. (6)] assumed for SVD. The SVD equations are equivalent to the Ehrenfest equations for the magnetization under the assumption that the density matrix is a product state, i.e., , where denotes the state of the th qubit.

As we show in Fig. 2, the scaling of SVD’s optimal TTS also saturates to a constant time, i.e., . Moreover, it reaches this value earlier (as a function of problem size ) than QA, thus outperforming QA for small problem sizes, while for large enough both achieve scaling. As seen in the inset, SVD’s advantage persists as a function of at constant .

The dynamics of QA in the non-adiabatic limit are well approximated by SVD until close to the end of the evolution, as shown in Fig. 2: the trace-norm distance between the instantaneous states of QA and SVD is almost zero until , after which the states start to diverge. This suggests that SVD is able to replicate the QA dynamics up to this point, and only deviates because this makes it more successful at repopulating the ground state than QA.

Discussion.—For the class of PHWO problems studied here, we have demonstrated that tunneling is
not necessary to achieve the optimal TTS. Instead, the optimal trajectory uses diabatic transitions to first scatter completely out of the ground state and return via a sequence of avoided level crossings.
This use of diabatic transitions is similar in spirit to those studied in Refs. Somma *et al.* (2012); Crosson *et al.* (2014); Hen (2014); Steiger *et al.* (2015), but there are some important differences.
essentially, our PHWO findings hold for the standard formulation of QA, without any fine-tuning of the interpolation schedule or the Hamiltonian. However, while both the adiabatic and diabatic quantum algorithms outperform SA for the plateau problem, the faster quantum diabatic algorithm is not better than the classical SVD algorithm for this problem.
These results extend beyond the plateau problem: as we show in the SM
even the “spike” problem studied in Ref. Farhi *et al.* (2002a)—which is in some sense the antithesis of the plateau problem since it features a sharp spike at a single Hamming weight—also exhibits the diabatic-beats-adiabatic phenomenon, indicating that tunneling is not required to efficiently solve the problem. Moreover, SVD is faster for this problem as well.

However, the mechanism we found here, of a “lining-up” of the avoided level crossings with an associated “diabatic cascade” [seen in Fig. 2], may not be generic.
E.g., we have checked that this mechanism is absent in the adiabatic Grover problem with a transverse field driver Hamiltonian [as in our Eq. (3)], even though the Grover problem is then equivalent to a “giant” plateau problem: ^{2}

It is important to note that we do not claim that PHWO problems are always associated with diabatic cascades (see the SM for a counterexample); nor do we claim that SVD will always have an advantage over QA. A simple counterexample to the latter statement comes from the class of cost functions that are convex in Hamming weight space, which have a constant minimum gap Jarret and Jordan (2014):

(7) |

We have observed similar diabatic transitions for this problem as for the plateau (not shown), thus obviating tunneling, but find that QA outperforms SVD, as shown in Fig. 3. This occurs because the optimum TTS for QA occurs at a slightly higher optimal annealing time, i.e., there is an advantage to evolving somewhat more slowly, though still far from adiabatically. Thus, this is a case of a “limited” quantum speedup Rønnow *et al.* (2014).

In summary, our work provides a counterargument to the widely made claim that tunneling is needed for a quantum speedup in optimization problems. Which features of Hamiltonians of optimization problems favor diabatic or adiabatic algorithms remains an open question.

Acknowledgements.—Special thanks to Ben Reichardt for insightful conversations and for suggesting the plateau problem, and to Bill Kaminsky for inspiring talks Kaminsky (2014a, b). We also thank Itay Hen, Joshua Job, Iman Marvian, Milad Marvian, and Rolando Somma for useful comments. The computing resources were provided by the USC Center for High Performance Computing and Communications and by the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work was supported under ARO grant number W911NF-12-1-0523 and ARO MURI Grant No. W911NF-11-1-0268.

Supplementary Material for

“Diabatic Trumps Adiabatic in Quantum Optimization”

## Appendix A Derivation of Eq. (1)

Equation (1) is easily derived as follows: the probability of successively failing times is , so the probability of succeeding at least once after runs is , which we set equal to the desired success probability ; from here one extracts the number of runs and multiplies by to get the time-to-solution TTS. Optimizing over yields TTS, which is natural for benchmarking purposes in the sense that it captures the trade-off between repeating the algorithm many times vs optimizing the probability of success in a single run. The adiabatic regime might be more attractive if one seeks a theoretical guarantee to have a certain probability of success if the evolution is sufficiently slow.

## Appendix B (Non-)Locality of PHWO problems

Since the PHWO problems, including the plateau, are quantum oracle problems, they can generically not be represented by a local Hamiltonian. For completeness we prove this claim here and also show why the (plain) Hamming weight problem is -local.

Let be a bit string of length , i.e., and let

(8) |

with and . This forms an orthonormal basis for the vector space of diagonal Hamiltonians. Thus:

(9) |

with

(10a) | ||||

(10b) | ||||

(10c) |

Note that generically will be be non-zero for arbitrary-weight strings , leading to -local terms in , even as high as -local.

E.g., substituting the plateau Hamiltonian [Eq. (5)] into this we obtain:

(11) | |||||

On the other hand, if (i.e., in the absence of a perturbation), the Hamiltonian is only -local:

(12a) | ||||

(12b) | ||||

(12c) | ||||

(12d) |

## Appendix C Methods

### c.1 Simulated Annealing

SA is a general heuristic solver Kirkpatrick *et al.* (1983), whereby the system is initialized in a high temperature state, i.e., in a random state, and the temperature is slowly lowered while undergoing Monte Carlo dynamics. Local updates are performed according to the Metropolis rule Metropolis *et al.* (1953); Hastings (1970): a spin is flipped and the change in energy associated with the spin flip is calculated. The flip is accepted with probability :

(13) |

where is the current inverse temperature along the anneal. Note that there could be different schemes governing which spin is to be selected for the update. We consider two such schemes: random spin-selection – where the next spin to be updated is selected at random; and sequential spin-selection – where one runs through all of the spins in a sequence. Random spin-selection (including just updating nearest neighbors) satisfies detailed-balance and thus is guaranteed to converge to the Boltzmann distribution. Sequential spin-selection does not satisfy strict detailed balance (since the reverse move of sequentially updating in the reverse order never occurs), but it too converges to the Boltzmann distribution Manousiouthakis and Deem (1999). In sequential updating, a “sweep” refers to all the spins having been updated once. In random spin-selection, we define a sweep as the total number of spin updates divided by the total number of spins. When it is possible to parallelize the spin updates, the appropriate metric of time-complexity is the number of sweeps , not the number of spin updates (they differ by a factor of ) Rønnow *et al.* (2014). However, in our problem this parallelization is not possible and hence the appropriate metric is the number of spin updates, and this is what is plotted in Fig. 2. After each sweep, the inverse temperature is incremented by an amount according to an annealing schedule, which we take to be linear, i.e. .

We can use SA both as an *annealer* and as a *solver* Hen *et al.* (2015). In the former, the state at the end of the evolution is the output of the algorithm, and can be thought of as a method to sample from the Boltzmann distribution at a specified temperature. For the latter, we select the state with the lowest energy found along the entire anneal as the output of the algorithm, the better technique if one is only interested in finding the global minimum. We use the latter to maximize the performance of the algorithm.

More details concerning the performance of SA in the context of the problem studied here are presented below.

### c.2 Quantum Annealing

Here we consider the most common version of quantum annealing:

(14) |

where is the dimensionless time parameter and is the total anneal time. The initial state is taken to be , which is the ground state of .

The initial ground state and the total Hamiltonian are symmetric under qubit permutations (recall that for our class of problems). It then follows that the time-evolved state, at any point in time, will also obey the same symmetry. Therefore the evolution is restricted to the -dimensional symmetric subspace, a fact that we can take advantage of in our numerical simulations. This symmetric subspace is spanned by the Dicke states with , which satisfy:

(15a) | ||||

(15b) |

where , . We can denote these states by:

(16) |

where, .

In this basis the Hamiltonian is tridiagonal, with the following matrix elements:

(17a) | ||||

(17b) |

The Schrödinger equation with this Hamiltonian can be solved reliably using an adaptive Runge-Kutte Cash-Karp method Cash and Karp (1990) and the Dormand-Prince method Dormand and Prince (1980) (both with orders and ).

If the quantum dynamics is run adiabatically the system remains close to the ground state during the evolution, and an appropriate version of the adiabatic theorem is satisfied. For evolutions with a constant spectral gap for all , an adiabatic condition of the form

(18) |

is often claimed to be sufficient A. Messiah (1962) (however, see discussion after Eq. (21) in Ref. Jansen *et al.* (2007)). In our case is upper-bounded by ; since we are considering a constant gap, the adiabatic algorithm can scale at most linearly by condition (18). This is true for the plateau problems.

We demonstrate in SM-E that the following version of the adiabatic condition, known to hold in the absence of resonant transitions between energy levels Amin (2009), estimates the scaling we observe very well:

(19) |

where and are the instantaneous ground and excited states in the symmetric subspace respectively. To extract from this condition we simply ensure that for a given choice of , condition (19) holds (recall that ).

### c.3 Spin-Vector Dynamics

Starting with the spin-coherent path integral formulation of the quantum dynamics, we can obtain Spin Vector Dynamics (SVD) as the saddle-point approximation (see, for example, p.10 of Ref. Owerre and Paranjape (2015) or Refs. Smolin and Smith (2013); Albash *et al.* (2015)). It can be interpreted as a semi-classical limit describing coherent single qubits interacting incoherently. In this sense, SVD is a well motivated classical limit of the quantum evolution of QA. SVD describes the evolution of unit-norm classical vectors under the Lagrangian (in units of ):

(20) |

where is a tensor product of independent spin-coherent states Arecchi *et al.* (1972):

(21) |

We can define an *effective semiclassical potential* associated with this Lagrangian:

(22) |

with the probability of finding the all-zero state at the end of the evolution (which is the ground state in our case), as . The quantum Hamiltonian obeys qubit permutation symmetry: where is a unitary operator that performs an arbitrary permutation of the qubits. This implies that our classical Lagrangian obeys the same symmetry:

where the derivative operator is trivially permutation symmetric. Therefore, the Euler-Lagrange equations of motion derived from this action will be identical for all spins. Thus, if we have symmetric initial conditions, i.e., , then the time evolved state will also be symmetric:

(24) |

As we show below, under the assumption of a permutation-invariant initial condition we only need to solve two (instead of ) semiclassical equations of motion:

(25a) | |||

(25b) |

where we have defined the symmetric effective potential as:

(26) |

and is simply with all the ’s and ’s set equal. Note that in the main text [see Eq. (6)], we slightly abuse notation for simplicity, and use instead of . The probability of finding the all-zero bit string at the end of the evolution is accordingly given by . We would have arrived at the same equations of motion had we used the symmetric spin coherent state in our path integral derivation, but that would have been an artificial restriction. In our present derivation the symmetry of the dynamics naturally imposes this restriction.

Note that the object in Eq. (C.3) involves a sum over all bit-strings and is thus exponentially hard to compute; on the other hand, the object in Eq. (C.3) only involves a sum over terms and is thus easy to compute. Therefore, just as in the quantum case—where due to permutation symmetry the quantum evolution is restricted to the dimensional subspace of symmetric states instead of the full -dimensional Hilbert space—given knowledge of the symmetry of the problem we can efficiently compute the SVD potential and efficiently solve the SVD equations of motion.

We also remark that the computation of the potential in Eq. (C.3) is significantly simplified if our cost function, , is given in terms of a local Hamiltonian. For example, if , then:

(27) |

which is easy to compute if is a number of terms.

Let us now derive the symmetric SVD equations of motion (25). Without any restriction to symmetric spin-coherent states, the SVD equations of motion, for the pair , read:

(28a) | |||

(28b) |

As can be seen by comparing Eqs. (25) and (28), it is sufficient to show that:

(29) |

and an analogous statement holding for derivatives with respect to . This claim is easily seen to hold true for the term multiplying in Eq. (C.3):

(30) |

where in the last line we used Eq. (C.3). Next we focus on the term multiplying in Eq. (C.3). This term has no dependence and thus we only consider the derivatives. First note that

(31) |

Now, we set all the ’s equal. Let us define . Using this and the fact that is only a function of the Hamming weight (which is equivalent to the qubit permutation symmetry), we can rewrite the last expression, after a few steps of algebra, as:

(32) |

### c.4 Simulated Quantum Annealing

Although not discussed in the main text, in SM-E we use an alternative method to simulated annealing, namely simulated quantum annealing (SQA, or Path Integral Monte Carlo along the Quantum Annealing schedule) Martoňák *et al.* (2002); Santoro *et al.* (2002). This is an annealing algorithm based on discrete-time path-integral quantum Monte Carlo simulations of the transverse field Ising model using Monte Carlo dynamics. At a given time along the anneal, the Monte Carlo dynamics samples from the Gibbs distribution defined by the action:

(33) |

where is the spacing along the time-like direction, is the ferromagnetic spin-spin coupling along the time-like direction, and denotes a spin configuration with a space-like direction (the original problem direction, indexed by ) and a time-like direction (indexed by ). For our spin updates, we perform Wolff cluster updates Wolff (1989) along the imaginary-time direction only. For each space-like slice, a random spin along the time-like direction is picked. The neighbors of this spin are added to the cluster (assuming they are parallel) with probability

(34) |

When all neighbors of the spin have been checked, the newly added spins are checked. When all spins in the cluster have had their neighbors along the time-like direction tested, the cluster is flipped according to the Metropolis probability using the space-like change in energy associated with flipping the cluster. A single sweep involves attempting to update a single cluster on each space-like slice.

We can use SQA both as an *annealer* and as a *solver* Hen *et al.* (2015). In the former, we randomly pick one of the states on the Trotter slices at the end of the evolution as the output of the algorithm, while for the latter, we pick the state with the lowest energy found along the entire anneal as the output of the algorithm. We use the latter to maximize the performance of the algorithm.

## Appendix D Review of the Hamming weight problem and Reichardt’s bound for PHWO problems

Here we closely follow Ref. Reichardt (2004).

### d.1 The Hamming weight problem

We review the analysis within QA of the minimization of the Hamming weight function , which counts the number of ’s in the bit string . This problem is of course trivial, and the analysis given here is done in preparation for the perturbed problem.

For the adiabatic algorithm, we start with the driver Hamiltonian,

(35) |

which has as the ground state.

The final Hamiltonian for the cost function is

(36) |

which has as the ground state.

We interpolate linearly between and :

(37) | ||||

(38) | ||||

(39) |

Since there are no interactions between the qubits, this problem can be solved exactly by diagonalizing the Hamiltonian on each qubit separately. For each term, we have the energy eigenvalues ,

(40) |

and associated eigenvectors,

(41) |

The ground state of is . The gap is given by,

(42a) | ||||

(42b) | ||||

(42c) | ||||

(42d) |

The gap is minimized at with minimum value . The minimum gap is independent of and hence does not scale with problem size. Therefore the adiabatic run time is given by,

(43) |

where the -dependence is solely due to (see SM-C.2).

It also useful to consider the form of . We can write,

(44) | |||||

If we measure in the computational basis, the probability of getting outcome is determined by :

(45) |

where

(46) |

### d.2 Reichardt’s bound for PHWO problems

Here we review Reichardt’s derivation of the gap lower-bound for general PHWO problems, but provide additional details not found in the original proof Reichardt (2004).

We use the same initial Hamiltonian [Eq. (35)] and linear interpolation schedule as before, , and choose the final Hamiltonian to be

(47) |

where

(48) |

where is the perturbation. Note that here we have *not* assumed that the perturbation, , respects qubit permutation symmetry.

We wish to bound the minimum gap of . Unlike the Hamming weight problem , this problem is no longer non-interacting. Define

(49) |

###### Lemma 1 (Reichardt (2004)).

Let and let and be the ground state energies of and , respectively. Then .

###### Proof.

First note that

(50) |

Below, we suppress the dependence of all the terms for notational simplicity. We know that . Using this,

(51a) | ||||

(51b) | ||||

(51c) | ||||

(51d) | ||||

(51e) | ||||

(51f) |

where is the number of strings with Hamming weight , and we used Eq. (45).

Consider the partial binomial sum (dropping the ’s),

(52) |

Using the fact that the binomial is well-approximated by the Gaussian in the large limit (note that this approximation requires that and not be too close to zero), we can write:

(53) |

where , and . Note that and depend on , and also on via . The parameters and are specified by the problem Hamiltonian, and are therefore allowed to depend on as long as is satisfied for all .

Let us define:

(54) |

We seek an upper bound on this function. We observe that decreases monotonically from to as goes from to . Thus, the mean of the Gaussian decreases from to . Depending on the values of , and , we thus have three possibilities: (i) , (ii) , and (iii) . Note that (ii) and (iii) are cases where the integral runs over the tails of the Gaussian and so the integral is exponentially small. We focus on (i), as this induces the maximum values of the integral. In this case the lower limit of the integral Eq. (54) is negative, while the upper limit is positive. Thus, the integral runs through the center of the standard Gaussian, and we can upper-bound the value of the integral by the area of the rectangle of width and height . Hence

(55a) | ||||

(55b) | ||||

(55c) |

where we have used the fact that .

Thus, we obtain the bound:

(56) |

∎

###### Lemma 2 (Reichardt (2004)).

If is non-negative, then the spectrum of lies above the spectrum of . That is, for all , where and denote the th largest eigenvalue of and , respectively.

This can be proved by a straightforward application of the Courant-Fischer min-max theorem (see, for example, Ref. Horn and Johnson (2012)).

Combining these lemmas results in the desired bound on the gap:

(57a) | ||||

(57b) | ||||

(57c) | ||||

(57d) |

where in Eq. (57b) we used Lemma 2 and in Eq. (57d), we used Lemma 1.

Now, if we choose a parameter regime for the perturbation such that , then the perturbed problem ma