Detecting localized eigenstates
of linear operators
We describe a way of detecting the location of localized eigenvectors of a linear system for eigenvalues with comparatively large. We define the family of functions
where is a parameter and is the th standard basis vector. We prove that eigenvectors associated to eigenvalues with large absolute value localize around local maxima of : the metastable states in the power iteration method (slowing down its convergence) can be used to predict localization. We present a fast randomized algorithm and discuss different examples: a random band matrix, discretizations of the local operator and the nonlocal operator .
Key words and phrases:Eigenvectors; localization; power iteration; randomized numerical linear algebra; Anderson localization.
2010 Mathematics Subject Classification:35P20 (primary), 82B44 (secondary)
1. Introduction and Main Idea
We are interested in spatially localized
eigenvectors of matrices . These
objects are of paramount importance in many fields of mathematics: the
ground state and low-frequency behavior of quantum systems
[1, 7, 10, 11, 15], the behavior of
metastable random dynamical systems [3, 4, 5], the
detection of central points in graphs clusters , the
principal component analysis for sample covariance matrix ,
and many more.
The purpose of this paper is to introduce a simple idea, which provably detects localized eigenstates associated to eigenvalues with large absolute value at low computational cost. We introduce the entire relevant theory for matrices , however, a crucial ingredient is the following: when numerically computing solutions for many infinite-dimensional linear operators of interest (linear/nonlocal/fractional partial differential equations, integral equations, …), these are usually discretized and the discretization respects the spatial ordering of the underlying domain. In particular, if the original continuous object has localized eigenstates and the discretization is sufficiently accurate, then the discretized linear operator will have localized eigenstates on the associated graph. We will completely ignore the question of how operators are discretized and restrict ourselves to the question of how to find localized eigenvectors.
1.2. Main idea
We are given a matrix (not necessarily symmetric) and are interested in finding, if they exist, the location of localized eigenvectors concentrating their mass on relatively few coordinates of
for in the spectral edge (meaning that is comparatively large to the rest of the spectrum, the low-lying eigenvalues, close to 0, can be obtained via the very same method after a transformation of , see below). Since strongly localized eigenstates are essentially created by localized structure, they should also be detectable by completely local operations.
We define given by
where is a parameter and is the th standard basis vector.
The main idea is rather simple: if highly localized eigenvectors exist, then they have a nontrivial inner product with one of the standard basis vectors (whose size can be bounded from below depending only on the scale of localization and not on ). An iterated application of the matrix will then lead to larger growth than it would in other regions. The idea is vaguely related to the stochastic interpretation  of the Filoche-Mayboroda landscape function . The logarithm counteracts the exponential growth purely for the purpose of visualization.
Example. We start by considering a numerical example (see Fig. 1): here where , , is given by a random band matrix with bandwidth around the diagonal (i.e., is a diagonal matrix) and every non-zero entry chosen independently randomly in the interval . A typical outcome can be seen in Figure 1 for : the function has a series of local maxima and the first few eigenvectors localized around these maxima; higher peaks in the landscape corresponds to eigenvalues with larger absolute value.
The value of depends on the precise circumstances; larger values can lead to higher accuracy but also increase the computational cost. It is worth pointing out that it is not interesting to have very large: whenever the largest eigenvalue is simple and the associated eigenvector does not vanish, then this approach becomes less effective since
It is not difficult to see that the convergence speed is going to depend on the spectral gap between the largest eigenvalue and the rest (in terms of absolute value of the eigenvalues)– it is commonly desirable to have a spectral gap; here, we are bound to encounter a delicate interplay between spectral gaps and the scale of localization.
2. Statament of Main result
We give one of the many possible formulation of a rigorous guarantee
of the approach. Indeed, the underlying principle, as outlined in the
previous section, is so simple that there are many ways of turning it
into a precise statement; we give a fairly canonical one but it is by
no means unique and different circumstances may call for different
We start by clarifying our setup and introducing some parameters below. We first phrase everything in a way that is most natural in the setting of band matrices or matrices with rapid decay off the diagonal (which covers on subsets of , discretized by finite difference or finite volume methods) – the general case follows in a rather straightforward manner by replacing the notion of ‘interval’ by ‘subset’, we briefly discuss this below. The only restrictive assumption is the orthogonality of eigenvectors (1), which is usually given in the setting that we are interested in (localization of self-adjoint operators). (2) and (3) introduces various parameters that are always defined, however, in the non-localized regime they may result in a vacuous conclusion (see Figure 2 for an illustration).
The eigenvectors of form an orthonormal basis of and we order the eigenvalues via
Every one of the first eigenvectors has half of its mass supported on an interval , i.e.
and we define as the longest such interval
We assume that, for all , the eigenvector has exponential decay away from the interval , i.e. for all
for some universal constant .
Our main result states that, depending on the size of the spectral gap and the quality of localization, there exist such that the superlevel set intersects all localized intervals and can only be that large in a small neighborhood of the intervals . In particular, this allows for a detection of localization of the first eigenfunctions by looking at alone.
If is chosen such that
then there exists a critical value such that is large on all
and only large in their neighborhood: if , then
The condition on depends on the spectral gap and localization properties. If the spectral gap is large, then is sufficient. If the matrix satisfies that each row has a constant number of non-zero entries (for example, if it is a local discretization of a differential operator in dimensions), this implies the computation of may only require operations. It is clear from the proof that there are many other possible conditions under which similar result could be obtained. Natural variations include the following:
The statement guarantees a gap of size between the values of attained on the intervals and far away from their supports; in most instances a much smaller gap would suffice (especially if one identifies regions of localization via a notion of local maximum of , i.e., attaining a larger value than in a neighborhood of a certain size). We observe that our approach easily implies the inequalities
and for any
In combination, they suggest that the gap size could in various situations be replaced by something much smaller which would improve the error bounds.
Orthogonality is not crucial: if the eigenvectors of form a basis and the angles between different eigenvectors are not too small, our argument easily implies similar results.
If we know in advance that a generic localized eigenfunction is going to be roughly localized on an interval then it is clear that it suffices to compute on a net which further speeds up computation time.
is defined via purely local operations and thus, by definition, its value at a certain location is stable under perturbing the matrix entries far away. On the other hand, it is entirely possible that a large perturbation will destroy the spectral structure of the matrix. With additional assumptions on the perturbation to guarantee spectral stability, it is then possible to use of the unperturbed matrix to predict localization after perturbation.
We also remark that the assumptions of the theorem consists of both
the existence of a gap of the spectrum and localization of the first
few eigenvectors. In some situations, the localization of the
eigenvectors follows from the gap assumption alone, for instance when
the matrix comes from a local discretization of a differential operator,
as established in [2, 13].
As increases, the neighborhoods in which the result guarantees localization are growing linearly in size (though only very slowly if ) and becomes less informative. This is not an artifact but necessary: whenever the largest eigenvalue is simple and the associated eigenvector never vanishes, then
This is similar in spirit to the classical power method for computing the first eigenvector.
It is noteworthy that we exploit exactly the fact that in the edge of the spectrum which makes the power method a slow
method in practice. Or, put differently, our method exploits that highly localized eigenvectors associated to eigenvalues in the spectral edge correspond to metastable states for the power iteration!
When , the first
eigenvector (and potentially other high lying ones) will then
interfere with the performance of the procedure. In such situations, we may
revise the procedure by first identifying those dominant eigenvectors and
then applying the procedure while iteratively projecting onto the orthogonal
complement of the subspace spanned by the dominant eigenvectors.
The details are standard and left to the interested reader.
In many applications, we have the eigenvalue problem
with being positive definite and the dominant characteristics of the physical system being determined by the low-lying eigenvalues . A straightforward application of our localization technique is only going to yield the largest eigenvalues. The obvious modification is to consider the matrix
This operation preserves sparsity and flips the spectrum and the low-lying eigenvalues are now in the spectral edge. This is used in our numerical examples in Section 4. Alternatively, if we are given a self-adjoint, positive definite and linear map on a Hilbert space such that is compact, then a natural way of recovering the bottom of the spectrum is via considertion of the semigroup
An application of the spectral theorem allows us to write the semigroup as
which has slow decay for the small eigenvalues and large decay for
larger eigenvalues. Note that if we Taylor expand and keeps
only the leading order term, we get ,
which connects with the previous trick. We refer to the Appendix A for an application.
The result, as given above, is easiest to understand in the setting of banded matrices. Banded matrices correspond naturally to localized interactions on the lattice . Neighborhoods of points correspond to intervals and this is how the Theorem was phrased. At a greater level of generality, there need not be such underlying structure and we will replace intervals by general subsets . The notion of distance between an element and a subset is implicitly defined via assuming the inequality
to hold. We emphasize that in all the interesting applications, where is the discretization of a differential operator (or somewhat localized integral operator), these notions can be made rather precise and we recover the classical notion of distance in Euclidean space.
3. Fast Randomized Algorithms
Computationally, can be obtained by calculating the -norm of the rows of the matrix . Thus the algorithm is particularly efficient if has structures enabling fast multiplications, such as being sparse, low rank, etc. To further accelerate the computation, we exploit the ideas from randomized numerical linear algebra (see  for a review) to use the following randomized version of the landscape function. For simplicity, we assume that the matrix is symmetric in this section.
We define given by
where is a parameter, is the th standard basis vector, and is a random matrix with i.i.d. entries.
In terms of computation, if , the randomized version only requires applying the matrix on a tall skinny matrix for times, and thus, the randomized version is particularly advantageous for dense , as it brings down the cost from to . As it turns out, the efficiency of this method is intimately coupled to very well studied concepts centered around the stability of random projection onto subspaces. To see this, we denote the columns of by and observe that
while for , we have
We start by quickly discussing a very strong sufficient condition that allows to transfer results almost verbatim from our Theorem to the random case and is equivalent to classical questions in dimensionality reduction; stronger results are discussed below. We observe that certainly all the results transfer in a pointwise manner if we knew that
for a typical realization of such random vectors. However, since the span the space, we are really asking that the map given by
This is, in fact, exactly the question that underlies the study of random projections, and has been dealt with extensively (see e.g., ). The main conclusion is that
is in many cases sufficient, however, the implicit constant may be large.
We now explain why in practice a much smaller number of random vectors suffices. The idea is rather simple and most easily explained by considering the example of one random vector in the case of a large spectral gap . Clearly, we have
The outcome now depends on the random vector , however, for suitable large values of it is clear that in order for the random landscape to profoundly differ from the profile of the leading eigenvector, it is required that is very small: even if it were only moderately small, it would get drastically amplified by the exponential growth and still dominate the expression. The following widely-used Lemma shows that this is not overly likely.
Let satisfy and let be randomly chosen w.r.t. the uniform surface measure on . Then, for ,
where the implicit constant depends only on the dimension.
This simple Lemma quantifies the concentration of measure phenomenon and is standard (see e.g. ). It explains why in the case of highly localized eigenstates it is completely sufficient to work with only one randomly chosen vector. The inner products are not likely to be extremely small and get amplified by an exponential growth while the strong exponential localization preserves the structure. This is easy to make precise in a variety of ways: the simplest case is a spectral gap . The trivial estimate
implies, together with the Lemma above,
We emphasize that this simple argument did not even use localization of the eigenvectors; the trivial estimate is clearly quite weak if the spectrum is spread out, in that case much stronger results should hold.
Example. Let us revisit the example in Section 1.2 using the random sampling.
The same random band matrix , for Figure 1 is used. The randomized version with number of random vectors , , respectively are plotted in the panels of Figure 3. We observe that the randomized landscape function even with only random vector still captures the important feature, in particular the local maxima, of the landscape function.
4. Numerical Examples
4.1. Schrödinger Operator with Potential.
The case of finding a way of numerically detecting localized eigenstates of operators of the form for and Dirichlet conditions on the boundary has recently received a lot attention in the mathematics and physics literature (see e.g., [10, 11, 12, 14, 16]). We explain how our method can be applied to this case (without any restrictions on ). We consider the operator defined on with periodic boundary condition. is a smooth periodic potential generated randomly in the unit square. The pseudo-spectral discretization with mesh size is used.
While the resulting matrix is dense due to the Fourier differentiation of the pseudo-spectral method, the method proposed still applies. In Figure 4, we show the landscape function using and random vectors on the left panel and the sum of the square modulus of the low-lying eigenvectors on the right panel. Good agreement is observed. For visualization, we only plotted the part of landscape function exceeding its maximum value minus .
4.2. Fractional Laplacian
The method extends without much difficulty to the fractional Laplacian . We still consider the computational domain with periodic boundary condition and a smooth randomly generated potential. Thanks to the periodic boundary condition, the fractional Laplacian can be defined through spectral decomposition. Pseudo-spectral discretization is again used. Note that the fractional Laplacian is non-local, regardless of the discretization. In Figure 5, we show the result for where the same potential as in the example of Figure 4 is used. We again observe excellent agreement between the landscape function and the localization of the eigenfunctions.
Proof of the Theorem.
We first show that exceeds a certain value on all the intervals (). We start by showing there is a with
If this were false, then
which is a contradiction. Here the first inequality comes from the assumption that contains half -mass of . A simple application of the spectral theorem implies that
It remains to show that this value cannot be attained unless one is close to one of the localized eigenvectors. If has at least distance from then
We want to relate this to the inequality
which will provide an upper bound on (meaning that if the distance is to the sets is larger than this bound, then we obtain the desired consequence). If, as we assume,
is equivalent to
which is the desired statement. ∎
Appendix A Localized Wannier Bases
In this appendix, we connect the landscape functions to localized Wannier bases. Localized Wannier bases  are maximally localized bases of low-lying eigenfunctions. Give an operator over with mutually orthogonal low-lying eigenfunctions , it is often helpful to work with a basis of where each basis element is as localized as possible (for example to obtain sparser matrices). A classical approach, see e.g., the review article , is to simply project the Dirac measure onto the finite-dimensional subspace to obtain the best possible representation of this localized point in the basis. We illustrate this with a simple example: let us consider on the torus . The projection of onto the span is given by
which is merely the classical Dirichlet kernel. The function is indeed highly localized around , roughly constant for and then exhibits some oscillatory behavior around 0 further away from . Motivated by our main idea, a slightly different idea suggests itself: instead of merely projecting, it makes sense to run the dynamical system with as initial datum on the subspace on which we project. More precisely, this yields a projection that depends on
Basic intuition tells us that, depending on the speed of propagation within the dynamical system, at least for small values of the projection will still look pretty much like the direct propagation. Our main observation is that for sufficiently small values of , the projection will be pretty much as localized as but will have much better decay properties. We illustrate this again on the torus , where
We observe that the infinite limit is given by the Jacobi function
The Jacobi function has exponential decay in the region and, what is especially useful, the exponential decay in the coefficients implies that a cutoff at frequency is only going to introduce a small error. In particular, in this setting, the threshold-value for (where localization properties are not significantly worse) is . This example generalizes immediately to higher-dimensions, where the situation is more or less identical: the direct projection creates oscillations with slow decay throughout the entire space, the diffused leads to a more localized representation. Returning to our original problem for matrices , the relationship to is easily explained: in the case of localized eigenvectors, we see that if is the eigenvector associated to the largest eigenvalue on coordinate , then is a very good local approximation of . The proof of this simple statement proceeds along the very same lines as the proof of our main theorem.
-  P. W. Anderson, Absence of Diffusion in Certain Random Lattices, Phys. Rev. 109 (1958), 1492–1505.
-  M. Benzi, P. Boito, and N. Razouk, Decay properties of spectral projectors with applications to electronic structure, SIAM Rev. 55 (2013), 3–64.
-  A. Bovier, M. Eckhoff, V. Gayrard and M. Klein, Metastability and low lying spectra in reversible Markov chains. Comm. Math. Phys. 228 (2002), no. 2, 219–255.
-  A. Bovier, M. Eckhoff, V. Gayrard, and M. Klein, Metastability in reversible diffusion processes. I. Sharp asymptotics for capacities and exit times. J. Eur. Math. Soc. 6 (2004), no. 4, 399–424.
-  A. Bovier, V. Gayrard and M. Klein, Metastability in reversible diffusion processes. II. Precise asymptotics for small eigenvalues. J. Eur. Math. Soc. 7 (2005), no. 1, 69–99.
-  X. Cheng, M. Rachh and S. Steinerberger, On the Diffusion Geometry of Graph Laplacians and Applications, preprint, arXiv:1611.03033.
-  Weinan E, Tiejun Li, and Jianfeng Lu, Localized bases of eigensubspaces and operator compression, Proc. Natl. Acad. Sci. USA, 107 (2010), 1273–1278.
-  N. Halko, P. G. Martinsson, and J. A. Tropp, Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions, SIAM Rev. (2011), 53(2), 217–288.
-  C. Kenney and A. Laub, Small-sample statistical condition estimates for general matrix functions. SIAM J. Sci. Comput. 15 (1994), no. 1, 36–61.
-  G. Lefebvre, A. Gondel, M. Dubois, M. Atlan, F. Feppon, A. Labbe, C. Gillot, A. Garelli, M. Ernoult, S. Mayboroda, M. Filoche, P. Sebbah, One single static measurement predicts wave localization in complex structures, Phys. Rev. Lett. 117 (2016), 074301.
-  M. Filoche and S. Mayboroda, Universal mechanism for Anderson and weak localization. Proc. Natl. Acad. Sci. USA 109 (2012), no. 37, 14761-14766.
-  M. Filoche and S. Mayboroda, The landscape of Anderson localization in a disordered medium, Contemporary Mathematics, 601 (2013), 113–121.
-  L. Lin and J. Lu, Decay estimates of discretized Green’s functions for Schrödinger type operators, Sci. China Math. 59 (2016) 1561–1578.
-  M. Lyra, S. Mayboroda and M. Filoche, Dual hidden landscapes in Anderson localization on discrete lattices, Euro. Phys. Lett. 109 (2015), 4700.
-  N. Marzari, A. A. Mostofi, J. R. Yates, I. Souza, D. Vanderbilt, Maximally localized Wannier functions: Theory and applications, Rev. Mod. Phys. 84 (2012), 1419–1475.
-  S. Steinerberger, Localization of Quantum States and Landscape Functions, Proc. Amer. Math. Soc, accepted.
-  S. Vempala, The random projection method. With a foreword by Christos H. Papadimitriou. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, 65. American Mathematical Society, Providence, RI, 2004.
-  G. Wannier, The Structure of Electronic Excitation Levels in Insulating Crystals, Phys. Rev. 52 (1937), 191.
-  H. Zou, T. Hastie, R. Tibshirani, Sparse principal component analysis, J. Comput. Graph. Stat. 15 (2006), 262–286.