Littlewood-Offord, circular law, universality

From the Littlewood-Offord problem to the Circular Law: universality of the spectral distribution of random matrices

Terence Tao Department of Mathematics, UCLA, Los Angeles CA 90095-1555  and  Van Vu Department of Mathematics, Rutgers, Piscataway, NJ 08854

The famous circular law asserts that if is an matrix with iid complex entries of mean zero and unit variance, then the empirical spectral distribution (ESD) of the normalized matrix converges both in probability and almost surely to the uniform distribution on the unit disk . After a long sequence of partial results that verified this law under additional assumptions on the distribution of the entries, the circular law is now known to be true for arbitrary distributions with mean zero and unit variance. In this survey we describe some of the key ingredients used in the establishment of the circular law at this level of generality, in particular recent advances in understanding the Littlewood-Offord problem and its inverse.

1991 Mathematics Subject Classification:
15A52, 60G50
T. Tao is supported by NSF grant CCF-0649473 and a grant from the MacArthur Foundation.
V. Vu is is supported by NSF Career Grant 0635606.

1. ESD of random matrices

For an matrix with complex entries, let

be the empirical spectral distribution (ESD) of its eigenvalues (counting multiplicity), thus for instance

for any (we use to denote the cardinality of a finite set ), and

for any continuous compactly supported . Clearly, is a discrete probability measure on .

A fundamental problem in the theory of random matrices is to compute the limiting distribution of the ESD of a sequence of random matrices with sizes tending to infinity [34, 4]. In what follows, we consider normalized random matrices of the form , where has entries that are iid random variables . Such matrices have been studied at least as far back as Wishart [58] (see [34, 4] for more discussion).

One of the first limiting distribution results is the famous semi-circle law of Wigner [57]. Motivated by research in nuclear physics, Wigner studied Hermitian random matrices with (upper triangular) entries being iid random variables with mean zero and variance one. In the Hermitian case, of course, the ESD is supported on the real line . He proved that the expected ESD of a normalized Hermitian matrix , where has iid gaussian entries , converges in the sense of probability measures111We say that a collection of probability measures converges to a limit if one has for every continuous compactly supported function , or equivalently if converges to for all . to the semi-circle distribution


on the real line, where denotes the indicator function of a set .

Theorem 1.1 (Semi-circular law for the Gaussian ensemble).

[57] Let be an random Hermitian matrix whose entries are iid gaussian variables with mean 0 and variance 1. Then, with probability one, the ESD of converges in the sense of probability measures to the semi-circle law (1).

Henceforth we shall say that a sequence of random probability measures converges strongly to a deterministic probability measure if, with probability one, converges in the sense of probability measures to . We also say that converges weakly to if for every continuous compactly supported , converges in probability to , thus as for each . Of course, strong convergence implies weak convergence; thus for instance in Theorem 1.1, converges both weakly and strongly to the semicircle law.

Wigner also proved similar results for various other distributions, such as the Bernoulli distribution (in which each equals with probability and with probability ). His work has been extended and strengthened in several aspects [1, 2, 36]. The most general form was proved by Pastur [36]:

Theorem 1.2 (Semi-circular law).

[36] Let be an random Hermitian matrix whose entries are iid complex random variables with mean 0 and variance 1. Then ESD of converges (in both the strong and weak senses) to the semi-circle law.

The situation with non-Hermitian matrices is much more complicated, due to the presence of pseudospectrum222Informally, we say that a complex number lies in the pseudospectrum of a square matrix if is large (or undefined). If lies in the pseudospectrum, then small perturbations of can potentially cause to fall into the spectrum of , even if it is initially far away from this spectrum. Thus, whenever one has pseudospectrum far away from the actual spectrum, the actual distribution of eigenvalues can depend very sensitively (in the worst case) on the coefficients of . Of course, our matrices are random rather than worst-case, and so we expect the most dangerous effects of pseudospectrum to be avoided; but this of course requires some analytical effort to establish, and deterministic techniques (e.g. truncation) should be used with extreme caution, since they are likely to break down in the worst case. that can potentially make the ESD quite unstable with respect to perturbations. The non-Hermitian variant of this theorem, the Circular Law Conjecture, has been raised since the 1950’s (see Chapter 10 of [4] or the introduction of [3])

Conjecture 1.3 (Circular law).

Let be the random matrix whose entries are iid complex random variables with mean 0 and variance 1. Then the ESD of converges (in both the strong and weak senses) to the uniform distribution on the unit disk .

The numerical evidence for this conjecture is extremely strong (see e.g. Figure 1). However, there are significant difficulties in establishing this conjecture rigorously, not least of which is the fact that the main techniques used to handle Hermitian matrices (such as moment methods and truncation) can not be applied to the non-Hermitian model (see [4, Chapter 10] for a detailed discussion). Nevertheless, the conjecture has been intensively worked on for many decades. The circular law was verified for the complex gaussian distribution in [34] and the real gaussian distribution in [12]. An approach to attack the general case was introduced in [18], leading to a resolution of the strong circular law for continuous distributions with bounded sixth moment in [3]. The sixth moment hypothesis in [3] was lowered to moment for any in [4]. The removal of the hypothesis of continuous distribution required some new ideas. In [21] the weak circular law for (possibly discrete) distributions with subgaussian moment was established, with the subgaussian condition relaxed to a fourth moment condition in [35] (see also [19] for an earlier result of similar nature), and then to moment in [22]. Shortly before this last result, the strong circular law assuming moment was established in [54]. Finally, in a recent paper [55], the authors proved this conjecture (in both strong and weak forms) in full generality. In fact, we obtained this result as a consequence of a more general theorem, presented in the next section.

2. Universality

An easy case of Conjecture 1.3 is when the entries of are iid complex gaussian. In this case there is the following precise formula for the joint density function of the eigenvalues, due to Ginibre [17] (see also [34], [25] for more discussion of this formula):


From here one can verify the conjecture in this case by a direct calculation. This was first done by Mehta and also Silverstein in the 1960s:

Theorem 2.1 (Circular law for Gaussian matrices).

[34] Let be an random matrix whose entries are iid complex gaussian variables with mean 0 and variance 1. Then, with probability one, the ESD of tends to the circular law.

A similar result for the real gaussian ensemble was established in [12]. These methods rely heavily on the strong symmetry properties of such ensembles (in particular, the invariance of such ensembles with respect to large matrix groups such as or ) in order to perform explicit algebraic computations, and do not extend directly to more combinatorial ensembles, such as the Bernoulli ensemble.

The above mentioned results and conjectures can be viewed as examples of a general phenomenon in probablity and mathematical physics, namely, that global information about a large random system (such as limiting distributions) does not depend on the particular distribution of the particles. This is often referred to as the universality phenomenon (see e.g. [9]). The most famous example of this phenomenon is perhaps the central limit theorem.

In view of the universality phenomenon, one can see that Conjecture 1.3 generalizes Theorem 2.1 in the same way that Theorem 1.2 generalizes Theorem 1.1.

A demonstration of the circular law for the Bernoulli and the Gaussian case appears333We thank Phillip Wood for creating the figures in this paper. in the Figure 1.

Bernoulli                                         Gaussian

Figure 1. Eigenvalue plots of two randomly generated 5000 by 5000 matrices. On the left, each entry was an iid Bernoulli random variable, taking the values and each with probability . On the right, each entry was an iid Gaussian normal random variable, with probability density function is . (These two distributions were shifted by adding the identity matrix, thus the circles are centered at rather than at the origin.)

The universality phenomenon seems to hold even for more general models of random matrices, as demonstrated by Figure 2 and Figure 3.

Bernoulli                                         Gaussian

Figure 2. Eigenvalue plots of randomly generated by matrices of the form , where . In the left column, each entry of was an iid Bernoulli random variable, taking the values and each with probability , and in the right column, each entry was an iid Gaussian normal random variable, with probability density function is . In the first row, is the deterministic matrix , and in the second row is the deterministic matrix (in each case, the first diagonal entries are ’s, and the remaining entries are or as specified).

Bernoulli                                         Gaussian

Figure 3. Eigenvalue plots of two randomly generated 5000 by 5000 matrices of the form , where and are diagonal matrices having entries with the value 1 followed by entries with the value 5 (for ) and the value (for ). On the left, each entry of was an iid Bernoulli random variable, taking the values and each with probability . On the right, each entry of was an iid Gaussian normal random variable, with probability density function is .

This evidence suggests that the asymptotic shape of the ESD depends only on the mean and the variance of each entry in the matirx. As mentioend earlier, the main result of [55] (building on a large number of previous results) gives a rigorous proof of this phenomenon in full generality.

For any matrix , we define the Frobenius norm (or Hilbert-Schmidt norm) by the formula .

Theorem 2.2 (Universality principle).

Let and be complex random variables with zero mean and unit variance. Let and be random matrices whose entries , are iid copies of and , respectively. For each , let be a deterministic matrix satisfying


Let and . Then converges weakly to zero. If furthermore we make the additional hypothesis that the ESDs


converge in the sense of probability measures to a limit for almost every , then converges strongly to zero.

This theorem reduces the computing of the limiting distribution to the case where one can assume444Some related ideas also appear in [19]. In the context of the central limit theorem, the idea of replacing arbitrary iid ensembles by Gaussian ones goes back to Lindeberg [31], and is sometimes known as the Lindeberg invariance principle; see [11] for further discussion, and a formulation of this principle for Hermitian random matrices. that the entries have Gaussian (or any special) distribution. Combining this theorem (in the case ) with Theorem 2.1, we conclude

Corollary 2.3.

The circular law (Conjecture 1.3) holds in both the weak and strong sense.

It is useful to notice that Theorem 2.2 still holds even when the limiting distributions do not exist.

The proof of Theorem 2.2 relies on several surprising connections between seemingly remote areas of mathematics that have been discovered in the last few years. The goal of this article is to give the reader an overview of these connections and through them a sketch of the proof of Theorem 2.2. The first area we shall visit is combinatorics.

3. Combinatorics

As we shall discuss later, one of the primary difficulties in controlling the ESD of a non-Hermitian matrix is the presence of pseudospectrum - complex numbers for which the resolvent exists but is extremely large. It is therefore of importance to obtain bounds on this resolvent, which leads one to understand for which vectors is likely to be small. Expanding out the vector , one encounters expressions such as , where are fixed and are iid random variables. The problem of understanding ths distribution of such random sums is known as the Littlewood-Offord problem, and we now pause to discuss this problem further.

3.1. The Littlewood-Offord problem

Let be a set of integers and let be i.i.d random Bernoulli variables. Define and and .

In their study of random polynomials, Littlewood and Offord [32] raised the question of bounding . They showed that if the are non-zero, then . Very soon after, Erdős [13], using Sperner’s lemma, gave a beautiful combinatorial proof for the following refinement.

Theorem 3.2.

Let be non-zero numbers and be i.i.d Bernoulli random variables. Then555We use the usual asymptotic notation in this paper, thus , , , or denotes an estimate of the form where does not depend on (but may depend on other parameters). We also let denote the bound , where as .

Notice that the bound is sharp, as can be seen from the example , in which case has a binomial distribution. Many mathematicians realized that while the classical bound in Theorem 3.2 is sharp as stated, it can be improved significantly under additional assumptions on . For instance, Erdős and Moser [14] showed that if the are distinct, then

They conjectured that the logarithmic term is not necessary and this was confirmed by Sárközy and Szemerédi [42]. Again, the bound is sharp (up to a constant factor), as can be seen by taking to be a proper arithmetic progression such as . Stanley [41] gave a different proof that also classified the extremal cases.

A general picture was given by Halász, who showed, among other things, that if one forbids more and more additive structure666Intuitively, this is because the less additive structure one has in the , the more likely the sums are to be distinct from each other. In the most extreme case, if the are linearly independent over the rationals , then the sums sums are all distinct, and so in this case. in the , then one gets better and better bounds on . One corollary of his results (see [24] or [48, Chapter 9] is the following.

Theorem 3.3.

Consider . Let be the number of solutions to the equation

where and . Then

Remark 3.4.

Several variants of Theorem 3.2 can be found in [27, 30, 16, 28] and the references therein. The connection between the Littlewood-Offord problem and random matrices was first made in [26], in connection with the question of determining how likely a random Bernoulli matrix was to be singular. The paper [26] in fact inspired much of the work of the authors described in this survey.

3.5. The inverse Littlewood-Offord problem

Motivated by inverse theorems from additive combinatorics, in particular Freiman’s theorem (see [15], [48, Chapter 5]) and a variant for random sums in [53, Theorem 5.2] (inspired by earlier work in [26]), the authors [49] brought a different view to the problem. Instead of trying to improve the bound further by imposing new assumptions, we aim to provide the full picture by finding the underlying reason for the probability to be large (e.g. larger than for some fixed ).

Notice that the (multi)-set has subsums, and mean that at least among these take the same value. This suggests that there should be very strong additive structure in the set. In order to determine this structure, one can study examples of where is large. For a set , we denote by the set . A natural example is the following.

Example 3.6.

Let and be elements of . Since , by the pigeon hole principle, . In fact, a short consideration yields a better bound. Notice that with probability at least , we have , thus again by the pigeonhole principle, we have . If we set for some constant , then


The next, and more general, construction comes from additive combinatorics. A very important concept in this area is that of a generalized arithmetic progression (GAP). A set of numbers is a GAP of rank if it can be expressed as in the form

for some .

It is convenient to think of as the image of an integer box under the linear map

The numbers are the generators of , and is the volume of . We say that is proper if this map is one to one, or equivalently if . For non-proper GAPs, we of course have .

Example 3.7.

Let be a proper GAP of rank and volume . Let be (not necessarily distinct) elements of . The random variable takes values in the GAP . Since , the pigeonhole principle implies that . In fact, using the same idea as in the previous example, one can improve the bound to . If we set for some constant , then


The above examples show that if the elements of belong to a proper GAP with small rank and small cardinality then is large. A few years ago, the authors [49] showed that this is essentially the only reason:

Theorem 3.8 (Weak inverse theorem).

[49] Let be arbitrary constants. There are constants and depending on and such that the following holds. Assume that is a multiset of integers satisfying . Then there is a GAP of rank at most and volume at most which contains all but at most elements of (counting multiplicity).

Remark 3.9.

The presence of the small set of exceptional elements is not completely avoidable. For instance, one can add completely arbitrary elements to and only decrease by a factor of at worst. Nonetheless we expect the number of such elements to be less than what is given by the results here.

The reason we call Theorem 3.8 weak is the fact that the dependence between the parameters is not optimal and does not yet reflect the relations in (5) and (6). Recently, we were able to modify the approach to obtain an almost optimal result.

Theorem 3.10 (Strong inverse theorem).

[56] Let and be positive constants. Assume that

Then there exists a GAP of rank which contains all but elements of (counting multiplicity), where

The bound on matches (6), up to the term. The proofs of Theorem 3.8 and 3.10 use harmonic analysis, combined with results from the theory of random walks and several facts about GAPs. Both theorems hold in a more general setting, where the elements of are from a torsion-free group. The lower bound on can also be relaxed, but the statement is more technical.

As an application of Theorem 3.10, one can deduce, in a straightforward manner, a slightly weaker version of the forward results mentioned above. For instance, let us show if the are different, then (for any constant ). Assume otherwise and set . Theorem 3.10 implies that most of is contained in a GAP of rank and cardinality at most . But since has elements in , we obtain a contradiction.

Next we consider another application of Theorem 3.10, which will be more important in later sections. This theorem enables us execute very precise counting arguments. Assume that we would like to count the number of (multi)-sets of integers with such that .

Fix , fix777A more detailed version of Theorems 3.8 and 3.10 tells us that there are not too many ways to choose the generators of . In particular, if , the number of ways to fix these is negligible. a GAP with rank and volume . The dominating term will be the number of multi-subsets of size of , which is


For later purposes, we need a continuous version of this result. Let the be complex numbers. Instead of , consider the maximum small ball probability

Given a small and , the collection of such that and is infinite, but we are able to show that it can be approximated by a small set.

Theorem 3.11 (The -net Theorem).

[54] Suppose that . Then the set of unit vectors such that admits an -net (in the infinity norm888In other words, for any with , there exists such that all coefficients of do not exceed in magnitude. of size at most

Remark 3.12.

A related result (with different parameters) appears in [38]; in our notation, the probability is allowed to be much smaller, but the net is coarser (essentially, a -net rather than a -net). In terms of random matrices, the results in [38] are better suited to control the extreme tail of such quantities as the least singular value of , but require more boundedness conditions on the matrix (and in particular, bounded operator norm) due to the coarser nature of the net.

4. Computer Science

Our next stop is computer science and numerical linear algebra, and in particular the problem of dealing with ill-conditioned matrices, which is closely related to the issue of pseudospectrum which is of central importance in the circular law problem.

4.1. Theory vs Practice

Running times of algorithms are frequently estimated by worst-case analysis. But in practice, it has been observed that many algorithms, especially those involving a large matrix, perform significantly better than the worst-case scenario. The most famous example is perhaps the simplex algorithm in linear programming. Here, the basic problem (in its simplest form) is to optimize a goal function , under the constraint , where are given vectors of length and is an matrix. In the worst case scenario, this algorithm takes exponential time. But in practice, the algorithm runs extremally well. It is still very popular today, despite the fact that there are many other algorithms proven to have polynomial complexity.

There have been various attempts to explain this phenomenon. In this section we will discuss an influential recent explanation given by Spielman and Teng [44, 45].

4.2. The effect of noise

An important issue in the theory of computing is noise, as almost all computational processes are effected by it. By the word noise, we would like to refer to all kinds of errors occurring in a process, due to both humans and machines, including errors in measuring, errors caused by truncations, errors committed in transmitting and inputting the data, etc.

Spielman and Teng [44] pointed out that when we are interested in a solving a certain system of equations, because of noise, our computer actually ends up solving a slightly perturbed version of the system. This is the core of their so-called smooth analysis that they used to explain the effectiveness of a specific algorithm (such as the simplex method). Interestingly, noise, usually a burden, plays a “positive” role here, as it smoothes the inputs randomly, and so prevents a very bad input from ever occurring.

The puzzling question here is, of course: why is the perturbed input typically better than the original (worst-case) input ?

In order to give a mathematical explanation, we need to introduce some notion. For an matrix , the condition number is defined as

where denotes the operator norm. (If is not invertible, we set .)

The condition number plays a crucial role in numerical linear algebra; in particular, the condition number of a matrix serves as a simplified proxy for the accuracy and stability of most algorithms used to solve the equation (see [5, 23], for example). The exact solution , in theory, can be computed quickly (by Gaussian elimination, say). However, in practice computers can only represent a finite subset of real numbers and this leads to two difficulties: the represented numbers cannot be arbitrarily large or small, and there are gaps between two adjacent represented numbers. A quantity which is frequently used in numerical analysis is which is half of the distance from to the nearest represented number. A fundamental result in numerical analysis [5] asserts that if one denotes by the result computed by computers, then the relative error satisfies

Following the literature, we call well-conditioned if is small. For quantitative purposes, we say that an by matrix is well-conditioned if its condition number is polynomially bounded in (that is, for some constant independent of ).

4.3. Randomly perturbed matrices are well-conditioned

The analysis in [44] is guided by the following fundamental intuition999This conjecture, of course, does not fully explain the phenomenon of smoothed analysis, since it may be that a well-conditioned matrix still causes a difficulty in one’s linear algorithms for some other reason, or perhaps the original ill-conditioned matrix did not cause a difficulty in the first place; we thank Alan Edelman for pointing out this subtlety. Nevertheless, Conjecture 4.4 does provide an informal intuitive justification of smoothed analysis, and various rigorous versions of this conjecture were used in the formal arguments in [44]: see Section 1.4 of that paper for further discussion.:

Conjecture 4.4.

For every input instance, it is unlikely that a slight random perturbation of that instance has large condition number.

More quantitatively,

Conjecture 4.5.

Let be an arbitrary by matrix and let be a random by matrix. Then with high probability is well-conditioned.

Notice that here one allows to have a large condition number.

Let us take a look at . In order to have , we want to upper-bound both and . Bounding is easy, since by the triangle inequality

In most models of random matrices, with very high probability, so it suffices to assume that ; thus we assume that the matrix is of polynomial size compared to the noise level. This is a fairly reasonable assumption for high-dimensional matrices for which the effect of noise is non-negligible101010In particular, it is naturally associated to the concept of polynomially smoothed analysis from [44]., and we are going to assume it in the rest of this section.

The remaining problem is to bound the norm of the inverse . An important detail here is how to choose the random matrix . In their works [44, 45, 43], Spielman and Teng (and coauthors) set to have iid Gaussian entries (with variance 1) and obtained the following bound, which played a critical role in their smooth analysis [44, 45].

Theorem 4.6.

Let be an arbitrary by matrix and be a random matrix with iid Gaussian entries. Then for any ,

While Spielman-Teng smooth analysis does seem to have the right philosophy, the choice of is a bit artificial. Of course, the analysis still passes if one replaces Gaussian by a fine enough approximation. A large fraction of problems in linear programming deal with integral matrices, so the noise is perturbation by integers. In other cases, even when the noise has continuous support, the data is strongly truncated. For example, in many engineering problems, one does not keep more than, say, three to five decimal places. Thus, in many situations, the entries of end up having discrete support with relatively small size, which may not even grow with , while the approximation mentioned above would require this support to have size exponential in . Therefore, in order to come up with an analysis that better captures real life data, one needs to come up with a variant of Theorem 4.6 where the entries of have discrete support.

This problem was suggested to the authors by Spielman a few years ago. Using the Weak Inverse Theorem, we were able to prove the following variant of Theorem 4.6 [50].

Theorem 4.7.

For any constants , there is a constant such that the following holds. Let be an by matrix such that and let be a random matrix with iid Bernoulli entries. Then

Using the stronger -net Theorem, one can have a nearly optimal relation between the constants , and [51]. These results extend, with the same proof, to a large variety of distributions. For example, one does not need require the entries of to be iid111111In practice, one would expect the noise at a large entry to have larger variance than one at a small entry, due to multiplicative effects., although independence is crucially exploited in the proofs. Also, one can allow many of the entries to be 0 [50].

Remark 4.8.

Results of this type first appear in [37] (see also [33] for some earlier related work for the least singualar value of rectangular matrices). In the special case where and where the entries of are iid and have finite fourth moment, Rudelson and Vershynin [38] (see also [39], [40]) obtained sharp bounds for , using a somewhat different method, which relies on an inverse theorem of a slightly different nature; see Remark 3.12.

The main idea behind the proof of Theorem 4.7, which first appears in [37], is the following. Let be the distance from the row vector of to the subspace spanned by the rest of the rows. Elementary linear algebra (see also (10) below) then gives the bound

Ignoring various factors of , the main task is then to understand the distribution of for any given .

If is the normal vector of a hyperplane , then the distance from a random vector to the hyperplane is given by the formula

where is as in the previous section.

To estimate the chance that , the notion of the small ball probability comes naturally. Of course, this quantity depends on the normal vector , and so we now divide into cases depending on the nature of this vector.

If small, we can be done using a conditioning argument121212Intuitively, the idea of this conditioning argument is to first fix (or “condition”) on of the rows of , which should then fix the normal vector . The remaining row is independent of the other rows, and so should have a probability at most of lying within of the span of the those rows. There are some minor technical issues in making this argument (which essentially dates back to [29]) rigorous, arising from the fact that the rows may be too degenerate to accurately control , but these difficulties can be dealt with, especially if one is willing to lose factors of in various places.. On the other hand, the -net Theorem says that there are “few” such that is large, and in this case a direct counting argument finishes the job131313For instance, one important class of for which tends to be large are the compressible vectors , in which most of the entries are close to zero. Each compressible (e.g. ) has a moderately large probability of being close to a normal vector for (e.g. in the random Bernoulli case, has a probability about of being a normal vector); but the number (or more precisely, the metric entropy) of the set of compressible vectors is small (of size ) and so the net contribution of these vectors is then manageable. Similar arguments (relying heavily on the -net theorem) handle other cases when is large (e.g. if most entries of live near a GAP of controlled size).. Details can be found in [50], [54], or [51].

5. Back to probability

5.1. The replacement principle

Let us now take another look at the Circular Law Conjecture. Recall that are the eigenvalues of , which generates a normalized counting measure . We want to show that tends (in probability) to the uniform measure on the unit disk.

The traditional way to attack this conjecture is via a Stieltjes transform technique141414The more classical moment method, which is highly successful in the Hermitian setting (for instance in proving Theorem 1.2), is not particularly effective in the non-Hermitian setting, because moments such as for do not determine the ESD (even approximately) unless one takes to be as large as ; see [3], [4] for further discussion., following [18, 3]. Given a (complex) measure , define, for any with Im ,

For the ESD , we have

Thanks to standard results from probability151515One can also use the theory of logarithmic potentials for this, as is done for instance in [21], [35]., in order to establish the Circular Law Conjecture in the strong (resp. weak) sense, it suffices to show that converges almost surely (resp. in probability) to for almost all (see [55] for a precise statement).

Set and . Since is analytic except at the poles, and vanishes at infinity, the Stieltjes transform is determined by its the real part . Let us take a closer look at this variable:


is the normalised counting measure of the (squares of the) singular values of . Notice that in the third equality, we use the fact that . This step is critical as it reduces the study of a complex measure to a real one, or in other words to study the ESD of a Hermitian matrix rather than a non-Hermitian matrix.

Putting this observation in the more general setting of Theorem 2.2, we arrived at the following useful result.

Theorem 5.2 (Replacement principle).

[55] Suppose for each that are ensembles of random matrices. Assume that

  • The expression


    is weakly (resp. strongly) bounded161616A sequence of non-negative random variables is said to be weakly bounded if , and strongly bounded if with probability .

  • For almost all complex numbers ,

    converges weakly (resp. strongly) to zero. In particular, for each fixed , these determinants are non-zero with probability for all (resp. almost surely non-zero for all but finitely many ).

Then converges weakly (resp. strongly) to zero.

At a technical level, this theorem reduces Theorem 2.2 to the comparison of and .

Remark 5.3.

Note that this expression is large and unstable when lies in the pseudospectra of either or , which means that the resolvent or is large. Controlling the probability of the event that lies in the pseudospectrum is therefore an important portion of the analysis. This technical problem is not an artefact of the method, but is in fact essential to any attempt to control non-Hermitian ESDs for general random matrix models, as such ESDs are extremely sensitive to perturbations in the matrix in regions of pseudospectrum. See [3], [4] for further discussion.

5.4. Treatment of the pole

Using techniques from probability, such as the moment method, one can show that the distributions of the singular values of and are asymptotically the same171717In the setting where the matrices and have iid entries, one can use the results of [10] to establish this. In the non-iid case, an invariance principle from [11] gives a slightly weaker version of this equivalence; this was observed by Manjunath Krishnapur and appears as an appendix to [55]. [3, 54, 10, 55, 11]. This, however, is not sufficient to conclude that and are close. As remarked earlier, the main difficulty here is that some of the singular values can be very small and thus significantly influence the value of logarithm.

Now is where Theorem 4.7 enters the picture. This theorem tells us that (with overwhelming probability), there is no mass between and (say) , for some sufficiently large constant . Using this critical information, with some more work181818In particular, the presence of certain factors of arising from inserting Theorem 4.7 into the normalized log-determinant forces one to establish a convergence rate for the ESD of which is faster than logarithmic in in a certain sense. This is what ultimately forces one to assume the bounded moment hypothesis. Actually the method allows one to relax this hypothesis to that of assuming for some absolute constant (e.g. will do)., we obtain:

Theorem 5.5.

[54] The Circular Law holds (with both strong and weak convergence) under the extra condition that the entries have bounded moment, for some constant .

Remark 5.6.

Shortly after the appearance of [54], Götze and Tikhomirov [22] gave an alternate proof of the weak circular law with these hypothesis, using a variant of Theorem 4.7, which they obtained via a method from [37], [38]. This method is based on a different version of the Weak Inverse Theorem.

5.7. Negative second moment and sharp concentration

At the point it was written, the analysis in [54] looked close to the limit of the method. It took some time to realize where the extra moment condition came from and even more time to figure out a way to avoid that extra condition. Consider the sums

where are the singular values of , and

where are the singular values of .

As already mentioned, we know that the bulk of the and are distributed similarly. For the smallest few, we used the lower bound on as a uniform bound be show that their contribution is negligible. This turned out to be wasteful, and we needed to use the extra moment assumption to compensate the loss in this step.

In order to remove this assumption, we need to find a way to give a better bound on other singular values. An important first step is the discovery of the following simple, but useful, identity.

The Negative Second Moment Identity. [55] Let be an matrix, . Then


where, as usual, are the distances and are the singular values.

One can prove this identity using undergraduate linear algebra. With this in hand, the rest of the proof falls into place191919A possible alternate approach would be to bound the intermediate singular values directly, by adapting the results from [39]. This would however require some additional effort; for instance, the results in [39] assume zero mean and bounded operator norm, which is not true in general when considering for non-zero assuming only a mean and variance condition on the entries of . In any case, the analysis in [39] ultimately goes through a computation of the distances , similarly to the approach we present here based on the negative second moment identity.. Consider the singular values involved in our analysis, and use as shorthand for . To bound from below, notice that by the interlacing law

where and is an truncation of , obtained by omitting the last rows. The Negative Second Moment Identity implies

On the other hand, the right-hand side can be bounded efficiently, thanks to the fact that all are large with overwhelming probability, which, in turn, is a consequence of Talagrand’s inequality [46]:

Lemma 5.8 (Distance Lemma).

[52, 55] With probability , the distance from a random row vector to a subspace of co-dimension is at least , as long as .

Thus, with overwhelming probability, is , which implies

This lower bound now is sufficient to establish Theorem 2.2 and with it the Circular Law in full generality.

6. Open problems

Our investigation leads to open problems in several areas:

Combinatorics. Our studies of Littewood-Offord problem focus on the linear form . What can one say about higher degree polynomials ?

In [6], it was shown that for a quadratic form with non-zero coefficients, is . It is simple to improve this bound to [7]. On the other hand, we conjecture that the truth is , which would be sharp by taking . Costello (personal communication) recently improved the bound to , and it looks likely that his approach will lead to the optimal bound, or something close.

The situation with higher degrees is much less clear. In [6], a bound of the form was shown, where is a positive constant depending on , the degree of the polynomial involved. In this bound decreases very fast with .

Smooth analysis. Spielman-Teng smooth analysis of the simplex algorithm [44] was done with gaussian noise. It is a very interesting problem to see if one can achieve the same conclusion with discrete noise with fixed support, such as Bernoulli. It would give an even more convincing explanation to the efficiency of the simplex method. As discussed earlier, noise that occurs in practice typically has discrete, small support. (This question was mentioned to us by several researchers, including Spielman, few years ago.)

As discussed earlier, we now have the discrete version of Theorem 4.6. While Theorem 4.6 plays a very important part in Spielman-Teng analysis [45], there are several other parts of the proof that make use of the continuity of the support in subtle ways. It is possible to modify these parts to work for fine enough discrete approximations of the continuous (noise) variables in question. However, to do so it seems one need to make the size of the support very large (typically exponential in , the size of the matrix).

Another exciting direction is to consider even more realistic models of noise. For instance,

  • In several problems, the matrix may have many frozen entries, namely those which are not effected by noise. In particular, an entry which is zero (by nature of the problem) is likely to stay zero in the whole computation. It is clear that the pattern of the frozen entries will be of importance. For example, if the first column consists of (frozen) zero, then no matter how the noise effects the rest of the matrix, it will always be non-singular (and of course ill-conditioned). We hope to classify all patterns where theorems such as Theorem 2.2 are still valid.

  • In non-frozen places, the noise could have different distributions. It is natural to think that the error at a large entry should have larger variance than the one occurring at a smaller entry.

Some preliminary results in these directions are obtained in [50]. However, we are still at the very beginning of the road and much needs to be done.

Circular Law. A natural question here is to investigate the rate of convergence. In [54], we observed that under the extra assumption that the -moment of the entries are bounded, we can have rate of convergence of order , for some positive constant depending on . The exact dependence between and is not clear.

Another question concerns the determinant of random matrices. It is known, and not hard to prove, that satisfies a central limit theorem, when the entries of are iid gaussian, see [20, 8]. Girko [20] claimed that the same result holds for much more general models of matrices. We, however, are unable to verify his arguments. It would be nice to have an alternative proof.

7. Acknowledgements

Thanks to Peter Forrester, Kenny Maples, and especially Alan Edelman for corrections.


  • [1] L. Arnold, On the asymptotic distribution of the eigenvalues of random matrices, J. Math. Anal. Appl., 20 (1967), 262-268.
  • [2] L. Arnold, On WQigner’s semi-cirle law for the eigenvalues of random matrices, Z. Wahrsch. Verw. Gebiete, 19 (1971), 191-198.
  • [3] Z. D. Bai, Circular law, Ann. Probab. 25 (1997), 494–529.
  • [4] Z. D. Bai and J. Silverstein, Spectral analysis of large dimensional random matrices, Mathematics Monograph Series 2, Science Press, Beijing 2006.
  • [5] D. Bau and L. Trefethen, Numerical linear algebra. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997.
  • [6] K. Costello, T. Tao, V. Vu, Random symmetric matrices are almost surely non-singular, Duke Math. J. 135 (2006), 395–413.
  • [7] K. Costello and V. Vu, The ranks of random graphs, to appear in Random Structures and Algorthms.
  • [8] K. Costello and V. Vu, Concentration of random determinants and Permanent Estimators, submitted.
  • [9] P. Deift, Universality for mathematical and physical systems. International Congress of Mathematicians Vol. I, 125–152, Eur. Math. Soc., Zürich, 2007.
  • [10] R. Dozier, J. Silverstein, On the empirical distribution of eigenvalues of large dimensional information-plus-noise-type matrices, J. Multivar. Anal. 98 (2007), 678–694.
  • [11] S. Chatterjee, A simple invariance principle. [arXiv:math/0508213]
  • [12] A. Edelman, Probability that a Random Real Gaussian Matrix Has Real Eigenvalues, Related Distributions, and the Circular Law, Journal of Multivariate Analysis 60, (1997), 203–232.
  • [13] P. Erdős, On a lemma of Littlewood and Offord. Bull. Amer. Math. Soc. 51, (1945). 898–902.
  • [14] P. Erdős, L. Moser, Elementary Problems and Solutions: Solutions: E736. Amer. Math. Monthly 54 (1947), no. 4, 229–230.
  • [15] G. Freiman, Foundations of a structural theory of set addition. Translated from the Russian. Translations of Mathematical Monographs, Vol 37. American Mathematical Society, Providence, R. I., 1973. vii+108 pp.
  • [16] P. Frankl and Z. Füredi, Solution of the Littlewood-Offord problem in high dimensions. Ann. of Math. (2) 128 (1988), no. 2, 259–270.
  • [17] J. Ginibre, Statistical Ensembles of Complex, Quaternion, and Real Matrices, Journal of Mathematical Physics 6 (1965), 440-–449.
  • [18] V. L. Girko, Circular law, Theory Probab. Appl. (1984), 694–706.
  • [19] V. L. Girko, The strong circular law. Twenty years later. II. Random Oper. Stochastic Equations 12 (2004), no. 3, 255–312.
  • [20] V. L. Girko,
  • [21] F. Götze, A.N. Tikhomirov, On the circular law, preprint
  • [22] F. Götze, A.N. Tikhomirov, The Circular Law for Random Matrices, preprint
  • [23] G. Golub and C. van Loan, Matrix computations, 3rd Edtion, 1996, John Hopkins Univ. Press.
  • [24] G. Halász, Estimates for the concentration function of combinatorial number theory and probability, Period. Math. Hungar. 8 (1977), no. 3-4, 197–211.
  • [25] C. R. Hwang, A brief survey on the spectral radius and the spectral distribution of large random matrices with i.i.d. entries, Contemp. Math. 50, Amer. Math. Soc., Providence, RI, 1986, 145–152.
  • [26] J. Kahn, J. Komlós, E. Szemerédi, On the probability that a random -matrix is singular. J. Amer. Math. Soc. 8 (1995), no. 1, 223–240
  • [27] G. Katona, On a conjecture of Erdös and a stronger form of Sperner’s theorem. Studia Sci. Math. Hungar 1 1966 59–63.
  • [28] D. Kleitman, On a lemma of Littlewood and Offord on the distributions of linear combinations of vectors, Advances in Math. 5 1970 155–157 (1970).
  • [29] J. Komlós, On the determinant of matrices. Studia Sci. Math. Hungar 2 1967 7–21.
  • [30] J. Griggs, J. Lagarias, A. Odlyzko and J. Shearer, On the tightest packing of sums of vectors, European J. Combin. 4 (1983), no. 3, 231–236.
  • [31] J. W. Lindeberg, Eine neue herleitung des exponentialgesetzes in der wahrscheinlichkeitsrechnung, Math. Z 15 (1922) 211–225.
  • [32] J. E. Littlewood and A. C. Offord, On the number of real roots of a random algebraic equation. III. Rec. Math. [Mat. Sbornik] N.S. 12 , (1943). 277–286.
  • [33] A. Litvak, A. Pajor, M. Rudelson and N. Tomczak-Jaegermann, Smallest singular value of random matrices and geometry of random polytopes, Adv. Math. 195 (2005), no. 2, 491–523.
  • [34] M.L. Mehta, Random Matrices and the Statistical Theory of Energy Levels, Academic Press, New York, NY, 1967.
  • [35] G. Pan and W. Zhou, Circular law, Extreme singular values and potential theory, preprint.
  • [36] L. A Pastur, On the spectrum of random matrices, Teoret. Mat. Fiz. 10, 102-112 (1973).
  • [37] M. Rudelson, Invertibility of random matrices: Norm of the inverse. Annals of Mathematics, to appear.
  • [38] M. Rudelson and R. Vershynin, The Littlewood-Offord problem and the condition number of random matrices, Advances in Mathematics, 218 (2008) 600-633.
  • [39] M. Rudelson, R. Vershynin, The smallest singular value of a rectangular random matrix, preprint.
  • [40] M. Rudelson, R. Vershynin, The least singular value of a random square matrix is , preprint.
  • [41] R. Stanley, Weyl groups, the hard Lefschetz theorem, and the Sperner property, SIAM J. Algebraic Discrete Methods 1 (1980), no. 2, 168–184.
  • [42] A. Sárközy and E. Szemerédi, Über ein Problem von Erdős und Moser, Acta Arithmetica, 11 (1965) 205-208.
  • [43] A. Sankar, S. H. Teng, and D. A. Spielman, Smoothed Analysis of the Condition Numbers and Growth Factors of Matrices, preprint.
  • [44] D. A. Spielman and S. H. Teng, Smoothed analysis of algorithms, Proceedings of the International Congress of Mathematicians, Vol. I (Beijing, 2002), 597–606, Higher Ed. Press, Beijing, 2002.
  • [45] D. A. Spielman and S. H. Teng, Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time, J. ACM 51 (2004), no. 3, 385–463.
  • [46] M. Talagrand, A new look at independence, Ann. Probab. 24 (1996), no. 1, 1–34.
  • [47] T. Tao and V. Vu, On random matrices: Singularity and Determinant, Random Structures Algorithms 28 (2006), no. 1, 1–23.
  • [48] T. Tao, V. Vu, Additive combinatorics, Cambridge University Press, 2006.
  • [49] T. Tao and V. Vu, Inverse Littlewood-Offord theorems and the condition number of random discrete matrices, Annals of Mathematics, to appear.
  • [50] T. Tao and V. Vu, The condition number of a randomly perturbed matrix, STOC 2007.
  • [51] T. Tao and V. Vu, Random matrices: A general approach for the least singular value problem, preprint.
  • [52] T. Tao and V. Vu, On random (-1,1) matrices: Singularity and Determinant, Random Structures and Algorithms 28 (2006), no 1, 1-23.
  • [53] T. Tao and V. Vu, On the singularity probability of random Bernoulli matrices, Journal of the A. M. S, 20 (2007), 603-673.
  • [54] T. Tao and V. Vu, Random matrices: The Circular Law, Communication in Contemporary Mathematics 10 (2008), 261-307.
  • [55] T. Tao and V. Vu, Random matrices: Universality of the ESD and the Circular Law (with an appendix by M. Krishnapur), submitted.
  • [56] T. Tao and V. Vu, paper in preparation.
  • [57] P. Wigner, On the distribution of the roots of certain symmetric matrices, The Annals of Mathematics 67 (1958) 325-327.
  • [58] J. Wishart, The generalized product moment distribution in samples from a normal multivariate population, Biometrika 20 (1928), 32–52.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description