Normal vector of a random hyperplane
Let be independent vectors in (or ). We study , the unit normal vector of the hyperplane spanned by the . Our main finding is that resembles a random vector chosen uniformly from the unit sphere, under some randomness assumption on the .
Our result has applications in random matrix theory. Consider an random matrix with iid entries. We first prove an exponential bound on the upper tail for the least singular value, improving the earlier linear bound by Rudelson and Vershynin. Next, we derive optimal delocalization for the eigenvectors corresponding to eigenvalues of small modulus.
A real random variable is normalized if it has mean 0 and variance 1. A complex random variable is normalized if , where are iid copies of a real normalized random variable.
Some popular normalized variables
real standard Gaussian , or real Bernoulli which takes value with probability ;
complex standard Gaussian , or complex Bernoulli .
Fixed a normalized random variable and consider the random vector , whose entries are iid copies of . Sample iid copies of . We would like to study the normal vector of the hyperplane spanned by the .
In matrix term, we let be a random matrix of size by where the entries are iid copies of ; the are the row vectors of . Let be a unit vector that is orthogonal to the (Here and later is either or , depending on the support of .) First note that recent studies in the singularity probability of random non-Hermitian matrices (see for instance [6, 22]) show that under very general conditions on , with extremely high probability has rank . In this case is uniquely determined up to the sign when or by a uniformly chosen rotation when . Throughout the paper, we use asymptotic notation under the assumption that tends to infinity. In particular, , , or means that for some fixed .
When the entries of are iid standard gaussian , it is not hard to see that is distributed as a random unit vector sampled according to the Haar measure in of . One then deduces the following properties (see for instance [Section 2])
Theorem 1.2 (Random gaussian vector).
Let be a random vector uniformly distributed on the unit sphere . Then,
(joint distribution of the coordinates) can be represented as
where are iid standard gaussian , and ;
(inner product with a fixed vector) for any fixed vector on the unit sphere,
(the largest coordinate) for any , with probability at least
(the smallest coordinate) for , any , and any ,
with probability at least .
Motivated by the universality phenomenon (see, for instance ), it is natural to ask if these properties are universal, namely that they hold if is non-gaussian. Our result confirms this prediction in a strong sense. They also have applications in the theory of random matrices, which we will discuss after stating the main result.
Let us introduce some notations. We say that is sub-gaussian if there exists a parameter such that for all
Definition 1.3 (Frequent events).
Let be an event depending on (which is assumed to be sufficiently large).
holds asymptotically almost surely if .
holds with high probability if there exists a positive constant such that .
holds with overwhelming probability, and write , if for any , with sufficiently large .
Theorem 1.4 (Main result).
Suppose that are iid copies of a normalized sub-gaussian random variable , then the followings hold.
(the largest coordinate) There are constants such that for any
In particularly, with overwhelming probability
(the smallest coordinate) with high probability
(joint distribution of the coordinates) There exists a positive constant such that the following holds: for any -tuple , with , the joint law of the tuple is asymptotically independent standard normal. More precisely, there exists a positive constant such that for any measurable set ,
where are iid standard gaussian.
(inner product with a fixed vector) Assume furthermore that is symmetric, then for any fixed vector on the unit sphere,
Our approach can be extended to unit vectors orthogonal to the rows of an iid matrices of size , for any fixed or even grows slowly with ; the details will appear in a later paper.
As random hyperplanes appear frequently in various areas, including random matrix theory, high dimensional geometry, statistics, and theoretical computer science, we expect that Theorem 1.4 will be useful. For the rest of this section, we discuss two applications.
1.5. Tail bound for the least singular value of a random iid matrix
Given an random matrix with entries being iid copies of a normalized variable . Let be its singular values. The two extremal and are of special interest, and was studied by Goldstein and von Neumann, as they tried to analyze the running time of solving a system of random equations .
In , Goldstein and von Neumann speculated that is of order , which turned out to be correct. In particular, tends to a limiting distribution, which was computed explicitly by Edelman in  in the gaussian case.
For any we have
as well as
In other words, and . These distributions have been confirmed to be universal (in the asymptotic sense) by Tao and the second author .
In applications, one usually needs large deviation results, which show that the probability that is far from its mean is very small. For the lower bound, Rudelson and Vershyin  proved that for any
which is sharp up to the constant . For the upper bound, in a different paper , the same authors showed
Using Theorem 1.4, we improve this result significantly by proving an exponential tail bound,
Theorem 1.7 (Exponential upper tail for the least singular values).
Assume that the entries of are iid copies of a normalized subgaussian random variable in either or . Then there exist absolute constants depending on such that
1.8. Eigenvectors of random iid matrices.
Our theorem is closely related to (and in fact was motivated by) recent results concerning delocalization and normality of eigenvectors of random matrices. For random Hermitian matrices, there have been many results achieving almost optimal delocalization of eigenvectors, starting with the work  by Erdős et al. and and continued by Tao et al. and by many others in [32, 36, 9, 10, 11, 12, 13, 35, 2, 3, 4]. Thanks to new universality techniques, one also proved normality of the eigenvectors; see for instance the work  by Knowles and Yin,  by Tao and Vu, and  by Bourgade and Yau.
For non-Hermitian random matrix , much less is known. Let be the eigenvalues with . Let be the corresponding unit eigenvectors (where are chosen according to the Haar measure from the eigensphere if the corresponding roots are multiple). Recently, Rudelson and Vershynin  proved that with overwhelming probability all of the eigenvectors satisfy
By modifying the proof of Theorem 1.4, we are able sharpen this bound for eigenvectors of eigenvalues with small modulus.
Theorem 1.9 (Optimal delocalization for small eigenvectors).
Assume that the entries of are iid copies of a normalized subgaussian random variable in either or . Then for any fixed , with overwhelming probability the following holds for any unit eigenvector corresponding to an eigenvalue of with
We believe that the individual eigenvector in Theorem 1.9 satisfies the normality property (8), which would imply that the bound is optimal up to a multiplicative constant. Figure 1 below shows that the first coordinate of the eigenvector corresponding to the smallest eigenvalue behaves like a gaussian random variable.
Finally, let us mention that all of our results holds (with logarithmic correction) under a weaker assumption that the variable is sub-exponential, namely there are positive constants and such that for all ; see Remark 2.3.
The rest of the paper is organized as follows. After introducing supporting lemmas in Section 2, we will prove (6) and Theorem 1.9 in Section 3. Section 6 and Section 7 are devoted to proving (8) and (9) correspondingly, while (7) will be shown in Section 4. Finally, we prove Theorem 1.7 in Section 5.
2. The lemmas
Let be a subspace of co-dimension in and let be the projection matrix onto the complement of . Let and be independent random vectors where are iid copies of an -normalized sub-gaussian random variable . Then the following holds.
the distance from to is well concentrated around its mean,
the correlation is small,
More generally, we have
Lemma 2.2 (Hanson-Wright inequality).
There exists an absolute constant such that the following holds for any sub-gaussian -normalized random variable . Let be a fixed Hermitian matrix. Consider a random vector where the entries are iid copies of . Then
In particularly, for any
This lemma was first proved by Hanson and Wright in a special case . The above general version is due to Rudelson and Vershynin ; see also  for related results which hold (with logarithmic correction) for sub-exponential variables.
As mentioned at the end of the introduction, the results of this paper hold (with logarithmic correction) for sub-exponential variables. One can achieve this by repeating the proofs, using the results from  (such as [36, Corollary 1.6]) instead of Lemmas 2.1 and 2.2. We leave the details as an exercise.
The next tool is Berry-Esséen theorem for frames, proved by Tao and Vu in . As the statement is technical, let us first warm the reader up by the classical Berry-Esséen theorem.
Lemma 2.4 (Berry-Esséen theorem).
Let be real numbers with and let be a -normalized random variable with finite third moment . Let denote the random sum
where are iid copies of . The for any we have
where the implied constant depends on the third moment of . In particularly,
Lemma 2.5 (Berry-Esséen theorem for frames).
[31, Proposition D.2] Let , and let be an -normalized and have finite third moment. Let be a normalized tight frame for , in other words
where is the identity matrix on . Let denote the random variable
where are iid copies of . Similarly, let be formed from iid copies of the standard gaussian random variable . Then for any measurable and for any we have
where is the collection of such that .
3.1. Proof of (6)
By a union bound, it suffices to show that for sufficiently large
Let , be the columns of . Because , among the subset sums , there is a subset sum which is smaller than . With a loss of a factor in probability, without loss of generality we will assume that
Let be the subspace generated by . Let be the orthogonal projection from onto . We view as a Hermitian matrix of size satisfying . It is known (see for instance [22, 31, 6]) that with probability we have , which implies .
Recall that by definition,
Applying , we have
where and . We remark that the here are not deterministic but depend on the column vectors .
As is linear, and as , we have
We are going to estimate the operator norm basing the randomness of .
There exists a sufficiently large constant such that
completing the proof.
To prove Lemma 3.2, we first estimate for any fixed . We will show
There exists a sufficiently large constant such that for any fixed with ,
Now for any unit vector , there exists such that , and thus by the triangle inequality
This implies that , and hence
(of lemma 3.3) Let be the concatenation of , then can be written as a bilinear form where is the tensor product of and , with . By construction, consists of blocks where the -th block is the matrix . It thus follows that
Applying Lemma 2.2 to , we have
It is easy to show that
Taking , we obtain
To this end, by properties of a tensor product,
which implies that
We now turn to the eigenvectors.
3.4. Proof of Theorem 1.9
We will be working with the perturbed matrix where and . By a standard net argument, it suffices to show the following
For any fixed with , the following holds with overwhelming probability with respect to : if then satisfies (6).
Equivalently, we show that for any unit vector satisfying the condition of Theorem 3.5, then
We will proceed as in Subsection 3.1 by assuming that , where instead of we have
for some vector with norm , where is the -th column of the matrix .
Projecting onto , we obtain
Note that here as , Lemma 2.1 is still effective, which yields with probability at least .
To estimate the right hand side, set . Similarly to Lemma 3.2, we will establish
There exists a sufficiently large constant such that
There exists a sufficiently large constant such that for any fixed with ,
It remains to prove Lemma 3.7. Write , where is a -vector with at most one non-zero entry and is a random vector of iid entries. Thus
Next, we have
Additionally, as and , by the properties of the vector has norm at most . As such, the subgaussian random variable has variance at most one, and hence
We can argue similarly for to obtain the same bound. Finally, notice that
Putting all the estimates together, we obtain Lemma 3.7 as long as .
4. Treatment for the smallest coordinate: proof of (7)
Let be the random matrix of size obtained from by deleting its first column. Set , we have
As it is known that with probability at least the matrix is invertible; in this case, we can write
where are the singular values of with corresponding left-singular vectors .
We now condition on . By the sub-gaussian property of the entries, we can easily show that there is a constant such that with overwhelming probability (with respect to )
We will need the following estimate
With respect to we have
Thus by the union bound
For the remaining sum , by the Cauchy-interlacing law,
where is obtained from by deleting its first columns.
On the other hand, by the negative second moment identity (see [30, Lemma A.4])
where is the distance from the th row of to the hyperplane spanned by the remaining rows of . Using Theorem 2.1 and the union bound, we obtain, for some constant and with overwhelming probability, that simultaneously for all . This implies that with overwhelming probability with respect to
By the union by, we have with probability at least ,