normality of vectors

# Normal vector of a random hyperplane

Hoi H. Nguyen Department of Mathematics, The Ohio State University, Columbus OH 43210 School of Mathematics, Institute for Advanced Studies, Princeton NJ 08540  and  Van H. Vu Department of Mathematics, Yale University, New Haven CT 06520
###### Abstract.

Let be independent vectors in (or ). We study , the unit normal vector of the hyperplane spanned by the . Our main finding is that resembles a random vector chosen uniformly from the unit sphere, under some randomness assumption on the .

Our result has applications in random matrix theory. Consider an random matrix with iid entries. We first prove an exponential bound on the upper tail for the least singular value, improving the earlier linear bound by Rudelson and Vershynin. Next, we derive optimal delocalization for the eigenvectors corresponding to eigenvalues of small modulus.

H. Nguyen is supported by NSF grants DMS-1358648, DMS-1128155 and CCF-1412958. V. Vu is supported by NSF grant DMS-1307797 and AFORS grant FA9550-12-1-0083.

## 1. Introduction

A real random variable is normalized if it has mean 0 and variance 1. A complex random variable is normalized if , where are iid copies of a real normalized random variable.

###### Example 1.1.

Some popular normalized variables

• real standard Gaussian , or real Bernoulli which takes value with probability ;

• complex standard Gaussian , or complex Bernoulli .

Fixed a normalized random variable and consider the random vector , whose entries are iid copies of . Sample iid copies of . We would like to study the normal vector of the hyperplane spanned by the .

In matrix term, we let be a random matrix of size by where the entries are iid copies of ; the are the row vectors of . Let be a unit vector that is orthogonal to the (Here and later is either or , depending on the support of .) First note that recent studies in the singularity probability of random non-Hermitian matrices (see for instance [6, 22]) show that under very general conditions on , with extremely high probability has rank . In this case is uniquely determined up to the sign when or by a uniformly chosen rotation when . Throughout the paper, we use asymptotic notation under the assumption that tends to infinity. In particular, , , or means that for some fixed .

When the entries of are iid standard gaussian , it is not hard to see that is distributed as a random unit vector sampled according to the Haar measure in of . One then deduces the following properties (see for instance [21][Section 2])

###### Theorem 1.2 (Random gaussian vector).

Let be a random vector uniformly distributed on the unit sphere . Then,

• (joint distribution of the coordinates) can be represented as

 x:=(ξ1S,…,ξnS) (1)

where are iid standard gaussian , and ;

• (inner product with a fixed vector) for any fixed vector on the unit sphere,

 √nx∗ud→gF; (2)
• (the largest coordinate) for any , with probability at least

 ∥x∥∞≤√8(C+1)3lognn; (3)
• (the smallest coordinate) for , any , and any ,

 ∥x∥min=min{|x1|,…,|xn|}≥ca1n3/2 (4)

with probability at least .

Motivated by the universality phenomenon (see, for instance [34]), it is natural to ask if these properties are universal, namely that they hold if is non-gaussian. Our result confirms this prediction in a strong sense. They also have applications in the theory of random matrices, which we will discuss after stating the main result.

Let us introduce some notations. We say that is sub-gaussian if there exists a parameter such that for all

 P(|ξ|≥t)=O(exp(−t2K0)). (5)
###### Definition 1.3 (Frequent events).

Let be an event depending on (which is assumed to be sufficiently large).

• holds asymptotically almost surely if .

• holds with high probability if there exists a positive constant such that .

• holds with overwhelming probability, and write , if for any , with sufficiently large .

###### Theorem 1.4 (Main result).

Suppose that are iid copies of a normalized sub-gaussian random variable , then the followings hold.

• (the largest coordinate) There are constants such that for any

 P(∥x∥∞≥√m/n)≤Cn2exp(−m/C). (6)

In particularly, with overwhelming probability

 ∥x∥∞=O(√lognn).
• (the smallest coordinate) with high probability

 ∥x∥min≥1n3/2logO(1)n. (7)
• (joint distribution of the coordinates) There exists a positive constant such that the following holds: for any -tuple , with , the joint law of the tuple is asymptotically independent standard normal. More precisely, there exists a positive constant such that for any measurable set ,

 |P((√nxi1,…,√nxid)∈Ω)−P(gF,1,…,gF,d)∈Ω)|≤d−c′, (8)

where are iid standard gaussian.

• (inner product with a fixed vector) Assume furthermore that is symmetric, then for any fixed vector on the unit sphere,

 √nx∗ud→gF. (9)

It also follows easily from (6) and (8) that with high probability . Indeed, it is clear that with high probability, with for some sufficiently small , . Thus by (8), with high probability .

Our approach can be extended to unit vectors orthogonal to the rows of an iid matrices of size , for any fixed or even grows slowly with ; the details will appear in a later paper.

As random hyperplanes appear frequently in various areas, including random matrix theory, high dimensional geometry, statistics, and theoretical computer science, we expect that Theorem 1.4 will be useful. For the rest of this section, we discuss two applications.

### 1.5. Tail bound for the least singular value of a random iid matrix

Given an random matrix with entries being iid copies of a normalized variable . Let be its singular values. The two extremal and are of special interest, and was studied by Goldstein and von Neumann, as they tried to analyze the running time of solving a system of random equations .

In [17], Goldstein and von Neumann speculated that is of order , which turned out to be correct. In particular, tends to a limiting distribution, which was computed explicitly by Edelman in [8] in the gaussian case.

###### Theorem 1.6.

For any we have

 P(σn(MgR)≤tn−1/2)=∫t01+√x2√xe−x/2+√xdx+o(1)

as well as

 P(σn(MgC)≤tn−1/2)=∫t0e−xdx.

In other words, and . These distributions have been confirmed to be universal (in the asymptotic sense) by Tao and the second author [31].

In applications, one usually needs large deviation results, which show that the probability that is far from its mean is very small. For the lower bound, Rudelson and Vershyin [22] proved that for any

 P(σn≤tn−1/2)≤Ct+.999n, (10)

which is sharp up to the constant . For the upper bound, in a different paper [24], the same authors showed

 P(σn≥tn−1/2)≤Clogtt. (11)

Using Theorem 1.4, we improve this result significantly by proving an exponential tail bound,

###### Theorem 1.7 (Exponential upper tail for the least singular values).

Assume that the entries of are iid copies of a normalized subgaussian random variable in either or . Then there exist absolute constants depending on such that

 P(σn≥tn−1/2)≤C1exp(−C2t).

Our proof of Theorem 1.7 is totally different from that of [24]. As showed in the gaussian case, the exponential bound is sharp, up to the value of .

### 1.8. Eigenvectors of random iid matrices.

Our theorem is closely related to (and in fact was motivated by) recent results concerning delocalization and normality of eigenvectors of random matrices. For random Hermitian matrices, there have been many results achieving almost optimal delocalization of eigenvectors, starting with the work [16] by Erdős et al. and and continued by Tao et al. and by many others in [32, 36, 9, 10, 11, 12, 13, 35, 2, 3, 4]. Thanks to new universality techniques, one also proved normality of the eigenvectors; see for instance the work [19] by Knowles and Yin, [33] by Tao and Vu, and [5] by Bourgade and Yau.

For non-Hermitian random matrix , much less is known. Let be the eigenvalues with . Let be the corresponding unit eigenvectors (where are chosen according to the Haar measure from the eigensphere if the corresponding roots are multiple). Recently, Rudelson and Vershynin [26] proved that with overwhelming probability all of the eigenvectors satisfy

 ∥vi∥∞=O(log9/2n√n). (12)

By modifying the proof of Theorem 1.4, we are able sharpen this bound for eigenvectors of eigenvalues with small modulus.

###### Theorem 1.9 (Optimal delocalization for small eigenvectors).

Assume that the entries of are iid copies of a normalized subgaussian random variable in either or . Then for any fixed , with overwhelming probability the following holds for any unit eigenvector corresponding to an eigenvalue of with

 ∥x∥∞=O(√lognn).

We believe that the individual eigenvector in Theorem 1.9 satisfies the normality property (8), which would imply that the bound is optimal up to a multiplicative constant. Figure 1 below shows that the first coordinate of the eigenvector corresponding to the smallest eigenvalue behaves like a gaussian random variable.

Finally, let us mention that all of our results holds (with logarithmic correction) under a weaker assumption that the variable is sub-exponential, namely there are positive constants and such that for all ; see Remark 2.3.

The rest of the paper is organized as follows. After introducing supporting lemmas in Section 2, we will prove (6) and Theorem 1.9 in Section 3. Section 6 and Section 7 are devoted to proving (8) and (9) correspondingly, while (7) will be shown in Section 4. Finally, we prove Theorem 1.7 in Section 5.

## 2. The lemmas

We will use the following well-known concentration result of distances in random non-Hermitian matrices (see for instance [32, Lemma 43], [28, Corollary 2.19] or [36]).

###### Lemma 2.1.

Let be a subspace of co-dimension in and let be the projection matrix onto the complement of . Let and be independent random vectors where are iid copies of an -normalized sub-gaussian random variable . Then the following holds.

1. the distance from to is well concentrated around its mean,

 P(∥PHu∥2−√m|≥t)≤exp(−t2/K40);
2. the correlation is small,

 P(|vTPHu|≥t)≤exp(−t2/K40).

More generally, we have

###### Lemma 2.2 (Hanson-Wright inequality).

There exists an absolute constant such that the following holds for any sub-gaussian -normalized random variable . Let be a fixed Hermitian matrix. Consider a random vector where the entries are iid copies of . Then

 P(|x∗Ax−Ex∗Ax|>t)≤2exp(−cmin(t2K40∥A∥2HS,tK20∥A∥2)).

In particularly, for any

 P(|x∗Ax−Ex∗Ax|>t∥A∥HS)≤O(exp(−ct2K40)+exp(−ctK20)).

This lemma was first proved by Hanson and Wright in a special case [18]. The above general version is due to Rudelson and Vershynin [25]; see also [36] for related results which hold (with logarithmic correction) for sub-exponential variables.

###### Remark 2.3.

As mentioned at the end of the introduction, the results of this paper hold (with logarithmic correction) for sub-exponential variables. One can achieve this by repeating the proofs, using the results from [36] (such as [36, Corollary 1.6]) instead of Lemmas 2.1 and 2.2. We leave the details as an exercise.

The next tool is Berry-Esséen theorem for frames, proved by Tao and Vu in [31]. As the statement is technical, let us first warm the reader up by the classical Berry-Esséen theorem.

###### Lemma 2.4 (Berry-Esséen theorem).

Let be real numbers with and let be a -normalized random variable with finite third moment . Let denote the random sum

 S=∑iviξi,

where are iid copies of . The for any we have

 P(|S|≤t)=P(|gF|≤t)+O(∑i|vi|3),

where the implied constant depends on the third moment of . In particularly,

 P(|S|≤t)=P(|gF|≤t)+O(maxi|vi|).
###### Lemma 2.5 (Berry-Esséen theorem for frames).

[31, Proposition D.2] Let , and let be an -normalized and have finite third moment. Let be a normalized tight frame for , in other words

 v1v∗1+⋯+vnv∗n=Ik,

where is the identity matrix on . Let denote the random variable

 S=ξ1v1+⋯+ξnvn,

where are iid copies of . Similarly, let be formed from iid copies of the standard gaussian random variable . Then for any measurable and for any we have

 P(G∈Ω/∂εΩ)−O(k5/2ε−3maxj∥vj∥∞)≤P(S∈Ω)≤P(G∈Ω∪∂εΩ)+O(k5/2ε−3maxj∥vj∥∞),

where is the collection of such that .

## 3. Treatment for the largest coordinate: proof of (6) and Theorem 1.9

### 3.1. Proof of (6)

By a union bound, it suffices to show that for sufficiently large

 P(|x1|≪√mn)=1−O(nexp(−mC)). (13)

Let , be the columns of . Because , among the subset sums , there is a subset sum which is smaller than . With a loss of a factor in probability, without loss of generality we will assume that

 |x2|2+⋯+|xm|2≤mn−1.

Let be the subspace generated by . Let be the orthogonal projection from onto . We view as a Hermitian matrix of size satisfying . It is known (see for instance [22, 31, 6]) that with probability we have , which implies .

Recall that by definition,

 x1c1+x2c2+⋯+xmcm+∑i≥m+1xici=0. (14)

Applying , we have

 x1PHc1=−PH(x2c2+⋯+xmcm),

which implies

 |x1|2∥PHc1∥22=∥m∑j=2xjPHcj∥22=∑2≤j1≤j2≤mxj1¯xj2cTj1PHcj2:=∥Qx′∥22, (15)

where and . We remark that the here are not deterministic but depend on the column vectors .

As is linear, and as , we have

 ∥Qx′∥2≤supy∈Fm−1,∥y∥=1∥Qy∥2√mn−1.

Thus

 |x1|2∥PHc1∥22≤supy∈Fm−1,∥y∥=1∥Qy∥22mn−1. (16)

We are going to estimate the operator norm basing the randomness of .

###### Lemma 3.2.

There exists a sufficiently large constant such that

 Pc2,…,cm(∥Q∥22≥Cm)=O(exp(−2(m−1))).

Assume Lemma 3.2 for the moment, we can complete the proof of (13) as follows. First, by Lemma 2.1, with probability at least . We then deduce from (16) and from Lemma 3.2 that

 P(|x1|2≫mn)≤O(nmexp(−m−14K40)+exp(−2(m−1))),

completing the proof.

To prove Lemma 3.2, we first estimate for any fixed . We will show

###### Lemma 3.3.

There exists a sufficiently large constant such that for any fixed with ,

 Pc2,…,cm(∥Qy∥22≥Cm)=O(exp(−4(m−1))).

The deduction of Lemma 3.2 from Lemma 3.3 is standard, we present it here for the sake of completeness.

###### Proof.

(of Lemma 3.2) Let be a -net for the set of unit vectors in . As is well known, one can assume that . Applying Lemma 3.3,

 P(∃y∈N,∥Qy∥22≥2m)=O(|N|exp(−4(m−1)))=O(exp(−2(m−1))).

Now for any unit vector , there exists such that , and thus by the triangle inequality

 ∥Qy′∥2≤∥Qy∥2+∥Q(y−y′)∥2≤∥Qy∥2+∥Q∥2/2.

This implies that , and hence

 ∥Q∥2≤2supy∈N∥Qy∥2.

###### Proof.

(of lemma 3.3) Let be the concatenation of , then can be written as a bilinear form where is the tensor product of and , with . By construction, consists of blocks where the -th block is the matrix . It thus follows that

 ∥P∥2=∥y∥22=1.

Applying Lemma 2.2 to , we have

 P(|S−trP|≥t)≤O(exp(−ct2K40∥P∥2HS)+exp(−ctK20∥P∥2)).

It is easy to show that

 trP=(m−1)m−1∑j=0|yj|2=m−1.

Taking , we obtain

 P(S≥(α+1)(m−1))≤O(exp(−16(m−1)2∥P∥2HS)+exp(−4(m−1))).

To this end, by properties of a tensor product,

 ∥P∥2HS=∥yyT∥2HS∥PH∥2HS=m−1,

which implies that

 P(S≥(α+1)(m−1))=O(exp(−4(m−1))). (17)

We now turn to the eigenvectors.

### 3.4. Proof of Theorem 1.9

We will be working with the perturbed matrix where and . By a standard net argument, it suffices to show the following

###### Theorem 3.5.

For any fixed with , the following holds with overwhelming probability with respect to : if then satisfies (6).

Equivalently, we show that for any unit vector satisfying the condition of Theorem 3.5, then

 P(|x1|≪√mn)=1−O(nexp(−mC)). (18)

We will proceed as in Subsection 3.1 by assuming that , where instead of we have

 x1c1+x2c2+⋯+xmcm+∑i≥m+1}xici=r (19)

for some vector with norm , where is the -th column of the matrix .

Projecting onto , we obtain

 |x1|2∥PHc1∥22≤2∥m∑j=2xjPHcj∥22+2∥r∥22≤2∑2≤j1≤j2≤mxj1¯xj2c∗j1PHcj2+2n2.

Note that here as , Lemma 2.1 is still effective, which yields with probability at least .

To estimate the right hand side, set . Similarly to Lemma 3.2, we will establish

###### Lemma 3.6.

There exists a sufficiently large constant such that

 Pc2,…,cm(∥Q∥22≥Cm)=O(exp(−2(m−1))).

It is clear that (18) follows from Lemma 3.6. Furthermore, similarly to our treatment in the previous subsection, for this lemma it suffices to show the following analog of Lemma 3.3 for any fixed .

###### Lemma 3.7.

There exists a sufficiently large constant such that for any fixed with ,

 Pc2,…,cm(∥Qy∥22≥Cm)=O(exp(−4(m−1))).

It remains to prove Lemma 3.7. Write , where is a -vector with at most one non-zero entry and is a random vector of iid entries. Thus

 ∑1≤i,j≤m−1yi¯yjc∗iPHcj =∑1≤i,j≤m−1yi¯yjc′i∗PHc′j +λ0∑1≤i,j≤m−1yi¯yjc′i∗PHfj +λ0∑1≤i,j≤m−1yi¯yjfi∗PHc′j +|λ0|2∑1≤i,j≤m−1yi¯yjfi∗PHfj :=S+S′+S′′+S′′′.

For , argue similarly as in the proof of Lemma 3.3, we obtain the following analog of (17)

 P(S≥(α+1)(m−1))=O(exp(−4(m−1)).

Next, we have

 |S′|=|λ0∑1≤i,j≤myi¯yjc′i∗PHfj| =|λ0(∑1≤i≤m−1yic′i∗)(∑1≤j≤m−1yjPHfj)| =|λ0(∑1≤i≤m−1yic′i∗)PH(∑1≤j≤m−1¯yjfj)|,

Additionally, as and , by the properties of the vector has norm at most . As such, the subgaussian random variable has variance at most one, and hence

 P(|(∑1≤i≤m−1yic′i∗)z|≥m−1)=o(exp(−4(m−1)).

We can argue similarly for to obtain the same bound. Finally, notice that

 |S′′′|=|λ0|2∥PH(∑1≤j≤m−1yjfj)∥22≤|λ0|2.

Putting all the estimates together, we obtain Lemma 3.7 as long as .

## 4. Treatment for the smallest coordinate: proof of (7)

Let be the random matrix of size obtained from by deleting its first column. Set , we have

 Ax=x1c1+Mx′=0.

As it is known that with probability at least the matrix is invertible; in this case, we can write

 x1M−1c1=−x′.

Since

 |x1|2∥M−1c1∥22=∥x′∥22=1−|x1|2,

we obtain

 |x1|2=11+∥M−1c1∥22=11+∑n−1j=1σ−2j|cT1uj|2,

where are the singular values of with corresponding left-singular vectors .

We now condition on . By the sub-gaussian property of the entries, we can easily show that there is a constant such that with overwhelming probability (with respect to )

 |cT1u1|≤Clogn∧⋯∧|cT1un−1|≤Clogn. (20)

We will need the following estimate

###### Claim 4.1.

With respect to we have

 P(n−1∑i=1σ−2i≤n3log8n)≥1−1nlogn.
###### Proof.

(of Claim 4.1) By (10)

 P(σ−1n−1≤n3/2log3n)≥1−1nlog3n.

Thus by the union bound

 P(log2n∑i=1σ−2n−i≤n3log8n)≥1−1nlogn.

For the remaining sum , by the Cauchy-interlacing law,

 n−log2n−1∑j=1σ−2j(M)≤n−logn−1∑j=1σ−2j(M′),

where is obtained from by deleting its first columns.

On the other hand, by the negative second moment identity (see [30, Lemma A.4])

 n−log2n∑j=1σ−2j(M′)=n−log2n∑j=1d−2j, (21)

where is the distance from the th row of to the hyperplane spanned by the remaining rows of . Using Theorem 2.1 and the union bound, we obtain, for some constant and with overwhelming probability, that simultaneously for all . This implies that with overwhelming probability with respect to

 n−log2n∑j=1σ−2j≪nlog2n.

Now by (20) and Claim 4.1, we have

 P(|x1|≫1n3log10n)≥1−1nlogn.

By the union by, we have with probability at least ,

<
 |x1|≥1n3log10n∧⋯∧|xn|≥1n3log10n,