Joint CLT for top eigenvalues of sample covariance matrices of separable high dimensional long memory processes
Abstract
For , consider the sample covariance matrix
from a data set , where is a matrix having i.i.d. entries with mean zero and variance one, and are deterministic positive semidefinite Hermitian matrices of dimension and , respectively. We assume that is bounded in spectral norm, and is a Toeplitz matrix with its largest eigenvalues diverging to infinity. The matrix can be viewed as a data set of an dimensional long memory stationary process having separable dependence structure.
As and , we establish the asymptotics and the joint CLT for where denotes the th largest eigenvalue of , and is a fixed integer. For the CLT, we first study the case where the entries of are Gaussian, and then we generalize the result to some more generic cases. This result substantially extends our previous result in [27], where we studied , the case where and with having Gaussian entries.
In order to establish this CLT, we are led to study the first order asymptotics of the largest eigenvalues and the associated eigenvectors of deterministic Toeplitz matrices. In particular, we prove multiple spectral gap properties for these largest eigenvalues and a delocalization property for their associated eigenvectors.
1 Introduction
Motivation.
For , let be a matrix having i.i.d. entries with mean zero and variance one. Let be two deterministic positive semidefinite Hermitian matrices of dimension and , respectively. We define the separable empirical covariance matrix by
(1) 
If is identity, the matrix is the classical sample covariance matrix. Let be a positive definite Toeplitz matrix with covariance kernel function
(2) 
where , and is a slowly varying function at infinity. We say that a real function is slowly varying at infinity, or slowly varying for short, if it is asymptotically positive, and for any ,
Let be a fixed integer. In this article we study the asymptotics and joint CLT of the largest eigenvalues of in the regime and . In the sequel, this regime will be simply denoted as . We assume that the sequence is bounded in spectral norm and the empirical spectral distributions (ESD) converge weakly to a probability measure different from , where denotes the dirac measure at . We recall that the ESD of a Hermitian matrix is defined by
where are the eigenvalues of .
If we set , then can be viewed as a sample data set of an dimensional long memory stationary process with a separable dependence structure. Recall that a multidimensional process , where , is (second order) stationary if
(3) 
where is a positive definite function from to the set of matrices. Here we say that a matrix valued function is positive definite if it satisfies , and for any , the matrix is nonnegative definite. Note that the matrix is the covariance matrix of the concatenate vector
If in particular
for some nonnegative definite and some positive definite function , where denotes the Kronecker product, then we say that the process has a separable dependence structure, or simply that the process is separable in covariance. Equivalently, the separability condition can also be stated as , for some nonnegative definite matrix and positive definite function .
The classical empirical covariance model has been intensively studied in the last decades. These studies are mostly concentrated on the global behaviors of the spectrum, including limiting spectral distribution (LSD) ([26, 36, 20, 39, 34, 33]) and CLT for linear spectral statistics ([3, 4, 17, 28]); and also concentrated on the local behaviors of individual eigenvalues ([7, 18, 19, 15, 23, 8, 22, 5, 6]). Among these works, we mention that Baik et al. [7], Paul [29] and Bai and Yao [5, 6] studied the CLT of the spiked eigenvalues when all eigenvalues of are one except several who exceed the socalled BBP phase transition threshold, and they established the Gaussiantype fluctuations for these spiked eigenvalues of .
Recently several models of matrices with having a small number of divergent eigenvalues have been considered, in the context of principal component analysis (PCA) [21, 32, 37, 12] and long memory processes [27]. Although the assumptions in these various works differ, many results coincide with the degenerated case of Bai and Yao [6] after normalization (see for example [27]).
The model assumes that the columns of the data matrix are i.i.d. The separable model introduces a special type of correlations between columns, or different weights on columns, achieving certain balance between generality and simplicity. So it attracts more and more attention nowadays. A first result on the model is due to L. Zhang [42] on the LSD of . She proved that if and , then as , the ESD will converge weakly to a nonrandom probability measure for which if or , then ; otherwise for each , the CauchyStieltjes transform of , together with another two functions, denoted by and , is the unique solution in the set
to the following system of equations
(4) 
Note that we use CauchyStieltjes transform, opposite of the usual Stieltjes transform. For any probability measure on , its CauchyStieltjes transform is defined as
Later, D. Paul and J.W. Silverstein [30] proved in the case where is diagonal, that almost surely, for large enough , there is no eigenvalue of in any closed interval outside the support of the limiting spectral distribution (LSD). This is an extension of the results of [2] for . R. Couillet and W. Hachem [13] studied the analytical properties of the LSD when and , including the determination of the support of , extending the work of S. Choi and J.W. Silverstein [35] for to the separable model . The CLT for linear spectral statistics has also been studied by Z. Bai et al. [1] and H. Li et al. [24].
As for the extreme eigenvalues of , far less is known compared to the classical model . In [38], F. Yang proved the edge universality under the condition that the densities of the LSD’s have a regular squareroot behavior at the rightmost edge (soft edge). With this result, if we find the fluctuations of the largest eigenvalue at a soft edge in the case where the entries are Gaussian, then the fluctuations at a soft edge in general cases will be determined. However, even in the Gaussian case, these fluctuations are still unknown.
The general spiked eigenvalues of have not been studied either. In [41], a very particular case of this problem has been touched. One can refer to Remark 2.4 for the relations between the concerned results of [41] and the ours. This paper does not aim at fully filling this blank, but to initialize by studying the largest eigenvalues of . From [27] we know that defined by (2) has unbounded spectral norm, and after normalization by its largest eigenvalue, the spectrum of concentrates near zero. Then by [42], we know that almost surely as . In [27] we have proved that the distance between the largest two eigenvalues is bounded away from zero (the socalled spectral gap property). So when is identity, the largest eigenvalue of behaves analogously to a classical spiked model. In this paper, we will prove the multiple spectral gap property, that is, for any fixed integer , the distance between the th and the th largest eigenvalues is bounded away from zero. So we can expect that the largest eigenvalues of are spiked, whether or not is identity.
Another contribution from this paper is the study of lagest eigenvalues of Toeplitz matrix defined by (2). Especially we prove the multiple spectral gap property for its lagest eigenvalues. Precisely, in [27] we have proved that the top eigenvalues of normalized Toeplitz matrix converge to the corresponding eigenvalues of a compact operator . In this paper, by proving that all the nonzero eigenvalues of are simple, we prove the multiple spectral gap property. We also study the asymptotic behavior of the eigenvectors associated with these eigenvalues, and prove that they are delocalized, that is, if is a normalized eigenvector of associated with , then there exists a constant independent of such that . These two results are important to our CLT for the largest eigenvalues of , and may have independent interests.
Introduction to our results.
As in [27], we normalize by its spectral norm and consider . Then we study the asymptotics and fluctuations of largest eigenvalues of .
We first study the asymptotics of the largest eigenvalues of . In [27], the No eigenvalues outside the support of LSD and the exact separation property of the classical sample covariance matrix play important roles in the proof. For the separable model, analogous results are still unknown. However, from the proof of Theorem 2.3 in [27], we know that the largest eigenvalues of converge to the corresponding eigenvalues of a compact operator. Thus for any small , there is only a finite number of eigenvalues of outside . This allows to obtain the following: assume that the entries of have finite fourth moment, then for any , as , in probability,
(5) 
Moreover, if the entries ’s are Gaussian, the above convergence holds almost surely. Note that if is identity, this result is consistent with Proposition 2.1 in [27].
Then we study the fluctuations of the largest eigenvalues . But before this, we will study the largest eigenvalues of and prove a multiple spectral gap property: for any fixed integer , the distance between and is bounded away from zero for large .
On this basis, we study the fluctuations of largest eigenvalues of and consider the Gaussian case first. Let and be the eigenvalues of and respectively. Note that if the entries of are Gaussian, then the eigenvalues of have the same distribution as the eigenvalues of
So we can study the generic model , and it is enough to assume that and are diagonal with diagonal elements and , respectively. We also assume that and are bounded in , and , and for any , the number of eigenvalues of outside is bounded. Moreover we assume that satisfies the multiple spectral gap condition, and assume that the entries of have finite sixth moment. Denote
(6) 
where is the largest solution of the equation
(7) 
We claim that for large enough , the equation (7) admits positive solutions. Also note that under the conditions on and , we have and as . See Remark 2.3 for details. Then, we have
(8) 
where . As a corollary, if the entries of is standard complex Gaussian, or if they are standard real Gaussian and , are also real, then no matter whether and are diagonal, the CLT (8) holds. And we have if is real Gaussian, and if is complex Gaussian.
This result is consistent with Theorem 2.2 in [27] because if , then the equation (7) becomes
(9) 
We also note that in this particular case has a closed formula. This is no longer the case for general . However we can express within a power series of the quantity who tends to as : Let
then
(10) 
We will also see that if , then , and the above expression of is consistent with (9).
Then we generalize the CLT (8) to some nondiagonal . We preserve the other conditions on , and add new assumptions that the first or the second moment of the ESD of converges to in the speed of , i.e.
Intuitively, these two conditions ensure certain concentration of the eigenvalues near zero. Indeed we have assumed that all but a finite number of eigenvalues of are smaller than any , so the eigenvalues of are concentrated near , and asymptotically
Therefore the condition expresses a higher concentration of the eigenvalues near than the condition .
If where is the parameter in the definition (2), the normalized Toeplitz matrix satisfies ; and if , we have . It is also known that is a parameter measuring the decay of the autocovariance function of the process. Note that the value is the threshold between short memory process and long memory process. If , then the process has short memory, and is uniformly bounded. In this case we could expect TracyWidom fluctuations for the largest eigenvalues of .
We assume that has enough order of moments (the order of moments may vary depending on the conditions). We still assume that is diagonal. Let be a normalized eigenvector (with ) associated with . Denote the LévyProkhorov distance. Then, we prove
as , where is defined in (6) and with
We state our result in the form of LévyProkhorov distance because in general we do not assume the convergence of . But if we have the convergence , then the CLT can be stated in the usual form
Two particular cases are of special interest. First assume that and are both diagonal. From the formula of , we have . Thus
This coincides with (8). Secondly, assume that is diagonal, is real, and as for any , then one can check that
Thus in this case we have
(11) 
From Proposition 2.2 we know that the Toeplitz matrices are in this case. Note that in some other lectures are often assumed to be real with variance one, for which (11) is the same as the real Gaussian case; or are assumed to be complex with and , for which (11) is the same as the complex Gaussian case. In this sense the CLT result (11) is universal, though we do not have such kind of assumptions. This explains the phenomenon in Simulation 3(b) of [27].
The above result can be applied to for those whose parameter is in . The case where remains unsolved.
Organizations.
This paper is organized as follows. In Section 2 we state our main theorems. This section is divided in three parts. In 2.1, we state the results on Toeplitz matrices ; in 2.2, we state the asymptotics of the largest eigenvalues of ; in 2.3 we state the CLT for largest eigenvalues of in the case where are diagonal, or where are Gaussian; in 2.4, we present some generalizations of the CLT with nondiagonal . The other sections contains the proofs of these results.
Notations.
For a Hermitian operator or matrix , we denote its real eigenvalues by decreasing order as
We also denote the largest eigenvalue of by . For a matrix or a vector , we use to denote the transpose of , and the conjugate transpose of .
The kernel of a linear operator , is denoted by and defined by
The spectrum of is denoted by .
For a matrix , we denote the th row of , and the submatrix of obtained by deleting the th row. Similarly denotes the submatrix of obtained by deleting the th row and the th column. When using these subscripts, we do not indicate the dependence on . By convention, these subscripts have higher priority than the transpose or conjugate transpose, for example is the conjugated transposition of the submatrix .
We denote the or norm by . For a matrix or a linear operator , the norm of induced by vector norm is denoted by , and we recall that . The or norm will be abbreviated as . We say that a function or a vector is "normalized" or "unit length" when or . When functions or vectors are said to be "orthonormal", they will be implicitly considered as elements of a Hilbert space. The inner product of two elements of a Hilbert space is denoted by .
For two probability measures and on , we denote their LévyProkhorov distance by which is defined by
(12) 
where is defined by
It is well known that this distance metrizes the weak convergence. For two random variables with distributions , respectively, we sometimes write , or which all mean .
Given , we denote by the integer satisfying . Given two sequences of nonnegative numbers , we denote
The notations and mean and , respectively. If are random variables, the notation means that in probability. The notations and denote convergence in distribution and in probability, respectively. If are measures, we denote with a slight abuse of notation for the weak convergence of to .
Definition 1.
We say that a sequence of events hold with high probability, if ; with low probability, if ; with overwhelming probability, if for any , ; with tiny probability, if for any , .
The cardinal of a set is denoted by . In the proofs we use to denote a constant that may take different values from one place to another.
2 Main theorems
2.1 Spectral properties of Toeplitz matrices
We collect our results on Toeplitz matrices in this section. Let be a Toeplitz matrix defined by
(13) 
with a function in the form
(14) 
where and is a slowly varying function at infinity. Let be the operator defined on by
(15) 
In [27] We have established the relation between the eigenvalues of and the eigenvalues of . From the proof of Theorem 2.3 of [27] we know that the operator is compact and positive semidefinite. It has infinitely many positive eigenvalues. And for any , we have
(16) 
Using the minmax formula for the largest eigenvalue and an argument by absurd, we have also proved that is simple, so that we proved the spectral gap property for the largest two eigenvalues of :
In this paper, using a different method, we will prove that all nonzero eigenvalues of are simple. As a consequence we prove the multiple spectral gap property for any th largest eigenvalue of :
Proposition 2.1.
All nonzero eigenvalues of the operator defined by (15) with are simple, and the associated eigenfunctions are continuous in .
We note that is selfadjoint, so for any nonzero eigenvalue , its algebraic multiplicity equals to its geometric multiplicity, which is defined as . For more information about algebraic multiplicity, see [25]. So here we say that a nonzero eigenvalue is simple, it means that
In the next proposition, we provide a quantitative description of the eigenvectors associated with for any fixed .
Proposition 2.2.
For any , let be the normalized eigenfunction of associated with , and be a normalized eigenvector of associated with . Then, up to a change of sign, we have
(17) 
From this proposition we deduce the delocalization of eigenvector associated with for any fixed . Indeed by (17), for large enough , we have
and because is continuous on , we have . Thus we conclude that
Let . The following proposition provides the decay of moments of the ESD .
Proposition 2.3.
Let be defined as above and .

If , then
(18) 
If , then
(19)
2.2 Convergence of largest eigenvalues of separable sample covariance matrix
For , let
(20) 
where are respectively and deterministic positive semidefinite Hermitian matrices, and is a matrix having i.i.d. entries . Let
be the eigenvalues of and respectively. Let be a fixed integer. We assume that the following assumptions hold:

The entries satisfy

The spectral norm is bounded in , and the ESD of converges weakly as , to a probability measure .

There exists a decreasing sequance of nonzero positive numbers
converging to such that for any , we have
For further use, we will prove a concentration lemma for the largest eigenvalues of which will assume the following conditions.

The Hermitian matrices and are diagonal:

(Bound condition) There exists a sequence of positive numbers such that almost surely for large enough ,
Note that under 3, we have , and for any ,
Remark 2.1.
We take two examples for which the bound condition 5 holds. The first case is where with some . In this case, we have
where we have assumed that the convergence rate of to is slower than any preassigned rate. Then by BorelCantelli’s Lemma, the bound condition holds.
The second case is where , and does not depend on for any fixed . In other words, are all from an infinite double array . In this case, by the truncation lemma 2.2 of [40], the bound condition holds.
Recall that we use to denote .
Proposition 2.4.
Remark 2.2.
The almost sure convergence under 4 and 5 is in fact a byproduct of Lemma 4.1 which is needed by the proof of CLT 2.6. However this does not allow to conclude the a.s. convergence when ’s are Gaussian. Indeed if the entries of are i.i.d real Gaussian variables, and if or are complex and non diagonal, then we cannot diagonalize or because the real Gaussian vectors are not unitary invariant. Thus we will proceed an independent proof for Gaussian case with help of a Gaussian concentration inequality.
Applying the above generic result to the special case of , and combining with (16), we obtain the following result:
2.3 CLT for largest eigenvalues: Diagonal & Gaussian case
In this section, we assume that , are diagonal, and study the CLT for largest eigenvalues of . As a corollary, we obtain the result for Gaussian case.

The sixth moment of the entries is finite:

The largest eigenvalues of satisfy the multiple spectral gap property:
For we define
(21) 
For , let be the largest solution of the equation
(22) 
Remark 2.3.
Note that if not all ’s are , and if , then from the graph of the function , we see that the equation on admits real solutions.