Limiting spectral distribution of sample autocovariance matrices

Limiting spectral distribution of sample autocovariance matrices

[ [    [ [    [ [ Department of Statistics, Stanford University, 390 Serra Mall, Stanford, CA 94305-4065, USA.
\printeade1
Statistics and Mathematics Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata 700108, India. \printeade2 Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA. \printeade3
\smonth9 \syear2011\smonth12 \syear2012
\smonth9 \syear2011\smonth12 \syear2012
\smonth9 \syear2011\smonth12 \syear2012
Abstract

We show that the empirical spectral distribution (ESD) of the sample autocovariance matrix (ACVM) converges as the dimension increases, when the time series is a linear process with reasonable restriction on the coefficients. The limit does not depend on the distribution of the underlying driving i.i.d. sequence and its support is unbounded. This limit does not coincide with the spectral distribution of the theoretical ACVM. However, it does so if we consider a suitably tapered version of the sample ACVM. For banded sample ACVM the limit has unbounded support as long as the number of non-zero diagonals in proportion to the dimension of the matrix is bounded away from zero. If this ratio tends to zero, then the limit exists and again coincides with the spectral distribution of the theoretical ACVM. Finally, we also study the LSD of a naturally modified version of the ACVM which is not non-negative definite.

\kwd
\aid

0 \volume20 \issue3 2014 \firstpage1234 \lastpage1259 \doi10.3150/13-BEJ520 \newproclaimassumptionAssumption \newremarkremRemark[section]

\runtitle

Autocovariance matrix

{aug}

1]\initsA.\fnmsAnirban \snmBasak\thanksref1label=e1]anirbanb@stanford.edu, 2]\initsA.\fnmsArup \snmBose\corref\thanksref2label=e2]bosearu@gmail.com and 3]\initsS.\fnmsSanchayan \snmSen\thanksref3label=e3]sen@cims.nyu.edu

autocovariance function \kwdautocovariance matrix \kwdbanded and tapered autocovariance matrix \kwdlinear process \kwdspectral distribution \kwdstationary process \kwdToeplitz matrix

1 Introduction

Let be a stationary process with and . The autocovariance function (ACVF) and the autocovariance matrix (ACVM) of order are defined as:

and

To every ACVF, there corresponds a unique distribution, called the spectral distribution, which satisfies

(1)

We shall assume that

(2)

Then has a density, known as the spectral density of or of , which equals

(3)

The non-negative definite estimate of is the sample ACVM

(4)

The matrix is a random matrix. Study of the behavior of random matrices, when the dimension goes to , have been inspired by both theory and applications. This is done by studying the behavior of its eigenvalues. For instance a host of results are known for the related sample covariance matrix, in the i.i.d. set-up and its variations; results on its spectral distribution, spacings of the eigenvalues, spectral statistics etc. encompasses a rich theory and a variety of applications.

The autocovariances are of course crucial objects in time series analysis. They are used in estimation, prediction, model fitting and white noise tests. Under suitable assumptions on , for every fixed , almost surely (a.s.). There are also results on the asymptotic distribution of specific functionals of the autocovariances. Recently, there has been growing interest in the matrix itself. For instance, the largest eigenvalue of does not converge to zero, even under reasonable assumptions (see Wu and Pourahmadi WuPourahmadi (), Arcones mcmurraypolitis2010 () and Xiao and Wu xiaowu ()).

In this article we study the behavior of , and a few other natural estimators of , as , through the behavior of its spectral distribution. We investigate the consistency (in an appropriate sense) of these estimators.

For a real symmetric matrix with eigenvalues , the Empirical Spectral Distribution (ESD) of is defined as,

(5)

If converges weakly to , we write . For any random variable with distribution , or will be called the Limiting Spectral Distribution (or measure) (LSD) of . The entries of are allowed to be random. In that case, the limit is taken to be either in probability or (as in this paper) in a.s. sense.

Any matrix of the form is a Toeplitz matrix and hence and (with a triangular sequence of entries) are Toeplitz matrices. For symmetric, from Szegö’s theory of Toeplitz operators (see Böttcher and Silbermann bottchersilberman ()), we note that if , then the LSD of equals where is uniformly distributed on and , . In particular if (2) holds, then the LSD of equals where is as defined in (3).

We call a sequence of estimators of consistent if its LSD is where is uniformly distributed on . We show that is inconsistent (see Theorem 2.1(c)). We also show that if is modified by suitable tapering or banding then the modified estimators are indeed consistent (see Theorem 2.3(b) and (c)). This phenomenon is mainly due to the estimation of a large number of autocovariances by . Such inconsistency of sample covariance matrices has also been observed in the context of high-dimensional multivariate analysis, and is now well understood, with the help the results from Random Matrix Theory.

To obtain the convergence of ESD of such estimators, we impose a reasonable condition on the stationary process ; we assume it to be a linear process, that is,

(6)

where satisfies a weak condition and is a sequence of independent random variables with appropriate conditions. The simulations of Sen SenA () suggested that the LSD of exists and is independent of the distribution of as long as they are i.i.d. with mean zero and variance one. Basak Basak () and Sen SenS () initially studied, respectively, the special cases where is an i.i.d. process or is an MA(1) process.

In Theorem 2.1, we prove that, if satisfies (6) and then the LSD of exists, and it is universal when are independent with mean zero and variance 1 and are either uniformly bounded or identically distributed. We further show that LSD is unbounded when for all , and thus is inconsistent, since is of bounded support.

When is a finite order process, the limit moments can be written as multinomial type sums of the autocovariances (see (13)). When is of infinite order, the limit moments are the limits of these sums as the order tends to infinity. Additional properties of the limit moments are available in the companion report Basak, Bose and Sen Basakbosesen ().

Incidentally, reminds us of the sample covariance matrix, , for the i.i.d. set-up, whose spectral properties are well known. See Bai Bai99 () for the basic references on . In particular, the LSD of (with i.i.d. entries) under suitable conditions is the Marčenko–Pastur law and is supported on the interval . Thus, the LSD of is in sharp contrast.

The proof of Theorem 2.1 is challenging, mainly because of the non-linear dependence, and the Teoplitz structure of . Bai and Zhou Baizhou2008 () and Yao Yao2012 () study the LSD of the sample covariance matrix of where are i.i.d. -dimensional vectors with some dependence structure. They establish the existence of the LSD by using Stieltjes transform method. Here this approach fails completely due to the strong row column dependence. In fact no Stieltjes transform proof for even the Toeplitz matrix with i.i.d. input is known. Moreover one added advantage in both the above articles is the existence of independent columns, which we lack here, because we have only one sample from the linear process . The methods of Xiao and Wu xiaowu () is also not applicable in our set-up because they deal with only the maximum eigenvalue of the difference of , and , not the ESD of .

Now consider a sequence of integers , and a kernel function . Define

(7)

as the kernel density estimate of . Considering this as a spectral density, the corresponding ACVF is given by (for ):

and is otherwise. This motivates the consideration of the tapered sample ACVM

(8)

If is a non-negative definite function then is also non-negative definite. Among other results, Xiao and Wu xiaowu () also showed that under the growth condition for a suitable and suitable conditions on , the largest eigenvalue of tends to zero a.s. Theorem 2.3(c) states that under the minimal condition , if is bounded, symmetric and continuous at 0 and , then is consistent. This is a reflection of the fact that the consistency notion of Xiao and Wu xiaowu () in terms of the maximum eigenvalue is stronger than our notion and hence our consistency holds under weaker growth condition on .

The second approach is to use banding as in McMurry and Politis mcmurraypolitis2010 () who used it to develop their bootstrap procedures. We study two such banded matrices. Let be such that . Then the type I banded sample autocovariance matrix is same as except that we substitute for whenever . This is the same as with . The type II banded ACVM is the principal sub matrix of . Theorem 2.3(a) and (b) states our results on these banded ACVMs. In particular, the LSD exists for all and is unbounded when . When , the LSD is and thus those estimate matrices are consistent.

A related matrix, which may be of interest, especially to probabilists, is,

(9)

does not have a “data” interpretation unless one assumes we have observations . It is not non-negative definite and hence many of the techniques applied to are not available for it. Theorem 2.2 states that its LSD also exists but under stricter conditions on . Its moments dominate those of the LSD of when for all (see Theorem 2.2(c)) even though simulations show that the LSD of has significant positive mass on the negative axis.

2 Main results

We shall assume that is a linear (MA()) process

(10)

where is a sequence of independent random variables. A special case of this process is the so called MA() where for all . We denote this process by

Note that working with two sided moving average entails no difference. The conditions on and on that will be used are:

{assumption}

(a) are i.i.d. with and .

(b) are independent, uniformly bounded with and .

{assumption}

(a) for all .

(b) .

The series in (10) converges a.s. under Assumptions 2(a) (or (b)) and 2(b). Further, and are strongly stationary and ergodic under Assumption 2(a) and weakly (second order) stationary under Assumptions 2(b) and 2(b).

The ACVF of and are given by

(11)

Let stand for suitable integers and let

(12)
Theorem 2.1 ((Sample ACVM))

Suppose Assumption 2(a) or (b) holds.

(a) Then a.s., which is non-random and does not depend on the distribution of . Further,

(13)

where are universal constants independent of the and the . They are defined by a limiting process given in (25) and (39).

(b) Under Assumption 2(b), a.s., which is non-random and independent of the distribution of . Further for every fixed , as ,

(c) Under Assumption 2(a), has unbounded support and if . Consequently, if Assumption 2(a) and (b) holds, then has unbounded support. Therefore is inconsistent.

Theorem 2.2

Suppose Assumption 2(b) holds. Then conclusions of Theorem 2.1 continue to hold for , , and (13) holds with modified universal constants .

{rem}

(i) From the proofs, it will follow that the limit moments and of the above LSDs are dominated by which are the th moment of a Gaussian variable with mean zero and variance . Hence the limit moments uniquely identify the LSDs.

i(ii) All the above LSDs have unbounded support while has support contained in . Simulations show that the LSD of has positive mass on the negative real axis.

(iii) Since is not non-negative definite, the proof of Theorem 2.2 for is different from the proof of Theorem 2.1 and needs Assumption 2(b). A detailed discussion on the different assumptions is given in Remark 3.3 at the end of the proofs.

 (iv) Unfortunately, the moments of the LSD of has no easy description. There is no easy description of the constants either. To explain briefly the complications involved in providing explicit expressions for these quantities, consider the much simpler random Toeplitz matrix where is i.i.d. with mean zero variance 1. Bryc, Dembo and Jiang bry () and Hammond and Miller hammil05 () have showed that the LSD exists and is universal. The limit moments are of the form

where the sum is over the so called matched words and for each , is given as the volume of a suitable subset of a -dimensional hypercube. These subsets are defined through the intersection of hyperplanes which arise from the function . Thus the value of can be calculated by performing multiple integration but must be done only via numerical integration when becomes large. For more details, see Bose and Sen Bose08 (). For our set up, definition of matched words is generalised and is given in Section 3 and are given by more complicated integrals. This is the main reason why the moments of the LSD cannot be obtained in any closed form, even when is the i.i.d. process.

Bose and Sen Bose08 () considered the Toeplitz matrix and showed that its LSD exists under suitable conditions. The moments of the LSD can be written in terms of and . This relation is given by

(14)

where is uniformly distributed on .

Even a relation like (14) relating the i.i.d. process case to the linear process case eludes us for the autocovariance matrix. This is primarily due to the non-linear dependence of the autocovariances on the driving . One of the Referees has pointed out that in this context, the so called “diagram formula” (see Arcones Arcones (), Giraitis, Robinson and Surgailis GRS () for details) may be useful, presumably to obtain a formula relating the linear process case to the i.i.d. case.

It is also noteworthy that no limit moment formula or explicit description of the LSD is known for the matrix where is the non-symmetric Toeplitz matrix defined using an i.i.d. sequence (see Bose, Gangopadhyay and Sen bosegangosen10 ()).

Theorem 2.3 ((Banded and tapered sample ACVM))

Suppose Assumption 2(b) holds.

(a) Let . Then all the conclusions of Theorem 2.1 hold for and with modified universal constants and , respectively, in (13). Same conclusions continue to hold also for .

(b) If , and Assumption 2(b) holds, the LSD of and are .

(a) and (b) remain true for and under Assumption 2(a).

(c) Suppose Assumption 2(b) holds. Let be bounded, symmetric and continuous at 0, , for . Suppose such that . Then the LSD of is for .

{rem}

(i) When is non-negative definite, Theorem 2.3(c) holds under Assumption 2(a).

i(ii) Xiao and Wu xiaowu () show that under the assumption (for a suitable ) and other conditions, the maximum eigenvalue of tends to zero a.s.

(iii) Each of the LSDs above are identical for the combinations , and . See Basak, Bose and Sen Basakbosesen () for a proof which is based on properties of the limit moments. The LSDs of are identical for processes with autocovariances and . The same is true of all the above LSDs.

3 Proofs

Szegö’s theorem (or its triangular version) for non-random Toeplitz matrices needs summability (or square summability) of the entries and that is absent (in the a.s. sense) for . As an answer to a question raised by Bai Bai99 (), Bryc, Dembo and Jiang bry () and Hammond and Miller hammil05 () showed that for the random Toeplitz matrix where is i.i.d. with mean zero variance 1, the LSD exists and is universal (does not depend on the underlying distribution of ). Bose and Sen Bose08 () considered the Toeplitz matrix and showed that the LSD of exists under the following condition: satisfies (6), ; further, are independent with mean zero and variance 1 and are (i) either uniformly bounded or (ii) are identically distributed and . However, none of the above two results are applicable to due to the non-linear dependence of on .

Our two main tools will be (i) the moment method to show convergence of distribution and (ii) the bounded Lipschitz metric to reduce the unbounded case to the bounded case and also to prove the results for the infinite order case from the finite order case. Suppose is a sequence of symmetric random matrices. Let be the th moment of its ESD. It has the following nice form:

Then the LSD of exists a.s. and is uniquely identified by its moments given below if the following three conditions hold:

(C1) for all (convergence of the average ESD).

(C2) .

(C3) satisfies Carleman’s condition: .

Let denote the bounded Lipschitz metric on the space of probability measures on , topologising the weak convergence of probability measures (see Dudley Dudley ()). The following lemma and its proof is given in Bai Bai99 ().

Lemma 1

(a) Suppose and are real symmetric matrices. Then

(15)

(b) Suppose and are real matrices. Let and . Then

(16)

When , then without loss of generality for asymptotic purposes, we assume that . We visualise the full ACVM as the case with . When is a finite order moving average process with bounded , we use the method of moments to establish Theorem 2.1(a). The longest and hardest part of the proof is to verify (C1). We first develop a manageable expression for the moments of the ESD and then show that asymptotically only “matched” terms survive. These moments are then written as an iterated sum, where one summation is over finitely many terms (called “words”). Then we verify (C1) by showing that each one of these finitely many terms has a limit. The metric is used to remove the boundedness assumption as well as to deal with the infinite order case. Easy modifications of these arguments yield the existence of the LSD when in Theorem 2.3(a) and (b). The proof of Theorem 2.2 is a byproduct of the arguments in the proof of Theorem 2.1. However, due to the matrix now not being non-negative definite, we impose Assumption 2(b). The proof of Theorem 2.1(a) is given in details. All other proofs are sketched and details are available in Basak, Bose and Sen Basakbosesen ().

3.1 Proof of Theorem 2.1

The first step is to show that we can without loss of generality, assume that are uniformly bounded so that we can use the moment method. For a standard proof of the following lemma, see Basak, Bose and Sen Basakbosesen (). For convenience, we will write

Lemma 2

If for every satisfying Assumption 2(b), has the same LSD a.s., then this LSD continues to hold if satisfies Assumption 2(a).

Thus from now on we assume that Assumption 2(b) holds. Fix any arbitrary positive integer and consider the th moment. Then

To express the above in a neater and more amenable form, define

Then using (3.1) we can write the so called trace formula,

(18)

3.1.1 Matching and negligibility of certain terms

By independence of , if there is at least one component of the product that has no common with any other component. Motivated by this, we introduce a notion of matching and show that certain higher order terms can be asymptotically neglected in (18). We say:

is -matched (in short matched) if such that . When this means .

is minimal -matched (in short minimal matched) if there is a partition of ,

(19)

such that are in ascending order and

For example, for is matched but not minimal matched and is both matched and minimal matched.

Lemma 3

is matched but not minimal matched.

{pf}

Consider the graph with vertices . Vertices and have an edge if . Let connected components. Consider a typical . Let be the number of vertices in the th component. Since is matched, for all and for at least one . Hence, . That implies . Also if and are in the same connected component then . Hence, the number of ’s such that belongs to any given component is and the result follows. Now we can rewrite (18) as

where the three summations are over such that is, respectively, (i) minimal matched, (ii) matched but not minimal matched and (iii) not matched.

By mean zero assumption, . Since ’s are uniformly bounded, by Lemma 3, for some constant . So provided the limit exists,

(20)

Hence, from now our focus will be only on minimal matched words.

3.1.2 Verification of (C1) for Theorem 2.1(a)

This is the hardest and lengthiest part of the proof. One can give a separate and easier proof for the case . However, the proof for general and for are developed in parallel since this helps to relate the limits in the two cases.

Our starting point is equation (20). We first define an equivalence relation on the set of minimal matched . This yields finitely many equivalence classes. Then we can write the sum in (20) as an iterated sum where the outer sum is over the equivalence classes. Then we show that for every fixed equivalence class, the inner sum has a limit.

To define the equivalence relation, consider the collection of symbols (letters)

Any minimal matched induces a partition as given in (19). With this , associate the word of length where

(21)

As an example, consider and . Then the unique partition of and the unique word associated with are and , respectively.

Note that corresponding to any fixed partition , there are several associated with it and there are exactly words that can arise from it. For example, with consider the partition . Then the nine words corresponding to are where .

By a slight abuse of notation, we write if the partition corresponding to is same as . We will say that:

matches with (say ) iff and for some .

is pair matched if it is induced by a minimal matched (so matches with iff ).

This induces an equivalence relation on all minimal matched and the equivalence classes can be indexed by pair matched . Given such a , the corresponding equivalence class is given by

Then we rewrite (20) as (provided the second limit exists)

(23)

By using the autocovariance structure, we further simplify the above as follows. Let

Using the definitions of and of given in (12), we rewrite (23) as (for any set , denotes the number of elements in )

(24)

provided the following limit exists for every word of length .

(25)

To show that this limit exists, it is convenient to work with defined as

By Lemma 3, we have for every , . Thus, it is enough to show that exists.

For a pair matched , we divide its coordinates according to the position of the matches as follows. For , let the sets be defined as

Let and be defined as

Elements in are the indices where any matched letter appears for the first time and these will be called the generating vertices. has elements say and for simplicity we will write

Claim 1

Each element of is a linear expression (say ) of the generating vertices that are all to the left of the element.

{pf}

Let the constants in the proposed linear expressions be .

(a) For those elements of that are generating vertices, we take the constants as and the linear combination is taken as the identity mapping so that

and for all

(b) Using the relations between and induced by , we can write

for some such that and define for and