On testing for high-dimensional white noise

# On testing for high-dimensional white noise

\fnms  Zeng \snmLilabel=e1]zxl278@psu.edu [    \fnms  Clifford \snmLam label=e3]c.lam2@lse.ac.uk [    \fnms  Jianfeng \snmYao label=e2]jeffyao@hku.hk [    \fnms  Qiwei \snmYaolabel=e4]q.yao@lse.ac.uk [ London School of Economics, Pennsylvania State University, The University of Hong Kong Department of Statistics
Pennsylvania State University
Department of Statistics and Actuarial Science
The University of Hong Kong
Department of Statistics
London School of Economics and Political Science
###### Abstract

Testing for white noise is a classical yet important problem in statistics, especially for diagnostic checks in time series modeling and linear regression. For high-dimensional time series in the sense that the dimension is large in relation to the sample size , the popular omnibus tests including the multivariate Hosking and Li-McLeod tests are extremely conservative, leading to substantial power loss. To develop more relevant tests for high-dimensional cases, we propose a portmanteau-type test statistic which is the sum of squared singular values of the first lagged sample autocovariance matrices. It, therefore, encapsulates all the serial correlations (upto the time lag ) within and across all component series. Using the tools from random matrix theory and assuming both and diverge to infinity, we derive the asymptotic normality of the test statistic under both the null and a specific VMA(1) alternative hypothesis. As the actual implementation of the test requires the knowledge of three characteristic constants of the population cross-sectional covariance matrix and the value of the fourth moment of the standardized innovations, non trivial estimations are proposed for these parameters and their integration leads to a practically usable test. Extensive simulation confirms the excellent finite-sample performance of the new test with accurate size and satisfactory power for a large range of finite combinations, therefore ensuring wide applicability in practice. In particular, the new tests are consistently superior to the traditional Hosking and Li-McLeod tests.

[
\kwd
\startlocaldefs\endlocaldefs\setattribute

journalname

\runtitle

On testing a high-dimensional white noise

{aug}111Li’s research was supported by was supported by NIDA, NIH grants P50 DA039838, a NSF grant DMS 1512422 and National Nature Science Foundation of China (NNSFC), 11690015.

and and and

class=AMS] \kwd[Primary ]62M10, 62H15 \kwd[; secondary ]15A52

large autocovariance matrix \kwdHosking test \kwdLi-McLeod test \kwdhigh-dimensional time series \kwdrandom matrix theory

## 1 Introduction

Testing for white noise is an important problem in statistics. It is indispensable in diagnostic checking for linear regression and linear time series modeling in particular. The surge of recent interests in modeling high-dimensional time series adds a further challenge: diagnostic checking demands the testing for high-dimensional white noise in the sense that the dimension of time series is comparable to or even greater than the sample size (i.e., the observed length of the time series). One prominent example showing the need for diagnostic checking in high-dimensional time series concerns the vector autoregressive model, which has a large literature. When the dimension is large, most existing works regularize the fitted models by Lasso (Hsu et al., 2008; Haufe et al., 2009; Shojaie and Michailidis, 2010; Basu and Michailidis, 2015), Dantzig penalization (Han and Liu, 2015), banded autocovariances (Bickel and Gel, 2011), or banded auto-coefficient matrices (Guo et al., 2016). However, none of them have developed any residual-based diagnostic tools. Another popular approach is to represent high-dimensional time series by lower-dimensional factors. See for example, Stock and Watson (1989, 1998, 1999), Forni et al. (2000, 2005), Bai and Ng (2002), Lam and Yao (2012) and Chang et al. (2015). Again, there is a pertinent need to develop appropriate tools for checking the validity of the fitted factor models through careful examination of the residuals.

There are several well-established white noise tests for univariate time series (Li, 2004). Some of them have been extended for testing vector time series (Hosking, 1980; Li et al., 1981; Lütkepohl, 2005). However, these methods are designed for the cases where the dimension of the time series is small or relatively small compared to the sample size. For the purpose of model diagnostic checking, the so-called omnibus tests are often adopted which are designed to detect any forms of departure from white noise. The celebrated Box-Pierce portmanteau test and its variations are the most popular omnibus tests. The fact that the Box-Pierce test and its variations are asymptotically distribution-free and -distributed under the null hypothesis makes them particularly easy to use in practice. However, it is well known in the literature that the slow convergence to their asymptotic null distributions is particularly pronounced in multivariate cases. On the other hand, testing for high-dimensional time series is still in an infancy stage. To our best knowledge, the only available methods are Chang, Yao and Zhou (2017) and Tsay (2017).

To appreciate the challenge in testing for a high-dimensional white noise, we refer to an example reported in Section 3.1 below where say, we have to check the residuals from a fitted multivariate volatility for a portfolio containing stocks using their daily returns over a period of one semester. The length of the returns time series is then approximately . Table 1 shows that the two variants of the multivariate portmanteau test, namely the Hosking and Li-McLeod tests, all have actual sizes around 0.1%, instead of the nominal level of 5%. These omnibus tests are thus extremely conservative and they will not be able to detect an eventual misfitting of the volatility model.

The above example illustrates the following fact, which is now better understood: many popular tools in multivariate statistics are severely challenged by the emergence of high-dimensional data, and they need to be re-examined or corrected. Recent advances in high-dimensional statistics demonstrate that feasible and quality solutions to these high-dimensional challenges can be obtained by exploiting tools of random matrix theory via a precise spectral analysis of large sample covariance or sample autocovariance matrices. For a review on such progress, we refer to Johnstone (2007), Paul and Aue (2014) and monograph Yao et al. (2015). In particular, asymptotic results found in this context using random matrix theory exhibit fast convergence rates, and hence provide satisfactory finite sample approximation for data analysis.

This paper proposes a new method for testing high-dimensional white noise. The test statistic encapsulates the serial correlations within and across all component series. Precisely, the statistic is the sum of the squared singular values of several lagged sample autocovariance matrices. Using random matrix theory, asymptotic normality for the test statistics under the null is established under the Marčenko-Pastur asymptotic regime where and are large and comparable. Next, original methods are proposed for estimation of a few parameters in the limiting distribution in order to get a fully implementable version of the test. The asymptotic power of the test under a specific alternative of first-order vector moving average process (VMA(1)) has also been derived. Extensive simulation demonstrates excellent behavior of the proposed tests for a wide array of combinations of , with accurate size and satisfactory power. In this paper, we also explore the reasons why the popular multivariate Hosking and Li-McLeod tests are no longer reliable when the dimension is large in relation to the sample size.

The rest of the paper is organized as follows. Section 2 presents the main contributions of the paper. A new high-dimensional test for white noise is introduced, its asymptotic distributions under both the null and the VMA(1) alternative hypothesis are established. Section 3 reports extensive Monte-Carlo experiments which assess the finite sample behavior of the tests. Whenever possible, comparison is made with the popular Hosking and Li-McLeod tests. Numerical evidence also indicates that the new test is more powerful than that of Chang, Yao and Zhou (2017). Section 4 collects all the technical proofs.

## 2 A test for high-dimensional white noise

Let be observations from a complex-valued linear process of the form

 xt=∑l≥0Alzt−l,

where are coefficient matrices, is a sequence of -dimensional random vectors such that, if the coordinates of are , then the two dimensional array of variables are i.i.d. satisfying the moment conditions and . Hence , and depends on only. Note that is the population covariance matrix of the time series. The goal is to test the null hypothesis

 H0: xt=A0zt (2.1)

where is unknown. This in fact tests the independence instead of linear independence (i.e. for all ), which is however a common practice in the literature of white noise tests. Throughout the paper, the complex adjoint of a matrix (or vector) is denoted by . For , let be the lag sample autocovariance matrix

 ˆΣτ=1TT∑t=1xtx∗t−τ.

where by convention when . Under the null hypothesis, for , and a natural test statistic is the sum of squared singular values of the first lagged sample autocovariance matrices:

 Gq=q∑τ=1Tr(ˆΣ∗τˆΣτ)=q∑τ=1∑jα2τ,j,

where are the singular values of , and denotes the trace operation for square matrices. We reject the null hypothesis for large values of .

Notice that the setting here allows for complex-valued observations: this is important for applications in areas such as signal processing where signal time series are usually complex-valued. However, for the sake of presentation, we mostly focus on the real-valued case in the subsequent sections. Directions on how the tests can be extended to accommodate complex-valued observations will be given in the last Section 2.4.

### 2.1 High dimensional asymptotics

We adopt the so-called Marčenko-Pastur regime for asymptotic analysis, i.e. we assume when . This asymptotic framework has been widely employed in the literature on high-dimensional statistics, see, Johnstone (2007), Paul and Aue (2014), also monograph Yao et al. (2015) and the references within. Most of the results in this area concern sample covariance matrices only. However our test statistic is based on the sample autocovariance matrices, which is much less studied; see Liu et al. (2015) and Bhattacharjee and Bose (2016).

As a main contribution of the paper, we characterize the asymptotic distribution of in this high-dimensional setting when the observations are real-valued. We introduce the following limits whenever they exist: for ,

 sℓ=limp→∞1pTr(Σℓ0),sd,ℓ=limp→∞1pTr(Dℓ(Σ0)), (2.2)

where denotes the diagonal matrix consisting of the main diagonal elements of (here the in the index is a reminder of this diagonal structure).

###### Theorem 2.1.

Let be a fixed integer, and the following assertions hold.

1. is a sequence of real-valued independent random vectors with independent components satisfying and ;

2. is a sequence of semi-positive definite matrices with bounded spectral norm such that the limits and exist;

3. (Marčenko-Pastur regime). The dimension and the sample size grow to infinity in a related way such that .

Then when , the limiting distribution of the test statistic is

 Gq−qTc2ps21  d→  N(0,σ2(c)), (2.3)

where

 σ2(c)=2qc2s22+4q2c3(ν4−3)s21sd,2+8q2c3s21s2. (2.4)

The proof of this theorem is given in Section 4.

Let be the upper- quantile of the standard normal distribution at level . Based on Theorem 2.1, we obtain a procedure for testing the null hypothesis in (2.1) as follows.

 \em Reject ~{} H0 ~{}~{}if ~{}~{}{Gq−qTc2ps21>Zασ(c)}. (2.5)

The illustration in Section 3 indicates that the test above is much more powerful than some classical alternatives, especially when the dimension is growing linearly with the sample size . The power of this test is gained from gathering together the serial correlations from the first lags within and across all component series; see the definition of . Also note that the asymptotic mean of is , which grows linearly with (and ), while its asymptotic variance is a constant. This implies that even for moderately large , departure from white noise in the first lags of the autocovariance matrices is likely to result in a large and different mean, which will be a large multiple standard deviation away from since the standard deviation is constant.

However the test in (2.5) is not yet practically usable as it depends on (i) three characteristic constants, , and of the (population) cross-sectional covariance matrix and (ii) the fourth moment of the innovations . These issues are addressed below.

### 2.2 Estimation of the covariance characteristics s1 and s2

If the cross-sectional covariance matrix is known, consistent estimates of these characteristics are readily calculated from . By Slutsky’s Theorem, these estimates can substitute for the true ones in the asymptotic variance and the centering term . The test (2.5) still applies.

However, the population covariance matrix is in general unknown and the situation becomes challenging as estimating a general is somehow out of reach without specific assumptions on its structure. Luckily, as observed previously, we only need consistent estimates of the three characteristics. First of all, in the setting of Theorem 2.1 and under the null, it is not difficult to find consistent estimators for these characteristics, thus a consistent estimator of the limiting variance . The situation is much more intricate for the centering term . Suppose is a consistent estimator of . Plugging it into the centering term leads to

 Gq,1:=Gq−qTc2p^s21={Gq−qTc2ps21}+qTc2p{s21−^s21} . (2.6)

Because of the multiplication by here, the asymptotic distribution would remain the same only if the estimation error is of order . This is however not the case and in general the error is exactly of the order and converges to some other normal distribution.

Our method is as follows. First we establish the joint asymptotic distribution of and for a natural estimator . This result extends Theorem 2.1 which addresses the statistic only. Next, the asymptotic null distribution of the “feasible” test statistic is readily obtained as a simple consequence.

Precisely, consider the sample covariance matrix and define the natural estimators of and as

 ^s1=1pTr(ˆΣ0),^s2=1pTr(ˆΣ20).
###### Theorem 2.2.

Assume the same conditions as in Theorem 2.1, then when , we have

 ⎛⎝p(^s21−s21)Gq−qTc2ps21⎞⎠d→
 N2((00), (4c(ν4−3)s21sd,2+8cs21s24qc2(ν4−3)s21sd,2+8qc2s21s24qc2(ν4−3)s21sd,2+8qc2s21s2σ2(c))),

where the variance is given in (2.4).

The proof of this theorem is relegated to Section 4.

Applying Theorem 2.2 to the decomposition (2.6), the following proposition establishes the asymptotic null distribution of the feasible statistic . Second order terms of the mean and variance of are also provided to improve finite sample performance.

###### Proposition 2.1.

Assume the same conditions as in Theorem 2.2 and the observations are real-valued, we have

 Gq,1=Gq−qTc2p^s21d→N(0,ξ2(c)), (2.7)

where . Meanwhile,

 E(Gq,1)=−qT2(2Tr(Σ20)+(ν4−3)Tr(D2(Σ0))), E(^s1)=1pTr(Σ0), Var(Gq,1)=2qT2Tr2(Σ20)+qT3(2Tr(Σ20)+(ν4−3)Tr(D2(Σ0)))2+o(1T), E(^s2)=1pTr(Σ20)+1pTTr2(Σ0)+1pT(Tr(Σ20)+(ν4−3)Tr(D2(Σ0))).

Now we aim at consistent estimates for the unknown quantity in the asymptotic variance . It is well known that almost surely (Bai et al. (2010)),

 ^s1→s1, ^s2→s2+cs21.

Therefore is a strongly consistent estimator of .

In summary, when is unknown, we obtain a procedure for testing the null hypothesis of white noise (2.1) as follows:

 \em Reject H0 if ~{}~{}{Gq−qTc2p^s21>Zα~ξ} (2.8)

where

### 2.3 Finite sample correction and estimation for non-Gaussian innovations

Although the test procedure (2.8) is already practically usable, it can be further improved by finite sample corrections provided in Proposition 2.1 which are especially useful for non-Gaussian population where . To this goal, it remains to obtain a consistent estimate for (i) the covariance characteristic

 sd,2=1pp∑i=1d2i=1pTr(D2(Σ0)),

where is the th diagonal element of , and (ii) the fourth moment of the innovations.

(i) Estimation of .  By its very definition, can be consistently estimated by its sample counterpart

 ~di=1TT∑t=1x2it.

It follows that a consistent estimator for is simply .

(ii) Estimation of .  This is again a non trivial problem which has not been touched yet in the literature (to our best knowledge). In order to get rid of the role of the unknown cross-sectional covariance matrix , we adopt the following splitting strategy: the original data are split into two halves of length and , respectively (). Define the two corresponding sample cross-sectional covariance matrices

 Sn,1=1T1T1∑t=1xtx∗t,Sn,2=1T2T2∑t=1xt+T1x∗t+T1. (2.9)

This yields the corresponding -ratio, or Fisher matrix, . Observe that this matrix does not depend on the value of the cross-sectional covariance so that in what follows we can assume .

Let be the eigenvalues of . Define test functions where are some positive constants. For each , we have an eigenvalue statistic of the Fisher matrix

 XT,k=fk(λ1)+⋯+fk(λp)−p∫fk(x)dFcp,1,cp,2(x) ,

where () and is the limiting Wachter distribution with index , see formula (3.1) in Zheng (2012). It is proved on page 452 of the reference, when grow proportionally to infinity,

 XT,k=uT,k+vT,kν4+εT,k , (2.10)

where are constants depending on and , and is a centered and asymptotically Gaussian error. Then the least squares estimator of using the above regression model leads to a consistent estimate, say for the unknown parameter.

Under the null hypothesis, the observations are independent. We may repeat this estimation procedure, say times, by taking random splits of the initial sample. The final estimate of is then taken to be the average of the estimates .

Finally we can implement the following test procedure with finite sample correction for the null hypothesis of white noise (2.1):

 \em Reject H0 if ~{}~{}{G∗q,1=Gq−qTc2p^s21+1T⋅qcp(2~s2+(^ν4−3)~sd,2)>Zα^ξ} (2.11)

where

 ^ξ2=2qc2p~s22+1T⋅qc2p(2~s2+(^ν4−3)~sd,2)2

with the above estimator for the fourth moment.

### 2.4 Tests when the observations are complex-valued

To proceed, we first define where is a proper complex random vector, and is such that is Hermitian with (Properness of a complex random vector means that ). We immediately have

 0=E(ztzTt)=E(z2it)Ip,

so that for all and . It also implies that . Since , we have

 E(xtxTt)=E(Σ1/20ztzTtΣT/20)=0,

so that we are also assuming an observed vector is proper.

From Corollary 4.1, since from the properness of , the asymptotic covariance of is then

 Var(Gq)→qc2s22+4q2c3s21[(ν4−2)sd,2−s′2+2sr,2],

where , , with , the matrix of the real parts of all entries in .

Using Lemma A.1, defining to be the matrix of the imaginary parts of all entries in , we have

 2Tr(R2(Σ0))−Tr(Σ0ΣT0) =2Tr(Σ0R(Σ0))−Tr(Σ0(R(Σ0)−iI(Σ0))) =Tr(Σ0(R(Σ0)+iI(Σ0)))=Tr(Σ20),

so that . The asymptotic variance for is then

 Var(Gq)→σ2(c)=qc2s22+4q2c3s21[(ν4−2)sd,2+s2],

which can be estimated consistently using the estimators suggested in Section 2.2.

### 2.5 Testing power of Gq,1

In this section, we look into the power function of the tests when an alternative hypothesis is specified. Here we assume that under , the observations follows from a real-valued -dimensional first-order vector moving average process, VMA(1) in short, of the form

 H1: xt=A0zt+A1zt−1, (2.12)

where , are coefficient matrices. Now we only consider the asymptotic behavior of our test statistic and when since higher order autocorrelations of are null under both and .

Denote

 ˜Σ0=A∗0A0,˜Σ1=A∗1A1,˜Σ01=A∗0A1,

we characterize the joint limiting distribution of and under the VMA(1) alternative (2.12) as follows.

###### Theorem 2.3.

Assume that

1. is a sequence of real-valued independent random vectors with independent components satisfying and ;

2. , and all have bounded spectral norm and for integers , , the limits exist;

3. (Marčenko-Pastur regime). The dimension and the sample size grow to infinity in a related way such that .

Then under the VMA(1) alternative (2.12), the joint limiting distribution of the and is

 (σ2GσGSσGSσ2S)−1/2(G1−μGTc2p^s21−μS)d→N2(0, I2),

where

 μG =1TTr2(˜Σ0+˜Σ1)+Tr(˜Σ0˜Σ1)+2TTr2(˜Σ01) +1T[Tr(˜Σ0˜Σ1)+(ν4−3)Tr(D(˜Σ0)D(˜Σ1))], μS =1TTr2(˜Σ0+˜Σ1)+4T2Tr(˜Σ01˜Σ∗01) +1T2[2Tr(˜Σ0+˜Σ1)2+(ν4−3)Tr(D2(˜Σ0+˜Σ1))],
 σ2S= 4T3Tr2(˜Σ0+˜Σ1)[2Tr(˜Σ0+˜Σ1)2+(ν4−3)Tr(D2(˜Σ0+˜Σ1))] +16T3Tr2(˜Σ0+˜Σ1)Tr(˜Σ01˜Σ∗01)+Rn,

and

 σ2G= 4T3Tr2(˜Σ0+˜Σ1)[2Tr(˜Σ0+˜Σ1)2+(ν4−3)Tr(D2(˜Σ0+˜Σ1))] +8T2Tr(˜Σ0+˜Σ1)[2Tr(˜Σ0˜Σ1(˜Σ0+˜Σ1))+(ν4−3)Tr(D(˜Σ0˜Σ1)D(˜Σ0+˜Σ1))] +8T2Tr(˜Σ01˜Σ∗01)Tr(˜Σ20+˜Σ21)+16T2Tr(˜Σ01˜Σ1)Tr(˜Σ01˜Σ0) +16T2Tr(˜Σ01)[Tr(˜Σ20˜Σ∗01)+Tr(˜Σ21˜Σ01)+2Tr(˜Σ1˜Σ01˜Σ0)] +4TTr(˜Σ∗01˜Σ01˜Σ20+˜Σ01˜Σ∗01˜Σ21+2˜Σ∗01˜Σ1˜Σ01˜Σ0) +16T3Tr2(˜Σ0+˜Σ1)Tr(˜Σ01˜Σ∗01)+16T3Tr2(˜Σ01)Tr(˜Σ0+˜Σ1)2 +32T3Tr(˜Σ0+˜Σ1)Tr(˜Σ01)Tr(˜Σ01(˜Σ0+˜Σ1)) +4TTr(˜Σ01˜Σ∗01˜Σ∗01˜Σ01)+12T2Tr2(˜Σ01˜Σ∗01)+16T2Tr(˜Σ01)Tr(˜Σ01˜Σ∗01˜Σ∗01) +16T3Tr2(˜Σ01)[Tr(˜Σ01)2+2Tr(˜Σ01˜Σ∗01)+(ν4−3)Tr(D2(˜Σ01))]+8T2Tr2(˜Σ1˜Σ01) +16T3Tr(˜Σ01)Tr(˜Σ0+˜Σ1)[2Tr(˜Σ01(˜Σ0+˜Σ1))+(ν4−3)Tr(D(˜Σ01)D(˜Σ0+˜Σ1))] +8T2Tr2(˜Σ0˜Σ01)+16T2Tr(˜Σ01)[2Tr(˜Σ0˜Σ1˜Σ01)+(ν4−3)Tr(D(˜Σ0˜Σ1)D(˜Σ01))]+Rn,
 σGS= 4T3Tr2(˜Σ0+˜Σ1)[2Tr(˜Σ0+˜Σ1)2+(ν4−3)Tr(D2(˜Σ0+˜Σ1))] +4T2Tr(˜Σ0+˜Σ1)[2Tr(˜Σ0˜Σ1(˜Σ0+˜Σ1))+(ν4−3)Tr(D(˜Σ0˜Σ1)D(˜Σ0+˜Σ1))] +8T3Tr(˜Σ01)Tr(˜Σ0+˜Σ1)[2Tr(˜Σ01(˜Σ0+˜Σ1))+(ν4−3)Tr(D(˜Σ01)D(˜Σ0+˜Σ1))] +16T3Tr(˜Σ0+˜Σ1)Tr(˜Σ01)Tr(˜Σ01(˜Σ0+˜Σ1))+Rn.

Here the ’s, possibly different, represent remainders which have smaller orders than the other terms listed in , and , respectively.

The proof of this theorem is relegated to Section 4. Similarly, applying Theorem 2.3 to the decomposition (2.6), the following proposition establishes the asymptotic distribution of our test statistic under the VMA(1) alternative (2.12) when .

###### Proposition 2.2.

Assume the same conditions as in Theorem 2.3, when and the observables are real-valued, we have

 (2.13)

where

 μG1,1= Tr(˜Σ0˜Σ1)+2TTr2(˜Σ01)+1T[Tr(˜Σ0˜Σ1)+(ν4−3)Tr(D(˜Σ0)D(˜Σ1))] −4T2Tr(˜Σ01˜Σ∗01)−1T2[2Tr(˜Σ0+˜Σ1)2+(ν4−3)Tr(D2(˜Σ0+˜Σ1))],
 σ2G1,1 =2T2Tr2(˜Σ20+˜Σ21)+4T[2Tr(˜Σ0˜Σ1)2+(ν4−3)Tr(D2(˜Σ0˜Σ1))] +6T2Tr2(˜Σ0˜Σ1)+8T2Tr(˜Σ01˜Σ∗01)Tr(˜Σ20+˜Σ21)+16T2Tr(˜Σ01˜Σ1)Tr(˜Σ01˜Σ0)