Asymptotic distribution of least square estimators for linear models with dependent errors : regular designs

Asymptotic distribution of least square estimators for linear models with dependent errors : regular designs

Emmanuel Caron 111Emmanuel Caron, Ecole Centrale Nantes, Laboratoire de Mathématiques Jean Leray UMR 6629, 1 Rue de la Noë, 44300 Nantes. Email: emmanuel.caron@ec-nantes.fr, Sophie Dede 222Sophie Dede, Lycée Stanislas, 22 Rue Notre-Dame-des-Champs, 75006 Paris. Email: dede.sophie@gmail.com
Abstract

In this paper, we consider the usual linear regression model in the case where the error process is assumed strictly stationary. We use a result from Hannan [13], who proved a Central Limit Theorem for the usual least square estimator under general conditions on the design and on the error process. We show that for a large class of designs, the asymptotic covariance matrix is as simple as the i.i.d.333independent and identically distributed. case. We then estimate the covariance matrix using an estimator of the spectral density whose consistency is proved under very mild conditions. As an application, we show how to modify the usual Fisher tests in this dependent context, in such a way that the type- error rate remains asymptotically correct, and we illustrate the performance of this procedure through different sets of simulations.

1 Introduction

We consider the usual fixed-design linear regression model:

where is the fixed design matrix and is a stationary process. This model is commonly used in time series regression.

Our work is based on the paper by Hannan [13], who proved a Central Limit Theorem for the usual least square estimator under general conditions on the design and on the error process. Most of short-range dependent processes satisfy the conditions on the error process, for instance the class of linear processes with summable coefficients and square integrable innovations, a large class of functions of linear processes, and many processes under various mixing conditions (see for instance [9], and also [6] for the optimality of Hannan’s condition).

In this paper, it is shown that for a large class of designs satisfying the conditions of Hannan, the covariance matrix of the limit distribution of the least square estimator is the same as in the i.i.d. case, up to the usual error variance term, which should be replaced by the covariance series of the error process. We shall refer to this very large class of designs as « regular designs » (see Section for the precise definition). It includes many interesting examples, for instance the ANOVA type designs or the designs whose columns are regularly varying (such as the polynomial regression type designs).

For this class of regular designs, any consistent estimator of the covariance series of may be used to obtain a Gaussian limit distribution with explicit covariance matrix for the normalized least square estimator. Doing so, it is then possible to obtain confidence regions and test procedures for the unknown parameter . In this paper, assuming only that Hannan’s condition on is satisfied, we propose a consistent estimator of the spectral density of (as a byproduct, we get an estimator of the covariance series).

Wu and Liu [14] considered the problem of estimating the spectral density for a large class of short-range dependent processes. They proposed a consistent estimator for the spectral density, and gave some conditions under which the centered estimator satisfies a Central Limit Theorem. These results are based on the asymptotic theory of stationary processes developed by Wu [23]. This framework enables to deal with most of the statistical procedures from time series, including the estimation of the spectral density. However the class of processes satisfying the "physical dependence measure" introduced by Wu is included in the class of processes satisfying Hannan’s condition. In this paper, we prove the consistency of an estimator of the spectral density of the error process under Hannan’s condition. Compared to Wu’s precise results on the estimation of the spectral density (Central Limit Theorem, rates of convergence, deviation inequalities), our result is only a consistency result, but it holds under Hannan’s condition, that is for most of short-range dependent processes.

Finally, we use these general results to modify the usual Fischer tests in cases where and the design verify the conditions of Hannan, and we perform simulations with different models. For these simulations, we need to choose how many covariance terms have to be estimated. In this paper this number is chosen by considering only the autocovariance graph of the residuals. Developing a data driven criterion would be more satisfying. This is probably a very difficult question in such a general context; for this reason it is left out of the scope of the present paper.

The paper is organized as follows. In Section , we recall Hannan’s Central Limit Theorem for the least square estimator, and we define the class of « regular designs » (we also give many examples of such designs). In Section , we focus on the estimation of the spectral density of the error process under Hannan’s condition. In Section , some examples of stationary processes satisfying Hannan’s condition are presented. Finally, Section is devoted to the correction of the usual Fischer tests in our dependent context, and some simulations are realized.

2 Hannan’s theorem and regular design

2.1 Notations and definitions

Let us recall the equation of the linear regression model:

(1)

where is a deterministic design matrix and is an error process defined on a probability space (). Let be the column of the matrix , and the real number at the row and the column , where is in and in . The random vectors and belong to and is a vector of unknown parameters.

Let be the usual euclidean norm on , and be the -norm on , defined for all random variable by: . We say that is in if .

The error process is assumed to be strictly stationary with zero mean. Moreover, for all in , is supposed to be in . More precisely, the error process satisfies, for all in :

where is a bijective bimeasurable transformation preserving the probability measure . Note that any strictly stationary process can be represented in this way.

Let () be a non-decreasing filtration built as follows, for all :

where is a sub--algebra of such that . For instance, one can choose the past -algebra before time : , and then . In that case, is -measurable.

As in Hannan, we shall always suppose that is trivial. Moreover is assumed -measurable. These implie that the ’s are all regular random variables in the following sense:

Definition 2.1.1 (Regular random variable).

Let be a random variable in . We say that is regular with respect to the filtration if almost surely and if is -measurable.

This implies that there exists a spectral density for the error process, defined on . The autocovariance function of the process then satisfies:

2.2 Hannan’s Central Limit Theorem

Let be the usual least square estimator for the unknown vector . Hannan [13] has shown a Central Limit Theorem for when the error process is stationary. In this section, the conditions for applying this theorem are recalled.

Let be a family of projection operators, defined for all in and for any in by:

We shall always assume that Hannan’s condition on the error process is satisfied:

(C1)

Note that this condition implies that:

(2)

(see for instance [9]).

Hannan’s condition provides a very general framework for stationary processes. The hypothesis (C1) is a sharp condition to have a Central Limit Theorem for the partial sum sequence (see the paper of Dedecker, Merlevède and Volný [9] for more details). Notice that the condition (2) implies that the error process is short-range dependent. However, Hannan’s condition is satisfied for most short-range dependent stationary processes. In particular, it is less restrictive that the well-known condition of Gordin [11]. Moreover the property of -strong stability introduced by Wu [22] is more restrictive than Hannan’s condition. This property of -strong stability will be recalled in Section , where large classes of examples will be fully described.

Let us now recall Hannan’s assumptions on the design. Let us introduce:

(3)

and let be the diagonal matrix with diagonal term for in .

Following Hannan, we also require that the columns of the design satisfy the following conditions:

(C2)

and:

(C3)

Moreover, we assume that the following limits exist:

(C4)

Notice that there is a misprint in Hannan’s paper (the supremum is missing on condition (C3)). Note that Conditions (C2) and (C3) correspond to the usual Lindeberg’s conditions for linear statistics in the i.i.d. case. In the dependent case, we also need the Condition (C4).

The matrix formed by the coefficients is called :

(4)

where is the spectral measure associated with the matrix . The matrix is supposed to be positive definite:

(C5)

Let then and be the matrices:

(5)
(6)

The Central Limit Theorem for the regression parameter, due to Hannan [13], can be stated as follows:

Theorem 2.1.

Let be a stationary process with zero mean. Assume that is trivial, is -measurable, and that the sequence satisfies Hannan’s condition (C1). Assume that the design satisfies the conditions (C2), (C3), (C4) and (C5). Then:

(7)

Furthermore, there is the convergence of second order moment: 444The transpose of a matrix is denoted by .

(8)

2.3 Regular design

Theorem 2.1 is very general because it includes a very large class of designs. In this paper, we will focus on the case where the design is regular in the following sense:

Definition 2.3.1 (Regular design).

A fixed design is called regular if, for any in , the coefficients do not depend on .

A large class of regular designs is the one for which the columns are regularly varying sequences. Let us recall the definition of regularly varying sequences :

Definition 2.3.2 (Regularly varying sequence [21]).

A sequence is regularly varying if and only if it can be written as:

where and is a slowly varying sequence.

This includes the case of polynomial regression, where the columns are of the form: .

Proposition 2.3.1.

Assume that each column is regularly varying with parameter . If the parameters are all strictly greater than , then Conditions (C2), (C3) and (C4) on the design are satisfied. Moreover, for all and in , the coefficients do not depend on and are equal to . Thereby, the design is regular, and (C5) is satisfied provided for any distinct in .

An other important class of regular designs are the ANOVA type designs. An ANOVA design is represented by a matrix whose column vectors are orthogonal to one another. Each coordinate of the columns are either or , with consecutive sequences of . The number of and of each column tends to infinity as tends to infinity.

Note that a design whose columns are either ANOVA or regularly varying is again a regular design.

2.4 The asymptotic covariance matrix for regular design

For regular design, the asymptotic covariance matrix is easy to compute. Actually, we shall see that it is the same as in the case where the errors are independent up to a multiplicative factor. More precisely, the usual variance term should be replaced by the sum of covariances : .

Since the coefficients are constant, the spectral measure is the product of a Dirac mass at , denoted , with the matrix ; consequently the spectral measure is equal to . Notice that, in the case of regular design, the matrix is equal to .

Thereby the matrix and can be computed explicitly:

(9)
(10)

Thus, using (9) and (10), the covariance matrix can be written as:

The connection between the spectral density and the autocovariance function is known:

and at the point :

Thereby the covariance matrix can be written:

since and .

In conclusion, for regular design the following corollary holds:

Corollary 2.1.

Under the assumptions of Theorem 2.1, if moreover the design is regular, then:

(11)

and we have the convergence of the second order moment:

(12)

One can see that, in the case of regular design, the asymptotic covariance matrix is similar to the one in the case where the random variables () are i.i.d.; the variance term is replaced by the series of covariances. Actually the matrix is the normalised limit of the matrix . It is formed by the coefficients , which are, in this case, the limit of the normalised scalar products between the columns of the design.

Thus, to obtain confidence regions and tests for , an estimator of the covariance matrix is needed. More precisely, it is necessary to estimate the quantity:

(13)

3 Estimation of the series of covariances

Properties of spectral density estimates have been discussed in many classical textbooks on time series; see, for instance, Anderson [1], Brillinger [3], Brockwell and Davis [4], Grenander and Rosenblatt [12], Priestley [17] and Rosenblatt [20] among others. But many of the previous results require restrictive conditions on the underlying processes (linear structure or strong mixing conditions). Wu [14] has developed an asymptotic theory for the spectral density estimate , defined at (14), which extends the applicability of spectral analysis to nonlinear and/or non-strong mixing processes. In particular, he also proved a Central Limit Theorem and deviation inequalities for . However, to show his results, Wu uses a notion of dependence that is more restrictive than Hannan’s.

In this section, we propose an estimator of the spectral density under Hannan’s dependence condition. Here, contrary to the precise results of Wu (Central Limit Theorem, deviation inequalities), we shall only focus on the consistency of the estimator.

Let us first consider a preliminary random function defined as follows, for in :

(14)

where:

(15)

and is the kernel defined by:

The sequence of positive integers is such that tends to infinity and tends to when tends to infinity.

In our context, is not observed. Only the residuals are available:

because only the data and the design are observed. Consequently, we consider the following estimator:

(16)

where:

Theorem 3.1 concludes this section:

Theorem 3.1.

Let be a sequence of positive integers such that as tends to infinity, and:

(17)

Then, under the assumptions of Theorem 2.1:

(18)
Remark 3.1.

If is in , then there exists such that (17) holds.

Remark 3.2.

Let us suppose that the random variable is such that , with . Since for all real , , we have:

Thus if satisfies , then (17) holds. In particular, if the random variable has a fourth order moment, then the condition on is .

Theorem 2.1 implies the following result:

Corollary 3.1.

Under the assumptions of Corollary 2.1, and if , then:

(19)

where is the identity matrix.

4 Examples of stationary processes

In this section, we present some classes of stationary processes satisfying Hannan’s condition.

4.1 Functions of Linear processes

A large class of stationary processes for which one can check Hannan’s condition is the class of smooth functions of linear processes generated by i.i.d. random variables.

Let us take and , where is a probability measure on . Let () be a sequence of i.i.d. random variables with marginal distribution . Let be a sequence of real numbers in , and assume that is defined almost surely. The random variable is square integrable and is regular with respect to the -algebras : . We focus on functions of real-valued linear processes:

Let us define the modulus of continuity of on the interval by:

Let be an independent copy of , and let:

According to Section in the paper of Dedecker, Merlevède, Volný [9], if the following condition holds:

(20)

then Hannan’s condition holds. We have an interesting application if the function is -Hölder on any compact set; if for some , and , then (20) holds as soon as and .

4.2 -strong stability

Let us recall in this section the framework used by Wu. We consider stationary processes of the form:

where , in , are i.i.d. random variables and is a measurable function. Assume that belongs to , and let be distributed as and independent of . Let us define the physical dependence measure in [23], for :

where is a coupled version of with in the latter being replaced by :

The sequence is said to be -strong stable if:

As a consequence of Theorem , of Wu [22], we infer that if is -strong stable, then it satisfies Hannan’s condition with respect to the filtration . Many examples of -strong stable processes are presented in the paper by Wu [22]. We also refer to [23] for other examples.

4.3 Conditions in the style of Gordin

According to Proposition of Dedecker, Merlevède, Volný [9], Hannan’s condition holds if the error process satisfies the two following conditions:

(21)
(22)

These conditions are weaker than the well-known conditions of Gordin [11], under which a martingale + coboundary decomposition holds in . An application is given in the next subsection.

4.4 Weak dependent coefficients

Hannan’s condition holds if the error process is weakly dependent. In this case, the process is -adapted and Condition (22) is always true.

Let us recall the definitions of weak dependence coefficients, introduced by Dedecker and Prieur [10]; for all integer :

and:

If is -dependent and is in with , then by Hölder’s inequality:

where for all , is the set of random variables Z, -measurable such that .

Consequently, if:

(23)

then the condition (21) holds, and Hannan’s condition is satisfied.

Now we look at the -weakly dependent sequence. We denote the generalized inverse function of . If is -mixing and verifies that there exists , such that , then, by Cauchy-Schwarz’s inequality and Rio’s inequality (Theorem [18]), we get:

But:

Hence, if:

(24)

then (21) is true, and Hannan’s condition is satisfied.

Notice that all we have written for -dependent sequences is also true for -mixing processes in the sense of Rosenblatt [20].

5 Tests and Simulations

We consider the linear regression model (1), and we assume that Hannan’s condition (C1) as well as the conditions (C2) to (C5) on the design are satisfied. We also assume that is -measurable and that is trivial. With these conditions, the usual Fischer tests can be modified and adapted to the case where the errors are short-range dependent.

As usual, the null hypothesis means that the parameter belongs to a vector space with dimension strictly smaller than , and we denote by the alternative hypothesis (meaning that is not true, but (1) holds).

In the case of regular design, thanks to Corollary 3.1, the usual Fischer tests to test versus , can be corrected by replacing the estimator of by an estimator of: .

Recall that if the errors are i.i.d. Gaussian random variables, the test statistic is:

(25)

In this expression, the integer is the dimension of the model under the -hypothesis, is the sum of the squares of the residuals for the complete model (1) (equal to ), is the corresponding quantity under , and is the estimator of the variance of (equal to ). Under , the quantity follows a Fischer distribution with parameters .

In the case where the design satisfies Hannan’s conditions, if the random variables are i.i.d. but do not necessarily follow a gaussian distribution, the test statistic is the same as (25) and converges to a -distribution under the -hypothesis:

Now if the error process is stationary, the test statistic must be corrected as follows:

(26)

where is defined in (16). Thanks to Corollary 3.1, it converges to a -distribution:

In practice, we shall only estimate a finite number of , say . For the simulations, we shall use the graph of the empirical autocovariance of the residuals to choose , and instead of (26), we shall consider the statistics:

(27)

with defined in (15).

5.1 Example 1: A non-mixing autoregressive process

The process () is simulated, according to the AR(1) equation:

where is uniformly distributed over , and is a sequence of i.i.d. random variables, independent of , such that . In this example, , and the -algebra is trivial.

The transition kernel of the chain is:

and the uniform distribution on is the unique invariant distribution by . Hence, the chain is strictly stationary. Furthermore, it is not -mixing in the sense of Rosenblatt [2], but it is -dependent. Indeed, one can prove that the coefficient of the chain decreases geometrically [10]:

Consequently Hannan’s conditions are satisfied and the Fischer tests can be corrected as indicated above.

The first model simulated with this error process is the following linear regression model, for all in :

The random variables are multiplied by to increase the variance. The coefficient is chosen equal to . We test the hypothesis : , against the hypothesis : .

The estimated level of the Fischer test will be studied for different choices of and , which is the number of covariance terms considered. Under the hypothesis , the same Fischer test is carried out times. Then we look at the frequency of rejection of the test when we are under , that is to say the estimated level of the test. Let us specify that we want an estimated level close to .

Case and (no correction):

Estimated level

Here, since , we do not estimate any of the covariance terms. The result is that the estimated levels are too large. This means that the test will reject the null hypothesis too often.

The quantities may be chosen by analyzing the graph of the empirical autocovariances, Figure 1, obtained with . For this example, this graph suggests a choice of or .

Figure 1: Empirical autocovariances for the first model of Example 1, n = 600.

Case , :

Estimated level

As suggested by the graph of the empirical autocovariances, the choice gives a better estimated level than .

Case , :

Estimated level

Here, we see that the choice works well also, and seems even slightly better than the choice . If one increases the size of the samples , and the number of estimated covariance terms , we are getting closer to the estimated level %. If and , the estimated level is around .

Case , :

In this example, is not satisfied. We choose equal to , and we perform the same tests as above () to estimate the power of the test.

Estimated power

As one can see, the estimated power is always greater than , as expected. Still as expected, the estimated power increases with the size of the samples. For , the power of the test is around , and for , the power is around . As soon as , the test always rejects the -hypothesis.

The second model considered is the following linear regression model, for all in :

Here, we test the hypothesis : against : or . The coefficient is equal to , and we use the same simulation scheme as above.

Case and (no correction):

Estimated level

As for the first simulation, if the test will reject the null hypothesis too often.

As suggested by the graph of the estimated autocovariances figure 2, the choice should give a better result for the estimated level.

Figure 2: Empirical autocovariances for the second model of Example 1, n = 600.

Case , :

Estimated level

Here, we see that the choice works well. For , the estimated level is around . If and , the estimated level is around .

Case , , :

Now, we study the estimated power of the test. The coefficient is chosen equal to and is zero.

Estimated power

As expected, the estimated power increases with the size of the samples, and it is around as soon as .

The third model that we consider is the following linear regression model, for all in :

We test again the hypothesis : against : or . The coefficient is equal to . The conditions of the simulation are the same as above except for the size of the samples. Indeed, for this model, the size of the samples must be greater than previously to have an estimated level close to % with the correction.

Case and (no correction):

Estimated level

As for the first and second simulation, if the test will reject the null hypothesis too often.

As suggested by the graph of the estimated autocovariances figure 3, the choice should give a better result for the estimated level.

Figure 3: Empirical autocovariances for the third model of Example 1, n = 2000.

Case , :

Estimated level

For and , the estimated level is around . If , it is around %.

Then, we study the estimated power of the test for or non equal to .

Case , , :

Estimated power

As expected, the estimated power increases with the size of the samples, and it is around as soon as .

5.2 Example 2: Intermittent maps

For in , we consider the intermittent map from to , introduced by Liverani, Saussol and Vaienti [15]:

It follows from [15] that there exists a unique absolutely continuous -invariant probability measure , with density .

Let us briefly describe the Markov chain associated with , and its properties. Let first be the Perron-Frobenius operator of with respect to , defined as follows: for any functions , in :

The operator is a transition kernel, and is invariant by . Let now be a stationary Markov chain with invariant measure and transition kernel . It is well-known that on the probability space (), the random vector () is distributed as (). Now it is proved in [8] that there exists two positive constants such that:

Moreover, the chain is not -mixing in the sense of Rosenblatt [19].

In the following simulations, we consider linear regression models, where . But, in our context, the coefficient must belong to . Indeed, if is lower than , then Condition (24) is verified. Consequently, Hannan’s condition is satisfied and we can apply our results. Note that if is greater than , then the chain