Asymptotic normality of the time-domain generalized least squares estimator for linear regression models

# Asymptotic normality of the time-domain generalized least squares estimator for linear regression models

Hien D. Nguyen Corresponding author email: h.nguyen5@latrobe.edu.au. Department of Mathematics and Statistics, La Trobe University, Bundoora Melbourne 3086, Victoria Australia.
###### Abstract

In linear models, the generalized least squares (GLS) estimator is applicable when the structure of the error dependence is known. When it is unknown, such structure must be approximated and estimated in a manner that may lead to misspecification. The large-sample analysis of incorrectly-specified GLS (IGLS) estimators requires careful asymptotic manipulations. When performing estimation in the frequency domain, the asymptotic normality of the IGLS estimator, under the so-called Grenander assumptions, has been proved for a broad class of error dependence models. Under the same assumptions, asymptotic normality results for the time-domain IGLS estimator are only available for a limited class of error structures. We prove that the time-domain IGLS estimator is asymptotically normal for a general class of dependence models.

Keywords: asymptotic normality; autoregressive models; generalized least squares; misspecification; time-series analysis

## 1 Introduction

Let be a non-stochastic sequence of vectors, such that , where , , and is the matrix transposition operator. Let be a sequence of random errors. We are interested in the sequence that is generated by the relationship

 Yt=x⊤tβ+Ut, (1)

where is a vector of non-stochastic regression coefficients. We can write relationship (1) in the matrix form:

 yT=XTβ+uT,

simultaneously for all , where has rows , , and .

Relationships that are described by (1) are generally referred to as multiple linear regression models and are ubiquitous in the study of engineering, natural science, and social science phenomena (see e.g., Weisberg, 2005). For general treatments of the topic of linear regression modeling, we refer the interested reader to the the manuscripts of Gross (2003), Seber & Lee (2003), and Yan & Su (2009).

In this article, we consider the scenario where the sequence of errors is a finite segment of the stationary sequence , such that

 Ut=∞∑i=−∞θiEt−i,

for each , where is an independent sequence of random variables, such that and . If the sequence of coefficients is known, then one can write the covariance matrix of : and use it to construct, the so-called generalized least squares (GLS) estimator

 ~βT(ΣT)=(X⊤TΣ−1TXT)−1X⊤TΣ−1TyT, (2)

which is known to be the best linear unbiased estimator (BLUE) of (cf. Amemiya, 1985, Sec. 6.1.3). Furthermore, Theorem 1 of Baltagi (2002, Ch. 9) states that under the condition that exists, we also have the fact that is asymptotically normal in the sense that

 T1/2[~βT(ΣT)−β]L⟶N(0,Cu),

where we denote convergence in law by and denotes a normal distribution with mean vector and covariance matrix .

In applications, the coefficients are rarely if ever known. Thus, the DGP of the error sequence is also unknown. In order to proceed to make inference, one generally assumes a hypothetical DGP for that is equivalent to that of some error sequence .

Let be a finite segment of , and let . Furthermore, write the covariance matrix as . Then, by replacing in (2) by , we obtain the so-called incorrectly-specified GLS (IGLS; Koreisha & Fang, 2001) estimator

 ~βT(ΛT)=(X⊤TΛ−1TXT)−1X⊤TΛ−1TyT. (3)

The finite-sample properties of (3) were studied comprehensively in Koreisha & Fang (2001) and Kariya & Kurata (2004). General asymptotic results for the IGLS estimator are more difficult to establish, and thus only a small number of results are available in the literature. For example, Rothenberg (1984) considered asymptotic normality of the IGLS estimator, when has first-order autoregressive (AR) form. Some consistency results regarding the IGLS estimator appear in Samarov (1987) and Koreisha & Fang (2001). To date, the most general set of asymptotic theorems regarding the IGLS estimator are those reported in Amemiya (1973).

Make the so-called Grenander regularity conditions (Grenander, 1954):

• (), where ,

• (),

• exists for each , and

• the matrix is non-singular, where has element in the row and column.

Furthermore, make the following additional assumptions.

• The elements of have the form

 Ut=∞∑i=1αiUt−i+Et,

where is a sequence of independent random variables such that and , and is such that and , where and .

• The sequence is hypothesized to be equivalent to the stationary autoregressive (AR) process , with term of the form

 Vt=N∑i=1κiVt−i+Et,

where is a sequence of independent random variables such that and , and are such that the roots of (with respect to ) are all outside of the unit circle.

Here denotes the imaginary unit. See Amemiya (1985, Ch. 5.2) for more details regarding AR processes.

Let be the diagonal matrix with element , for . Further, let and define to be a Hermitian matrix function with positive semidefinite increments such that . Under [Gren1]–[Gren4], [Amem1], and [Amem2], Amemiya (1973) proved that

 ST[~βT(ΛT)−β]L⟶N(0,Cv),

where

 Cv=limT→∞(Z⊤TΛ−1TZT)−1Z⊤TΛ−1TΣTΛ−1TZT(Z⊤TΛ−1TZT)−1.

Furthermore, Amemiya (1973) showed that if we denote the spectral density functions (SDFs) of the processes and by and , respectively, then we may write in the spectral form

 Cv=2π[∫π−π1fv(ω)dH(ω)]−1∫π−πfu(ω)f2v(ω)dH(ω)[∫π−π1fv(ω)dH(ω)]−1. (4)

It is remarkable that the IGLS is a time-domain estimator that can be proved to have a covariance matrix with simple spectral form.

Since the SDF of can be written as

 fv(ω)=(2π)−1∞∑i=−∞ηie−ιiω, (5)

where and for each , we can write

 (6)

Thus, since the auto-covariance is determined by (5), we may interchange the notation with . We shall refer to as the time-domain IGLS estimator in order to differentiate it from the frequency-domain IGLS estimator that is introduced in the sequel.

Let

 Jxx(ω)=12πT(T∑t=1xteιtω)(T∑t=1x⊤te−ιtω) (7)

and

 JxY(ω)=12πT(T∑t=1xteιtω)(T∑t=1Yte−ιtω) (8)

to be the periodogram of and the cross-spectra periodogram between and , respectively (see, e.g., Brockwell & Davis, 2006, Sec. 11.7). Using (7) and (8), Hannan (1973) proposed to estimate by the frequency-domain IGLS estimator

 ¯β(fv)=[T∑t=1f−1v(2πtT)Jxx(2πtT)]−1T∑t=1f−1v(2πtT)JxY(2πtT). (9)

Let is the that is generated by the sequence and make the following assumptions.

• The random sequence has the form , satisfying

 E(Et|Ft−1)=E(E2t−E(E2t)|Ft−1)=0

almost surely, and .

• The random sequence has distribution , which satisfies

 limδ→∞supt∈Z∫|e|>δe2dFEt(e)=0.
• The SDF is real, positive, continuous, and even over .

Under [Gren1]–[Gren4] and [Hann1]–[Hann3], Hannan (1973) proved that

 ST[¯β(fv)−β]L⟶N(0,Cv), (10)

where has form (4) (see also Robinson & Velasco, 1997). That is, (3) and (9) have the same asymptotic distributions when both [Amem1] and [Amem2], or [Hann1]–[Hann3] are satisfied, in addition to [Gren1]–[Gren4].

It is notable, however, that [Hann1]–[Hann3] are more general assumptions that [Amem1] and [Amem2]. Thus, an obvious question to ask is whether the equivalence in asymptotic distributions between the time-domain estimator (3) and frequency-domain estimator (9) remains when one replaces [Amem1] and [Amem2] by assumptions that are more general and closer in spirit to [Hann1]–[Hann3]. In this article, we shall provide an affirmative answer to this question. Before presenting our main result, we wish to provide a review of the relevant literature.

The asymptotic covariance form (9) was used by Engle (1974b) and Nicholls & Pagan (1977) to explore the efficiencies of the OLS estimator, the BLUE, and the IGLS, under various choices of and , when . Some finite sample properties of the frequency-domain estimator were established in Engle (1974a) and Engle & Gardner (1976).

Under [Gren1]–[Gren4], the IGLS asymptotic covariance (4) was obtained via spectral methods in Kholevo (1969), Rozanov & Kozlov (1969), Kholevo (1971a), and Ibragimov & Rozanov (1978, Sec. 7.4), under general conditions (see Lemma 5 in the Appendix). In the cited papers, the IGLS was studied under the name of pseudo-best estimators. Unfortunately, no asymptotic normality result of the desired kind were established. It is notable, that Kholevo (1971b) obtained an asymptotic normality result for the continuous-time least squares problem that is hypothesized to be transferable to the pseudo-best estimator case. However, no such result was provided, nor a result regarding the discrete time case.

Hybrid time and frequency-domain IGLS estimators have also been considered, as well as extensions upon the frequency-domain estimator theme. Examples of hybrid estimators include Samarov (1987) and Hambaba (1992).

Extensions of the results of Hannan (1973) to account for long-range dependence appear in Robinson & Hidalgo (1997) and Hidalgo & Robinson (2002). A non-linear frequency-domain estimator appears in Hannan (1971). A broad generalization of the frequency-domain estimation approach to semi-parametric and non-parametric modeling is considered by Robinson (1991).

Closely related to our article is the report of Aguero et al. (2010), which establishes the asymptotic equivalence between time and frequency-domain estimators for linear dynamic system identification problems. See Hannan & Deistler (2012) regarding linear dynamic systems.

Using the Cholesky covariance matrix factorization method of Wu & Pourahmadi (2003), Yang (2012) constructed an IGLS estimator that is asymptotically efficient. Furthermore, they obtain an asymptotic normality result, under the [Gren1]–[Gren4], using a proof technique that is adapted from those of Anderson (1971, Thm. 10.2.7) and Fuller (1996, Thm. 9.1.2) (see Lemma 2 in the Appendix). A model averaging method akin to the construction of Yang (2012) was studied in Cheng et al. (2015), and a long memory GLS estimator of the same form was considered by Ing et al. (2016).

Also related to our article is the work of Kapetanios & Psaradakis (2016), which proposed to extend the results of Amemiya (1973) in a different direction. Here, the [Gren1]–[Gren4] are replaced by various stochastic assumptions on the sequences and that make use of mixing and stochastic approximation concepts, and higher moment bound (see Potscher & Prucha 1997, Ch. 6 regarding mixing and approximation concepts). Compared to our work, the work of Kapetanios & Psaradakis (2016) can be seen as a complementary and parallel direction of generalization of the results of Amemiya (1973). Whereas we propose to relax [Amem1] and [Amem2], Kapetanios & Psaradakis (2016) replaces [Gren1]–[Gren4], instead.

The remainder of the manuscript proceeds as follows. In Section 2, we state and prove our main result. Discussions and remarks are provided in Section 3. Here, we provide results regarding the practical case, where is both hypothesized and estimated from the data. Necessary lemmas and technical results are presented in the Appendix.

## 2 Main result

We retain all notation from the introduction. Furthermore for matrices , let

denote the operator norm, and let

 ∥A∥1=maxj∈[n]m∑i=1∣∣aij∣∣ and ∥A∥∞=maxi∈[m]n∑j=1∣∣aij∣∣,

denote the and induced norms, respectively. For vectors , we denote the Euclidean norm of by .

Make the following assumptions.

• The element of the error sequence has form

 Ut=∞∑i=−∞θiEt−i, (11)

and is an independent sequence, where

 E(Et)=0, var(Et)=σ2<∞, ∞∑i=−∞|θi|<∞,% and 0<∣∣ ∣∣∞∑i=−∞θie−ιiω∣∣ ∣∣.
• The random sequence has distribution , which satisfies

 limδ→∞supt∈Z∫|e|>δe2dFEt(e)=0.
• The SDF is real, positive, continuous, and even over .

• The covariance expansion (5) of satisfies

 ∞∑i=1i|ηi|<∞.
###### Lemma 1.

Under [Gren1]–[Gren4] and [Main1]–[Main4], approaches

 Cv =limT→∞(Z⊤TΛ−1TZT)−1Z⊤TΛ−1TΣTΛ−1TZT(Z⊤TΛ−1TZT)−1 (12) =2π[∫π−π1fv(ω)dH(ω)]−1∫π−πfu(ω)f2v(ω)dH(ω)[∫π−π1fv(ω)dH(ω)]−1,

as .

###### Proof.

Following from Amemiya (1973), we write

 ST[~βT(fv)−β]=(ZTΛ−1TZT)−1Z⊤TΛ−1TuT, (13)

and let . Under [Gren1]–[Gren4] and [Main3],

 limT→∞Z⊤TΛ−1TZT=12π∫π−π1fv(ω)dH(ω),

by Lemma 2. By Lemma 3, is invertible and thus, for any , exists. Thus, we have

 STvar[~βT(fv)]ST=(ZTΛ−1TZT)−1Z⊤TΛ−1TΣTΛ−1TZT(ZTΛ−1TZT)−1,

which has the limit, as ,

 (12π∫π−π1fv(ω)d% H(ω))−1[limT→∞Z⊤TΛ−1TΣTΛ−1TZT](12π∫π−π1fv(ω)dH(ω))−1. (14)

Assumption [Main1] implies that is real and positive since is an absolutely summable linear filter of the independent finite variance sequence (cf. Theorems 2.11 and 2.12 of Fan & Yao, 2003). Since is positive and continuous by [Main3] and is real, positive and continuous by [Main1], we can apply Lemma 5 to obtain

 limT→∞STvar[~βT(fv)]ST=Cv. (15)

Upon substitution of (14) into the left-hand side (LHS) of (15) and rearrangement, we obtain

 limT→∞Z⊤TΛ−1TΣTΛ−1TZT=12π∫π−πfu(ω)f2v(ω)dH(ω), (16)

and have thus verified (12). ∎

###### Theorem 1.

Under [Gren1]–[Gren4] and [Main1]–[Main4],

 ST(~βT(fv)−β)L⟶N(0,Cv), (17)

where has the form (12).

###### Proof.

It suffices to show that is asymptotically normal with mean and covariance matrix equal to the LHS of (16). First, write , where

 υ⊤T,N=(N∑i=−NθiE1−i,N∑i=−NθiE2−i,…,N∑i=−NθiET−i−1,N∑i=−NθiET−i),

and is a positive and increasing integer function of , such that . Let , where

is a matrix and

 e⊤T,N=(E1−N,E2−N,…,ET+N−1,ET+N)

is a vector.

To apply Lemma 6, we must show that for each , converges in law to some , as , where is asymptotically normal with mean and covariance matrix (16), as . Then, we must verify that

 limN→∞limsupT→∞ P(∥wT,N−wT∥2≥ε)=0, (18)

for each .

For the purpose of applying the Cramer-Wold device, define , where . Let denote the column of , for . That is,

Therefore,

 α⊤wT,N=α⊤S−1TT+N∑t=1−NνtEt=σ[T+N∑t=1−N(α⊤S−1Tνt)2]1/2T+N∑t=1−NWt, (19)

where

 Wt=α⊤S−1Tνtσ[∑T+Nk=1−N(α⊤S−1Tνk)2]1/2Et.

By [Main1] is bounded, and by [Gren4], is bounded. Further, by [Main3] and [Main4], we have the boundedness of . Thus, we obtain the inequalities

 0

The last fact follows from an application of Lemma 3 and all of the bounds are independent of and .

Observe that is a sequence of independent random variables with expectation and . Let be the distribution function of , for each . Then, for any , we have the bound:

 T+N∑t=1−N∫|w|>δw2dFWt(w) =1σ2T+N∑t=1−N(α⊤S−1Tνt)2∑T+Nk=1−N(α⊤S−1Tνk)2∫|e|>δetT,Ne2dFEt(e) (21)

where

 etT,N=σ[∑T+Nk=1−N(α⊤S−1Tνk)2]1/2∣∣α⊤S−1Tνt∣∣, and e∗T,N=σ[∑T+Nt=1−N(α⊤S−1Tνt)2]1/2supt∈{1−N,…,T+N}∣∣α⊤S−1Tνt∣∣.

By the bound in (20) and [Main2], we must show that

 supt∈{1−N,…,T+N}∣∣α⊤S−1Tνt∣∣→0, (22)

to prove that (21) converges to zero, as . Write the row and column element of and as and , respectively. Upon expansion we can obtain the following inequalities for the LHS of (22):

 supt∈{1−N,…,T+N}∣∣α⊤S−1Tνt∣∣ ≤1σ2supt∣∣ ∣∣d∑i=1αisiTT∑j=1ΥjtT∑k=1λ|k−j|xik∣∣ ∣∣ ≤Cmaxi∈[d]maxk∈[T]|xik|siTsuptT∑j=1∣∣Υjt∣∣∞∑k=0|λk|,

where is some finite constant.

By [Main1], and are bounded, independently of and .Therefore, for any and , . Similarly, by Assumptions [Main3] and [Main4], we apply Corollary 1 to show that , independently of , and thus . Lastly,

 limT→∞maxi∈[d]maxk∈[T]|xik|siT=0,

by [Gren1], [Gren2], and Anderson (1971, Lem. 2.6.1). Thus (22) is proved.

Next, (22) is sufficient to guarantee that (21) approaches zero as approaches infinity. We can apply the Lindeberg-Feller central limit theorem (DasGupta, 2008, Thm. 5.1) to obtain .

Via (19), is asymptotically equal in distribution to (as ), where, for any choice of , is normal with mean zero and variance

 limT→∞α⊤E(wT,Nw⊤T,N)α.

Via the Cramer-Wold device (cf. DasGupta, 2008, Th. 1.16), is asymptotically normal with mean vector and covariance

 E(wNw⊤N) =limT→∞Z⊤TΛ−1TΣT,NΛ−1TZT =12π∫π−πfu,N(ω)f2v(ω)dH(ω), (23)

using Lemma 5, where is the SDF corresponding to . In other words, is normally distributed with zero mean vector and covariance matrix (23).

By [Main1], and , via the power transfer formula (cf. Lemma 4). Furthermore, since and by preservation of uniform convergence under continuous composition (Bartle & Joichi, 1961), converges uniformly to , as approaches infinity (cf. Gray, 2006, Sec. 4.1). Via the Cramer-Wold device, converges in law to (), where has mean vector and covariance matrix

 E(ww⊤) =limT→∞Z⊤TΛ−1TΣNΛ−1TZT =12π∫π−πfu(ω)f2v(ω)dH(ω), (24)

which is equal to (16).

Finally, we must verify (18). We use Chebyshev’s inequality, which states that for any ,

 P(∥wT,N−wT∥2≥ε) ≤E(∥wT,N−wT∥22)ε2, (25)

where we write the numerator of the right-hand side of (25) as

 tr[(Z−1TΛ−1TυT,N−Z−1TΛ−1TuT)(Z−1TΛ−1TυT,N−Z−1TΛ−1TuT)⊤] = tr{Z−1TΛ−1TE[(υT,N−uT)(υT,N−uT)⊤]Λ−1TZ−1T},

which reduces to

 tr{Z−1TΛ−1TE[uTu⊤T]Λ−1TZ−1T}−tr{Z−1TΛ−1TE[υT,Nυ⊤T,N]Λ−1TZ−1T}.

By (16), (23), and (24), we have

 limN→∞limT→∞tr{Z−1TΛ−1TE[uTu⊤T]Λ−1TZ−1T}−tr{Z−1TΛ−1TE[υT,Nυ⊤T,N]Λ−1TZ−1T}=0.

Thus, by (25), condition (18) is verified. This completes the proof. ∎

## 3 Discussions and remarks

### 3.1 Notes regarding the assumptions of Theorem 1

We can directly compare [Main1] to [Hann1]. It is notable that [Hann1] is more general than [Main1] since it allows to be a linear filter over a martingale sequence , satisfying . Further, the condition is necessitated so that Lemma 5 can be applied. It is remarked in Amemiya (1973), however, that [Main1] and [Main2] are more general than [Amem1].

The addition [Main4] is the key that facilitates the proof. This assumption is necessary for bounding and , which is required to prove (17). It must be remarked that [Main4] is a common condition in the literature, and has been made in similar proof methods, such as those of Cheng et al. (2015). The assumption is not restrictive, since a broad class of short-memory processes satisfy [Main4]. For example, any stationary autoregressive moving average (ARMA) process will satisfy [Main4] (cf. Fan & Yao, 2003, Sec. 2.5). The assumption is also commonly used in the analysis of unit root processes (see, e.g., Hamilton, 1994, 17.5).

### 3.2 Feasible generalized least squares

Generally the SDF is unknown and must be estimated from data. Suppose that is an estimator of , which is indexed by the sample size . Denote the FGLS estimator of by . For the FGLS to be of use, we require that the FGLS has the same asymptotic distribution as . To this end, it is sufficient to show that

 (26)

where denotes convergence in probability. Denote the auto-covariance matrix corresponding to the estimator as . Then, we may write .

Under [Gren1]–[Gren4] and [Amem1], Amemiya (1973, Thm. 2) proved that if is the SDP of an AR process of order , that satisfies [Amem2], then (26) holds, when is obtained via the OLS estimator for the AR model coefficients (cf. Amemiya, 1985, Sec. 5.4). The argument from Amemiya (1973, Thm. 2) would hold whenever is obtained via any consistent estimator of the AR model coefficients.

The proof of the theorem also remains the same upon replacing [Amem1] by [Main1] and [Main2] and noting that [Amem2] is implied by [Main3] and [Main4]. Thus, under the hypothesis of Theorem 1, if is hypothesized to be the SDF of a stationary AR process of order (i.e., satisfying [Amem2]), then (26) holds, where is obtained via any consistent estimator of the AR coefficients.

It is notable that proving that (26) holds, under [Amem2] is permissive due to the fact that the inverse auto-covariance matrix has a banded Toeplitz form (cf. Verbyla, 1985). We conjecture that it is possible to obtain similar results using the same techniques as those from Amemiya (1973), when is any parametric family of SDFs with banded Toeplitz inverse auto-covariance matrices . However, the proof of such a result is beyond the scope of the current paper.

### 3.3 Further comments regarding the frequency-domain IGLS estimator

We note that Hannan (1973) proved a more general result than that which we reported in Section 1. Consider the following conditions.

• The sequence