Asymptotic normality of the timedomain generalized least squares estimator for linear regression models
Abstract
In linear models, the generalized least squares (GLS) estimator is applicable when the structure of the error dependence is known. When it is unknown, such structure must be approximated and estimated in a manner that may lead to misspecification. The largesample analysis of incorrectlyspecified GLS (IGLS) estimators requires careful asymptotic manipulations. When performing estimation in the frequency domain, the asymptotic normality of the IGLS estimator, under the socalled Grenander assumptions, has been proved for a broad class of error dependence models. Under the same assumptions, asymptotic normality results for the timedomain IGLS estimator are only available for a limited class of error structures. We prove that the timedomain IGLS estimator is asymptotically normal for a general class of dependence models.
Keywords: asymptotic normality; autoregressive models; generalized least squares; misspecification; timeseries analysis
1 Introduction
Let be a nonstochastic sequence of vectors, such that , where , , and is the matrix transposition operator. Let be a sequence of random errors. We are interested in the sequence that is generated by the relationship
(1) 
where is a vector of nonstochastic regression coefficients. We can write relationship (1) in the matrix form:
simultaneously for all , where has rows , , and .
Relationships that are described by (1) are generally referred to as multiple linear regression models and are ubiquitous in the study of engineering, natural science, and social science phenomena (see e.g., Weisberg, 2005). For general treatments of the topic of linear regression modeling, we refer the interested reader to the the manuscripts of Gross (2003), Seber & Lee (2003), and Yan & Su (2009).
In this article, we consider the scenario where the sequence of errors is a finite segment of the stationary sequence , such that
for each , where is an independent sequence of random variables, such that and . If the sequence of coefficients is known, then one can write the covariance matrix of : and use it to construct, the socalled generalized least squares (GLS) estimator
(2) 
which is known to be the best linear unbiased estimator (BLUE) of (cf. Amemiya, 1985, Sec. 6.1.3). Furthermore, Theorem 1 of Baltagi (2002, Ch. 9) states that under the condition that exists, we also have the fact that is asymptotically normal in the sense that
where we denote convergence in law by and denotes a normal distribution with mean vector and covariance matrix .
In applications, the coefficients are rarely if ever known. Thus, the DGP of the error sequence is also unknown. In order to proceed to make inference, one generally assumes a hypothetical DGP for that is equivalent to that of some error sequence .
Let be a finite segment of , and let . Furthermore, write the covariance matrix as . Then, by replacing in (2) by , we obtain the socalled incorrectlyspecified GLS (IGLS; Koreisha & Fang, 2001) estimator
(3) 
The finitesample properties of (3) were studied comprehensively in Koreisha & Fang (2001) and Kariya & Kurata (2004). General asymptotic results for the IGLS estimator are more difficult to establish, and thus only a small number of results are available in the literature. For example, Rothenberg (1984) considered asymptotic normality of the IGLS estimator, when has firstorder autoregressive (AR) form. Some consistency results regarding the IGLS estimator appear in Samarov (1987) and Koreisha & Fang (2001). To date, the most general set of asymptotic theorems regarding the IGLS estimator are those reported in Amemiya (1973).
Make the socalled Grenander regularity conditions (Grenander, 1954):

(), where ,

(),

exists for each , and

the matrix is nonsingular, where has element in the row and column.
Furthermore, make the following additional assumptions.

The elements of have the form
where is a sequence of independent random variables such that and , and is such that and , where and .

The sequence is hypothesized to be equivalent to the stationary autoregressive (AR) process , with term of the form
where is a sequence of independent random variables such that and , and are such that the roots of (with respect to ) are all outside of the unit circle.
Here denotes the imaginary unit. See Amemiya (1985, Ch. 5.2) for more details regarding AR processes.
Let be the diagonal matrix with element , for . Further, let and define to be a Hermitian matrix function with positive semidefinite increments such that . Under [Gren1]–[Gren4], [Amem1], and [Amem2], Amemiya (1973) proved that
where
Furthermore, Amemiya (1973) showed that if we denote the spectral density functions (SDFs) of the processes and by and , respectively, then we may write in the spectral form
(4) 
It is remarkable that the IGLS is a timedomain estimator that can be proved to have a covariance matrix with simple spectral form.
Since the SDF of can be written as
(5) 
where and for each , we can write
(6) 
Thus, since the autocovariance is determined by (5), we may interchange the notation with . We shall refer to as the timedomain IGLS estimator in order to differentiate it from the frequencydomain IGLS estimator that is introduced in the sequel.
Let
(7) 
and
(8) 
to be the periodogram of and the crossspectra periodogram between and , respectively (see, e.g., Brockwell & Davis, 2006, Sec. 11.7). Using (7) and (8), Hannan (1973) proposed to estimate by the frequencydomain IGLS estimator
(9) 
Let is the that is generated by the sequence and make the following assumptions.

The random sequence has the form , satisfying
almost surely, and .

The random sequence has distribution , which satisfies

The SDF is real, positive, continuous, and even over .
Under [Gren1]–[Gren4] and [Hann1]–[Hann3], Hannan (1973) proved that
(10) 
where has form (4) (see also Robinson & Velasco, 1997). That is, (3) and (9) have the same asymptotic distributions when both [Amem1] and [Amem2], or [Hann1]–[Hann3] are satisfied, in addition to [Gren1]–[Gren4].
It is notable, however, that [Hann1]–[Hann3] are more general assumptions that [Amem1] and [Amem2]. Thus, an obvious question to ask is whether the equivalence in asymptotic distributions between the timedomain estimator (3) and frequencydomain estimator (9) remains when one replaces [Amem1] and [Amem2] by assumptions that are more general and closer in spirit to [Hann1]–[Hann3]. In this article, we shall provide an affirmative answer to this question. Before presenting our main result, we wish to provide a review of the relevant literature.
The asymptotic covariance form (9) was used by Engle (1974b) and Nicholls & Pagan (1977) to explore the efficiencies of the OLS estimator, the BLUE, and the IGLS, under various choices of and , when . Some finite sample properties of the frequencydomain estimator were established in Engle (1974a) and Engle & Gardner (1976).
Under [Gren1]–[Gren4], the IGLS asymptotic covariance (4) was obtained via spectral methods in Kholevo (1969), Rozanov & Kozlov (1969), Kholevo (1971a), and Ibragimov & Rozanov (1978, Sec. 7.4), under general conditions (see Lemma 5 in the Appendix). In the cited papers, the IGLS was studied under the name of pseudobest estimators. Unfortunately, no asymptotic normality result of the desired kind were established. It is notable, that Kholevo (1971b) obtained an asymptotic normality result for the continuoustime least squares problem that is hypothesized to be transferable to the pseudobest estimator case. However, no such result was provided, nor a result regarding the discrete time case.
Hybrid time and frequencydomain IGLS estimators have also been considered, as well as extensions upon the frequencydomain estimator theme. Examples of hybrid estimators include Samarov (1987) and Hambaba (1992).
Extensions of the results of Hannan (1973) to account for longrange dependence appear in Robinson & Hidalgo (1997) and Hidalgo & Robinson (2002). A nonlinear frequencydomain estimator appears in Hannan (1971). A broad generalization of the frequencydomain estimation approach to semiparametric and nonparametric modeling is considered by Robinson (1991).
Closely related to our article is the report of Aguero et al. (2010), which establishes the asymptotic equivalence between time and frequencydomain estimators for linear dynamic system identification problems. See Hannan & Deistler (2012) regarding linear dynamic systems.
Using the Cholesky covariance matrix factorization method of Wu & Pourahmadi (2003), Yang (2012) constructed an IGLS estimator that is asymptotically efficient. Furthermore, they obtain an asymptotic normality result, under the [Gren1]–[Gren4], using a proof technique that is adapted from those of Anderson (1971, Thm. 10.2.7) and Fuller (1996, Thm. 9.1.2) (see Lemma 2 in the Appendix). A model averaging method akin to the construction of Yang (2012) was studied in Cheng et al. (2015), and a long memory GLS estimator of the same form was considered by Ing et al. (2016).
Also related to our article is the work of Kapetanios & Psaradakis (2016), which proposed to extend the results of Amemiya (1973) in a different direction. Here, the [Gren1]–[Gren4] are replaced by various stochastic assumptions on the sequences and that make use of mixing and stochastic approximation concepts, and higher moment bound (see Potscher & Prucha 1997, Ch. 6 regarding mixing and approximation concepts). Compared to our work, the work of Kapetanios & Psaradakis (2016) can be seen as a complementary and parallel direction of generalization of the results of Amemiya (1973). Whereas we propose to relax [Amem1] and [Amem2], Kapetanios & Psaradakis (2016) replaces [Gren1]–[Gren4], instead.
The remainder of the manuscript proceeds as follows. In Section 2, we state and prove our main result. Discussions and remarks are provided in Section 3. Here, we provide results regarding the practical case, where is both hypothesized and estimated from the data. Necessary lemmas and technical results are presented in the Appendix.
2 Main result
We retain all notation from the introduction. Furthermore for matrices , let
denote the operator norm, and let
denote the and induced norms, respectively. For vectors , we denote the Euclidean norm of by .
Make the following assumptions.

The element of the error sequence has form
(11) and is an independent sequence, where

The random sequence has distribution , which satisfies

The SDF is real, positive, continuous, and even over .

The covariance expansion (5) of satisfies
Lemma 1.
Under [Gren1]–[Gren4] and [Main1]–[Main4], approaches
(12)  
as .
Proof.
Following from Amemiya (1973), we write
(13) 
and let . Under [Gren1]–[Gren4] and [Main3],
by Lemma 2. By Lemma 3, is invertible and thus, for any , exists. Thus, we have
which has the limit, as ,
(14) 
Assumption [Main1] implies that is real and positive since is an absolutely summable linear filter of the independent finite variance sequence (cf. Theorems 2.11 and 2.12 of Fan & Yao, 2003). Since is positive and continuous by [Main3] and is real, positive and continuous by [Main1], we can apply Lemma 5 to obtain
(15) 
Upon substitution of (14) into the lefthand side (LHS) of (15) and rearrangement, we obtain
(16) 
and have thus verified (12). ∎
Theorem 1.
Proof.
It suffices to show that is asymptotically normal with mean and covariance matrix equal to the LHS of (16). First, write , where
and is a positive and increasing integer function of , such that . Let , where
is a matrix and
is a vector.
To apply Lemma 6, we must show that for each , converges in law to some , as , where is asymptotically normal with mean and covariance matrix (16), as . Then, we must verify that
(18) 
for each .
For the purpose of applying the CramerWold device, define , where . Let denote the column of , for . That is,
Therefore,
(19) 
where
By [Main1] is bounded, and by [Gren4], is bounded. Further, by [Main3] and [Main4], we have the boundedness of . Thus, we obtain the inequalities
(20) 
The last fact follows from an application of Lemma 3 and all of the bounds are independent of and .
Observe that is a sequence of independent random variables with expectation and . Let be the distribution function of , for each . Then, for any , we have the bound:
(21) 
where
By the bound in (20) and [Main2], we must show that
(22) 
to prove that (21) converges to zero, as . Write the row and column element of and as and , respectively. Upon expansion we can obtain the following inequalities for the LHS of (22):
where is some finite constant.
By [Main1], and are bounded, independently of and .Therefore, for any and , . Similarly, by Assumptions [Main3] and [Main4], we apply Corollary 1 to show that , independently of , and thus . Lastly,
Next, (22) is sufficient to guarantee that (21) approaches zero as approaches infinity. We can apply the LindebergFeller central limit theorem (DasGupta, 2008, Thm. 5.1) to obtain .
Via (19), is asymptotically equal in distribution to (as ), where, for any choice of , is normal with mean zero and variance
Via the CramerWold device (cf. DasGupta, 2008, Th. 1.16), is asymptotically normal with mean vector and covariance
(23) 
using Lemma 5, where is the SDF corresponding to . In other words, is normally distributed with zero mean vector and covariance matrix (23).
By [Main1], and , via the power transfer formula (cf. Lemma 4). Furthermore, since and by preservation of uniform convergence under continuous composition (Bartle & Joichi, 1961), converges uniformly to , as approaches infinity (cf. Gray, 2006, Sec. 4.1). Via the CramerWold device, converges in law to (), where has mean vector and covariance matrix
(24) 
which is equal to (16).
3 Discussions and remarks
3.1 Notes regarding the assumptions of Theorem 1
We can directly compare [Main1] to [Hann1]. It is notable that [Hann1] is more general than [Main1] since it allows to be a linear filter over a martingale sequence , satisfying . Further, the condition is necessitated so that Lemma 5 can be applied. It is remarked in Amemiya (1973), however, that [Main1] and [Main2] are more general than [Amem1].
The addition [Main4] is the key that facilitates the proof. This assumption is necessary for bounding and , which is required to prove (17). It must be remarked that [Main4] is a common condition in the literature, and has been made in similar proof methods, such as those of Cheng et al. (2015). The assumption is not restrictive, since a broad class of shortmemory processes satisfy [Main4]. For example, any stationary autoregressive moving average (ARMA) process will satisfy [Main4] (cf. Fan & Yao, 2003, Sec. 2.5). The assumption is also commonly used in the analysis of unit root processes (see, e.g., Hamilton, 1994, 17.5).
3.2 Feasible generalized least squares
Generally the SDF is unknown and must be estimated from data. Suppose that is an estimator of , which is indexed by the sample size . Denote the FGLS estimator of by . For the FGLS to be of use, we require that the FGLS has the same asymptotic distribution as . To this end, it is sufficient to show that
(26) 
where denotes convergence in probability. Denote the autocovariance matrix corresponding to the estimator as . Then, we may write .
Under [Gren1]–[Gren4] and [Amem1], Amemiya (1973, Thm. 2) proved that if is the SDP of an AR process of order , that satisfies [Amem2], then (26) holds, when is obtained via the OLS estimator for the AR model coefficients (cf. Amemiya, 1985, Sec. 5.4). The argument from Amemiya (1973, Thm. 2) would hold whenever is obtained via any consistent estimator of the AR model coefficients.
The proof of the theorem also remains the same upon replacing [Amem1] by [Main1] and [Main2] and noting that [Amem2] is implied by [Main3] and [Main4]. Thus, under the hypothesis of Theorem 1, if is hypothesized to be the SDF of a stationary AR process of order (i.e., satisfying [Amem2]), then (26) holds, where is obtained via any consistent estimator of the AR coefficients.
It is notable that proving that (26) holds, under [Amem2] is permissive due to the fact that the inverse autocovariance matrix has a banded Toeplitz form (cf. Verbyla, 1985). We conjecture that it is possible to obtain similar results using the same techniques as those from Amemiya (1973), when is any parametric family of SDFs with banded Toeplitz inverse autocovariance matrices . However, the proof of such a result is beyond the scope of the current paper.