Robust Nearly-Efficient Estimation of Large Panels with Factor Structures

# Robust Nearly-Efficient Estimation of Large Panels with Factor Structures

Marco Avarucci University of Glasgow Paolo Zaffaroni Imperial College London
###### Abstract

This paper studies estimation of linear panel regression models with heterogeneous coefficients, when both the regressors and the residual contain a possibly common, latent, factor structure. Our theory is (nearly) efficient, because based on the GLS principle, and also robust to the specification of such factor structure, because it does not require any information on the number of factors nor estimation of the factor structure itself. We first show how the unfeasible GLS estimator not only affords an efficiency improvement but, more importantly, provides a bias-adjusted estimator with the conventional limiting distribution, for situations where the OLS is affected by a first-order bias. The technical challenge resolved in the paper is to show how these properties are preserved for a class of feasible GLS estimators in a double-asymptotics setting. Our theory is illustrated by means of Monte Carlo exercises and, then, with an empirical application using individual asset returns and firms’ characteristics data.

\externaldocument

Supplement_31Jan2019 \@definecounterParti \@definecounterPartni

Keywords: GLS estimation; panel; factor structure; robustness; bias-correction.

## 1 Introduction

This paper considers (nearly) efficient estimation of linear panel regression models with heterogeneous coefficients, when both the regressors and the residual contain a common, latent, factor structure. At the same time, our estimation procedure does not require any knowledge of such latent factor structure, not even the maximum possible number of latent factors, let alone the latent factors themselves. This qualifies our procedure as robust. Factor models represent one of the most popular and successful way to capture cross-sectional and temporal dependence, especially when facing a large number of units () and time periods (), although in our context factors and their loadings represent nuisance parameters.

However, the possibility of a common factor structure in both regressors and residuals, which would typically arise when omitting relevant regressors, leads to endogeneity, making estimation by ordinary least squares (OLS) invalid such that all its asymptotic properties are not holding any longer.

We first consider an unfeasible generalized least squares (UGLS) estimator for the regression coefficients, based on the presumption that the covariance matrix of the residuals, evaluated conditional on the latent factors, is known. It turns out that, regardless of the possibility of endogeneity (that is when regressors and residuals are correlated), the UGLS is -consistent and asymptotically normal distributed without requiring any information on the factor structure, such as the number of factors or the factors themselves and their loadings. This contrasts with the asymptotic bias plaguing the OLS estimator, under the same circumstances. In other words, the UGLS does not only represent a more efficient estimator but provides an automatic biased-adjusted estimator with desirable asymptotic properties. This result is due to an important insight, namely the existence of a form of asymptotic orthogonality between the common factors, that affect the residuals, and the inverse of the residuals’ covariance matrix. Most importantly, such asymptotic orthogonality is manifested at a very fast rate, namely the squared norm of the product between the covariance matrix and the factors is .

The challenge arises when considering a feasible version of the UGLS estimator. A natural approach, here followed, is to make use of the panel dimension, considering the sample (across ) covariance matrix of the OLS residuals, which in turn have been obtained by a (time-series) regression with observations. Unlike the OLS and UGLS cases, the asymptotic theory for the GLS requires both and to diverge. Lack of consistency of the OLS estimator for the regression coefficients unavoidably implies that such sample covariance matrix is not consistent, element by element, for the true residuals’ covariance matrix but will converge (element by element) instead to a pseudo-true covariance matrix. The surprising, crucial, result here established is that such pseudo-true covariance matrix is also asymptotically orthogonal to the latent factors, and at the same rate of convergence . Indeed, there is an entire class of matrices, rather than a unique matrix, that is asymptotically orthogonal to the factors. This is the most intriguing aspect of our theory. The technical achievement of this paper is to show that the feasible GLS (henceforth GLS) estimator for the regression coefficients, is -consistent and asymptotically normal, as both diverge to infinity. Again, this holds even when the OLS remains an invalid estimator. At the same time, since the pseudo-true value differs in general from the true covariance matrix, the GLS might not be as efficient as the UGLS. However, evaluation of the GLS iteratively, as explained below, permits to make it close to the UGLS estimator.

In summary, the GLS estimator exhibits four main desirable, compelling, properties. First, it permits to carry out inference on the regression coefficients based on conventional asymptotic distributions. In particular, the GLS estimator of the regression coefficients has a mixed-normal asymptotic distribution, implying the possibility of inference by means of chi-squared criteria. Second, as in classical estimation theory, it delivers (nearly) efficient estimation. Third, the GLS estimator does not require any knowledge of the exact number of latent factors, or even an upper bound of such number. In particular, the number of factors can be either smaller, equal or larger than the number of regressors. Fourth, the GLS is computationally easy to handle since it simply requires to perform () linear regressions, without invoking any nonlinear numerical optimizations. Our approach can be also applied to the dual case of cross-sectional regressions with time-varying coefficients.

This paper belongs to, and extends, two different strands of literature.

First, it has been demonstrated, in various contexts, that efficient estimation techniques not only lead to an improvement of precision but, most importantly, resurrect the required asymptotic properties, in terms of bias, rate of convergence and distribution, in situations where these are not warranted by non-efficient approaches.

In the context of cointegrated systems, Phillips (1991a) and Phillips (1991b) show that use of the efficient, full system, maximum likelihood (ML) goes beyond an efficiency improvement: it solves the well-known issues of specification and inference in cointegrated systems, that plagues unrestricted VAR estimation such as the presence of asymptotic biases and non-standard asymptotic distributions (i.e. the Dickey-Fuller distribution).111Indeed, especially Phillips (1991a) provides a detailed explanation of these properties, namely removing second order bias, dealing with endogeneity, absence of nuisance parameters and, obviously, achievement of full efficiency. Note that ML is asymptotically equivalent to GLS in that set-up. Although our theoretical framework is not one of cointegrated systems, strong analogies emerge with Phillips (1991a) and Phillips (1991b): in both cases, a (local) mixed-normal distribution arises and efficient estimation mitigates the lack of strong exogeneity. Moreover, such deficiency (i.e. lack of exogeneity) is manifested through the form of the residuals’ covariance matrix: non-block diagonality for (triangular) cointegrated systems of Phillips (1991a), Phillips (1991b) and a factor structure such as ours, which also rules out block-diagonality, for our framework. Second, Phillips (1991a) demonstrates how these remarkable properties of efficient estimation are warranted by full system regressions but not by single-equation regressions. Likewise, our method requires the full information arising from the panel, namely one needs both and to diverge.

Robinson & Hidalgo (1997) study estimation of time series regression models, when both the regressors and the residual exhibit long-memory, and in fact spectral singularities can arise at any frequency. Under these circumstances, in particular when the spectral singularities of the regressors and residuals arise at the same frequency with sufficient intensity, the OLS estimator is no longer -consistent and asymptotically normal. However, under the same circumstances, Robinson & Hidalgo (1997) show that a class of weighted least squares estimates, which includes GLS as a special case, has standard asymptotic properties.

An important difference between our approach and Phillips (1991a), Phillips (1991b) and Robinson & Hidalgo (1997) is that their estimation procedure is affected by a second-order bias, that is their estimators are consistent (although with non-standard rate of convergence and asymptotic distribution), whereas in our context a first-order bias arises, leading for instance to inconsistency of the OLS estimator. Therefore, our GLS adjustment appears compelling in our framework.

Second, inference of panel data model with a latent factor structure in the residuals and heteroreneous regression coefficients has been studied, initially, from a purely econometric perspective and, more recently, from an empirical finance angle.

In a linear cross-sectional regression Andrews (2005) shows that, when residuals and regressors share a factor structure, -consistency of the OLS estimator is preserved only with uncorrelated factor loadings.222Although not spelled out, Andrews (2005) can be readily applied to time-varying coefficients. Within a linear time regression, Pesaran (2006) shows that heterogeneous regression coefficients can be -consistently estimated OLS by augmenting the regressors with cross-sectional averages of the dependent variable and individual-specific regressors. Ando & Bai (2015) consider a panel model with (sparse) heterogenous coefficients and establish the asymptotics of a penalized OLS estimator. Maintaining the assumption of a common latent factor structure in the residuals of a panel data model with heterogeneous coefficients, Ergemen & Velasco (2017) allow for the possibility that the idiosyncratic innovation is non-stationary, in particular exhibiting long memory.

Motivated by empirical asset pricing, new methods to conduct robust inference on panel data models with a latent factor structure have been recently developed. Giglio & Xiu (2018) derive the asymptotics for a procedure to estimate the risk-premium of an observed factor, robust to the omission of the set of relevant (i.e. priced) factors. Like us, they adopt a double-asymptotics approach. However, Giglio & Xiu (2018) differ from us because they focus on estimation of the parameters of the second-pass regression, that is when the asset-pricing restriction is imposed, whereas we ignore any asset-pricing content (i.e., from the point of view of the two-pass methodology, we focus on the parameters of the first-pass regression). Moreover, their procedure relies on estimating the complete space spanned by the latent factors driving the model whereas our method can avoid this aspect altogether. Gagliardini, Ossola and Scaillet (2018) study the properties of a diagnostic criterion to detect an approximate factor structure in the residuals of large, unbalanced, panel data models. Like us, they consider a double-asymptotic setting and ignore any asset-pricing restrictions on the parameters of the panel data model. Moreover, Gagliardini et al. (2018) method is robust, in the sense that it does not need to explicitly estimate the latent factor structure embedded in the residuals, just like us. However their focus is specifically to check whether the unobserved residuals have a factor structure whereas our method focuses on estimation of the regression coefficients to the observed, possibly heterogenous, regressors.

Unlike the previous papers, the large majority of contributions to this literature focused on the case of constant regression coefficients. Pesaran (2006) shows that a faster rate of convergence is achieved with constant regression coefficients. Bai (2009) considers joint estimation of the constant regression coefficients and of the residuals’ factor structure components through an iterative OLS procedure. The same estimator has been studied by Moon & Weidner (2017) under weaker conditions on the observed regressors. Moon & Weidner (2015) show that Bai (2009) and Moon & Weidner (2017) results hold, with no loss of efficiency, when the exact number of latent factors is unknown and only an upper bound is specified. Bai & Liao (2017) show that GLS estimation leads to an efficiency improvement over the Bai (2009) and Moon & Weidner (2017) OLS-type estimator. Greenaway-McGrevy et al. (2012) establish -asymptotics for the OLS estimator by augmenting the regressors with the principal component estimator of the common factors extracted from the observable data.333Several generalizations of the aforementioned results have been considered. Pesaran & Tosetti (2011) and Chudik & Pesaran (2015) confirm the same asymptotic results of Pesaran (2006) when spatial-dependence in the idiosyncratic component of the innovation’s factor structure as well as dynamic panel, respectively, are allowed for. Karabiyik et al. (2015) show that Pesaran’s estimator retains its asymptotic properties under weaker conditions, allowing for either correlated loadings or for the number of latent factor to be larger than the number of observables, whereas Westerlund & Urbain (2015) discuss some limitations. Song (2013) extends Bai (2009) to the case of non-constant regression coefficients establishing -asymptotics when . Dynamic panel are permitted. Other contributions to this literature include Holtz-Eakin, Newey & Rosen (1988), Ahn, Hoon Lee & Schmidt (2001), Bai & Ng (2004), Phillips & Sul (2003), Moon & Perron (2003) and Phillips & Sul (2007).

None of these papers address the issue of efficient estimation, except for Bai & Liao (2017)444Bai & Liao (2017) focus on the homogeneous parameter case, unlike us, and considers joint estimation of the latent factors and parameters, generalizing Bai (2009) and Moon & Weidner (2017). More importantly, the motivation of Bai & Liao (2017) differs drastically from ours because they focus on the GLS approach for an efficiency improvement of an estimator that already exhibits the conventional asymptotic properties under Bai & Liao (2017) assumptions, in particular iid-ness across time. In our case, our GLS approach mitigates the first-order bias affecting the OLS estimator, where we allow for both serial and cross-sectional correlation as well as heteroskedasticity of the residuals. , but rather focus on various, ingenious, ways to mitigate the bias induced by the correlation between regressors and innovations. In contrast, our GLS approach allows to tackle both issues, at the same time, without requiring any knowledge of the factor structures affecting the regressors and innovations.555In particular, given that we can afford to be completely agnostic about the need to conduct inference on the latent factor structure affecting the model, our work differ, both in terms of focus and in terms of the techniques developed, from the multitude of papers developing inference methods on latent factor structures (on estimating the number of latent factors see Bai & Ng (2002), Hallin & Liska (2007), Amengual & Watson (2007), Onatski (2009), Onatski (2010), Ahn & Horenstein (2013) and on estimating latent factor structures see Forni et al. (2000), Stock & Watson (2002), Bai & Ng (2002), Bai (2009) among others. Our asymptotic distribution theory requires whereas the milder ensures consistency. This relative speed spells out a neat dichotomy in terms of the role of and : the faster rate of divergence for is asked for to estimate accurately the (inverse of the) sample-covariance matrix required by the GLS formula, which in turn mitigates the asymptotic bias. Instead, the slower divergence of controls the asymptotic variance of the GLS estimator, dictating ultimately the estimator’s rate of convergence. Noticeably, the relative speed requested by our estimator differs from the relative speed requested by the alternative procedures described above, suggesting that our result can also be viewed as complementary to the others, for example more suitable to short panels where is much larger than .666For example, Pesaran (2006) requires for asymptotic normality but the weaker condition is required for homogeneous regression coefficients, where the faster -rate of convergence is achieved. Moreover, one needs the number of heterogeneous regressors to be greater than number of latent factors. Bai (2009) shows that the regression coefficients’ estimator is also -consistent, when for some constant . Bai (2009) estimator is asymptotically biased, in general, but an asymptotically valid bias-correction is established under slightly stronger conditions. Moon & Weidner (2017) establish -asymptotics, again when and diverge at the same speed (i.e. ).

This paper proceeds as follows. Section 2 illustrates the general model and the assumptions required for estimation of regressions with unit-specific parameters. The asymptotic results for the OLS, UGLS and GLS estimators are presented in Section 3. Section 4 describes estimation and inference of the coefficients to common regressors. The technical contributions of the paper are discussed and highlighted in Section 5 Section 6 discusses various issues related to the GLS estimator. In particular, we first explore the case when the regressors and the residuals do not depend on the same set of factors. Second, we discuss the conditions under which the feasible GLS will still work in the context of dynamic panels. Third, despite the inefficiency of the feasible GLS, we explain how substantial efficiency gains can be achieved by a multi-step version of the GLS estimator. Fourth, we describe how consistent estimation of the asymptotic covariance matrix can be obtained. Fifth, we describe how to implement our estimator to cross-sectional regressions with time-varying coefficients. Our theoretical results are corroborated by a set of Monte Carlo experiments described in Section 7. An empirical application, which investigates whether firms’ characteristics are relevant to individual stock returns, is presented in Section 8. Section 9 concludes. The proofs of our theorems are reported in Appendix B, relying on three technical results, enunciated in Appendix A. Appendix C defines some quantities of interest for the construction of the GLS estimator, in particular regarding the (inverse of the) covariance matrix of the residuals. The Supplement contains appendices D-J with the proofs of additional material, that serve out main results.

Hereafter we use the following notation. Let denotes a generic real matrix with entries ; in short , or simply when the matrix’s dimension is clear. Similarly, denotes a generic column vectors of length with element ; in short . The transpose of a is denoted by . If , and denote the minimum and the maximum eigenvalue of , respectively. With we mean that is positive definite (positive semi positive). Let denotes the spectral norm of , and , where denotes the trace, is the Frobenius norm. When we define the column and row norm of as and , respectively. Furthermore, for , we use , where denotes the Moore-Penrose generalized inverse of and , where is the identity matrix of dimension . If has full column rank, denotes the matrix satisfying , where is a matrix of zeros, and . We use , to denote convergence in probability and convergence in distribution, respectively, and denote a random vector normally distributed with mean and covariance matrix equal to , respectively. For and being three random matrices, then denotes the probability limit (when finite) of as . For the random matrices that are functions of we write if when . denotes the sigma-algebra generated by the random matrix , and indicate the probability of an event and the expectation of a random variable, respectively. In the sequel, denotes a generic, positive constant, which need not to be the same every time we use it.

## 2 Model: definitions and assumptions

Assume that the observed variables obey a linear regression model with common observed regressors and heterogeneous regressors . Following the convenient specification put forward by Pesaran (2006), the model for the th unit can be expressed, in matrix form, as

 yi=Dαi+Xiβi+ui, (1)

for an observed vector , an observed matrix of common regressors, an observed matrix of unit-specific regressors, and an unobserved vector . In turn, the innovation vector satisfies the factor structure:

 ui=Fbi+εi,with% Ξi:=Eεiε′i, (2)

for an unobserved vector of factor loadings , an unobserved matrix of common factors and an unobserved vector of idiosyncratic innovations . The unit specific regressors satisfy:

 Xi=DΔi+FΓi+Vi, (3)

for an unobserved matrix of factor loadings with , an unobserved matrix of factor loadings with , and an unobserved matrix of idiosyncratic innovations with . The maintained assumption here is that , and do not vary with and . Moreover, we do not need to impose any relationship between them so that, in particular, can be either smaller, equal or bigger than . Although model (1) is written as a single regression across time for a given , we assume that in fact a panel of observations is available and fully used within our methodology.

As explained below, throughout our analysis we always de-mean the data by . This allows to avoid making any assumptions on . We now present our assumptions which, thank to the detailed specification of model (1)-(3), appear relatively primitive.

###### Assumption 2.1 (idiosyncratic innovation εit)

The vector satisfies the following equation

 εt=Rat,fort=1,…T, (4)

where the matrix of constants satisfies , for some , and the elements of the vector follow a linear process:

 ait=∞∑s=0ϕisηi,t−s,supi∞∑s=0s2|ϕis|<∞,withϕi0=1, (5)

where the sequence is independent and identically distributed across and with and . Moreover, for every complex number ,

 infi|ϕi(z)|>κ,|z|≤1,whereϕi(z)=∞∑s=0ϕiszs. (6)
###### Remark 2.1

Assumption 2.1 is similar to Assumptions 1 and 2 in Pesaran & Tosetti (2011) and, with same variations, this form of cross-sectional and time dependence has been adopted also by Moon & Weidner (2017), Moon & Weidner (2015) and Onatski (2015). The above assumption turns out to be extremely convenient for establishing the asymptotic distribution of the feasible and unfeasible GLS estimators along the lines of Theorem 1 in Robinson & Hidalgo (1997).

###### Remark 2.2

Assumption 2.1 implies that, for every :

 supi1supt1N∑i2⋯iℓ=1T∑t2⋯th=1|cumh(εi1t1,εi2t2⋯,εiℓth)|<∞,

where the summands are the cumulants of order of .

###### Remark 2.3

By Brockwell & Davis (1991), Proposition 4.5.3, (6) implies that the eigenvalues of the covariance matrices of are bounded, and greater than for every . Easy calculations give , implying that and .

###### Assumption 2.2 (regressor innovation Vi)

The sequence have zero mean, and they satisfy, for every and :

 supk1⋯kssupi1supt1N∑i2⋯iℓ=1T∑t2⋯th=1(1+t2j)|cumh(vi1t1k1,⋯,viℓth,ks)|≤∞.

Moreover, , where .

###### Remark 2.4

Assumption 2.2 implies that , with . It follows that .

###### Remark 2.5

The can be interpreted as the high-rank components of the regressors , adopting Moon & Weidner (2015) terminology, as opposed to the which represent the low-rank components. For instance, if for each the are generated as in Assumption 2.1, one obtains for every (see the discussion in Moon & Weidner (2015), Appendix 1 and Onatski (2015)). In contrast, .

###### Assumption 2.3 (latent and observed factors)

Set for and . Then,

 (7)

and , . Moreover, we assume where

###### Remark 2.6

Equation (7) implies that (see Lütkepohl (1996), Result (4), Section 9.11.2).

###### Remark 2.7

Although not strictly necessary, we are ruling out trending behaviours in and . However, and are allowed to be cross-correlated as well as serially correlated. although not perfectly collinear. For instance, the joint dynamics of could be described by a multivariate stationary ARMA.

###### Assumption 2.4 (regressors)

For every , the matrix of unit specific regressors and the matrix of common regressors have full row rank. Moreover, setting , has always rank for sufficiently large and .

###### Remark 2.8

Assumption 2.4 requires enough cross section heterogeneity of the ’s across individuals. Simple manipulations show that

 D′⊥(1NN∑i=1MZiuiui′MZi)D⊥=1NN∑i=1M(D′⊥Xi)D′⊥uiui′D⊥M(D′⊥Xi)>0,

implying that the empirical covariance matrix defined in (19) is invertible.

and are non-random such that and and, for ,

 BN:=1NN∑i=1bibi′>0. (8)

and

 AN := (9)

is positive definite with

 Ψi:=Γi′F′MDFTΓi+ΣV′iVi. (10)
###### Remark 2.9

Condition (8) implies that the factor structure (2) is strong, as defined in Pesaran & Tosetti (2011). This is commonly assumed in the literature. The technical condition (9) is used in the proof of Theorem 3.2. As shown in Section H.1 in the Supp. Material, Lemma H.14, the matrices in brackets are of full rank. Hence, (9) will be satisfied when there is enough cross-sectional heterogeneity in the sample. Finally, our results will not change if random loadings are assumed (and cross-sectionally independent from other parameters).

###### Assumption 2.6 (independence)

The are mutually independent for every and and .

###### Remark 2.10

We are not allowing for any correlation between any entries of and . This rules out the possibility that contains a weakly exogenous component, and in this respect we are similar to Pesaran (2006) and Bai (2009). The implications from generalizing this assumption, in particular when considering dynamic panels where one element of represents the lagged dependent variable, are discussed in Section 6.2.

###### Remark 2.11

Assumptions 2.2, 2.3 and 2.6 and Remark 2.6 imply that and , for every . Hence and for large enough.

## 3 Estimators: definitions and asymptotics

Our main objective is to estimate the heterogeneous slope coefficients of (1). However, estimation of the coefficients of the common regressors is also discussed in Section 4. Hence, without loss of generality, we premultiply both sides of (1) by the projection matrix , obtaining

 MDyi=MDXiβi+MDui. (11)

We consider three different estimators for the parameters , namely the OLS, the unfeasible and feasible GLS estimators. Regarding the OLS estimator for :

 ^βOLSi:=(Xi′MDXi)−1XiMDyi. (12)

We now consider GLS estimation. Define the cross-sectional averages of the individual covariance matrices of the , conditional on sigma algebra generated by , defined in Assumption 2.3:

 MDSNMD, setting SN:=FBNF′+ΞN,withΞN:=1NN∑i=1Ξi. (13)

We assume without loss of generality that includes an element equal to one, i.e. we allow for an intercept term, leading to . The presence of could cause some complications in the definition of the GLS estimator since the have a singular covariance matrix. We show how to solve this issue and obtain a model with a non-singular residual covariance matrix that can be used to construct the GLS estimator.

Proceeding along the lines of Magnus & Neudecker (1988), Section 11 in Chapter 13, one gets the UGLS estimator when the residual covariance matrix to model (11) is singular:

 (14)

By Lemma D.2 in the Supp. Material

where is the full rank matrix such that where . Assumption 2.4 and display (13) imply that the inverse in (14) is well defined for any . By substitution, setting for simplicity

 (15)

one obtains

 ^βUGLSi =(Xi′D⊥(D′⊥S−1ND⊥)−1D′⊥Xi′)−1Xi′D⊥(D′⊥S−1ND⊥)−1D⊥yi (16) =(Xi′S−1NXi)−1Xi′S−1Nyi,

where we set This means that the UGLS has now the more conventional expression of the generalized least squares for the model

 yi=Xiβi+ui,withui=Fbi+ui, (17)

without involving Moore-Penrose matrices. Pre-multiplying the data by reduces the sample size by units since now the and the have rows. Likewise, considering again model (17), an equivalent representation of (12) is .

Along the same lines, our proposed feasible GLS estimator is given by

 ^βGLSi:=(Xi′^SN−1Xi)−1Xi′^SN−1yi, (18)

where

 ^SN:=N−1N∑i=1^ui^ui′, with ^ui:=yi−Xi^βOLSi=MXiui, (19)

for and large enough, by Assumption 2.4 and Remark 2.8, has full rank. The following two theorems enunciates the asymptotic distribution of the OLS, UGLS and GLS estimators, respectively. The proofs are given in Appendixes B.1 and B.2, respectively. Further details are provided in the Supp. Material.

###### Theorem 3.1

When Assumptions 2.1, 2.2, 2.3, 2.4, 2.5 and 2.6 hold, for any and as

(i) (OLS estimator)

 T12(^βOLSi−βi−τOLSi)d→N(0,Σi),

where

 τOLSi:=ΣXi′Xi−1ΣXi′Fbi, (20)

is the bias term, and the asymptotic covariance matrix equals

 Σi:=ΣXi′Xi−1ΣXi′D′⊥ΞiD⊥XiΣXi′Xi−1, (21)

setting

 ΣXi′F := Γi′(ΣF′F−ΣF′DΣ−1D′DΣD′F), (22) ΣXi′Xi := Γ′i(ΣF′F−ΣF′DΣ−1D′DΣD′F)Γi+ΣV′iVi, (23) ΣXi′D′⊥ΞiD⊥Xi := Γi′(−ΣF′DΣ−1D′D,Im)ΣZ′ΞiZ(−ΣF′DΣ−1D′D,Im)′Γi +ΣV′iΞiVi.

(ii) (UGLS estimator)

 T12(^βUGLSi−βi)d→N(0,Σ⋆N), (25)

with .

###### Remark 3.1

The OLS estimator is affected by a first-order bias. It will be asymptotically unbiased if either or or, alternatively, for diagonal as well as with and satisfying for every and . Essentially, this means that the entries of are non zero whenever the corresponding entries of are zero, for the same row , and viceversa. More in general, no bias arises if belongs to the null space of , assuming .

###### Remark 3.2

One can assume without loss of generality that the same latent factors enter into and . In fact, assume with the rows of correlated, but not identical to the rows of . Then the bias takes the form

 τOLSi=ΣXi′Xi−1ΣF′PFGbi,

exploiting the decomposition . Hence the bias will only be non-zero due to the portion of correlated with . The same consideration applies to the GLS estimator. In Section 6.1 we explore more in details the implications of having different, yet correlated, factor structures for regressors and innovations.

###### Remark 3.3

The UGLS estimator is asymptotically unbiased, consistent and asymptotically normal as . Moreover, the UGLS estimator can be efficient in the GLS sense. In particular, when the are not (unconditionally) heteroskedastic, namely , then the UGLS asymptotic covariance matrix does not have the sandwich form, unlike for the OLS estimator. One can define the UGLS differently, for instance replacing with in (14). However, our definition of the UGLS estimator makes it closer to the population counterpart to the class of feasible GLS estimators here studied.

We now present the main result of the paper.

###### Theorem 3.2

When Assumptions 2.1, 2.2, 2.3, 2.4, 2.5 and 2.6 hold, as ,

 ^βGLSip→βi, (26)

and, as , then

 (V′iC−1NΞiC−1NVi)−12(V′iC−1NVi)(^βGLSi−βi)d→N(0,IK), (27)

where

 CN:=1NN∑i=1(Ξi+Θi)withΘi:=E[ViΣXi′Xi−1ΓiΣF′Fbibi′ΣF′FΓi′ΣXi′Xi−1V′i], (28)

with defined in (23).

###### Remark 3.4

The GLS estimator is asymptotically unbiased, consistent and asymptotically normal as both such that . The feasible GLS estimator is not efficient in general. A multi-step generalization achieves substantial efficiency gains, see Section 6.

## 4 Common regressors

We now consider estimation of the coefficients to the common regressors in model (1). A natural generalization of the GLS estimator would be

 (~αGLSi~βGLSi):=(Z′i~S+NZi)+Z′i~S+Nyi,

where has been defined in Assumption 2.4 and

 ~SN=N−1N∑i=1^ui^ui′,^ui:=yi−D^αOLSi−Xi^βOLSi, (29)

that is are the OLS residuals, for . However, we show in Theorem I.1 (Supp. Material, Appendix I) that and due to a cancellation that occurs as a consequence of being common across units. If the joint distribution for the estimators of is not required, one can estimate the as the projection of on yielding

 ˜αi:=(D′D)−1D′(yi−Xi^βGLSi). (30)

Using our theory, its asymptotic distribution follows (see Theorem I.2 ins the Spp. Material for further details). Note that the additional assumption is required. For example, if we are interested in a model with an intercept term, heterogenous across units, such as , with , then one of the restrictions is simply . If, moreover, a grand-mean is also allowed for, such as , then the additional restriction is needed. Similar identification conditions are discussed in Bai (2009) and Moon & Weidner (2017).777Most of the papers on estimation of panel regressions with so-called interactive fixed effects, such as ours, focus exclusively on the coefficients to the heterogeneous time-varying regressors. Among the few exceptions, is Bai (2009) who shows that, without further identification assumption, estimation of the coefficient to common regressors is possible only for constant parameters. In contrast, for the case of non-constant coefficients further identification assumptions similar to ours are needed. Moon & Weidner (2017) study the same estimator of Bai (2009) under weaker conditions on the regressors, allowing for instance for pre-determinatedness. Our identification condition for the coefficients to common regressors implies their weaker corresponding assumption. They focus exclusively on the case of constant regression coefficients.

If instead the joint distribution for estimators of and is required, this can be achieved by a slight modification of our GLS estimator, namely

 (˘αGLSi˘βGLSi):=(Z′i˘S−1NZi)−1Z′i˘S−1Nyi, (31)

for the non-singular matrix

 ˘SN:=~SN+(tr(~SN)N)PD. (32)

Non-singularity of follows by augmenting the matrix , of rank , with the projection matrix of rank . Scaling by in not required by the asymptotic theory but could be relevant in finite-samples to ensure the same order of magnitude of the two terms in . It turns out that the same identification condition , discussed above, is required. Monte Carlo experiments are reported in Section 7 to assess the small-sample properties of these estimators.

## 5 Technical contributions

The asymptotics for the GLS estimator requires four key auxiliary results, enunciated in Appendix A, which could be useful in a broader set of statistical problems. The main reason for this complexity is that, unlike most of the existing theoretical results on GLS estimation, we are not restricting the number of free elements of the weighting matrix to be finite. Indeed, in our case the number of free elements of the weighting matrix is and hence rapidly increasing with . To tackle the curse-of-dimensionality issue, we exploit the approximate factor structure of the weighted matrix, that we write as , for (possibly random) matrix and matrix for every finite . The inverse has a convenient form thanks to the Sherman-Morrison formula (see Appendix D.1).

Lemma A.1 establishes the asymptotic orthogonality between the inverse of the matrix , and the factor . More precisely, when and satisfy a set of mild regularity conditions

 ∥E−1F∥2=Op(T−1), (33)

This is a remarkably fast rate given that is dimensional, with fixed, hence with its number of rows increasing with . It implies that for a large class of matrices (satisfying the mild regularity conditions of Lemma A.1) possibly unrelated to both and , then and, when the entries of have zero mean and are stochastically independent of and , then . These rates are very different from the usual case, arising when and are unrelated. For example, when , under the same assumptions on , one gets that is of order or , depending on whether has zero or non-zero mean, respectively, assuming that , and are mutually independent.

The asymptotic orthogonality (33) plays a crucial role in establishing the asymptotics for the GLS (and UGLS) estimator. To better understand this, consider the following decomposition of the GLS estimator:

 ^βGLSi−βi =(Xi′^SN−1Xi)−1Xi′^SN−1ui (34) =(Xi′^SN−1Xi)−1Xi′^SN−1Fbi+(Xi′^SN−1Xi)−1Xi