Individual and Time Effects
in Nonlinear Panel Models with Large , ^{1}
Abstract
We derive fixed effects estimators of parameters and average partial effects in (possibly dynamic) nonlinear panel data models with individual and time effects. They cover logit, probit, ordered probit, Poisson and Tobit models that are important for many empirical applications in micro and macroeconomics. Our estimators use analytical and jackknife bias corrections to deal with the incidental parameter problem, and are asymptotically unbiased under asymptotic sequences where converges to a constant. We develop inference methods and show that they perform well in numerical examples.
Abstract
This supplemental material contains five appendices. Appendix S.1 presents the results of an empirical application and a Monte Carlo simulation calibrated to the application. Following Aghion et al. [?]AghionBloomBlundellGriffithHowitt2005, we use a panel of U.K. industries to estimate Poisson models with industry and time effects for the relationship between innovation and competition. Appendix S.2 gives the proofs of Theorems 4.3 and 4.4. Appendices S.3, S.4, and S.5 contain the proofs of Appendices B, C, and D, respectively. Appendix S.6 collects some useful intermediate results that are used in the proofs of the main results.
Keywords: Panel data, nonlinear model, dynamic model, asymptotic bias correction, fixed effects, time effects.
JEL: C13, C23.
1 Introduction
Fixed effects estimators of nonlinear panel data models can be severely biased because of the incidental parameter problem [\astronciteNeyman and Scott1948]. A growing literature, surveyed in Arellano and Hahn [?]ArellanoHahn2007, shows that the leading term of an asymptotic expansion of the bias as both the crosssectional dimension and time series dimension of the panel grow, can be characterized and corrected for. In models with individual effects, the leading bias term is of order and comes from the estimation of the individual effects. This result, however, does not apply to models with individual and time effects, where both of these effects are treated as parameters to be estimated. In this paper we show that the estimation of the time effects causes an additional incidental parameter bias of order . Thus, if and are similarly large, the bias produced by the estimation of the time effects is of similar order of magnitude to the bias produced by the estimation of the individual effects, and both biases need to be corrected. We provide the corresponding analytical and jackknife bias corrections.
The asymptotic approximation to the fixed effects estimators that lets the two dimensions of the panel grow with the sample size is motivated by the recent availability of long panels and other large pseudopanel data structures where the indexes might not correspond to individuals and time periods. Examples of these datasets include traditional microeconomic panel surveys with a long history of data such as the PSID and NLSY, international crosscountry panels such as the Penn World Table, U.S. state level panels over time such as the CPS, and square pseudopanels of trade flows across countries such as the Feenstra’s World Trade Flows and CEPII, where the indexes correspond to the same countries indexed as importers and exporters.
We focus on semiparametric models with loglikelihood functions that are concave in all parameters, and where each individual effect and time effect enter the loglikelihood for observation additively as . This is the most common specification for the individual and time effects in linear models and is also a natural specification in the nonlinear models that we consider. Imposing concavity of the loglikelihood function greatly facilitates showing consistency in our setting where the dimension of the parameter space grows with the sample size. The most popular limited dependent variable models, including logit, probit, ordered probit, Tobit and Poisson models have concave loglikelihood functions, possibly after reparametrization (Olsen [?]Olsen:1978p3375, and Pratt [?]Pratt:1981p654). We note here that the general expansion that we derive in Appendix B do not impose additivity and concavity, but we use these restrictions to apply the expansion to fixed effects estimators. The models that we consider are semiparametric because the joint distribution of the explanatory variables and the unobserved effects is left unspecified. The explanatory variables can be either strictly exogenous or predetermined.
We derive bias expansions and corrections for fixed effects estimators of common parameters and average partial effects (APEs). The vector includes all the unknown parameters that enter the loglikelihood function other than the individual and time effects, such as index coefficients in a probit model. The APEs are functions of the data, the common parameters, and the individual and time effects in nonlinear models. We find that the properties of the fixed effects estimators of and the APEs are different. For , the order of the bias is , which is of the same as the rate of convergence under sequences where converge to a constant. For the APEs, we uncover that the incidental parameter problem is negligible asymptotically
because the order of the bias, , is smaller than the rate of convergence, which is , slower than for model parameters. To the best of our knowledge, this rate result is new for fixed effects estimators of average partial effects in nonlinear panel models with individual and time effects.
The bias correction eliminates the bias terms of orders and from the fixed effects estimators. We considerer two methods to implement the correction: an analytical bias correction similar to Hahn
and Newey [?]Hahn:2004p882 and Hahn and Kuersteiner [?]HahnKuersteiner2011, and a suitable modification of the split panel jackknife of Dhaene and Jochmans [?]DhaeneJochmans2015.
Simulation evidence indicates that our corrections improve the estimation and inference performance of the fixed effects estimators of parameters and average effects. The analytical corrections dominate the jackknife corrections in a probit model for sample sizes that are relevant for empirical practice. In the online supplement, FernándezVal and Weidner [?]Supp2015, we illustrate the corrections with an empirical application on the relationship between competition and innovation using a panel of U.K. industries, following Aghion, Bloom, Blundell, Griffith and Howitt [?]AghionBloomBlundellGriffithHowitt2005. We find that the invertedU pattern relationship found by Aghion et al is robust to relaxing the strict exogeneity assumption of competition with respect to the innovation process and to the inclusion of innovation dynamics. We also uncover substantial state dependence in the innovation process.
Literature review. The Neyman and Scott incidental parameter problem has been extensively discussed in the econometric literature; see, for example, Heckman [?]Heckman:1981p2940, Lancaster [?]Lancaster:2000p879, and Greene [?]Greene:2004p3125. There is also a vast literature that shows how to tackle the problem in specific models under asymptotic sequences where is fixed and grows to infinity. However, there are results, e.g. from Honoré and Tamer [?]HonoreTamer2006, Chamberlain [?]Chamberlain2010, and Chernozhukov, FernándezVal, Hahn and Newey [?]CFHN13, showing that model parameters and APEs are not point identified in important nonlinear panel data models under fixed asymptotic sequences, implying that no fixed consistent point estimators exist in these models.
A recent response to the incidental parameter problem is to adopt an alternative asymptotic approximation where both and grow with the sample size. Under these large sequences, the fixed effects estimator is consistent but has bias in the asymptotic distribution. This asymptotic bias is the large version of the incidental parameter problem and has motivated the development of bias corrections. Examples of papers that use this approximation include Phillips and Moon [?]Phillips:1999p733, Hahn and Kuersteiner [?]Hahn:2002p717,
Lancaster [?]Lancaster:2002p875, Woutersen [?]Woutersen:2002p3683,
Alvarez and Arellano [?]AlvarezArellano2003,
Hahn and Newey [?]Hahn:2004p882,
Carro [?]Carro:2007p3601,
Arellano and Bonhomme [?]ArellanoBonhomme2009,
FernandezVal [?]FernandezVal:2009p3313,
Hahn and Kuersteiner [?]HahnKuersteiner2011,
FernandezVal and Vella [?]FernandezValVella2011,
and Kato, Galvao and MontesRojas [?]KatoGalvaoMontesRojas2012.
This previous work, however, does not cover models with time effects.
The large panel literature on models with both individual and time effects is sparse. Pesaran [?]Pesaran2006, Bai [?]Bai:2009p3321, and Moon and Weidner [?]MoonWeidner2015a,MoonWeidner2015b study linear regression models with interactive individual and time fixed effects. The fixed effects estimators in these models also have asymptotic bias of order , but the methods used to derive this bias rely on linearity and therefore cannot be applied to the nonlinear models that we consider. Hahn and Moon [?]HahnMoon2006 consider bias corrected fixed effects estimators in panel linear autoregressive models with additive individual and time effects. Regarding nonlinear models, there is independent and contemporaneous work by Charbonneau [?]Charbonneau2011,Charbonneau2014, which extends the conditional fixed effects estimators to logit and Poisson models with individual and time effects. She differences out the individual and time effects by conditioning on sufficient statistics. The conditional approach completely eliminates the asymptotic bias coming from the estimation of the incidental parameters, but it does not permit estimation of average partial effects and has not been developed for models with predetermined regressors. We instead consider estimators of model parameters and average partial effects in nonlinear models with predetermined regressors. The two approaches can therefore be considered as complementary.
Outline of the paper. The rest of the paper is organized as follows. Section 2 introduces the model and fixed effects estimators. Section 3 describes the bias corrections to deal with the incidental parameters problem and illustrates how the bias corrections work through an example. Section 4 provides the asymptotic theory. Section 5 presents Monte Carlo results. The Appendix collects the proofs of the main results, and an online supplement to the paper contains additional technical derivations, numerical examples, and an empirical application [\astronciteFernándezVal and Weidner2015b].
2 Model and Estimators
2.1 Model
The data consist of observations for a scalar outcome variable of interest and a vector of explanatory variables . We assume that the outcome for individual at time is generated by the sequential process:
where , , is a known probability function, and is a finite dimensional parameter vector. The variables and are unobserved individual and time effects that in economic applications capture individual heterogeneity and aggregate shocks, respectively. The model is semiparametric because we do not specify the distribution of these effects nor their relationship with the explanatory variables. The conditional distribution represents the parametric part of the model. The vector contains predetermined variables with respect to . Note that can include lags of to accommodate dynamic models.
We consider two running examples throughout the analysis:
Example 1 (Binary response model).
Let be a binary outcome and be a cumulative distribution function, e.g. the standard normal or standard logistic distribution. We can model the conditional distribution of using the singleindex specification with individual and time effects
In a labor economics application, can be an indicator for female labor force participation and can include fertility indicators and other socioeconomic characteristics.
Example 2 (Poisson model).
Let be a nonnegative intergervalued outcome, and be the probability mass function of a Poisson random variable with mean . We can model the conditional distribution of using the single index specification with individual and time effects
In an industrial organization application, can be the number of patents that a firm produces and can include investment in R&D and other firm characteristics.
For estimation, we adopt a fixed effects approach, treating the realization of the unobserved individual and time effects as parameters to be estimated. We collect all these effects in the vector . The model parameter usually includes regression coefficients of interest, while the vector is treated as a nuisance parameter. The true values of the parameters, denoted by and , are the solution to the population conditional maximum likelihood problem
(2.1) 
for every , where denotes the expectation with respect to
the distribution of the data conditional on the unobserved effects and initial conditions including strictly exogenous variables, is an arbitrary constant,
, and and denote vectors of ones with dimensions and .
Existence and uniqueness of the solution to the population problem will be
guaranteed by our assumptions in Section 4 below, including concavity
of the objective function in all parameters.
The second term of is a penalty that imposes a normalization needed to identify in
models with scalar individual and time effects that enter
additively into the loglikelihood function
as .
Other quantities of interest involve averages over the data and unobserved effects
(2.2) 
where denotes the expectation with respect to the joint distribution of the data and the unobserved effects, provided that the expectation exists. is indexed by and because the marginal distribution of can be heterogeneous across and/or ; see Section 4.2. These averages include average partial effects (APEs), which are often the ultimate quantities of interest in nonlinear models. The APEs are invariant to the choice of normalization for if and enter as . Some examples of partial effects that satisfy this condition are the following:
Example 1 (Binary response model). If , the th element of , is binary, its partial effect on the conditional probability of is
(2.3) 
where is the th element of , and and include all elements of and except for the th element. If is continuous and is differentiable, the partial effect of on the conditional probability of is
(2.4) 
where is the derivative of .
Example 2 (Poisson model). If includes and some known transformation with coefficients and , the partial effect of on the conditional expectation of is
(2.5) 
2.2 Fixed effects estimators
We estimate the parameters by solving the sample analog of problem (2.1), i.e.
(2.6) 
As in the population case, we shall impose conditions guaranteeing that the solution to this maximization problem
exists and is unique with probability approaching one as and become large.
For computational purposes, we note that the solution to the program (2.6) for is the same as the solution to the program that imposes directly as a constraint in the optimization, and is invariant to the normalization.
In our numerical examples we impose either or directly by dropping the first individual or time effect. This constrained program has good computational properties because its objective function is concave and smooth in all the parameters. We have developed the commands probitfe and logitfe in Stata to implement the methods of the paper for probit and logit models [\astronciteCruzGonzález et al.2015].
To analyze the statistical properties of the estimator of it is convenient to first concentrate out the nuisance parameter . For given , we define the optimal as
(2.7) 
The fixed effects estimators of and are
(2.8) 
Estimators of APEs can be formed by pluggingin the estimators of the model parameters in the sample version of (2.2), i.e.
(2.9) 
Again, is invariant to the normalization chosen for if and enter as .
3 Incidental parameter problem and bias corrections
In this section we give a heuristic discussion of the main results, leaving the technical details to Section 4. We illustrate the analysis with numerical calculations based on a variation of the classical Neyman and Scott [?]Neyman:1948p881 variance example.
3.1 Incidental parameter problem
Fixed effects estimators in nonlinear models suffer from the incidental parameter problem [\astronciteNeyman and Scott1948]. The source of the problem is that the dimension of the nuisance parameter increases with the sample size under asymptotic approximations where either or pass to infinity. To describe the problem let
(3.1) 
The fixed effects estimator is inconsistent under the traditional Neyman and Scott asymptotic sequences where and is fixed, i.e., . Similarly, the fixed effects estimator is inconsistent under asymptotic sequences where and is fixed, i.e., . Note that if is replaced by . Under asymptotic approximations where either or are fixed, there is only a fixed number of observations to estimate some of the components of , for each individual effect or for each time effect, rendering the estimator inconsistent for . The nonlinearity of the model propagates the inconsistency to the estimator of .
A key insight of the large panel data literature is that the incidental parameter problem becomes an asymptotic bias problem under an asymptotic approximation where and (e.g., Arellano and Hahn, \citeyearArellanoHahn2007). For models with only individual effects, this literature derived the expansion as , for some constant . The fixed effects estimator is consistent because , but has bias in the asymptotic distribution if is not negligible relative to , the order of the standard deviation of the estimator. This asymptotic bias problem, however, is easier to correct than the inconsistency problem that arises under the traditional Neyman and Scott asymptotic approximation. We show that the same insight still applies to models with individual and time effects, but with a different expansion for . We characterize the expansion and develop bias corrections.
3.2 Bias Expansions and Bias Corrections
Some expansions can be used to explain our corrections. For smooth likelihoods and under appropriate regularity conditions, as ,
(3.2) 
for some and that we characterize in Theorem 4.1 and explain in Remark 2, where . Unlike in nonlinear models without incidental parameters, the order of the bias is higher than the inverse of the sample size due to the slow rate of convergence of . Note also that by the properties of the maximum likelihood estimator
for some that we also characterize in Theorem 4.1. Under asymptotic sequences where as , the fixed effects estimator is asymptotically biased because
(3.3) 
Relative to fixed effects estimators with only individual effects, the presence of time effects introduces additional asymptotic bias through . This asymptotic result predicts that the fixed effects estimator can have significant bias relative to its dispersion. Moreover, confidence intervals constructed around the fixed effects estimator can severely undercover the true value of the parameter even in large samples. We show that these predictions provide a good approximations to the finite sample behavior of the fixed effects estimator through analytical and simulation examples in Sections 3.3 and 5.
The analytical bias correction consists of subtracting estimates of the leading terms of the bias from the fixed effect estimator of . Let and be estimators of and as defined in (4.6). The bias corrected estimator can be formed as
If , , and then
The analytical correction therefore centers the asymptotic distribution at the true value of the parameter, without increasing asymptotic variance. This asymptotic result predicts that in large samples the corrected estimator has small bias relative to dispersion, the correction does not increase dispersion, and the confidence intervals constructed around the corrected estimator have coverage probabilities close to the nominal levels. We show that these predictions provide a good approximations to the behavior of the corrections in Sections 3.3 and 5 even in small panels with and .
We also consider a jackknife bias correction method that does not require explicit estimation of the bias. This method is based on the split panel jackknife (SPJ) of Dhaene and Jochmans [?]DhaeneJochmans2015 applied to the time and crosssection dimension of the panel. Alternative jackknife corrections based on the leaveoneobservationout panel jackknife (PJ) of Hahn and Newey [?]Hahn:2004p882 and combinations of PJ and SPJ are also possible. We do not consider corrections based on PJ because they are theoretically justified by secondorder expansions of that are beyond the scope of this paper.
To describe our generalization of the SPJ, define the fixed effects estimator of in the subpanel with cross sectional indexes and time series indexes as
where ,
, and
.
Let be the average of the 2 split jackknife
estimators in the subpanels with , and or , i.e. including all the individuals and leaving out the first and second halves of the time
periods. Let be the average of the 2 split jackknife
estimators in the subpanels with , and or , i.e. including all the time periods and leaving out half of the individuals of the panel.
(3.4) 
To give some intuition about how the corrections works, note that
where and Relative to , has double the bias coming from the estimation of the individual effects because it is based on subpanels with half of the time periods, and has double the bias coming from the estimation of the time effects because it is based on subpanels with half of the individuals. The time series split removes the bias term and the cross sectional split removes the bias term
3.3 Illustrative Example
To illustrate how the bias corrections work in finite samples, we consider a simple model where the solution to the population program (3.1) has closed form. This model corresponds to a variation of the classical Neyman and Scott [?]Neyman:1948p881 variance example that includes both individual and time effects, . It is wellknow that in this case
where , and Moreover, from the wellknown results on the degrees of freedom adjustment of the estimated variance
so that and .
To form the analytical bias correction we can set and . This yields with
This correction reduces the order of the bias from to and introduces additional higher order terms. The analytical correction increases finitesample variance because the factor . We compare the biases and standard deviations of the fixed effects estimator and the corrected estimator in a numerical example below.
For the Jackknife correction, straightforward calculations give
The correction therefore reduces the order of the bias from to
Table 1 presents numerical results for the bias and standard deviations of the fixed effects and bias corrected estimators in finite samples. We consider panels with and only report the results for since all the expressions are symmetric in and . All the numbers in the table are in percentage of the true parameter value, so we do not need to specify the value of . We find that the analytical and jackknife corrections offer substantial improvements over the fixed effects estimator in terms of bias. The first and fourth row of the table show that the bias of the fixed effects estimator is of the same order of magnitude as the standard deviation, where under independence of over and conditional on the unobserved effects. The fifth row shows the increase in standard deviation due to analytical bias correction is small compared to the bias reduction, where . The last row shows that the jackknife yields less precise estimates than the analytical correction when .
N = 10  N=25  N=50  
T = 10  T=10  T=25  T=10  T=25  T=50  
.19  .14  .08  .12  .06  .04  
.03  .02  .00  .01  .01  .00  
.01  .00  .00  .00  .00  .00  
.13  .08  .05  .06  .04  .03  
.14  .09  .06  .06  .04  .03  
.17  .10  .06  .07  .04  .03  
Notes: obtained by 50,000 simulations with 
Table 2 illustrates the effect of the bias on the inference based on the asymptotic distribution. It shows the coverage probabilities of 95% asymptotic confidence intervals for constructed in the usual way as
where and is an estimator of the asymptotic variance . To find the coverage probabilities, we use that and . These probabilities do not depend on the value of because the limits of the intervals are proportional to . For the Jackknife we compute the probabilities numerically by simulation with . As a benchmark of comparison, we also consider confidence intervals constructed from the unbiased estimator . Here we find that the confidence intervals based on the fixed effect estimator display severe undercoverage for all the sample sizes. The confidence intervals based on the corrected estimators have high coverage probabilities, which approach the nominal level as the sample size grows. Moreover, the bias corrected estimators produce confidence intervals with very similar coverage probabilities to the ones from the unbiased estimator.
N = 10  N=25  N=50  
T = 10  T=10  T=25  T=10  T=25  T=50  
.56  .55  .65  .44  .63  .68  
.89  .92  .93  .92  .94  .94  
.89  .91  .93  .92  .93  .94  
.91  .93  .94  .93  .94  .94  
50,000 simulations with  
Notes: Nominal coverage probability is .95. obtained by 
4 Asymptotic Theory for Bias Corrections
In nonlinear panel data models the population problem (3.1) generally does not have closed form solution, so we need to rely on asymptotic arguments to characterize the terms in the expansion of the bias (3.2) and to justify the validity of the corrections.
4.1 Asymptotic distribution of model parameters
We consider panel models with scalar individual and time effects that enter the likelihood function additively through . In these models the dimension of the incidental parameters is . The leading cases are single index models, where the dependence of the likelihood function on the parameters is through an index . These models cover the probit and Poisson specifications of Examples 1 and 2. The additive structure only applies to the unobserved effects, so we can allow for scale parameters to cover the Tobit and negative binomial models. We focus on these additive models for computational tractability and because we can establish the consistency of the fixed effects estimators under a concavity assumption in the loglikelihood function with respect to all the parameters.
The parametric part of our panel models takes the form
(4.1) 
We denote the derivatives of the loglikelihood function by , , , , etc. We drop the arguments and when the derivatives are evaluated at the true parameters and , e.g. . We also drop the dependence on from all the sequences of functions and parameters, e.g. we use for and for .
We make the following assumptions:
Assumption 4.1 (Panel models).
Let and .
Let and let
be a subset of
that contains an neighbourhood of
for all .

Asymptotics: we consider limits of sequences where , , as .

Sampling: conditional on , is independent across and, for each , is mixing with mixing coefficients satisfying as , where
and for , is the sigma field generated by , and is the sigma field generated by .

Model: for , we assume that for all
The realizations of the parameters and unobserved effects that generate the observed data are denoted by and .

Smoothness and moments: We assume that is four times continuously differentiable over a.s. The partial derivatives of with respect to the elements of up to fourth order are bounded in absolute value uniformly over by a function a.s., and is a.s. uniformly bounded over .

Concavity: For all is strictly concave over a.s. Furthermore, there exist constants and such that for all , a.s. uniformly over .
Remark 1 (Assumption 4.1).
Assumption 4.1 defines the large asymptotic framework and is the same as in Hahn and Kuersteiner [?]HahnKuersteiner2011. The relative rate of and exactly balances the order of the bias and variance producing a nondegenerate asymptotic distribution.
Assumption 4.1 does not impose identical distribution nor stationarity over the time series dimension, conditional on the unobserved effects, unlike most of the large panel literature, e.g., Hahn and Newey [?]Hahn:2004p882 and Hahn and Kuersteiner [?]HahnKuersteiner2011. These assumptions are violated by the presence of the time effects, because they are treated as parameters. The mixing condition is used to bound covariances and moments in the application of laws of large numbers and central limit theorems – it could replaced by other conditions that guarantee the applicability of these results.
Assumption 4.1 is the parametric part of the panel model. We rely on this assumption to guarantee that and have martingale difference properties. Moreover, we use certain Bartlett identities implied by this assumption to simplify some expressions, but those simplifications are not crucial for our results. We provide expressions for the asymptotic bias and variance that do not apply these simplifications in Remark 3 below.
Assumption 4.1 imposes smoothness and moment conditions in the loglikelihood function and its derivatives. These conditions guarantee that the higherorder stochastic expansions of the fixed effect estimator that we use to characterize the asymptotic bias are welldefined, and that the remainder terms of these expansions are bounded.
The most commonly used nonlinear models in applied economics such as logit, probit, ordered probit, Poisson, and Tobit models have smooth loglikelihoods functions that satisfy the concavity condition of Assumption 4.1, provided that all the elements of have cross sectional and time series variation. Assumption 4.1 guarantees that and are the unique solution to the population problem (2.1), that is all the parameters are point identified.
To describe the asymptotic distribution of the fixed effects estimator it is convenient to introduce some additional notation. Let be the expected Hessian matrix of the loglikelihood with respect to the nuisance parameters evaluated at the true parameters, i.e.