A Notation and Choice of Norms

Individual and Time Effects in Nonlinear Panel Models with Large , 1

Abstract

We derive fixed effects estimators of parameters and average partial effects in (possibly dynamic) nonlinear panel data models with individual and time effects. They cover logit, probit, ordered probit, Poisson and Tobit models that are important for many empirical applications in micro and macroeconomics. Our estimators use analytical and jackknife bias corrections to deal with the incidental parameter problem, and are asymptotically unbiased under asymptotic sequences where converges to a constant. We develop inference methods and show that they perform well in numerical examples.

Abstract

This supplemental material contains five appendices. Appendix S.1 presents the results of an empirical application and a Monte Carlo simulation calibrated to the application. Following Aghion et al. [?]AghionBloomBlundellGriffithHowitt2005, we use a panel of U.K. industries to estimate Poisson models with industry and time effects for the relationship between innovation and competition. Appendix S.2 gives the proofs of Theorems 4.3 and 4.4. Appendices S.3, S.4, and S.5 contain the proofs of Appendices B, C, and D, respectively. Appendix S.6 collects some useful intermediate results that are used in the proofs of the main results.


Keywords: Panel data, nonlinear model, dynamic model, asymptotic bias correction, fixed effects, time effects.
JEL: C13, C23.

1 Introduction

Fixed effects estimators of nonlinear panel data models can be severely biased because of the incidental parameter problem [\astronciteNeyman and Scott1948]. A growing literature, surveyed in Arellano and Hahn [?]ArellanoHahn2007, shows that the leading term of an asymptotic expansion of the bias as both the cross-sectional dimension and time series dimension of the panel grow, can be characterized and corrected for. In models with individual effects, the leading bias term is of order and comes from the estimation of the individual effects. This result, however, does not apply to models with individual and time effects, where both of these effects are treated as parameters to be estimated. In this paper we show that the estimation of the time effects causes an additional incidental parameter bias of order . Thus, if and are similarly large, the bias produced by the estimation of the time effects is of similar order of magnitude to the bias produced by the estimation of the individual effects, and both biases need to be corrected. We provide the corresponding analytical and jackknife bias corrections.

The asymptotic approximation to the fixed effects estimators that lets the two dimensions of the panel grow with the sample size is motivated by the recent availability of long panels and other large pseudo-panel data structures where the indexes might not correspond to individuals and time periods. Examples of these datasets include traditional microeconomic panel surveys with a long history of data such as the PSID and NLSY, international cross-country panels such as the Penn World Table, U.S. state level panels over time such as the CPS, and square pseudo-panels of trade flows across countries such as the Feenstra’s World Trade Flows and CEPII, where the indexes correspond to the same countries indexed as importers and exporters.

We focus on semi-parametric models with log-likelihood functions that are concave in all parameters, and where each individual effect and time effect enter the log-likelihood for observation additively as . This is the most common specification for the individual and time effects in linear models and is also a natural specification in the nonlinear models that we consider. Imposing concavity of the log-likelihood function greatly facilitates showing consistency in our setting where the dimension of the parameter space grows with the sample size. The most popular limited dependent variable models, including logit, probit, ordered probit, Tobit and Poisson models have concave log-likelihood functions, possibly after reparametrization (Olsen [?]Olsen:1978p3375, and Pratt [?]Pratt:1981p654). We note here that the general expansion that we derive in Appendix B do not impose additivity and concavity, but we use these restrictions to apply the expansion to fixed effects estimators. The models that we consider are semi-parametric because the joint distribution of the explanatory variables and the unobserved effects is left unspecified. The explanatory variables can be either strictly exogenous or predetermined.

We derive bias expansions and corrections for fixed effects estimators of common parameters and average partial effects (APEs). The vector includes all the unknown parameters that enter the log-likelihood function other than the individual and time effects, such as index coefficients in a probit model. The APEs are functions of the data, the common parameters, and the individual and time effects in nonlinear models. We find that the properties of the fixed effects estimators of and the APEs are different. For , the order of the bias is , which is of the same as the rate of convergence under sequences where converge to a constant. For the APEs, we uncover that the incidental parameter problem is negligible asymptotically because the order of the bias, , is smaller than the rate of convergence, which is , slower than for model parameters. To the best of our knowledge, this rate result is new for fixed effects estimators of average partial effects in nonlinear panel models with individual and time effects.4 In numerical examples we find that the bias corrections, while not necessary to center the asymptotic distribution of APE estimators, do improve their finite-sample properties, specially in dynamic models.

The bias correction eliminates the bias terms of orders and from the fixed effects estimators. We considerer two methods to implement the correction: an analytical bias correction similar to Hahn and Newey [?]Hahn:2004p882 and Hahn and Kuersteiner [?]HahnKuersteiner2011, and a suitable modification of the split panel jackknife of Dhaene and Jochmans [?]DhaeneJochmans2015.5 However, the theory of the previous papers does not cover the models that we consider, because, in addition to not allowing for time effects, it assumes either identical distribution or stationarity over time for the processes of the observed variables, conditional on the unobserved effects. These assumptions are violated in our models due to the presence of the time effects, so we need to adjust the asymptotic theory accordingly. The individual and time effects introduce strong correlation in both dimensions of the panel. Conditional on the unobserved effects, we impose cross-sectional independence and weak time-serial dependence, and we allow for heterogeneity in both dimensions.

Simulation evidence indicates that our corrections improve the estimation and inference performance of the fixed effects estimators of parameters and average effects. The analytical corrections dominate the jackknife corrections in a probit model for sample sizes that are relevant for empirical practice. In the online supplement, Fernández-Val and Weidner [?]Supp2015, we illustrate the corrections with an empirical application on the relationship between competition and innovation using a panel of U.K. industries, following Aghion, Bloom, Blundell, Griffith and Howitt [?]AghionBloomBlundellGriffithHowitt2005. We find that the inverted-U pattern relationship found by Aghion et al is robust to relaxing the strict exogeneity assumption of competition with respect to the innovation process and to the inclusion of innovation dynamics. We also uncover substantial state dependence in the innovation process.

Literature review. The Neyman and Scott incidental parameter problem has been extensively discussed in the econometric literature; see, for example, Heckman [?]Heckman:1981p2940, Lancaster [?]Lancaster:2000p879, and Greene [?]Greene:2004p3125. There is also a vast literature that shows how to tackle the problem in specific models under asymptotic sequences where is fixed and grows to infinity. However, there are results, e.g. from Honoré and Tamer [?]HonoreTamer2006, Chamberlain [?]Chamberlain2010, and Chernozhukov, Fernández-Val, Hahn and Newey [?]CFHN13, showing that model parameters and APEs are not point identified in important nonlinear panel data models under fixed- asymptotic sequences, implying that no fixed- consistent point estimators exist in these models.

A recent response to the incidental parameter problem is to adopt an alternative asymptotic approximation where both and grow with the sample size. Under these large- sequences, the fixed effects estimator is consistent but has bias in the asymptotic distribution. This asymptotic bias is the large- version of the incidental parameter problem and has motivated the development of bias corrections. Examples of papers that use this approximation include Phillips and Moon [?]Phillips:1999p733, Hahn and Kuersteiner [?]Hahn:2002p717, Lancaster [?]Lancaster:2002p875, Woutersen [?]Woutersen:2002p3683, Alvarez and Arellano [?]AlvarezArellano2003, Hahn and Newey [?]Hahn:2004p882, Carro [?]Carro:2007p3601, Arellano and Bonhomme [?]ArellanoBonhomme2009, Fernandez-Val [?]FernandezVal:2009p3313, Hahn and Kuersteiner [?]HahnKuersteiner2011, Fernandez-Val and Vella [?]FernandezValVella2011, and Kato, Galvao and Montes-Rojas [?]KatoGalvaoMontes-Rojas2012. This previous work, however, does not cover models with time effects.6 Our contribution to this literature is to extend the large- bias corrections to models with two-way unobserved effects such as the individual and time effects commonly included in linear models.

The large- panel literature on models with both individual and time effects is sparse. Pesaran [?]Pesaran2006, Bai [?]Bai:2009p3321, and Moon and Weidner [?]MoonWeidner2015a,MoonWeidner2015b study linear regression models with interactive individual and time fixed effects. The fixed effects estimators in these models also have asymptotic bias of order , but the methods used to derive this bias rely on linearity and therefore cannot be applied to the nonlinear models that we consider. Hahn and Moon [?]HahnMoon2006 consider bias corrected fixed effects estimators in panel linear autoregressive models with additive individual and time effects. Regarding non-linear models, there is independent and contemporaneous work by Charbonneau [?]Charbonneau2011,Charbonneau2014, which extends the conditional fixed effects estimators to logit and Poisson models with individual and time effects. She differences out the individual and time effects by conditioning on sufficient statistics. The conditional approach completely eliminates the asymptotic bias coming from the estimation of the incidental parameters, but it does not permit estimation of average partial effects and has not been developed for models with predetermined regressors. We instead consider estimators of model parameters and average partial effects in nonlinear models with predetermined regressors. The two approaches can therefore be considered as complementary.

Outline of the paper. The rest of the paper is organized as follows. Section 2 introduces the model and fixed effects estimators. Section 3 describes the bias corrections to deal with the incidental parameters problem and illustrates how the bias corrections work through an example. Section 4 provides the asymptotic theory. Section 5 presents Monte Carlo results. The Appendix collects the proofs of the main results, and an online supplement to the paper contains additional technical derivations, numerical examples, and an empirical application [\astronciteFernández-Val and Weidner2015b].

2 Model and Estimators

2.1 Model

The data consist of observations for a scalar outcome variable of interest and a vector of explanatory variables . We assume that the outcome for individual at time is generated by the sequential process:

where , , is a known probability function, and is a finite dimensional parameter vector. The variables and are unobserved individual and time effects that in economic applications capture individual heterogeneity and aggregate shocks, respectively. The model is semiparametric because we do not specify the distribution of these effects nor their relationship with the explanatory variables. The conditional distribution represents the parametric part of the model. The vector contains predetermined variables with respect to . Note that can include lags of to accommodate dynamic models.

We consider two running examples throughout the analysis:

Example 1 (Binary response model).

Let be a binary outcome and be a cumulative distribution function, e.g. the standard normal or standard logistic distribution. We can model the conditional distribution of using the single-index specification with individual and time effects

In a labor economics application, can be an indicator for female labor force participation and can include fertility indicators and other socio-economic characteristics.

Example 2 (Poisson model).

Let be a non-negative interger-valued outcome, and be the probability mass function of a Poisson random variable with mean . We can model the conditional distribution of using the single index specification with individual and time effects

In an industrial organization application, can be the number of patents that a firm produces and can include investment in R&D and other firm characteristics.

For estimation, we adopt a fixed effects approach, treating the realization of the unobserved individual and time effects as parameters to be estimated. We collect all these effects in the vector . The model parameter usually includes regression coefficients of interest, while the vector is treated as a nuisance parameter. The true values of the parameters, denoted by and , are the solution to the population conditional maximum likelihood problem

(2.1)

for every , where denotes the expectation with respect to the distribution of the data conditional on the unobserved effects and initial conditions including strictly exogenous variables, is an arbitrary constant, , and and denote vectors of ones with dimensions and . Existence and uniqueness of the solution to the population problem will be guaranteed by our assumptions in Section 4 below, including concavity of the objective function in all parameters. The second term of is a penalty that imposes a normalization needed to identify in models with scalar individual and time effects that enter additively into the log-likelihood function as .7 In this case, adding a constant to all , while subtracting it from all , does not change . To eliminate this ambiguity, we normalize to satisfy , i.e. . The penalty produces a maximizer of that is automatically normalized. We could equivalently impose as a constraint, but for technical reasons we prefer to work with an unconstrained optimization problem. There are other possible normalizations for , such as . The model parameter is invariant to the choice of normalization, that is, our asymptotic results on the estimator for are independent of this choice of normalization. Our choice is convenient for certain intermediate results that involve the incidental parameter , its score vector and its Hessian matrix. The pre-factor in is just a rescaling.

Other quantities of interest involve averages over the data and unobserved effects

(2.2)

where denotes the expectation with respect to the joint distribution of the data and the unobserved effects, provided that the expectation exists. is indexed by and because the marginal distribution of can be heterogeneous across and/or ; see Section 4.2. These averages include average partial effects (APEs), which are often the ultimate quantities of interest in nonlinear models. The APEs are invariant to the choice of normalization for if and enter as . Some examples of partial effects that satisfy this condition are the following:

Example 1 (Binary response model). If , the th element of , is binary, its partial effect on the conditional probability of is

(2.3)

where is the th element of , and and include all elements of and except for the th element. If is continuous and is differentiable, the partial effect of on the conditional probability of is

(2.4)

where is the derivative of .

Example 2 (Poisson model). If includes and some known transformation with coefficients and , the partial effect of on the conditional expectation of is

(2.5)

2.2 Fixed effects estimators

We estimate the parameters by solving the sample analog of problem (2.1), i.e.

(2.6)

As in the population case, we shall impose conditions guaranteeing that the solution to this maximization problem exists and is unique with probability approaching one as and become large. For computational purposes, we note that the solution to the program (2.6) for is the same as the solution to the program that imposes directly as a constraint in the optimization, and is invariant to the normalization. In our numerical examples we impose either or directly by dropping the first individual or time effect. This constrained program has good computational properties because its objective function is concave and smooth in all the parameters. We have developed the commands probitfe and logitfe in Stata to implement the methods of the paper for probit and logit models [\astronciteCruz-González et al.2015].8 When and are large, e.g., and , we recommend the use of optimization routines that exploit the sparsity of the design matrix of the model to speed up computation such as the package Speedglm in R [\astronciteEnea2012]. For a probit model with and , Speedglm computes the fixed effects estimator in less than 2 minutes with a 2 x 2.66 GHz 6-Core Intel Xeon processor, more than 7.5 times faster than our Stata command probitfe and more than 30 times faster than the R command glm.9

To analyze the statistical properties of the estimator of it is convenient to first concentrate out the nuisance parameter . For given , we define the optimal as

(2.7)

The fixed effects estimators of and are

(2.8)

Estimators of APEs can be formed by plugging-in the estimators of the model parameters in the sample version of (2.2), i.e.

(2.9)

Again, is invariant to the normalization chosen for if and enter as .

3 Incidental parameter problem and bias corrections

In this section we give a heuristic discussion of the main results, leaving the technical details to Section 4. We illustrate the analysis with numerical calculations based on a variation of the classical Neyman and Scott [?]Neyman:1948p881 variance example.

3.1 Incidental parameter problem

Fixed effects estimators in nonlinear models suffer from the incidental parameter problem [\astronciteNeyman and Scott1948]. The source of the problem is that the dimension of the nuisance parameter increases with the sample size under asymptotic approximations where either or pass to infinity. To describe the problem let

(3.1)

The fixed effects estimator is inconsistent under the traditional Neyman and Scott asymptotic sequences where and is fixed, i.e., . Similarly, the fixed effects estimator is inconsistent under asymptotic sequences where and is fixed, i.e., . Note that if is replaced by . Under asymptotic approximations where either or are fixed, there is only a fixed number of observations to estimate some of the components of , for each individual effect or for each time effect, rendering the estimator inconsistent for . The nonlinearity of the model propagates the inconsistency to the estimator of .

A key insight of the large- panel data literature is that the incidental parameter problem becomes an asymptotic bias problem under an asymptotic approximation where and (e.g., Arellano and Hahn, \citeyearArellanoHahn2007). For models with only individual effects, this literature derived the expansion as , for some constant . The fixed effects estimator is consistent because , but has bias in the asymptotic distribution if is not negligible relative to , the order of the standard deviation of the estimator. This asymptotic bias problem, however, is easier to correct than the inconsistency problem that arises under the traditional Neyman and Scott asymptotic approximation. We show that the same insight still applies to models with individual and time effects, but with a different expansion for . We characterize the expansion and develop bias corrections.

3.2 Bias Expansions and Bias Corrections

Some expansions can be used to explain our corrections. For smooth likelihoods and under appropriate regularity conditions, as ,

(3.2)

for some and that we characterize in Theorem 4.1 and explain in Remark 2, where . Unlike in nonlinear models without incidental parameters, the order of the bias is higher than the inverse of the sample size due to the slow rate of convergence of . Note also that by the properties of the maximum likelihood estimator

for some that we also characterize in Theorem 4.1. Under asymptotic sequences where as , the fixed effects estimator is asymptotically biased because

(3.3)

Relative to fixed effects estimators with only individual effects, the presence of time effects introduces additional asymptotic bias through . This asymptotic result predicts that the fixed effects estimator can have significant bias relative to its dispersion. Moreover, confidence intervals constructed around the fixed effects estimator can severely undercover the true value of the parameter even in large samples. We show that these predictions provide a good approximations to the finite sample behavior of the fixed effects estimator through analytical and simulation examples in Sections 3.3 and 5.

The analytical bias correction consists of subtracting estimates of the leading terms of the bias from the fixed effect estimator of . Let and be estimators of and as defined in (4.6). The bias corrected estimator can be formed as

If , , and then

The analytical correction therefore centers the asymptotic distribution at the true value of the parameter, without increasing asymptotic variance. This asymptotic result predicts that in large samples the corrected estimator has small bias relative to dispersion, the correction does not increase dispersion, and the confidence intervals constructed around the corrected estimator have coverage probabilities close to the nominal levels. We show that these predictions provide a good approximations to the behavior of the corrections in Sections 3.3 and 5 even in small panels with and .

We also consider a jackknife bias correction method that does not require explicit estimation of the bias. This method is based on the split panel jackknife (SPJ) of Dhaene and Jochmans [?]DhaeneJochmans2015 applied to the time and cross-section dimension of the panel. Alternative jackknife corrections based on the leave-one-observation-out panel jackknife (PJ) of Hahn and Newey [?]Hahn:2004p882 and combinations of PJ and SPJ are also possible. We do not consider corrections based on PJ because they are theoretically justified by second-order expansions of that are beyond the scope of this paper.

To describe our generalization of the SPJ, define the fixed effects estimator of in the subpanel with cross sectional indexes and time series indexes as

where , , and . Let be the average of the 2 split jackknife estimators in the subpanels with , and or , i.e. including all the individuals and leaving out the first and second halves of the time periods. Let be the average of the 2 split jackknife estimators in the subpanels with , and or , i.e. including all the time periods and leaving out half of the individuals of the panel.10 In choosing the cross sectional indexing of the panel, one might want to take into account individual clustering structures and other dependencies to preserve them in the SPJ. For example, all the individuals belonging to the same cluster should be indexed such that they remain in the same subpanel after the cross sectional split. If there are no cross sectional dependencies, the indexing of the individuals is unrestricted. We recommend to construct as the average of the estimators obtained from all possible partitions of individuals to avoid ambiguity and arbitrariness in the choice of the division.11 The bias corrected estimator is

(3.4)

To give some intuition about how the corrections works, note that

where and Relative to , has double the bias coming from the estimation of the individual effects because it is based on subpanels with half of the time periods, and has double the bias coming from the estimation of the time effects because it is based on subpanels with half of the individuals. The time series split removes the bias term and the cross sectional split removes the bias term

3.3 Illustrative Example

To illustrate how the bias corrections work in finite samples, we consider a simple model where the solution to the population program (3.1) has closed form. This model corresponds to a variation of the classical Neyman and Scott [?]Neyman:1948p881 variance example that includes both individual and time effects, . It is well-know that in this case

where , and Moreover, from the well-known results on the degrees of freedom adjustment of the estimated variance

so that and .12

To form the analytical bias correction we can set and . This yields with

This correction reduces the order of the bias from to and introduces additional higher order terms. The analytical correction increases finite-sample variance because the factor . We compare the biases and standard deviations of the fixed effects estimator and the corrected estimator in a numerical example below.

For the Jackknife correction, straightforward calculations give

The correction therefore reduces the order of the bias from to 13

Table 1 presents numerical results for the bias and standard deviations of the fixed effects and bias corrected estimators in finite samples. We consider panels with and only report the results for since all the expressions are symmetric in and . All the numbers in the table are in percentage of the true parameter value, so we do not need to specify the value of . We find that the analytical and jackknife corrections offer substantial improvements over the fixed effects estimator in terms of bias. The first and fourth row of the table show that the bias of the fixed effects estimator is of the same order of magnitude as the standard deviation, where under independence of over and conditional on the unobserved effects. The fifth row shows the increase in standard deviation due to analytical bias correction is small compared to the bias reduction, where . The last row shows that the jackknife yields less precise estimates than the analytical correction when .

N = 10 N=25 N=50
T = 10 T=10 T=25 T=10 T=25 T=50
-.19 -.14 -.08 -.12 -.06 -.04
-.03 -.02 .00 -.01 -.01 .00
-.01 .00 .00 .00 .00 .00
.13 .08 .05 .06 .04 .03
.14 .09 .06 .06 .04 .03
.17 .10 .06 .07 .04 .03
Notes: obtained by 50,000 simulations with
Table 1: Biases and Standard Deviations for

Table 2 illustrates the effect of the bias on the inference based on the asymptotic distribution. It shows the coverage probabilities of 95% asymptotic confidence intervals for constructed in the usual way as

where and is an estimator of the asymptotic variance . To find the coverage probabilities, we use that and . These probabilities do not depend on the value of because the limits of the intervals are proportional to . For the Jackknife we compute the probabilities numerically by simulation with . As a benchmark of comparison, we also consider confidence intervals constructed from the unbiased estimator . Here we find that the confidence intervals based on the fixed effect estimator display severe undercoverage for all the sample sizes. The confidence intervals based on the corrected estimators have high coverage probabilities, which approach the nominal level as the sample size grows. Moreover, the bias corrected estimators produce confidence intervals with very similar coverage probabilities to the ones from the unbiased estimator.

N = 10 N=25 N=50
T = 10 T=10 T=25 T=10 T=25 T=50
.56 .55 .65 .44 .63 .68
.89 .92 .93 .92 .94 .94
.89 .91 .93 .92 .93 .94
.91 .93 .94 .93 .94 .94
50,000 simulations with
Notes: Nominal coverage probability is .95. obtained by
Table 2: Coverage probabilities for

4 Asymptotic Theory for Bias Corrections

In nonlinear panel data models the population problem (3.1) generally does not have closed form solution, so we need to rely on asymptotic arguments to characterize the terms in the expansion of the bias (3.2) and to justify the validity of the corrections.

4.1 Asymptotic distribution of model parameters

We consider panel models with scalar individual and time effects that enter the likelihood function additively through . In these models the dimension of the incidental parameters is . The leading cases are single index models, where the dependence of the likelihood function on the parameters is through an index . These models cover the probit and Poisson specifications of Examples 1 and 2. The additive structure only applies to the unobserved effects, so we can allow for scale parameters to cover the Tobit and negative binomial models. We focus on these additive models for computational tractability and because we can establish the consistency of the fixed effects estimators under a concavity assumption in the log-likelihood function with respect to all the parameters.

The parametric part of our panel models takes the form

(4.1)

We denote the derivatives of the log-likelihood function by , , , , etc. We drop the arguments and when the derivatives are evaluated at the true parameters and , e.g. . We also drop the dependence on from all the sequences of functions and parameters, e.g. we use for and for .

We make the following assumptions:

Assumption 4.1 (Panel models).

Let and . Let and let be a subset of that contains an -neighbourhood of for all .14

  • Asymptotics: we consider limits of sequences where , , as .

  • Sampling: conditional on , is independent across and, for each , is -mixing with mixing coefficients satisfying as , where

    and for , is the sigma field generated by , and is the sigma field generated by .

  • Model: for , we assume that for all

    The realizations of the parameters and unobserved effects that generate the observed data are denoted by and .

  • Smoothness and moments: We assume that is four times continuously differentiable over a.s. The partial derivatives of with respect to the elements of up to fourth order are bounded in absolute value uniformly over by a function a.s., and is a.s. uniformly bounded over .

  • Concavity: For all is strictly concave over a.s. Furthermore, there exist constants and such that for all , a.s. uniformly over .

Remark 1 (Assumption 4.1).

Assumption 4.1 defines the large- asymptotic framework and is the same as in Hahn and Kuersteiner [?]HahnKuersteiner2011. The relative rate of and exactly balances the order of the bias and variance producing a non-degenerate asymptotic distribution.

Assumption 4.1 does not impose identical distribution nor stationarity over the time series dimension, conditional on the unobserved effects, unlike most of the large- panel literature, e.g., Hahn and Newey [?]Hahn:2004p882 and Hahn and Kuersteiner [?]HahnKuersteiner2011. These assumptions are violated by the presence of the time effects, because they are treated as parameters. The mixing condition is used to bound covariances and moments in the application of laws of large numbers and central limit theorems – it could replaced by other conditions that guarantee the applicability of these results.

Assumption 4.1 is the parametric part of the panel model. We rely on this assumption to guarantee that and have martingale difference properties. Moreover, we use certain Bartlett identities implied by this assumption to simplify some expressions, but those simplifications are not crucial for our results. We provide expressions for the asymptotic bias and variance that do not apply these simplifications in Remark 3 below.

Assumption 4.1 imposes smoothness and moment conditions in the log-likelihood function and its derivatives. These conditions guarantee that the higher-order stochastic expansions of the fixed effect estimator that we use to characterize the asymptotic bias are well-defined, and that the remainder terms of these expansions are bounded.

The most commonly used nonlinear models in applied economics such as logit, probit, ordered probit, Poisson, and Tobit models have smooth log-likelihoods functions that satisfy the concavity condition of Assumption 4.1, provided that all the elements of have cross sectional and time series variation. Assumption 4.1 guarantees that and are the unique solution to the population problem (2.1), that is all the parameters are point identified.

To describe the asymptotic distribution of the fixed effects estimator it is convenient to introduce some additional notation. Let be the expected Hessian matrix of the log-likelihood with respect to the nuisance parameters evaluated at the true parameters, i.e.