Fixed-Effect Regressions on Network Data

Koen Jochmans
University of Cambridge
Address: University of Cambridge, Faculty of Economics, Austin Robinson Building, Sidgwick Avenue, Cambridge CB3 9DD, U.K. E-mail:
   Martin Weidner
University College London
Address: University College London, Department of Economics, Gower Street, London WC1E 6BT, U.K., and CeMMAP. E-mail:
We are grateful to Ulrich Müller and three referees for constructive comments. We would also like to thank Nadine Geiger, Bryan Graham, Áureo de Paula, Fabien Postel-Vinay, Valentin Verdier, and, in particular, Jean-Marc Robin for discussion and for comments on earlier drafts of this paper. Valentin Verdier also provided indispensable support that enabled us to include our illustration on the estimation of teacher value-added.
Jochmans gratefully acknowledges financial support from the European Research Council through Starting Grant no 715787. Weidner gratefully acknowledges support from the Economic and Social Research Council through the ESRC Centre for Microdata Methods and Practice grant RES-589-28-0001 and from the European Research Council grant ERC-2014-CoG-646917-ROMIA.
The first version of this paper dates from August 4, 2016. All versions are available at

This paper considers inference on fixed effects in a linear regression model estimated from network data. An important special case of our setup is the two-way regression model. This is a workhorse technique in the analysis of matched data sets, such as employer-employee or student-teacher panel data. We formalize how the structure of the network affects the accuracy with which the fixed effects can be estimated. This allows us to derive sufficient conditions on the network for consistent estimation and asymptotically-valid inference to be possible. Estimation of moments is also considered. We allow for general networks and our setup covers both the dense and sparse case. We provide numerical results for the estimation of teacher value-added models and regressions with occupational dummies.

Keywords: connectivity, fixed effects, graph, Laplacian, limited mobility, teacher value-added, two-way regression model.

JEL classification: C23, C55

1 Introduction

Data on the interaction between agents are in increasing supply. A workhorse technique to analyze such data is a linear regression model with agent-specific parameters. It has been used to investigate a variety of questions. For example, application of a two-way regression model to matched employer-employee data decomposes (log) wages into worker heterogeneity, firm heterogeneity, and residual variation. Following Abowd, Kramarz and Margolis (1999) the correlation between the estimated worker and firm effects is regarded as a measure of assortative matching, i.e., whether high quality workers are employed in more productive firms. Using the same decomposition, Card, Heining and Kline (2013) study to what extent the evolution of wage inequality is due to changes in the variance of worker and firm heterogeneity. Nimczik (2018) reports the whole distribution of the estimated worker and firm effects. In a similar fashion, the literature on student achievement backs out student and teacher effects from test score data. The estimated teacher heterogeneity is interpreted as teacher value-added and their variance as a measure of their importance (see Jackson, Rockoff and Staiger 2014 for an overview of this literature). These estimates are used to assess teachers and are important inputs to personnel evaluations and merit pay programs (Rothstein, 2010).333Fixed-effect regressions of this kind are now part of the standard toolkit of many empiricists in a variety of different areas. Finkelstein, Gentzkow and Williams (2016) and Amiti and Weinstein (2018) use them to separate supply and demand factors in healthcare utilization from data on patient migration, and in firm investment behavior from financial data on banks loans, respectively. Chetty and Hendren (2018) evaluate the importance of growing up in a specific neighborhood on labor market outcomes later on in life.

In spite of their widespread use, there is little to no work on the theoretical properties of such fixed-effect approaches. In fact, the few results that are available point to issues of downward bias in the estimation of the correlation between worker and firm effects, finding a spurious negative correlation in many data sets (Andrews, Gill, Schank and Upward, 2008, 2012), and upward bias in the estimator of the variance of teacher effects (Rockoff, 2004). The presence of bias here is not surprising. Indeed, the individual effects are estimated with noise. Their sampling error then introduces bias in the estimator of nonlinear functionals. A more complicated issue is the assessment of the statistical precision with which the fixed effects are estimated and, more generally, the development of distribution theory. This is important as it allows to establish conditions for consistency and rates of convergence, and yields insight into whether standard test statistics can be expected to be approximately size correct and have non-trivial power. None of these issues has been addressed so far. Such theory is important for inference on the individual effects and their moments themselves but may also serve as a stepping stone to address related problems. For example, without theory for the fixed-effect estimator the behavior of the falsification test for value-added models of Rothstein (2010) remains unknown and correct standard errors for regressions of outcomes on estimated fixed effects (Kettemann, Mueller and Zweimüller, 2017) cannot be derived.

The data structure arising from interactions between agents is different from that of standard cross-section or panel data. It is typically difficult to see how the data carry information about certain parameters. In this paper we present sufficient conditions for consistency and asymptotic normality of least-squares estimators of fixed effects in linear regression models. We see the data as a network and represent it by a graph where agents are vertices and edges between vertices are present if these agents interact. It is intuitive that the structure of this graph should be a key determinant of the accuracy of statistical inference. We formalize this here. Our setup places no a priori restrictions on the graph structure and our results apply to both dense and sparse settings. A data structure of particular importance is that of a bipartite graph, as in our motivating examples, and we treat this in detail. In fact, while we deal with general graphs, our regression setup is designed to capture the main features of the prototypical two-way regression model. We focus on inference on the individual effects. Our results also serve as a stepping stone for the analysis of estimators of other parameters, such as the variance and other moments of (the distribution of) the individual effects, and we provide some results on these as well. We do not discuss inference on common slope coefficients. In all the examples above, these are not parameters of primary interest. In contemporary work Verdier (2017) provides inference results for slope coefficients in two-way regression models. His work is complementary to our own.

The ability to accurately estimate the individual effect of a given vertex depends on how well this vertex is connected to the rest of the network. Our theory involves both global and local measures of network connectivity. The main global connectivity measure we use is the smallest non-zero eigenvalue of the (normalized) Laplacian matrix of the graph.444The Laplacian matrix is similar to the adjacency matrix as a devise to represent a graph and can be obtained from it. Both matrices are formally defined below. Eigenvalues and eigenvectors of these and related matrices have also been found of use in determining equilibrium conditions in games played on networks (Bramoullé, Kranton and D’Amours, 2014) and in (statistical) community detection (Schiebinger, Wainwright and Yu, 2015). It reflects how easy it is to disconnect a network by removing edges from it. The other measures of connectivity that we use are the degrees of the vertices as well as various harmonic means thereof. All of these measures arise naturally when studying the variance of the fixed-effect estimator. We highlight the interplay between them in deriving conditions for consistent estimation and standard inference to be possible. As the network grows the smallest eigenvalue may approach zero, and so the graph may become more sparse, provided the relevant harmonic mean grows sufficiently fast. These findings mimic conditions on the bandwidth in nonparametric estimation problems, although they will typically show up in second-order terms here. Estimation and inference on averages over the individual effects is more demanding on the network structure and, even after bias reduction, may only be feasible in quite dense networks.

Our analysis shows that it is useful to inspect measures of global and local connectivity when interpreting estimation results from network data. We do so here for two data sets. The first is a large network of teachers in elementary schools in North Carolina, where the object of interest would be teacher value-added. This is arguably one of the most important applications of the two-way regression model. This graph is only very weakly connected and our theory does not support the use of large-sample arguments. When a simple model with homoskedastic errors is applied to these data standard errors based on conventional first-order approximations for teacher value-added are, on average, about smaller than the actual standard deviations. Further, the sample variance of the estimated teacher effects has a substantial upward bias. This bias is large relative to its standard deviation and so confidence bands will be overly optimistic on the ability of teacher value-added to explain variation in test scores. To provide an example of a data set that yields a much stronger connected graph we also construct an occupational network from the British Household Panel Survey (BHPS). This graph would arise in wage regressions with occupational dummies, for example. Here, our connectivity measures are much more supportive of standard inferential approaches and, indeed, again in a simple model, we find that the first-order variance approximation is quite accurate.

2 Regression analysis of network data

2.1 Data structure

Consider an undirected graph where edges are placed between vertices. We allow for multiple edges between vertices (i.e., can be a multigraph) and the edges may be assigned a weight. We do not consider loops (i.e., no edge connects a vertex with itself). Without loss of generality we label the vertices by natural numbers, so that is . The multiset contains the unordered pairs from the product set that are connected by an edge, possibly with repetition. The same pair will appear multiple times in if more than one edge exists between them; we let denote the set of edges between them. We have and may have . We will label the edges by natural numbers; so, each edge edge has assigned to it an integer . For later use we note that vertices and are said to be connected if contains a path from to , and that the graph is said to be connected if every pair of vertices in the graph is connected.

For an edge let be its weight. An unweighted graph has for all . The graph may be represented by its (oriented) incidence matrix , with entries

As will become apparent, the analysis to follow is invariant to the choice of orientation on . The graph may also be represented through its adjacency matrix . It has elements

The incidence matrix and adjacency matrix are related through the Laplacian matrix as

where is the (weighted) degree matrix. It has as diagonal entries

the degree of vertex . When the graph is unweighted equals the number of edges that involve vertex . We will let denote the set of direct neighbors of vertex . Observe that may be large even if has few neighbors (i.e, is small), as the edge weights for may be large. One example is a multigraph where many edges exist between a given vertex pair .

2.2 Regression model and least-squares estimator

Now, given a graph , for each edge we observe an outcome and a -vector of covariates . Allowing to be a multigraph covers the (unbalanced) panel data case, where multiple outcomes are available for some vertex pairs. Collect all outcomes in the -vector and all covariates in the matrix . Let be an -vector of vertex-specific parameters and let be a -vector of regression slopes. Our interest lies in estimating the model


where is an -vector of regression errors.555A change of edge orientation corresponds to a sign flip in the corresponding outcome and regressor matrices. This does not affect the least squares estimator. Our focus is on the vector . In the two-way regression model of our motivating examples, and discussed in more detail in Section 2.3, these are the worker and firm effects or the teacher and student effects. We will treat and as fixed throughout. While it may be difficult to maintain in certain applications, exogeneity of the network is the standard assumption in the literature. Inference in the presence of endogenous network formation has started to receive some attention in related work on peer effects (Auerbach 2016) and would be useful to study in the current context in future work. Before moving on we note that our model as it stands is overparameterized; , where is the -vector of ones, as each row of sums up to zero. It follows that the mean of the vertex-specific parameters cannot be learned from the data. We, therefore, impose that


which will prove a convenient choice for our purposes. With we may write the constraint compactly as . A normalization can be dispensed with if interest lies in parameter differences, i.e, , as in Finkelstein, Gentzkow and Williams (2016), for example.

The standard estimator of is the constrained least-squares estimator


where denotes the Euclidean norm and The following theorem gives conditions under which this estimator exists and is unique. We denote the Moore-Penrose pseudoinverse of a matrix by and, for an matrix , let . It is easily shown that and , so is a pseudoinverse of .

Theorem 1 (Existence).

Let be connected, , and . Then

and is unique.

The need for a pseudoinverse arises because is singular, which follows from the fact that . The use of the particular pseudoinverse follows from our normalization . The result of Theorem 1 is intuitive and generalizes results in the literature on matched employer-employee data (Abowd, Creecy and Kramarz 2002).666When is not connected the analysis is typically confined to the largest connected component of . Our results then apply to that subgraph.

While is routinely used its statistical properties are not well understood. Our aim here is to shed light on how the structure of the network affects its sampling behavior and, with it, how reliable standard inferential procedures based on it are. For our analysis edge-specific covariates mostly complicate notation and presentation. It will on occasion be convenient to first analyze (2.1) when is treated as known and the outcome vector is redefined as . Then

is the least-squares estimator of subject to (2.2). To appreciate how the structure of relates to our problem of estimating the parameter suppose first that . Then


So, up to a scale factor, the variance of is completely determined by the Laplacian of . If, in addition, we were to assume that we would be in the classical regression setting and, given unbiasedness of , size-correction inference could be performed for any sample size. It is not clear, however, how one should proceed with non-classical regression errors.

The validity of standard large-sample arguments is not immediate here. Observe that



is the normalized Laplacian. Equation (2.5) follows from the fact that , such that .777We note that our choice of normalization (2.2) guarantees the appearance of the Moore-Penrose pseudoinverse of in . We are grateful to Nadine Geiger for pointing out an inconsistency in our normalization in an earlier version. While (2.5) shows the importance of the sample size in the variance of through the presence of the degree , it does not imply that shrinks as , nor would it give a convergence rate if it did. Indeed, the normalized Laplacian also changes when grows.

Let be the eigenvalues of . Then for all . We always have that , with as eigenvector. If is connected, is the smallest non-zero eigenvalue of the normalized Laplacian. Our theory involves conditions on and on the degree structure of the network through various harmonic means thereof. can be seen as a measure of global connectivity of . To see this we note that it can be linked to the Cheeger constant,

reflects how difficult it is to deconstruct into two large disconnected components by removing edges from it. A larger value of implies a more strongly-connected graph. From Friedland and Nabben (2002),


(see also Chung 1997). Thus, like , is a measure of global connectivity of the graph . Our results below allow for as grows, and so cover situations where the graph becomes increasingly more sparse. We will give explicit rates on for consistent estimation to be possible.

Example 1 (Erdős-Rényi graph).

Consider the Erdős and Rényi (1959) random-graph model, where edges between vertices are formed independently with probability . The threshold on for to be connected is (Hoffman, Kahle and Paquette 2013). That is, if for a constant , then, as , with probability approaching one, is disconnected if and connected if . In the former case, while, in the latter case, , almost surely.

2.3 Two-way regression model on bipartite graph

To relate our model to our main motivating examples consider the case of a bipartite graph , i.e., and , and edges are formed only between the subsets and but not within. So, for an edge we necessarily have that and . A bipartite graph describes the interaction between two types of units, such as workers and firms or students and teachers. The outcome of interest here would typically be (log) wages or earnings and test scores, respectively. If we have panel data, so is a multigraph, we may observe workers match with different firms across time and observe students in different classrooms or across multiple subjects. In fact, in these applications, such longitudinal data are necessary for to be connected. A two-way regression model for such data is


where, with and , and are type-specific parameter vectors and the and matrices and have entries

This is a workhorse specification to capture heterogeneity across units in linked data sets. It can be cast into (2.1) by setting

sorting the units in by type so that we can write , and constructing the matrix by concatenation. Choosing the sign in front of is without loss of generality because links are only formed between, but never within, the subsets and . The need for a normalization built-in in our general specification arises here from the fact that (2.7) is invariant to reparametrizations of the form for any .

The two-way regression model provides an interesting example where a weighted graph arises naturally. In many applications the researcher is primarily interested in learning the parameters of one type, say those . This is so in teacher value-added models, for example. There, interest lies in estimating the teacher effects while controlling for unobserved student-specific heterogeneity through the inclusion of student effects (see, e.g., Jackson, Rockoff and Staiger 2014). Partialling-out the vector from the two-way model in (2.7) gives


From standard partitioned-regression theory, the least-squares estimation of from this equation is numerically identical to the one obtained from joint estimation of and in (2.7). However, the formulation in (2.8) is helpful in understanding the behavior of the estimator of . The properties of the matrix drive the sampling behavior of . This matrix is the Laplacian of a weighted one-mode projection (Newman, 2010, pp. 124) of the bipartite graph on the vertices in .

It is instructive to discuss this one-mode projection in more detail and to formalize how it fits the general setup in (2.1). Projecting the bipartite graph on is done by suppressing the vertices in . This gives a new (unipartite) graph, say . Each edge pair with and in for some and gives rise to a single edge in . In the student-teacher example, two teachers and are connected by an edge in if and only if there exists at least one student that they have both taught. Alternatively, the edge exists because and both connect to the same vertex . Given the edges this connecting vertex is unique; for later use we denote it by . In we have

edges; need not equal and, indeed, may be much larger. We again label by the natural number . The process of concatenating edges in to form the new edge set can be described by the matrix with entries

Choosing the orientation of the rows of is without loss of generality. The matrix has a first-differencing interpretation. Indeed, when applied to the two-way regression model (2.7) we get So, sweeps out the nuisance parameters and transforms original outcomes into first differences. The matrix is the (oriented) incidence matrix of an unweighted graph, and this first-differenced regression equation fits (2.1).888 may contain rows with only zero entries. These correspond to differenced outcomes that do not depend on . Dropping the associated differenced outcomes from restores the incidence matrix interpretation of . This operation does not affect estimation of and so is irrelevant for our purposes. However, the differenced outcomes may still provide information on , which is why we prefer to work with as defined here. Applying least-squares directly to the first differences is inefficient and is not equivalent to estimation of the two-way regression model. Ordinary least-squares estimation of (2.8) is numerically equivalent to weighted least-squares estimation of the first-differenced equation. The relevant diagonal weight matrix has entries Ordinary least-squares applied to (2.8) and to


yields the same result. Here, is the incidence matrix of a weighted one-mode projection of . This determines the properties of the least-squares estimator. Its Laplacian is

where we use the fact that .

The adjacency matrix of is the matrix with entries

Here, is the set of all vertices in that are connected to both and in the original bipartite graph . In the student-teacher example two teachers are connected by an edge if there is at least one student who was taught by both teachers. The weight of the edge is larger the more students there are connecting teachers and , and the more courses they have taken from these teachers. determines the accuracy with which teacher value-added can be estimated.

The matrix is also the adjacency matrix of the simple graph obtained from by replacing all edges by one weighted edge, with weight . Figure 1 provides an illustration of a simple bipartite graph for students (red vertices) and teachers (yellow vertices), given in the left plot, and this induced weighted graph featuring only teachers, given in the right plot. The thickness of the edge between in the latter plot reflects the magnitude of the weight .

Figure 1: A simple unweighted bipartite graph (left) with links between (red vertices) and (yellow vertices), and the induced weighted graph (right) on alone resulting from profiling out the parameters associated with .

The device of a one-mode projection highlights the importance of having movers in panel data. In matched worker-firm data sets workers do not frequently switch employer over the course of the sampling period. This lack of mobility is one cause of the substantial bias that is observed in the correlation coefficient between (estimated) worker and firm effects (Abowd, Kramarz, Lengermann and Perez-Duarte 2004, Andrews, Gill, Schank and Upward 2008, 2012). While this is now well recognized, limited mobility has consequences more broadly. Indeed, it implies that few workers connect firms in the one-mode firm projection. Therefore, the induced graph may be only weakly connected (and will be close to zero) and the variance of the estimator of the firm effects may be large. This is not only detrimental for identifying sorting between workers and firms but, indeed, complicates estimation and inference of the firm effects as well as all their moments, such as their variance. Restricting attention to large firms need not resolve this problem. An analogous argument holds for teacher effects and their estimated variance, and so for our ability to infer the contribution of teacher value-added to observed variation in test scores. We illustrate this on our data below.

3 Variance bound and asymptotic analysis

3.1 Finite-sample bound

To work towards general distribution theory it is instructive to start with a finite-sample bound on the variance of the fixed-effect estimator when the errors are homoskedastic and uncorrelated. Let


This is a (weighted) harmonic mean of the (weighted) degrees of all . Note that, for a given vertex , is increasing in the degree of its direct neighbors.

Theorem 2 (Variance bound).

Let be connected. Suppose that . Then

Theorem 2 states that, for a given degree and global connectivity measure , the upper bound on the variance of is smaller if the direct neighbors of vertex are themselves more strongly connected to other vertices in the network. The theorem provides insight into how the local connectivity structure of the network, around vertex , affects statistical precision.

Example 1 (cont’d).

Consider the Erdős and Rényi (1959) random-graph model with for . Let be a randomly chosen vertex. Then, as , we have, almost surely,


follows from Theorem 2.

Additional calculations for analytically-tractable cases where as the network grows are provided in the supplementary material.

Theorem 2 highlights the importance of as a sufficient condition for the parametric rate to be attainable for estimation of . This result carries over to the model with covariates. Let

where denotes the spectral norm and . Note that is a measure of non-collinearity between the columns of and , with close to zero indicating near-collinearity. Indeed, while measures the total variation in , captures the residual variation in after its linear dependence on has been partialled out. For let be column of , and let and in the following theorem.

Theorem 3 (Variance bound (cont’d.)).

Let be connected. Suppose that , , and . Then

for all .

This result shows that, if is bounded away from zero, introducing covariates only has a higher-order effect on the statistical precision of the fixed-effect estimator. In particular we have


provided that as grows. Furthermore, the parametric rate is achievable even if is not treated as fixed, and becomes less dense as more vertices are added to the network.

3.2 Large-sample analysis

We now discuss asymptotic results under more general conditions on the regression errors. The following theorem provides a first-order representation of . Let .

Theorem 4 (First-order representation).

Let be connected. Assume that and . Suppose that and that . Then

where and are zero-mean random variables that satisfy , and .

We now consider sequences of growing networks such that


These are relatively weak conditions that ensure that the fact that is estimated can be ignored in large samples. Moreover, they imply that

and that . The main implication of the theorem is that, then, under the now familiar condition , as ,

Thus, the fixed-effect estimator of is asymptotically equivalent to an average of rescaled regression errors associated with the edges that involve vertex . This result allows the errors to be heteroskedastic and correlated.

With Theorem 4 in hand the limit distribution of can be deduced under conventional conditions. As an example we do so next for independent but heterogeneously distributed regression errors.

Theorem 5 (Limit distribution for i.n.i.d. errors).

Let the assumptions of Theorem 4 and the conditions in (3.12) hold. Suppose that the regression errors are independent, have bounded fourth-order moments, and variances bounded away from zero, and that the edge weights are bounded away from zero and from infinity. Then

as , provided that .

When the errors are independent and homoskedastic we have and the variance in the theorem reduces to , which agrees with (3.11).

A plug-in estimator of is , where and are the residuals from the least-squares regression. This involves estimation of for all . We have that

as provided that, in addition to the conditions of Theorem 5 holding, we have that is bounded away from zero, where

is a weighted harmonic mean. At the heart of this result lies (a local version of) a global convergence rate on , which is interesting in its own right. More precisely, letting

it is easy to see that

provided that is bounded away from zero.

3.3 Estimation of moments

Suppose that the are sampled from some distribution. One might be interested to learn the variance of this distribution as in, say, Rockoff (2004) or Card, Heining and Kline (2013), or some other moment. The typical estimator is the corresponding sample moment of the estimated effects. Sampling noise in the estimated individual effects will introduce bias in the moment estimator, however. To see this, consider estimation of the variance in a simple model without regressors. The sample variance of the estimated effects in this case is

where is the usual demeaning matrix. When its bias is

which clearly shows how imprecise estimation of contributes to the bias in the variance estimator.

It is difficult to derive an exact expression for the bias for more general functionals. Theorem 5 is instrumental here. Suppose that is of interest. Its plug-in estimator is

Under the conditions of the theorem we can calculate the leading bias in this estimator as

where denotes the second derivative, provided . Simple sufficient conditions on for this bias result to hold are that it is differentiable with and bounded third derivative. So, quite generally, the bias will shrink like . Therefore, for the bias to vanish, we need that the degrees of the individual vertices grow with for an increasing fraction of the vertices.

The bias will typically be non-negligible relative to the standard error. If the functional of interest is the variance, an exact bias correction can be performed (see Andrews, Gill, Schank and Upward 2008 and Kline, Saggio and Sølvsten 2018). For functionals like , a plug-in estimator of the leading-order bias is easily formed and so an adjusted estimator is readily constructed. Its effectiveness as a bias-reduction device will again depend on the connectivity structure of the graph. We postpone a detailed analysis to future work. In a recent contribution Kline, Saggio and Sølvsten (2018) present limit theory for quadratic forms in .

4 Empirical illustrations

4.1 Teacher value-added

We construct a graph connecting teachers as the (weighted) one-mode projection from matched student-teacher data from the North Carolina Education Research Center. The projection of interest is the one discussed in Section 2.3. The full data set includes scores for a standardized test in reading in elementary schools in North Carolina and was used by Verdier (2017) to estimate the effect of class-size reduction on student performance. The analysis conducted here is useful to assess the precision with which teacher value-added can be estimated. The data concern pupils in Grades 4 and 5 of elementary school over the period 2008–2012. The full teacher graph (with a single weighted edge between neighboring teachers, as in Figure 1) has 12,057 vertices and 53,741 edges and is disconnected. The largest connected component involves 41,612 edges between 11,945 teachers and we work with this subgraph. With the projected teacher graph is weakly connected. Its local connectivity is summarized in Table 1. The table contains the mean, standard deviation, and deciles of the relevant degree distributions. Inspection reveals that the degrees are small for virtually all teachers.

mean stdev 1st% 2nd% 3th% 4th% 5th% 6th% 7th% 8th% 9th%
13.87 10.76 3.00 5.50 7.50 9.00 11.00 14.00 17.50 21.50 27.50
7.15 7.13 2.43 3.30 4.01 4.72 5.48 6.36 7.44 9.12 12.56
36.48 58.59 3.03 5.72 10.48 14.76 19.81 26.20 35.67 50.65 83.48
Table 1: Summary statistics for the teacher graph

The weak connectivity suggests that inference on teacher value-added will be difficult. To get a sense of the precision of a first-order asymptotic approach we can look at the ratio

This is the exact variance of to its large-sample approximation in a regression model with homoskedastic and uncorrelated errors. This ratio is free of and can be computed directly from the graph. The left plot of Figure 2 shows the deciles of the distribution of . The asymptotic approximation is revealed to be widely inaccurate. On average, the actual variance is about 2.5 times larger than its approximation. Even the first decile equals . This implies that confidence intervals based on the large-sample arguments in Theorem 4 are overoptimistic. To illustrate this the right plot in Figure 2 gives the distribution of the width of 95% confidence intervals for the using both the exact variance (blue line) and its large-sample approximation (red line) for the case . The former stochastically dominates the latter.

Figure 2: Deciles of the distribution of (left plot) and empirical distributions of the width of confidence bands (right plot). The width is calculated as (blue curve) and (red curve).

The large variability in the estimators of teacher value-added implies a large bias in their estimated variance. We calculate

so the bias in the plug-in estimator of the variance is about one-third of the error variance when . The large-sample approximation to the bias here is proportional to . With this yields a bias of about 18%, roughly half the size of the exact bias.

4.2 Occupational network

Wages regressions on worker and occupational dummies (as in Kambourov and Manovskii 2009 for example) provide an interesting example of a situation where more accurate results can be obtained. We use all 18 available waves from the BHPS (for a total of 132,097 observations) to construct the induced (weighted) occupational network. The Standard Occupational Classification (SOC90) in the BHPS distinguishes (at the three-digit level) between 374 occupations. We again focus on the largest connected component, which contains 365 occupations with 14,825 weighted edges between them. As a measure of global connectivity here we find . Compared to traditional matched employer-employee data our occupational network does not suffer as much from limited mobility. One reason is that the number of occupations is relatively small compared to the number of workers. Another is that workers may switch occupation also if they remain employed by the same firm, for example due to internal promotions. Finally, as we deal with self-reported occupations there is also the possibility of spurious mobility due to misreporting. A look at the distributions summarized in Table 2 reveals that the degrees and harmonic means tend to be larger here than in the teacher graph.

mean stdev 1st% 2nd% 3th% 4th% 5th% 6th% 7th% 8th% 9th%
155.06 268.82 7.35 16.08 27.68 45.68 67.10 92.66 143.99 212.09 402.01
81.50 125.20 14.52 20.88 27.25 35.16 46.23 60.61 80.72 113.67 163.74
213.77 455.93 11.15 22.90 35.29 48.59 66.34 102.82 172.75 296.80 539.07
Table 2: Summary statistics for the occupation graph

The distribution of now places most of its mass in close vicinity of unity. Its mean and standard deviation are and . The median is while the first and ninth decile are and , respectively. This suggests that, here, the large-sample approximation to the variance is a much more accurate reflection of actual estimation uncertainty. Similarly, we may again calculate , which is about 7 times smaller than in the previous example. Further, as , here, the bias approximation is quite accurate.


  • Abowd, Creecy and Kramarz (2002) Abowd, J., R. Creecy, and F. Kramarz (2002). Computing person and firm effects using linked longitudinal employer-employee data. U.S. Census Technical Paper TP-2002-06.
  • Abowd, Kramarz, Lengermann and Perez-Duarte (2004) Abowd, J., F. Kramarz, P. Lengermann, and S. Perez-Duarte (2004). Are good workers employed by good firms? A test of a simple assortative matching model for France and the United States. Mimeo.
  • Abowd, Kramarz and Margolis (1999) Abowd, J. M., F. Kramarz, and D. N. Margolis (1999). High wage workers and high wage firms. Econometrica 67, 251–333.
  • Amiti and Weinstein (2018) Amiti, M. and D. E. Weinstein (2018). How much do idiosyncratic bank shocks affect investment? Evidence from matched bank-firm loan data. Journal of Political Economy 126, 525–587.
  • Andrews, Gill, Schank and Upward (2008) Andrews, M. J., L. Gill, T. Schank, and R. Upward (2008). High wage workers and low wage firms: Negative assortative matching or limited mobility bias. Journal of the Royal Statistical Society, Series A 171, 673–697.
  • Andrews, Gill, Schank and Upward (2012) Andrews, M. J., L. Gill, T. Schank, and R. Upward (2012). High wage workers match with high wage firms: Clear evidence of the effects of limited mobility bias. Economics Letters 117, 824–827.
  • Auerbach (2016) Auerbach, E. (2016). Identification and estimation of models with endogenous network formation. Mimeo.
  • Bramoullé, Kranton and D’Amours (2014) Bramoullé, Y., R. Kranton, and M. D’Amours (2014). Strategic interaction and networks. American Economic Review 104, 898–930.
  • Butler (2016) Butler, S. (2016). Algebraic aspects of the normalized Laplacian. In A. Beveridge, J. R. Griggs, L. Hogben, G. Musiker, and P. Tetali (Eds.), Recent Trends in Combinatorics, pp. 295–315. Springer.
  • Card, Heining and Kline (2013) Card, D., J. Heining, and P. Kline (2013). Workplace heterogeneity and the rise of West German wage inequality. Quarterly Journal of Economics 128, 967–1015.
  • Chetty and Hendren (2018) Chetty, R. and N. Hendren (2018). The impacts of neighborhoods on intergenerational mobility II: County-level estimates. Forthcoming in Quarterly Journal of Economics.
  • Chung (1997) Chung, F. R. K. (1997). Spectral Graph Theory. Volume 92 of CBMS Regional Conference Series in Mathematics, American Mathematical Society.
  • Erdős and Rényi (1959) Erdős, P. and A. Rényi (1959). On random graphs. Publicationes Mathematicae 6, 290–297.
  • Finkelstein, Gentzkow and Williams (2016) Finkelstein, A., M. Gentzkow, and H. Williams (2016). Sources of geographic variation in health care: Evidence from patient migration. Quarterly Journal of Economics 131, 1681–1726.
  • Friedland and Nabben (2002) Friedland, S. and R. Nabben (2002). On Cheeger-type inequalities for weighted graphs. Journal of Graph Theory 41, 1–17.
  • Hoffman, Kahle and Paquette (2013) Hoffman, C., M. Kahle, and E. Paquette (2013). Spectral gaps of random graphs and applications to random topology. Mimeo.
  • Jackson, Rockoff and Staiger (2014) Jackson, C. K., J. E. Rockoff, and D. O. Staiger (2014). Teacher effects and teacher related policies. Annual Review of Economics 6, 801–825.
  • Kambourov and Manovskii (2009) Kambourov, G. and I. Manovskii (2009). Occupational specificity of human capital. International Economic Review 50, 63–115.
  • Kettemann, Mueller and Zweimüller (2017) Kettemann, A., A. I. Mueller, and J. Zweimüller (2017). Wages, workers and vacancy durations: Evidence from linked data. Mimeo.
  • Kline, Saggio and Sølvsten (2018) Kline, P., R. Saggio, and M. Sølvsten (2018). Leave-out estimation of variance components. Mimeo.
  • Newman (2010) Newman, M. E. J. (2010). Networks. Oxford University Press.
  • Nimczik (2018) Nimczik, J. (2018). Job mobility networks and endogenous labor markets. Mimeo.
  • Rockoff (2004) Rockoff, J. E. (2004). The impact of individual teachers on student achievement: Evidence from panel data. American Economic Review 94, 247–252.
  • Rothstein (2010) Rothstein, J. (2010). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics 125, 175–214.
  • Schiebinger, Wainwright and Yu (2015) Schiebinger, G., M. J. Wainwright, and B. Yu (2015). The geometry of kernelized spectral clustering. Annals of Statistics 43, 819–846.
  • Verdier (2017) Verdier, V. (2017). Estimation and inference for linear models with two-way fixed effects and sparsely matched data. Mimeo.






Appendix S.1 Additional illustrations

Recall that our measure of global connectivity of the graph is , the second smallest eigenvalue of the normalized Laplacian matrix. In the following we provide some concrete examples of graphs for which can be explicitly calculated, and we discuss the implications of our variance bound in Theorem 2

Our first example illustrates that, even if with the sample size, we may still have that .

Example S.1 (Hypercube graph).

Consider the -dimensional hypercube, where each of vertices is involved in edges; see the left hand side of Figure S.1. This is an -regular graph — that is, for all — with the total number of edges in the graph equaling . Here,

Thus, is constant in . An application of Theorem 2 yields

From this, we obtain the convergence rate result .

Figure S.1: three-dimensional hypercube (left) and extended hypercube (right).

Theorem 2 allows to establish the convergence rate for the hypercube, but the conditions are too stringent to obtain (3.11). The reason is that does not increase fast enough to ensure that . The following example deals with an extended hypercube and illustrates that, despite , we still have in this case.

Example S.2 (Extended Hypercube graph).

Start with the -dimensional hypercube from the previous example and add edges between all path-two neighbors in ; see the right hand side of Figure S.1 for an example. The resulting graph still has vertices, but now has edges. Here,

so that holds, despite as . Theorem 2 therefore implies (3.11) in this example.

The next example shows that our bound can still be informative if is finite.

Figure S.2: Star graph (left) and Wheel graph (right) for .
Example S.3 (Star graph).

Consider a Star graph around the central vertex , that is, the graph with vertices and edges

see the left hand side of Figure S.2. Here, for any while , and , for . For one finds that the bounds in Theorem 2 imply that , and so

In contrast, for we find and thus, although (3.11) holds, these cannot be estimated consistently as .

The previous example also illustrates that can be large despite having many vertices with small degrees. It is largely due to this property that we prefer to measure global connectivity by and not by the “algebraic connectivity” (the second smallest eigenvalue of ; see, e.g., Chung 1997), which has been studied more extensively.

Our last example shows the effect on the upper bound in Theorem 2 when neighbors themselves are more stron