Fuzzy Differences-in-DifferencesThis paper is a merged and revised version of deChaisemartin14 and deChaisemartin13. We thank Yannick Guyonvarch for outstanding research assistance, and are very grateful to Esther Duflo for sharing her data with us. We also want to thank the editor, five anonymous referees, Alberto Abadie, Joshua Angrist, Marc Gurgand, Guido Imbens, Rafael Lalive, Thierry Magnac, Blaise Melly, Roland Rathelot, Bernard Salanié, Frank Vella, Fabian Waldinger, Yichong Zhang, and participants at various conferences and seminars for their helpful comments.

Fuzzy Differences-in-Differencesthanks: This paper is a merged and revised version of deChaisemartin14 and deChaisemartin13. We thank Yannick Guyonvarch for outstanding research assistance, and are very grateful to Esther Duflo for sharing her data with us. We also want to thank the editor, five anonymous referees, Alberto Abadie, Joshua Angrist, Marc Gurgand, Guido Imbens, Rafael Lalive, Thierry Magnac, Blaise Melly, Roland Rathelot, Bernard Salanié, Frank Vella, Fabian Waldinger, Yichong Zhang, and participants at various conferences and seminars for their helpful comments.

Clément de Chaisemartin University of California at Santa Barbara, clementdechaisemartin@ucsb.edu    Xavier D’Haultfœuille CREST, xavier.dhaultfoeuille@ensae.fr
July 27, 2019
Abstract

Difference-in-differences (DID) is a method to evaluate the effect of a treatment. In its basic version, a “control group” is untreated at two dates, whereas a “treatment group” becomes fully treated at the second date. However, in many applications of the DID method, the treatment rate only increases more in the treatment group. In such fuzzy designs, a popular estimator of treatment effects is the DID of the outcome divided by the DID of the treatment. We show that this ratio identifies a local average treatment effect only if two homogeneous treatment effect assumptions are satisfied. We then propose two alternative estimands that do not rely on any assumption on treatment effects, and that can be used when the treatment rate does not change over time in the control group. We prove that the corresponding estimators are asymptotically normal. Finally, we use our results to revisit Duflo (2001).

Keywords: differences-in-differences, control group, changes-in-changes, quantile treatment effects, partial identification, returns to education.

JEL Codes: C21, C23

1 Introduction

Difference-in-differences (DID) is a method to evaluate the effect of a treatment in the absence of experimental data. In its basic version, a “control group” is untreated at two dates, whereas a “treatment group” becomes treated at the second date. If the trend on the outcome is the same in both groups, the so-called common trends assumption, one can measure the effect of the treatment by comparing the evolution of the outcome in the two groups.

However, in many applications of the DID method the treatment rate increases more in some groups than in others between the two dates, but there is no group that experiences a sharp change in treatment, and there is also no group that remains fully untreated. In such fuzzy designs, a popular estimator of treatment effects is the DID of the outcome divided by the DID of the treatment, the so-called Wald-DID estimator. Other popular estimation methods are 2SLS or group-level OLS regressions with time and group fixed effects. In deChaisemartin16, we show that these regressions estimate a weighted average of Wald-DIDs across groups. We also show that 10.1% of all papers published by the American Economic Review between 2010 and 2012 estimate either a simple Wald-DID, or the aforementioned IV or OLS regression. Despite its popularity, to our knowledge no paper has studied under which condition the Wald-DID estimand identifies a causal effect in a model with heterogeneous treatment effects.

This paper makes the following contributions. Hereafter, let “switchers” refer to units that become treated at the second date. We start by showing that the Wald-DID estimand identifies the local average treatment effect (LATE) of treatment group switchers only if two treatment effect homogeneity assumptions are satisfied, on top of the usual common trend assumption. First, the LATE of units treated at both dates must not change over time. Second, when the share of treated units changes between the two dates in the control group, the LATEs of treatment and control group switchers must be equal. Then, we propose two alternative estimands of the same LATE. They do not rely on any treatment effect homogeneity assumption, and they can be used when the share of treated units is stable in the control group. The first one, the time-corrected Wald ratio (Wald-TC), relies on common trends assumptions within subgroups of units sharing the same treatment at the first date. The second one, the changes-in-changes Wald ratio (Wald-CIC), generalizes the changes-in-changes (CIC) estimand introduced by Athey06 to fuzzy designs. It relies on the assumption that a control and a treatment group unit with the same outcome and the same treatment at the first date will also have the same outcome at the second date.111Strictly speaking, the assumption underlying the CIC and Wald-CIC estimands is slightly weaker than what we describe. We still find this presentation helpful for pedagogical purposes. Finally, we discuss how researchers can choose between the Wald-DID, Wald-TC, and Wald-CIC estimands.

We then extend these identification results in several important directions. We start by showing how our results can be used in applications with more than two groups. We also show that our results extend to applications with a non-binary treatment variable. Finally, we show that under the same assumptions as those underlying the Wald-TC and Wald-CIC estimands, the LATE of treatment group switchers is partially identified when the share of treated units changes over time in the control group. We also consider estimators of the Wald-DID, Wald-TC, and Wald-CIC, and we derive their limiting distributions.222A Stata package computing the estimators is available on the authors’ webpages.

Finally, we use our results to revisit findings in Duflo01 on returns to education. Years of schooling increased substantially in the control group used by the author, so we show that her estimate heavily relies on the assumption that returns to schooling are the same in her two groups. As we argue in more detail later, this assumption might not be applicable in this context. The bounds we propose do not rely on this assumption, but they are wide and uninformative, here again because schooling increased in the author’s control group. Therefore, we form a new control group where years of schooling did not change. The Wald-DID with our new groups is twice as large as the author’s original estimate. The Wald-TC and Wald-CIC lie in-between the two. The Wald-DID relies on the assumption that returns to schooling do not change over cohorts, which rules out decreasing returns to experience. Because the Wald-TC and Wald-CIC do not rely on this assumption, we choose them as our favorite estimates.

Overall, our paper shows that researchers who use the DID method with fuzzy groups can obtain estimates not resting on any treatment effect homogeneity assumption, provided they can find a control group whose exposure to the treatment does not change over time.

Though we are the first to study fuzzy DID estimators in models with heterogeneous treatment effects, our paper is related to several other papers in the DID literature. Blundell04 and Abadie05 consider a conditional version of the common trends assumption in sharp DID designs, and adjust for covariates using propensity score methods. Our Wald-DID estimator with covariates is related to their estimators. bonhomme2011 consider a linear model allowing for heterogeneous effects of time, and show that in sharp designs it can be identified if the idiosyncratic shocks are independent of the treatment and of the individual effects. Our Wald-CIC estimator builds on Athey06. In work posterior to ours, Dhault13 study the possibly nonlinear effects of a continuous treatment, and propose an estimator related to our Wald-CIC estimator.

The remainder of the paper is organized as follows. Section 2 presents our main identification results in a simple setting with two groups, two periods, and a binary treatment. Section 3 presents identification results in applications with more than two groups, as well as other extensions. Section 4 presents estimation and inference. In section 5 we revisit results from Duflo01. Section 6 concludes. The appendix gathers the main proofs. Due to a concern for brevity, further identification and inference results, two additional empirical applications, and additional proofs are deferred to our supplementary material.

2 Identification

2.1 Framework

We are interested in measuring the effect of a treatment on some outcome. For now, we assume that the treatment is binary. and denote the two potential outcomes of the same unit with and without treatment. The observed outcome is .

Hereafter, we consider a model best suited for repeated cross sections. This model also applies to single cross sections where cohort of birth plays the role of time, as in Duflo01 for instance. The extension to panel data is sketched in Subsection 3.4 and developed in our supplementary material. We assume that the data can be divided into “time periods” represented by a random variable , and into groups represented by a random variable . In this section and in the next, we focus on the simplest possible case where there are only two groups, a “treatment” and a “control” group, and two periods of time. is a dummy for units in the treatment group and is a dummy for the second period.

We now introduce the notation we use throughout the paper. For any random variable , let denote its support. Let also and be two other random variables such that and , where denotes equality in distribution. For instance, it follows from those definitions that , while . For any event or random variable , let and denote the cumulative distribution function (cdf) of and its cdf conditional on .333With a slight abuse of notation, should be understood as when is an event and . Finally, for any increasing function on the real line, we denote by its generalized inverse, . In particular, is the quantile function of the random variable .

Contrary to the standard “sharp” DID setting where , we consider a “fuzzy” setting where . Some units may be treated in the control group or at period , and some units may remain untreated in the treatment group at period . However, we assume that the treatment rate increased more between period 0 and 1 in the treatment than in the control group. Accordingly, we introduce two assumptions we maintain throughout the paper.

Assumption 1

(Treatment participation equation)

, with .

Assumption 2

(First stage)

, and .

Assumption 1 imposes a latent index model for the treatment (see, e.g., Vytlacil02, Vytlacil02), where the threshold depends both on time and group.444This selection equation implies that within each group, units can switch treatment in only one direction. While this assumption greatly simplifies the presentation of our identification results, it is not necessary for them to hold. See our discussion of panel data in Subsection 3.4 for further detail on this point. may be interpreted as a unit’s propensity to be treated. Assumption 1 also imposes that the distribution of is stable within each group. Assumption 2 is just a way to define the treatment and the control group in our fuzzy setting. The treatment group is the one experiencing the larger increase of its treatment rate. If the treatment rate decreases in both groups, one can redefine the treatment variable as . Thus, Assumption 2 only rules out the case where the two groups experience the same evolution of their treatment rates.

We now define our parameters of interest. For that purpose, let us introduce

In repeated cross sections, and denote the treatment status of a unit at period 0 and 1 respectively, and only is observed. In single cross sections where cohort of birth plays the role of time, denotes instead the potential treatment of a unit had she been born at . Here again, only is observed. Then, let . stands for treatment group units going from non treatment to treatment between period 0 and 1, hereafter referred to as the “treatment group switchers”. Our parameters of interest are their Local Average Treatment Effect (LATE) and Local Quantile Treatment Effects (LQTE), which are respectively defined by

We focus on these parameters for two reasons. First, there are instances where treatment group switchers are the only units affected by some policy, implying that they are the relevant subgroup one should consider to assess its effects. Consider for instance a policy whereby in , the treatment group becomes eligible to some treatment for which it was not eligible in (see, e.g., field2007, field2007). In this example, treatment group switchers are all the units in that group treated in . Those units are affected by the policy: without it, they would have remained untreated. Moreover, nobody else is affected by the policy. Second, identifying treatment effects in the whole population would require additional conditions, on top of those we consider below. In the example above, the policy extension does not provide any information on treatment effects in the control group, because this group does not experience any change.

2.2 The Wald differences-in-differences estimand

We first investigate the commonly used strategy of running an IV regression of the outcome on the treatment with time and group as included instruments, and the interaction of the two as the excluded instrument. The estimand arising from this regression is the Wald-DID defined by where for any random variable we let

Let also denote the control group switchers, and let denote their LATE. Finally, let .

We consider the following assumptions, under which we can relate to and .

Assumption 3

(Common trends)

does not depend on .

Assumption 4

(Homogeneous treatment effect over time)

For all ,555When the treatment is binary, Assumption 4 only requires that the equation therein holds for . Writing Assumption 4 this way ensures it carries through to the case of a non-binary treatment.

Assumption 5

(Homogeneous treatment effect between groups)

Assumption 6

(Stable percentage of treated units in the control group)

.

Assumption 3 requires that the mean of follow the same evolution over time in the treatment and control groups. This assumption is not specific to the fuzzy settings we are considering here: DID in sharp settings also rely on this assumption (see, e.g., Abadie05, Abadie05). Assumption 4 requires that in both groups, the average treatment effect among units treated in period is stable over time. This is equivalent to assuming that among these units, the mean of and follow the same evolution over time:

This assumption is specific to fuzzy settings. Assumption 5 requires that in both groups, switchers have the same LATE. This assumption is also specific to fuzzy settings. Finally, Assumption 6 requires that the share of treated units in the control group does not change between period 0 and 1 and is included between 0 and 1. While Assumptions 3 to 5 are not directly testable, Assumption 6 can be assessed from the data.

Theorem 2.1
  1. If Assumptions 1-4 are satisfied, then

  2. If Assumption 5 or 6 further holds, then

When the treatment rate increases in the control group, , so . Therefore, under Assumptions 1-4 the Wald-DID is equal to a weighted difference of the LATEs of treatment and control group switchers in period 1. In both groups, the evolution of the mean outcome between period 0 and 1 is the sum of three things: the change in the mean of for units untreated at ; the change in the mean of for units treated at ; the average effect of the treatment for switchers. Under Assumptions 3 and 4, changes in the mean of and in both groups cancel out. The Wald-DID is finally equal to the weighted difference between the LATEs of treatment and control group switchers. This weighted difference does not satisfy the no sign-reversal property: it may be negative even if the treatment effect is positive for everybody in the population. If one is ready to further assume that Assumption 5 is satisfied, this weighted difference simplifies into .666Under this assumption, the Wald-DID actually identifies the LATE of all switchers, not only of those in the treatment group. There are instances where this LATE measures the effect of the policy under consideration, because treatment and control group switchers are the only units affected by this policy. Consider for instance the case of a policy whereby a new treatment is introduced in both groups in (see enikolopov2011, enikolopov2011). In this example, treatment and control group switchers are all the units treated in . These units are affected by the policy (without it, they would have remained untreated) and nobody else is affected by it.

When the treatment rate diminishes in the control group, , so . Therefore, under Assumptions 1-4 the Wald-DID is equal to a weighted average of the LATEs of treatment and control group switchers in period 1. This quantity satisfies the no sign-reversal property, but it still differs from unless here as well one is ready to further assume that Assumption 5 is satisfied.

When the treatment rate is stable in the control group, so the Wald-DID is equal to under Assumptions 1-4 alone. But even then, the Wald-DID relies on a treatment effect homogeneity assumption: in both groups, the average treatment effect among units treated at should remain stable over time. This assumption is necessary. Under Assumptions 1-3 alone, the Wald-DID is equal to plus a bias term involving several LATEs. Unless this combination of LATEs cancels out exactly, the Wald-DID differs from . We give the formula of the bias term in the end of the proof of Theorem 2.1.

2.3 The time-corrected Wald estimand

In this section, we consider a first alternative estimand of . Instead of relying on Assumptions 3 and 4, it is based on the following condition:

Assumption 3

’ (Conditional common trends)

For all , does not depend on .

Assumption 3’ requires that the mean of (resp. ) follows the same evolution over time among treatment and control group units that were untreated (resp. treated) at .

Let denote the change in the mean outcome between period 0 and 1 for control group units with treatment status . Then, let

stands for “time-corrected Wald”.

Theorem 2.2

If Assumptions 1-2, 3’, and 6 are satisfied, then

Note that

This is almost the Wald ratio in the treatment group with time as the instrument, except that we have instead of in the second term of the numerator. This difference arises because time is not a standard instrument: it can directly affect the outcome. When the treatment rate is stable in the control group, we can identify the trends on and by looking at how the mean outcome of untreated and treated units changes over time in this group. Under Assumption 3’, these trends are the same in the two groups. As a result, we can add these changes to the outcome of untreated and treated units in the treatment group in period , to recover the mean outcome we would have observed in this group in period if switchers had not changed their treatment between the two periods. This is what does. Therefore, the numerator of compares the mean outcome in the treatment group in period 1 to the counterfactual mean we would have observed if switchers had remained untreated. Once normalized, this yields the LATE of treatment group switchers.

2.4 The changes-in-changes estimands

In this section, we consider a second alternative estimand of for continuous outcomes, as well as estimands of the LQTE. They rely on the following condition.

Assumption 7

(Monotonicity and time invariance of unobservables)

, with and strictly increasing in for all . Moreover, .

Assumptions 1 and 7 generalize the CIC model in Athey06 to fuzzy settings. Assumptions 1 and 7 imply . Therefore, they require that at each period, both potential outcomes are strictly increasing functions of a scalar unobserved heterogeneity term whose distribution is stationary over time, as in Athey06. But Assumption 7 also imposes : the distribution of must be stationary within subgroup of units sharing the same treatment status at .

We also impose the assumption below, which is testable in the data.

Assumption 8

(Data restrictions)

  1. for , and is a closed interval of .

  2. is continuous on and strictly increasing on , for .

The first condition requires that the outcome have the same support in each of the eight treatment group period cells. Athey06 make a similar assumption.777Common support conditions might not be satisfied when outcome distributions differ in the treatment and control groups, the very situations where the Wald-CIC estimand we propose below might be more appealing than the Wald-DID or Wald-TC (see Subsection 2.5). Athey06 show that in such instances, quantile treatment effects are still point identified over a large set of quantiles, while the average treatment effect can be bounded. Even though we do not present them here, similar results apply in fuzzy settings. Note that this condition does not restrict the outcome to have bounded support: for instance, is a closed interval of . The second condition requires that the distribution of be continuous with positive density in each of the eight groups periods treatment status cells. With a discrete outcome, Athey06 show that one can bound treatment effects under their assumptions. Similar results apply in fuzzy settings, but for the sake of brevity we do not present them here.

Let be the quantile-quantile transform of from period 0 to 1 in the control group conditional on . This transform maps at rank in period into the corresponding at rank in period 1. Let also

Theorem 2.3

If Assumptions 1-2, 6, and 7-8 are satisfied, then and .

This result combines ideas from Imbens97 and Athey06. We seek to recover the distribution of, say, among switchers in the treatment group period 1 cell. On that purpose, we start from the distribution of among all treated observations of this cell. Those include both switchers and units already treated at . Consequently, we must “withdraw” from this distribution that of units treated at , exactly as in Imbens97. But this last distribution is not observed. To reconstruct it, we adapt the ideas in Athey06 and apply the quantile-quantile transform from period 0 to 1 among treated observations in the control group to the distribution of among treated units in the treatment group in period 0.

Intuitively, the quantile-quantile transform uses a double-matching to reconstruct the unobserved distribution. Consider a treated unit in the treatment group period 0 cell. She is first matched to a treated unit in the control group period 0 cell with same . Those two units are observed at the same period of time and are both treated. Therefore, under Assumption 7 they must have the same . Second, the control period unit is matched to her rank counterpart among treated units of the control group period 1 cell. We denote by the outcome of this last observation. Because , under Assumption 6 those two observations must also have the same . Consequently, , which means that is the outcome that the treatment period 0 cell unit would have obtained in period 1.

Note that

Here again, is almost the standard Wald ratio in the treatment group with as the instrument, except that we have instead of in the second term of the numerator. accounts for the fact that time directly affects the outcome, just as does in the estimand. Under Assumption 3’, the trends affecting the outcome are identified by additive shifts, while under Assumptions 7-8 they are identified by possibly non-linear quantile-quantile transforms.

2.5 Choosing between the Wald-DID, Wald-TC, and Wald-CIC estimands

When the treatment rate is stable in their control group, researchers need to chose between the three estimands we have proposed in this section. In order to do so, they can start by conducting placebo tests. Assume for instance that data is available for period , and that the share of treated units is stable in both groups between and : . Then Assumptions 3 and 4 between and 0 imply that . Similarly, Assumption 3’ (resp. Assumption 7) between and 0 implies that (resp. ).888With a slight abuse of notation, here and are computed between periods -1 and 0. Assumptions 3’ and 7 have further testable implications. For instance, Assumption 3’ implies that for , : common trends between the two groups should hold conditional on each value of the treatment.

On the other hand, when or is different from 0, placebo estimators can no longer be used to test Assumptions 3 and 4, 3’, or 7. Placebos might differ from zero even if those assumptions are satisfied, because of the effect of the treatment.

Sometimes, even after using placebos to discard estimators relying on implausible assumptions, researchers might be left with several, significantly different estimators. This can be due to lack of power. This can also be due to the fact that placebos are tests of Assumptions 3-4, 3’, or 7 for pairs of dates prior to , while our estimands require that these assumptions hold between and 1.

In such instances, inspecting the assumptions underlying each estimand through the lens of economic theory might help researchers choose between the Wald-DID and Wald-TC estimands. In applications where technological or institutional evolutions make it likely that treatment effects change over time, it might be appealing to choose the Wald-TC estimand, so as not to rely on Assumption 4. On the other hand, Assumption 3’ may be more restrictive than Assumption 3, in particular when units self-select themselves into treatment. One might for instance worry that the treatment rate increases in the treatment group because units in this group experience a positive shock on their at . This would imply that Assumption 3’ is violated while Assumption 3 might hold. But in this scenario, Assumption 4 would also be violated, thus implying that both the Wald-DID and the Wald-TC estimators are inconsistent.999Note however that Assumptions 4 and 3’ are not incompatible with Roy models of selection into treatment. For instance, if , , , and , then Assumptions 4 and 3’ are satisfied. On the other hand, if with , then Assumptions 4 and 3’ fail.

On the other hand, the assumptions underlying the Wald-CIC and Wald-TC estimands are substantively close.101010For this reason, one can follow the same steps as those outlined in the previous paragraph to choose between the Wald-DID and Wald-CIC estimands. Therefore, economic theory can provide little guidance as to which estimand one should pick. Here the choice should rather be based on whether the treatment and the control groups have very different outcome distributions conditional on at . Assumption 3’ is not invariant to the scaling of the outcome, but it only restricts its first moment. Assumption 7 is invariant to the scaling of the outcome, but it restricts its entire distribution. When the treatment and the control groups have different outcome distributions conditional on in the first period (see e.g. baten2014, baten2014), the scaling of the outcome might have a large effect on the Wald-TC, so using the Wald-CIC might be preferable. On the other hand, when the two groups have similar outcome distributions conditional on in the first period, using the Wald-TC might be preferable as it only restricts the first moment of the outcome. Another advantage of working under Assumptions 1 and 7 is that this enables the analyst to study distributional effects instead of mean effects only.

Finally, when the treatment rate varies in the control group, the assumptions underlying the Wald-TC and Wald-CIC estimands only lead to partial identification (see Subsection 3.3), and the bounds may not be informative, as is the case in our application below. On the other hand, the Wald-DID estimand can still point identify . This estimand may then be appealing, especially when the treatment rate decreases in the control group, as in such instances it estimates a weighted average of LATEs even if Assumption 5 fails to hold. When the treatment rate increases in the control group, Assumption 5 is necessary to have , and placebo tests are generally uninformative as to the plausibility of this assumption. To see this, assume for instance that a treatment appears in and that some units are treated both in the treatment and in the control group. This corresponds to the situation in enikolopov2011, who study the effect of the introduction of an independent TV channel in Russia on votes for the opposition. In such instances, placebo DIDs comparing the evolution of the mean outcome in the two groups before are tests of Assumption 3, but they are uninformative as to the plausibility of Assumption 5 because nobody was treated before . Therefore, this assumption should be carefully discussed.

3 Extensions

We now consider several extensions. We first consider applications with multiple groups. We then show that our results extend to ordered, non-binary treatments. Next, we show that when the treatment rate is not stable in the control group, and can still be partially identified under our assumptions. Finally we sketch other extensions that are fully developed in the supplementary material.

3.1 Multiple groups

We consider the case where there are more than two groups but only two time periods in the data. The case with multiple groups and time periods is considered in the supplementary material. Let denote the group a unit belongs to. For any , let denote units of group who switch treatment between and 1. Let be the union of all switchers. We can partition the groups depending on whether their treatment rate is stable, increases, or decreases. Specifically, let

and let .

Theorem 3.1 below shows that when there is at least one group in which the treatment rate is stable, our assumptions allow us to point identify , the LATE of all switchers. Before presenting this result, additional notation is needed. For any random variable , , and , let

We also define the following weight:

Theorem 3.1

Assume that Assumption 1 is satisfied, that , and that .

  1. If Assumptions 3 and 4 are satisfied,

  2. If Assumption 3’ is satisfied,

  3. If Assumptions 7 and 8 are satisfied,

This theorem states that with multiple groups and two periods of time, treatment effects for switchers are identified if there is at least one group in which the treatment rate is stable over time. The estimands we propose can then be computed in four steps. First, we form three “supergroups”, by pooling together the groups where treatment increases (), those where it is stable (), and those where it decreases (). While in some applications these three sets of groups are known to the analyst (see e.g. gentzkow2011, gentzkow2011), in other applications they must be estimated (see our application in Section 5). Second, we compute the Wald-DID, Wald-TC, or Wald-CIC estimand with and as the treatment and control groups. Third, we compute the Wald-DID, Wald-TC, or Wald-CIC estimand with and as the treatment and control groups. Finally, we compute a weighted average of those two estimands.

Theorem 3.1 relies on the Assumption that . This requires that the distribution of groups be stable over time. This will automatically be satisfied if the data is a balanced panel and is time invariant. With repeated cross-sections or cohort data, this assumption might fail to hold. However, when is not independent of , it is still possible to form Wald-DID and Wald-TC type of estimands identifying . We give the formulas of these estimands in Subsection LABEL:sub:multiple_periods in the supplementary material.

Three last comments on Theorem 3.1 are in order. First, it contrasts with the current practice in empirical work. When many groups are available, researchers usually include group fixed effects in their regressions, instead of pooling together groups into super control and treatment groups as we advocate here. In deChaisemartin16, we show that such regressions estimate a weighted sum of switchers’ LATEs across groups, with potentially many negative weights and without the aggregation property we obtain here. Second, groups where the treatment rate diminishes can be used as “treatment” groups, just as those where it increases. Indeed, it is easy to show that all the results from the previous section still hold if the treatment rate decreases in the treatment group and is stable in the control group. Finally, when there are more than two groups where the treatment rate is stable, our three sets of assumptions become testable. Under each set of assumptions, using any subset of as the control group should yield the same estimand for .

3.2 Non-binary, ordered treatment

We now consider the case where treatment takes a finite number of ordered values, . To accommodate this extension, Assumption 1 has to be modified as follows.

Assumption 1

’   (Ordered treatment equation)

, with and . As before, let .

Let denote stochastic dominance between two random variables, and let denote equality in distribution. Let also .

Theorem 3.2

Suppose that Assumption 1’ and 2 hold, that , and that .

  1. If Assumptions 3-4 are satisfied,

  2. If Assumption 3’ is satisfied,

  3. If Assumptions 7 and 8 are satisfied,

Theorem 3.2 shows that with an ordered treatment, the estimands we considered in the previous sections are equal to the average causal response (ACR) parameter considered in Angrist95. This parameter is a weighted average, over all values of , of the effect of increasing treatment from to among switchers whose treatment status goes from strictly below to above over time.

For this theorem to hold, two conditions have to be satisfied. First, in the treatment group, the distribution of treatment in period 1 should dominate stochastically the corresponding distribution in period 0. Angrist95 impose a similar stochastic dominance condition. Actually, this assumption is not necessary for our three estimands to identify a weighted sum of treatment effects. If it is not satisfied, one still has that , , or identify

which is a weighted sum of treatment effects with some negative weights.

Second, the distribution of treatment should be stable over time in the control group. When it is not, one can still obtain some identification results. Firstly, Theorem 2.1 generalizes to non-binary and ordered treatments. When treatment increases in the control group, the Wald-DID identifies a weighted difference of the ACRs in the treatment and in the control group; when treatment decreases in the control group, the Wald-DID identifies a weighted average of these two ACRs. The weights are the same as those in Theorem 2.1. Secondly, our partial identification results below also generalize to non-binary and ordered treatments. When the distribution of treatment is not stable over time in the control group, the ACR in the treatment group can be bounded under Assumption 3’, or Assumptions 7-8, as shown in Subsection LABEL:sub:bounds_appli of the supplementary material.

Finally, Theorem 3.2 extends to a continuous treatment. In such instances, one can show that under an appropriate generalization of Assumption 1, the Wald-DID, Wald-TC, and Wald-CIC identify a weighted average of the derivative of potential outcomes with respect to treatment, a parameter that resembles that studied in angrist2000.

3.3 Partial identification with a non stable control group

In this subsection, we come back to our basic set-up with two groups and a binary treatment, and we show that and can still be partially identified when Assumption 6 does not hold. Let us introduce some additional notation. When the outcome is bounded, let and respectively denote the lower and upper bounds of its support. For any real number , let . For any , let be the ratio of the shares of units in group receiving treatment in period 1 and period 0. For instance, when the share of untreated observations increases in the control group between period 0 and 1. Let also

We define the bounds obtained under Assumption 3’ (TC bounds hereafter) as follows:

Next, we define the bounds obtained under Assumptions 7-8 (CIC bounds hereafter). For and any cdf , let and

In the definition of and , we use the convention that for , and for . We then define the CIC bounds on and by:

Finally, we introduce the two following conditions, which ensure that the CIC bounds are well-defined and sharp.

Assumption 9

(Existence of moments)

and for

Assumption 10

(Increasing bounds)

For , is continuously differentiable, with positive derivative on the interior of . Moreover, and are increasing on .

Theorem 3.3

Assume that Assumptions 1-2 are satisfied and . Then:

  1. If Assumption 3’ holds and for , 111111It is not difficult to show that these bounds are sharp. We omit the proof due to a concern for brevity.

  2. If Assumptions 7-9 hold, for , and . These bounds are sharp if Assumption 10 holds.

The reasoning underlying the TC bounds goes as follows. Assume for instance that the treatment rate increases in the control group. Then, the difference between and arises both from the trend on , and from the fact the former expectation is for units treated at and switchers, while the latter is only for units treated at . Therefore, we can no longer identify the trend on among units treated at . But when the outcome has bounded support, this trend can be bounded, because we know the percentage of the control group switchers account for. A similar reasoning can be used to bound the trend on among units untreated at . Eventually, can also be bounded. The smaller the change of the treatment rate in the control group, the tighter the bounds.

The reasoning underlying the CIC bounds goes as follows. When , the second matching described in Subsection 2.4 collapses, because treated (resp. untreated) observations in the control group are no longer comparable in period 0 and 1 as explained in the previous paragraph. Therefore, we cannot match period 0 and period 1 observations on their rank anymore. However, we know the percentage of the control group switchers account for, so we can match period 0 observations to their best- and worst-case rank counterparts in period 1.

If the support of the outcome is unbounded, and are proper cdfs when , but they are defective when . On the contrary, and are always proper cdfs. As a result, when is unbounded and , the CIC bounds we derive for and are finite under Assumption 9. The TC bounds, on the other hand, are always infinite when is unbounded.

3.4 Other extensions

In the supplementary material, we present additional extensions that we discuss briefly here.

Multiple groups and multiple periods

With multiple groups and periods, we show that one can gather groups into “supergroups” for each pair of consecutive dates, depending on whether their treatment increases, is stable, or decreases. Then, a properly weighted sum of the estimands for each pair of dates identifies a weighted average of the LATEs of units switching at any point in time.

Particular fuzzy designs

Up to now, we have considered general fuzzy situations where the are restricted only by Assumption 2. In our supplementary material, we consider two interesting special cases. First, we show that when , identification of the average treatment effect on the treated can be obtained under the same assumptions as those of the standard DID or CIC model. Second, we consider the case where . Such situations arise when a policy is extended to a previously ineligible group, or when a program or a technology previously available in some geographic areas is extended to others (see e.g. field2007, field2007). One can show that Theorem 2.1 still holds in this special case. On the other hand, Theorems 2.2-2.3 do not hold. In such instances, identification must rely on the assumption that and change similarly over time.

Including covariates

We also propose Wald-DID, Wald-TC, and Wald-CIC estimands with covariates. Including covariates in the analysis has two advantages. First, our estimands with covariates rely on conditional versions of our assumptions, which might be more plausible than their unconditional counterparts. Second, there might be instances where but