Dynamical systems theory for causal inference with application to synthetic control methods
Abstract
To estimate treatment effects in panel data, suitable control units need to be selected to generate counterfactual outcomes. To guard against cherrypicking of potential controls, which is an important concern in practice, we leverage results from dynamical systems theory. Specifically, key results on delay embeddings in dynamical systems (Takens, 1981) show that under fairly general assumptions a dynamical system can be reconstructed up to a onetoone mapping from scalar observations of the system. This suggests a quantified measure of strength of the dynamical relationship between any two time series variables. The key idea in this paper is to use this measure to ensure that selected control units are dynamically related to treated units, and thus guard against cherrypicking of controls. We illustrate our approach on the synthetic control methodology of Abadie and Gardeazabal (2003), which generates counterfactuals using a model of treated unit outcomes fitted on outcomes from control units. In this setting, we propose to screen out control units that have a weak dynamical relationship to the single treated unit before the model is fit. In simulated studies, we show that the standard synthetic control methodology can be biased towards any desirable direction by adversarially creating artificial control units, but the bias is largely mitigated if we apply the aforementioned screening. In realworld applications, the proposed approach contributes to more reliable control selection, and thus more robust estimation of treatment effects.
Keywords: Causal inference, treatment effect, synthetic control, dynamical systems theory, convergent cross mapping
Contents
1 Introduction
In causal inference of treatment effects, we typically compare outcomes of units who received the treatment with outcomes from units who didn’t. A key assumption, often made implicitly, is that the relationships of interest are static and largely invariant. For example, in studying the effects of schooling on later earnings, we usually consider potential outcomes , for some student receiving years of schooling. In reality, only one potential outcomes can be observed for every unit, and so causal inference relies on comparisons across units who received varying years of schooling. The interpretation of the results rests upon the assumption that the relationship between years of schooling and earnings is static and unidirectional.
However, in many realworld settings the variables exhibit a dynamic interdependence, sometimes showing positive correlation and sometimes negative. Such ephemeral correlations can be illustrated with a popular dynamical system in Figure 1, which is known as the Lorenz system (Lorenz, 1963). In this context, the trajectory is determined by a system of differential equations, and resembles a butterfly shape indicating varying correlations at different times: in one wing of the shape, variables and appear to be positively correlated, and in the other they are negatively correlated.
Such dynamics present new methodological challenges for causal inference that have not been addressed in the literature. In relation to the abovementioned schooling example, our analysis of schooling effect on earnings could occur on one wing of the dynamical system, where the correlation is, say, positive. However, crucial policy decisions, such as college subsidies, could occur on the other wing where the relationship is reversed. Such discord between analysis and policy making is clearly detrimental to policy effectiveness.
A famous related example from the economic literature is the relationship between unemployment rate and inflation, known as the Phillips curve (Phillips, 1958; Roberts, 1995). A longstanding belief until the 1970s was that this relationship is inverse: lower unemployment leads to a tighter labor market, to increased wages, and eventually to higher inflation. For a long time, data were consistent with such relationship, with significant implications for public policy. The relationship abruptly reversed in the midst of the 1973 Oil Crisis, bringing a peculiar mix of high inflation and high unemployment, known as stagflation. This spectacular reversal led economists to explore the distinction between shortterm and longterm policy effects as a result of information asymmetries (Friedman, 1975), and consider that macroeconomic indicators may form a dynamical relationship, which could sometimes be positive and other times negative. Motivated by the same problem, Lucas (1976) formed a general critique of economic theory, famously known as the “Lucas critique”, by pointing out that policy changes invalidate econometric models established before the change.
Despite its longstanding importance in many scientific fields, dynamical systems theory has not been applied to the aforementioned problems. The main goal of this paper is to leverage key results from dynamical systems theory to guide causal inference in the presence of dynamics. For concreteness, we focus on the synthetic control methodology (Abadie et al., 2010), which has emerged as a popular methodology for treatment effect estimation in comparative case studies, and is presented in the following section.
1.1 Comparative case studies and synthetic controls
Comparative case studies have played an important role in policy evaluation. The key idea is that the counterfactual evolution of aggregate outcomes of treated units is estimated from and compared to the same aggregates of control units. One popular example is the differenceindifferences (DID) method, where the causal effect is obtained by first taking the difference between outcomes after and before intervention, for treated and control units separately, and then again taking the difference between these two differences (Angrist and Pischke, 2008; Card and Krueger, 1994). For identification, DID requires a paralleltrends assumption, which is generally strong in practice.
A significant generalization of DID was introduced as the synthetic control methodology by Abadie and Gardeazabal (2003). In this approach, there is usually one treated unit, and its outcomes are fitted with control units’ outcomes, before intervention. Subsequently, the counterfactual outcomes of the treated unit are predicted by the same fitted model applied on data from control units after the intervention. Under this approach, identification depends crucially on an assumption of model continuity, often left implicit; that is, the assumption that the underlying outcomes model is the same before and after the intervention. Interestingly, identification is guaranteed even when outcomes depend (linearly) on unobserved factors. Advantages of the synthetic control method include simplicity, some level of transparency, and mitigation of extrapolation bias, as in the main method the regression weights of controls are constrained to be nonnegative and sum to one. Due to these merits, the synthetic control methodology has been applied widely in the fields of policy analysis (Abadie et al., 2010; Kreif et al., 2016), criminology (Saunders et al., 2015), politics (Abadie and Gardeazabal, 2003; Abadie et al., 2015), and economics (Billmeier and Nannicini, 2013).
Formally, we consider units, with only one unit being treated. Let be the potential outcome for unit at time in a hypothetical world where the intervention did not occur, where , and ; also let be the corresponding potential outcome when the intervention did occur. Let be a binary indicator of whether unit receives the treatment at time . By convention, unit 1 only receives the treatment. There exists as the time point where the intervention occurs, such that
The observed outcome for each unit at time , denoted by , therefore satisfies:
(1) 
where is the treatment effect for unit at time . Suppose that there exist weights such that and , and , for , then the causal effect of the intervention can be estimated through:
(2) 
The time series defined with the term in Equation (2) is the synthetic control. Because of the constraints put on the weights , namely that they are nonnegative and sum to one, the fitted values of the weights reside on the edges of a polytope, and so many weights are set to 0. Such sparsity in the weight vector corresponds to control selection, where only a few control units are used to model the outcomes of the treated unit.
Theoretically, Abadie et al. (2010) showed that, under standard linear conditions, the synthetic control estimator is asymptotically unbiased as the number of preintervention periods goes to infinity. As such, a key assumption of model continuity is implicitly made for identification, where, for example, the weights are assumed to be timeinvariant. Furthermore, control selection in synthetic controls depends only on the statistical fit between treated and control outcomes in the preintervention period, which opens up the possibility of cherrypicking controls to bias causal inference.
However, the synthetic control methodology does not provide clear guidance on the choice of the set of potential controls, i.e., the units , known as the control pool (or donor pool). Since the criterion for selecting controls from the pool is the statistical fit of preintervention outcomes between control and treated units, the control pool should be carefully constructed so that the aforementioned model continuity assumption is plausible. Otherwise, cherrypicking what control units to include in the pool may bias causal estimation.
In this paper, we address this issue by screening units out from the control pool if they are likely to compromise model continuity. The key idea is that all control units in the pool should be strongly dynamically related with the treated unit. We adopt fundamental results in dynamical systems theory (Strogatz, 2001), where two time series variables are dynamically related if they jointly evolve on the same attractor of a common dynamical system. Foundational results on delay embeddings in dynamical systems (Takens, 1981; Sauer et al., 1991; Pecora et al., 1996; Bradley and Kantz, 2015) show that, under fairly general assumptions, a dynamical system can be reconstructed up to a onetoone mapping from scalar observations of the system. This suggests quantified measures of the dynamical relationship between any two variables, such as the method of convergent cross mapping (Sugihara et al., 2012), by measuring the predictive power of possible reconstructions. The key connection to causal inference with panel data is that when control units are dynamically related to treated units, the aforementioned model continuity assumption is more plausible, which guards against cherrypicking of controls and leads to more robust causal inference.
The rest of the paper proceeds as follows. Section 1.2 starts with a motivating example. Section 2 presents an overview of dynamical systems theory that we leverage in this paper, while Section 2.3 connects this theory to synthetic controls through the motivating example. Section 3 describes the proposed procedure and reports the results of extensive applied and simulated studies. Section 4 discusses practical limitations, and Section 5 concludes with ideas for future work. The supplementary materials contain technical details and theory.
1.2 Motivation: issues with cherrypicking of synthetic controls
As motivation we use the example of California’s tobacco control program in 1989, studied by Abadie et al. (2010). The goal is to estimate the treatment effect of Proposition 99, a largescale tobacco control program passed by electorate majority in 1988 in the State of California, and took effect in 1989 mainly through a sizeable tax hike per cigarette packet. The panel data include annual state percapita cigarette sales from 1970 to 2000 as outcome, along with related predictors, such as state median income and %youth population. The authors constructed a control pool with 38 states as potential controls, after discarding states that adopted similar tobacco control programs during the time period of interest.
As mentioned earlier, the synthetic control methodology proceeds by calculating a weighted combination of control unit outcomes to fit cigarette sales of California, using only pre1989 data. The weights are required to be nonnegative and sum to one to avoid overspecification and extrapolation bias. We follow this approach even though recent approaches have generalized those constraints (Doudchenko and Imbens, 2016). In this application, the weighted combination is: Colorado (0.164), Connecticut (0.069), Montana (0.199), Nevada (0.234), and Utah (0.334), where the numbers in the parentheses are the corresponding weights. The implied model is the following:
(3) 
where denotes packet sales at a particular state and time (a state is denoted by a twoletter code; e.g., CA stands for California). We note that time in the model of Equation (3) is before intervention (), so that all states in the data, including California, are in control for the entire period considered in the model. The idea for causal inference through this approach is that the same model in Equation (3) can be used to estimate the counterfactual outcomes for California, , for , had California not been treated with the tax hike in 1989. In other words, the same model can be used to predict the counterfactual outcomes of a nontreated California, and so implicitly we need a model continuity assumption, as described in the introduction. By comparing the postintervention data from actual California that was treated with the tobacco control program in 1989, and the synthetic control California that hypothetically stayed in control in 1989, we can estimate that percapita cigarette sales reduced by 19 packs on average by Proposition 99, suggesting a positive causal effect. This is illustrated in Figure 2, where the dashed line is synthetic control California estimated from postintervention data and the model in Equation (3), and the solid line corresponds to actual California.
One key problem with the above method that we try to fix in this paper is the choice of control units in the model of Equation (3). Currently, this choice relies mostly on the subjectmatter expert, which leaves open opportunities for cherrypicking in constructing the control pool. To illustrate this problem we can perform the following manipulation of the synthetic control method. First, we add 9 unemploymentrelated time series^{1}^{1}1Data were collected from the Local Area Unemployment Statistics (LAUS) program of the Bureau of Labor Statistics (Bureau of Labor Statistics, 2018). See Appendix A for details., namely , into the pool of potential controls. Second, before adding these units to the control pool we transform the time series as follows:
(4) 
where . This transformation is done to ensure that the average of the new time series is similar to the time series on cigarette consumption before treatment. Since the synthetic control method only relies on statistical fit, it may pick up the artificial time series from the new control pool. Indeed, the new synthetic California is now described by the following model:
(5) 
where represents one artificial time series selected. This produces a synthetic control California that is drastically different than the original one in Figure 2. The weight on the artificial control unit is in fact the highest compared to the weights of all other units, which is clearly undesirable. More importantly, with the new synthetic control California, we estimate a negative causal effect of 8 packs on average from the intervention, as shown in Figure 3. This example suggests that the synthetic control method can be easily manipulated to select control units that are unrelated to the problem, and thus give unreliable causal estimates.
1.3 Overview of contribution
To guard against such cherrypicking of the control pool, we aim at developing a practical and largely automated method to screen out control units that have a weak relationship to the treated unit. The notion of relatedness that we use in this paper is borrowed from dynamical systems theory (Pecora et al., 1996), as we view the collection of time series in the panel data as a dynamical system. Seminal theoretical results from embedding theory (Takens, 1981; Sauer et al., 1991; Bradley and Kantz, 2015) suggest that a dynamical system can be fully reconstructed, under fairly general assumptions, by taking time delays of any constituent time series in the system. Interestingly, the reconstruction can be done through any scalar summary of the system that satisfies certain mild smoothness conditions. Such reconstruction is essentially a diffeomorphism, which maps the reconstructed dynamical system to the original one in a onetoone way. However, certain reconstructions look similar to the original system, while others are quite distorted (Gibson et al., 1992; Kostelich and Schreiber, 1993; Eftekhari et al., 2016). This variation can be used to score each constituent time series variable in terms of how similar the reconstructed system is to the original one. These scores could then be used for various purposes, such as forecasting (Deyle et al., 2013; Ye et al., 2015), or to understand which variables are driving the system, and which ones are being driven (Sugihara et al., 2012).
In this paper, we leverage the same tools and intuitions and use them in the context of synthetic controls. While we do not assign any causal interpretation to the aforementioned delayembedding reconstructions, we utilize them to screen out controls that are not dynamically related to the treated unit. The strength of the dynamical relationship between a control unit and the treated unit can in practice be quantified simply through the correlation between the actual treated unit’s outcome variable with the variable (of the control unit) used for the reconstruction. In fact, this procedure can be reversed, so that the reconstruction is based on the treated unit’s variable as well. This way, we can also quantify how strongly the control unit and the treated unit are coupled in the system. By considering all pairs of controltreated units we can filter out those controls that have the weakest relationship with the treated unit, and remove them from the control pool. The filtered control pool can then be passed forward to the standard synthetic control method. The idea is that the new pool will consist of variables that are more strongly coupled with each other, so that the model continuity assumption required in synthetic controls is more plausible.
To illustrate this approach and its ability to guard against cheerypicking of potential controls, we apply it to the earlier example involving the artificial time series in the tobacco control program. Through the prescreening step described earlier, and which we detail in the following sections, we can actually filter out the artificial time series and remove it from the control pool (see Appendix A for details). With this new pool, the new synthetic control California is described by the following model:
(6) 
Note that the new pool gives new constraints for optimization, and so the new model is different from the one without artificial units. The model in Equation (1.3) corresponds to the synthetic control California shown in Figure 4, which is virtually identical to the original one presented in Figure 2. Importantly, with the new synthetic California we have restored a positive causal effect (18 packs on average), since the artificial controls that caused the sign reversal have been filtered out.
More generally, our approach shows that theory and properties of dynamical systems can naturally be integrated into statistical frameworks of causal inference, a goal that so far has remained elusive (Rosser, 1999; Durlauf, 2005). In relation to the Lucas Critique, we argue that our approach bridges the gap between reducedform approaches, such as synthetic controls, and structural approaches, which use microeconomic models to do causal inference. Intuitively, taking dynamics into account captures some of the underlying structural information in the system. From this perspective, our paper introduces, for the first time, fundamental results in dynamical systems theory, namely timedelay embeddings, into methods of statistical causal inference with panel data.
2 Dynamical systems theory and proposed method
2.1 Overview of convergent cross mapping (CCM)
Following the notation in Section 1.1, we consider the time series in synthetic controls, namely , as a dynamical system. The state of the system at time is the vector of values from every time series. Taken across all possible , this defines a manifold, , known as the phase space of the system, defined formally as , where denotes the length of time series, and the number of controls . For example, when and there are two units in total, is a curve (possibly selfintersecting) on the plane.
There may be different transformations of the phase space, known as embeddings, which may preserve or not the topological properties of the original manifold. A seminal result in embedding theory is Takens’ theorem (Takens, 1981), which shows that the phase space of a dynamical system can be reconstructed by taking timedelayed observations from the system. For example, we can consider a delaycoordinate embedding of the form , where is the time delay. The key theoretical result of Takens (1981) is that manifold defined from outcomes is diffeomorphic (i.e., mapping is differentiable, invertible, and smooth) to the original manifold , meaning that some important topological information is preserved, such as invariance to coordinate changes. In other words, is a reconstruction of . It follows that reconstructions , for various , are diffeomorphic to each other, including the original manifold , which implies crosspredictability. For two different reconstructions and , with their corresponding base time series and , we could use to predict and use to predict . By measuring this crosspredictability, the relative strength of dynamical relationship between any two variables in the system can be obtained (Schiff et al., 1996; Arnhold et al., 1999).
One method utilizing this idea is convergent cross mapping (CCM), which was recently proposed by Sugihara et al. (2012). In addition to the aforementioned ideas of crosspredictability, CCM also relies on an implication of Takens’ theorem, whereby neighboring points in the reconstructed manifold are close to neighboring points in the original manifold. This suggests that cross predictability will increase and stabilize as data size increases. The cross predictability is quantified by a CCM score, which could simply be the Pearson correlation coefficient between the observed values and the predicted values obtained from a reconstruction. Operationally, the generic CCM algorithm considers two time series and and their corresponding delaycoordinate embedding vectors at time , namely
(7) 
where is the embedding dimension and and is the delay—we will consider these quantities to be fixed, and in Section 4 we point to references of recent work on tuning those parameters. The manifold based on the phase space of is denoted by , and the manifold based on is denoted by . The idea is that these manifolds are diffeomorphic to the original manifold comprised of both and that defines the dynamical system of and . A manifold from the delay embedding of one variable can be used to predict the other variable, and the quality of this prediction is an indication of which variable drives the other.
Such prediction proceeds in discrete steps as follows. First, we build a nonparametric model of using the reconstruction manifold based on . For a given time point we pick nearest neighbors from in , where is called the library size, and denote their time indices (from closest to farthest) as . A linear model for is as follows:
(8) 
where is the weight based on the Euclidean distance between and its th nearest neighbor on , that is, and , and the norm operator is the usual L2 norm. The correlation between and across is one CCM score of on :
(9) 
Intuitively, the correlation in Equation (9) captures how much information there is in about . If dynamically drives we expect such information to be large. Conversely, the value of is obtained by repeating the above procedure using manifold of the values from the delay embedding of , and quantifies the information in about . The two CCM scores jointly quantify the dynamic coupling between the two variables, and, by Takens’ theorem, the two scores will converge to a number as library size increases (Sugihara et al., 2012). Another way to define the CCM score is through the mean absolute error (MAE) between true and estimated values. This is sometimes preferred for analytical convenience, and we will use it in this paper. In this case, we define
(10) 
where lower CCM scores are now better in terms of predictability. As before, convergence is implied by the Takens’ theorem, but in deterministic, possibly chaotic, systems.
From a statistical perspective, the CCM method is a form of nonparametric timeseries estimation (Härdle et al., 1997). The unique feature of CCM, and more generally of delay embedding methods, is that the nonparametric components are in fact the time indices in Equation (8). This differs from kernel smoothing (Hastie et al., 2001), where the target point is fitted by neighboring observations to smooth estimation. As mentioned earlier, the theoretical justification for CCM lies on Takens’ theorem, which implies that there exists a onetoone mapping such that the nearest neighbors of identify the corresponding time indices of nearest neighbors of , if and are dynamically related. As the library size, , increases, the reconstruction manifolds and become denser and the distances between the nearest neighbors shrink; see (Sugihara and May, 1990; Casdagli et al., 1991; Eckmann and Ruelle, 1985; Sauer et al., 1991) for details.
To illustrate these ideas, in the following section we perform a theoretical analysis of CCM in the context of the familiar autoregressive model in statistics.
2.2 Illustration of CCM on autoregressive model
In this section, we illustrate the delay embedding method of CCM through a simple autoregression model. This is a standalone contribution of this paper because, to the best of our knowledge, there is no theoretical analysis of delayembedding methods, such as CCM, in the context of known statistical or econometric models of treatment effects.
Consider the following autoregressive model:
(11) 
where , are fixed, with , is the drift, and are zeromean and constantvariance normal errors. In the joint dynamical system of and , it is evident that drives since evolves independently of , whereas the evolution of depends on . We are interested in knowing how CCM quantifies this asymmetric dynamic relationship between and . To do so, we will derive and study the CCM estimates between variables and , namely and .
Theorem 1.
Let and , where is the delayed coordinate embedding of , as defined in Equation (7), and are the nearest neighbors to . The CCM score of on based on MAE metric is equal to:
(12) 
where , and
Similarly, let and , where is the delayed coordinate embedding of , as defined in Equation (7), and are the nearest neighbors to . The CCM score of on based on MAE metric is equal to:
(13) 
where
Remarks. From Theorem 1, we observe that the CCM scores have: (1) one component that involves the distance between the starting point, , and the stationary mean, , of series ; (2) one component that involves the noise terms ; (3) and only for one term that includes the noise terms . Since it follows , and the first term diminishes at for both CCM scores. Here, we give several highlevel insights of CCM analysis on the dynamical relationship between and (more detailed analysis with experiments are given in Appendix B):

When the dependence of on is entirely lost, and so and evolve independently implying that there is no driving factor in the system. CCM analysis confirms this observation, since and , with the two scores being independent in the limit.

When is small, i.e., , it means that is weakly dependent on . The error term will dominate in , meaning that on average . CCM analysis indicates correctly that the driving factor in the system is and not . When is very large, the coupling between and becomes stronger. This situation is similar to when is large or is small, which are analyzed below.

When , it follows that . In this setting, is essentially deterministic, converging fast to , whereas behaves increasingly as a random walk. Thus, contains no information about . The CCM analysis confirms this result, since it implies, in the limit, that , whereas .

The settings where is large or when are symmetrical, and thus yield similar results. In these settings, and follow each other in lockstep, and the main difference between and is the scaling factor . Since , , indicating that drives . In fact, we would have equality in the limit if depended on instead of .
In summary, the theoretical results of CCM analysis in Theorem 1 elegantly summarize the various aspects of the dynamical relationship between and in the autoregressive model of Equation (2.2). More details on the performance of CCM in different contexts, such as forecasting in ecology, can be found in Sugihara et al. (2012); Ye et al. (2015).
2.3 Using CCM to improve synthetic controls
In this section, we return to synthetic controls, and apply CCM to quantify the strength of the dynamical relationship between the treated unit and the control units in the pool. As mentioned before, the idea will be to screen out control units that have a weak dynamic relationship with the treated unit.
We begin by presenting the CCM scores based on the correlation measure of Equation (8), where treated California (CA) is crossmapped with the five control states selected by the standard synthetic control method, namely, Colorado (CO), Connecticut (CT), Montana (MT), Nevada (NV), and Utah (UT). We use percapita cigarette sales time series from preintervention period as the outcome variable, which gives 19 data points for each series. The cross predictability measured by the CCM scores for each state pair is shown in Figure 5. For example, the figure shows curves for both and for the CaliforniaColorado pair.
We see that the cross predictability for all pairs roughly converge to some level as library size, , increases (xaxis in the figure). As mentioned earlier, this is predicted by Takens’ theorem. Furthermore, most pairs converge to the same high level of CCM score, and for every pair there is a strong and bidirectional dynamical relationship between the states in the pair. The only obvious exception is the pair between California and Connecticut, where we see an asymmetry between the two curves, indicating a weak dynamical relationship between them. In particular, we see that Connecticut is better predicted from California than the other way round, indicating that from a dynamical systems point of view Connecticut drives California more than the other way around, in the context of tobacco cigarette sales. For this reason, we argue (and, indeed, this is a key intuition in our proposed methodology) that Connecticut is not a suitable control for California and should be removed from the donor pool. Below, we aim at giving some qualitative explanations of the underlying reasons for this proposal.
While Connecticut and California are similar in some important aspects that are relevant to progressive legislation (e.g., both have high median household income), a closer look into the data and history of tobacco production provides additional support for why Connecticut should be removed from the donor pool. Figure 6 compares the percapita cigarette sales trends for California and Connecticut. We see that the trends for Connecticut and California are distinct, especially between 19701980. This explains why CCM implies no strong dynamic coupling between the two states. Indeed, if we apply an averaging transformation to smooth the 19701980 trend of Connecticut (rightmost plot in Figure 6, named “Smooth Connecticut”) CCM estimates a strong dynamic coupling between the two states (bottomright plot in Figure 5). In the context of tobacco legislation, Connecticut differs crucially from California because Connecticut has been a large tobaccoproducing state. In particular, up until recently Connecticut was cultivating shade tobacco in River valley (colloquially known as “Tobacco Valley”), which is famous for being used as binder and wrapper for premium cigars. At its peak, Connecticut was one of the largest tobacco producer states in the US (totaling more than 20,000 acres of tobacco cultivation land), which distinguishes the state from California and the rest of control states in the control unit pool.
3 Applications
In this section, we describe our proposed procedure, and then apply it to two case studies. Our main procedure filters out control units that are not dynamically coupled with the treated unit, and then applies the standard synthetic control methodology. We refer to our procedure as CCM+SCM, and the generic synthetic control method as SCM. Specifically, CCM+SCM filters out a control unit if in the two CCM correlation plots with the treated unit, either the maximum correlation across library sizes is less than 0.5 in both directions, or the correlation gap is greater than 0.2 at the maximum library size. For example, in Figure 5 the CCM plots for ConnecticutCalifornia have a gap that is larger than 0.2 at the maximum library size (), and so we decide to filter out Connecticut. These cutoffs are necessarily adhoc, since there is currently no theory for a more dataadaptive selection. We chose the aforementioned numerical values based on our extensive applied studies with CCM. Even though a more principled cutoff selection would clearly be desirable, choosing appropriate cutoffs has not been a critical issue in the applications we considered. For instance, in Figure 5 it is evident that the plots for ConnecticutCalifornia are distinct from all others, and thus Connecticut stands out as a state with weak dynamical relationship with California.
To examine the screening power of CCM+SCM, we design simulated studies where artificial control units are added in the pool to mislead the synthetic control selection procedure.
3.1 California’s tobacco control program
In the first application, we consider the California’s tobacco control program studied by Abadie et al. (2010), using the annual statelevel percapita cigarette sales panel data from 1970 to 2000 available in their work. The program was passed in November 1988 and took effect in January 1989, which gives 19 years of preintervention data. There are 38 states as controls, California is the treated state. Four predictors of the outcome variable are used, including per capita state personal income (logged), percentage of the population aged 15–24, average retail price of cigarettes, and per capita beer consumption.
We design the following simulated study. Artificial control unit is created as a hypothetical state at time , and then added in the donor pool. Then, we perform the standard synthetic control analysis, and check whether CCM+SCM or SCM select the artificial units to construct synthetic California. The idea is that a method that better captures the dynamic relationship between treated units and controls, does not select artificial units as controls. More specifically, we create the artificial units from noisy copies of various realworld time series used as templates. We choose templates from three data sources:

Unemployment: We use statelevel unemployment data, available from 1976 to 2016, giving 41 years of data for 51 states ^{2}^{2}2Data collected from the Local Area Unemployment Statistics (LAUS) program of the Bureau of Labor Statistics (BLS) (Bureau of Labor Statistics, 2018) https://www.bls.gov/lau/staadata.txt. We define artificial units as where is a scalar, and is unemployment for state at time .

Income: We use data on wealth share of top 0.1 in the United States from 1953 to 2012, every other year (Saez and Zucman, 2016). We define artificial variables as where is a linear transformation function, is income data for state at time , is a scalar, and is Gaussian noise.

Downshift: We use generic time series data with downshift trend, obtained from a popular machine learning data repository ^{3}^{3}3http://archive.ics.uci.edu/ml/datasets/synthetic+control+chart+time+series (Lichman, 2013). We define the artificial time series as where is downshift data for state at time .
For each template, we generate multiple sets of artificial control units to run simulations and obtain the selection results. Each set includes 39 artificial control units that is the same number of units as in the original study. We provide full details on how to create the aforementioned artificial units in Appendix A. We plot the number of selected artificial states (nstates) and the corresponding average treatment effect (ATE) for models SCM and CCM+SCM with box plots. The results are shown in Figures 9, 9, and 9, respectively.
We can see that CCM+SCM selects much fewer artificial states than SCM, especially illustrated in Figure 9 and Figure 9, where CCM+SCM rejects virtually all artificial units, while SCM selects a large proportion of them. In addition, CCM+SCM is observed to generate more stable and concentrated average treatment effect estimation across experiments. Specifically, the ATE reported by CCM+SCM remains stable between 18 to 23 regardless of the adversarial situations, which is close to the original ATE (roughly equal to 19 packets) estimated by SCM without artificial controls, while the ATE estimate from SCM is more varied in a range of values from 5 to 23.
3.2 Brexit vote
The second application we consider is about estimating the economic costs of the Brexit vote in June 2016, and was motivated by the technical report by Born et al. (2017), where the authors used the standard synthetic control methodology to argue that UK’s economy was hurt after the Brexit vote. We picked 30 OECDmember countries as potential controls, and UK is the treated unit. We collected quarterly real GDP data of these countries from the OECD Economic Outlook database (June 2017) from 1995Q1 to 2018Q4^{4}^{4}4https://stats.oecd.org/index.aspx?DataSetCode=EO, where data from 2017Q4 till 2018Q4 are forecasts. The length of the quarterly real GDP data is 96, and the first 86 data points are before Brexit vote. It is assumed that the treatment took form after 2016Q2, and the countries in the donor pool are not affected by the treatment. We also collected predictors of outcome variable such as private consumption, investment, inflation rate, interest rate, and exchange rate. As before, To generate artificial control units, we use time series data that are unrelated to the problem. Specifically, we use the time series of monthly average daily calls to directory assistance from Jan 1962 to Dec 1976 collected from the Time Series Data Library ^{5}^{5}5https://datamarket.com/data/set/22yq, which gives 106 data points to construct 11 templates. Each template generates one set of artificial control units, which includes 31 artificial control units that is the same number of units in the original study. The full details on predictors in this application, and on data preprocessing are in Appendix A.
The simulation results are summarized in Figure 10. As before, we see that CCMSCM selects much fewer artificial units, and generates more stable treatment effect estimates than SCM. Specifically, SCM selects on average more than twice number of artificial control units than CCM+SCM. CCM+SCM filters our all artificial control units from the control pool. Importantly, the ATE estimate from SCM can be negative under appropriate construction of the artificial control time series. This means that the effects of Brexit vote may have been overstated in ongoing econometric work that uses synthetic control methods, as the estimates are likely to be sensitive to control pool construction.
4 Discussion
Due to the success of CCM in quantifying dynamical relationships (Sugihara et al., 2012; Deyle et al., 2013), it may be tempting to consider it as a method for causal inference. We recommend against this idea. To illustrate why, we apply CCM on causal relationship detection tasks from the benchmark dataset CauseEffectPairs (Mooij et al., 2016), which contains time series pairs that are known a priori to be causal or not; for instance, one such pair is altitudetemperature.
Two cases where CCM fails to detect the true direction of causality are shown in Figure 11. Pair 67 in the benchmark data is the financial time series about stock returns from two companies in which one stock is believed to depend on the other. We can see that the CCM score fails to visually converge as library size increases. By inspecting the data, we noticed that the univariate predictability of each time series using delayed embeddings of itself is very low for both variables. This indicates that the time series is close to a random walk. Since CCM theory mainly applies on dynamical systems that are primarily deterministic or chaotic, it does not appear to be reliable in systems dominated by noise.
Another example is Pair 69 in the data of indoor and outdoor temperature. In this example, the ground truth is that outdoor temperature variable drives the indoor temperature variable , indicating that the green curve in the right subplot of Figure 11 should converge faster than the red curve. However, CCM gives the opposite causal direction result with relative low crosspredictability (less than 0.5). A possible explanation for this might be that temperature is periodic. Chang et al. (2017) have suggested that strong periodicities could undermine the effectiveness of CCM, and have proposed ways to fix it.
Another practical aspect of CCM is that hyperparameters, such as the embedding dimension and time delay , should be carefully chosen. To illustrate this, we consider Pair 68 in the data of internet connections and traffic, where is bytes sent and is number of http connections. Figure 12 shows CCM results for this pair with simple data transformations and varying embedding dimension .
Although CCM uncovers the correct causal detection with the original data and embedding dimension 3, the result is not strong enough. Moreover, CCM detects a wrong causal direction when the embedding dimension is set to 4. But with a transform on this data, the results are much stronger and clearer. Optimality of embedding methods and parameter tuning is a currently active research area (Rosenstein et al., 1994; Small and Tse, 2004; Garland and Bradley, 2015).
5 Concluding remarks
In this paper, we leveraged results from dynamical systems theory, such as the celebrated Takens’ theorem (Takens, 1981), to quantify the strength of dynamic relationship between treated and control units. We showed that this is useful in the context of comparative cases studies in panel data to guard against cherrypicking of potential controls, which is an important concern in practice. In simulated studies, we showed that standard methods, such as synthetic controls, can be biased towards any desirable direction through artificial control units. We proposed a simple method to screen out control units that are weakly related to treated units, and through extensive simulations showed that the aforementioned bias is largely mitigated.
Our work opens up the potential for an interplay between dynamical systems theory and causal inference, which may be fruitful in the context of complex timeevolving systems. Especially in econometrics, treatments are typically applied on complex dynamical systems, such as an auction or a labor market, which always evolve, before and after treatment. On a highlevel, our work could be viewed as a first step towards integrating methods from nonlinear timeseries analysis in treatment effect estimation to control for the strength of dynamical relationships between treated and control units, in addition to controlling only for their static crosssectional covariates, which is more standard.
Future work could focus more on theoretical connections between embedding methods, such as CCM, and standard treatment effects in econometrics, especially if we view the filtering process described in Section 2.3 as a way to do matching. Furthermore, CCM provides a viable way to connect structural and reducedform approaches. It would be interesting to know, for example, under what conditions reducedform methods could use CCM to control for hidden economic structure.
References
 Abadie et al. (2010) Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic control methods for comparative case studies: Estimating the effect of californiaâs tobacco control program. Journal of the American Statistical Association, 105(490):493–505.
 Abadie et al. (2015) Abadie, A., Diamond, A., and Hainmueller, J. (2015). Comparative politics and the synthetic control method. American Journal of Political Science, 59(2):495–510.
 Abadie and Gardeazabal (2003) Abadie, A. and Gardeazabal, J. (2003). The economic costs of conflict: A case study of the basque country. American Economic Review, 93(1):113–132.
 Angrist and Pischke (2008) Angrist, J. D. and Pischke, J.S. (2008). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton University Press.
 Arnhold et al. (1999) Arnhold, J., Grassberger, P., Lehnertz, K., and Elger, C. E. (1999). A robust method for detecting interdependences: Application to intracranially recorded eeg. Phys. D, 134(4):419–430.
 Billmeier and Nannicini (2013) Billmeier, A. and Nannicini, T. (2013). Assessing economic liberalization episodes: A synthetic control approach. The Review of Economics and Statistics, 95(3):983–1001.
 Born et al. (2017) Born, B., MÃ¼ller, G. J., Schularick, M., and Sedlacek, P. (2017). The Economic Consequences of the Brexit Vote. Discussion Papers 1738, Centre for Macroeconomics (CFM).
 Bradley and Kantz (2015) Bradley, E. and Kantz, H. (2015). Nonlinear timeseries analysis revisited. Chaos: An Interdisciplinary Journal of Nonlinear Science, 25(9):097610.
 Bureau of Labor Statistics (2018) Bureau of Labor Statistics, U. D. o. L. (2018). Local Area Unemployment Statistics.
 Card and Krueger (1994) Card, D. and Krueger, A. (1994). Minimum wages and employment: A case study of the fastfood industry in new jersey and pennsylvania. American Economic Review, 84(4):772–93.
 Casdagli et al. (1991) Casdagli, M., Eubank, S., Farmer, J. D., and Gibson, J. (1991). State space reconstruction in the presence of noise. Phys. D, 51(13):52–98.
 Chang et al. (2017) Chang, C.W., Ushio, M., and Hsieh, C.h. (2017). Empirical dynamic modeling for beginners. Ecological Research, 32(6):785–796.
 Deyle et al. (2013) Deyle, E. R., Fogarty, M., Hsieh, C.h., Kaufman, L., MacCall, A. D., Munch, S. B., Perretti, C. T., Ye, H., and Sugihara, G. (2013). Predicting climate effects on pacific sardine. Proceedings of the National Academy of Sciences, 110(16):6430–6435.
 Doudchenko and Imbens (2016) Doudchenko, N. and Imbens, G. W. (2016). Balancing, regression, differenceindifferences and synthetic control methods: A synthesis. Technical report, National Bureau of Economic Research.
 Durlauf (2005) Durlauf, S. N. (2005). Complexity and empirical economics. The Economic Journal, 115(504):F225–F243.
 Eckmann and Ruelle (1985) Eckmann, J.P. and Ruelle, D. (1985). Ergodic theory of chaos and strange attractors. In The Theory of Chaotic Attractors, pages 273–312. Springer.
 Eftekhari et al. (2016) Eftekhari, A., Yap, H. L., Wakin, M. B., and Rozell, C. J. (2016). Stabilizing embedology: Geometrypreserving delaycoordinate maps. arXiv preprint arXiv:1609.06347.
 Friedman (1975) Friedman, M. (1975). Unemployment versus inflation?: an evaluation of the Phillips curve. Number 2. Transatlantic Arts.
 Garland and Bradley (2015) Garland, J. and Bradley, E. (2015). Prediction in projection. Chaos: An Interdisciplinary Journal of Nonlinear Science, 25(12):123108.
 Gibson et al. (1992) Gibson, J. F., Doyne Farmer, J., Casdagli, M., and Eubank, S. (1992). An analytic approach to practical state space reconstruction. Physica. D, Nonlinear phenomena, 57(12):1–30.
 Härdle et al. (1997) Härdle, W., Lütkepohl, H., and Chen, R. (1997). A review of nonparametric time series analysis. International Statistical Review, 65(1):49–72.
 Hastie et al. (2001) Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning. Springer.
 Kostelich and Schreiber (1993) Kostelich, E. J. and Schreiber, T. (1993). Noise reduction in chaotic timeseries data: A survey of common methods. Physical Review E, 48(3):1752.
 Kreif et al. (2016) Kreif, N., Grieve, R., Hangartner, D., Turner, A. J., Nikolova, S., and Sutton, M. (2016). Examination of the synthetic control method for evaluating health policies with multiple treated units. Health Economics, 25(12):1514–1528.
 Lichman (2013) Lichman, M. (2013). UCI machine learning repository.
 Lorenz (1963) Lorenz, E. N. (1963). Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20(2):130–141.
 Lucas (1976) Lucas, R. J. (1976). Econometric policy evaluation: A critique. CarnegieRochester Conference Series on Public Policy, 1(1):19–46.
 Mooij et al. (2016) Mooij, J. M., Peters, J., Janzing, D., Zscheischler, J., and Schölkopf, B. (2016). Distinguishing cause from effect using observational data: Methods and benchmarks. Journal of Machine Learning Research, 17(32):1–102.
 Pecora et al. (1996) Pecora, L., Carroll, T., and Heagy, J. (1996). Statistics for continuity and differentiability: An application to attractor reconstruction from time series. Nonlinear Dynamics and Time Series: Building a Bridge Between the Natural and Statistical Sciences.
 Phillips (1958) Phillips, A. W. (1958). The relation between unemployment and the rate of change of money wage rates in the united kingdom, 1861–1957 1. economica, 25(100):283–299.
 Roberts (1995) Roberts, J. M. (1995). New keynesian economics and the phillips curve. Journal of money, credit and banking, 27(4):975–984.
 Rosenstein et al. (1994) Rosenstein, M. T., Collins, J. J., and Luca, C. J. D. (1994). Reconstruction expansion as a geometrybased framework for choosing proper delay times. Physica D: Nonlinear Phenomena, 73(1):82 – 98.
 Rosser (1999) Rosser, J. B. (1999). On the complexities of complex economic dynamics. The Journal of Economic Perspectives, 13(4):169–192.
 Saez and Zucman (2016) Saez, E. and Zucman, G. (2016). Editor’s choice wealth inequality in the united states since 1913: Evidence from capitalized income tax data. The Quarterly Journal of Economics, 131(2):519–578.
 Sauer et al. (1991) Sauer, T., Yorke, J. A., and Casdagli, M. (1991). Embedology. Journal of Statistical Physics, 65(3):579–616.
 Saunders et al. (2015) Saunders, J., Lundberg, R., Braga, A. A., Ridgeway, G., and Miles, J. (2015). A synthetic control approach to evaluating placebased crime interventions. Journal of Quantitative Criminology, 31(3):413–434.
 Schiff et al. (1996) Schiff, S. J., So, P., Chang, T., Burke, R. E., and Sauer, T. (1996). Detecting dynamical interdependence and generalized synchrony through mutual prediction in a neural ensemble. Phys. Rev. E, 54:6708–6724.
 Small and Tse (2004) Small, M. and Tse, C. (2004). Optimal embedding parameters: a modelling paradigm. Physica D: Nonlinear Phenomena, 194(3):283 – 296.
 Strogatz (2001) Strogatz, S. H. (2001). Nonlinear Dynamics And Chaos: With Applications To Physics, Biology, Chemistry, And Engineering. Studies in Nonlinearity. Westview Press.
 Sugihara et al. (2012) Sugihara, G., May, R., Ye, H., Hsieh, C.h., Deyle, E., Fogarty, M., and Munch, S. (2012). Detecting causality in complex ecosystems. 338.
 Sugihara and May (1990) Sugihara, G. and May, R. M. (1990). Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature, 344(6268):734–741.
 Takens (1981) Takens, F. (1981). Detecting strange attractors in turbulence. Lecture Notes in Mathematics, Berlin Springer Verlag, 898:366.
 Ye et al. (2015) Ye, H., Beamish, R. J., Glaser, S. M., Grant, S. C. H., Hsieh, C.h., Richards, L. J., Schnute, J. T., and Sugihara, G. (2015). Equationfree mechanistic ecosystem forecasting using empirical dynamic modeling. Proceedings of the National Academy of Sciences, 112(13):E1569–E1576.
Appendix A Data
a.1 California’s tobacco control program
We use time series templates from three difference data sources to generate artificial control units. For each template, we create 39 artificial states (the same number of states in original study) with corresponding panel data and four predictors of the outcome variable. Specifically, the panel data are generated from multiple sets of noisy copies from the template and the predictors are from original tobacco data but with permuted indices for each artificial state. We add to the original pool to construct a new pool including 77 control units. We also apply moderate data transformations to ensure these adversaries sizable but unrelated to original data. For each data source, multiple sets of artificial control units are generated. We run simulations for each set of adversaries and display result distributions with box plots. The details of each data source are described as follows.
Unemployment.
The unemployed percent of US labor force data include annual average employment status of the civilian population from 1976 to 2016, giving 41 years of data for 51 states. ^{6}^{6}6Data collected from the Local Area Unemployment Statistics (LAUS) program of the Bureau of Labor Statistics (BLS) (Bureau of Labor Statistics, 2018) https://www.bls.gov/lau/staadata.txt To make the data sizable with the original tobacco data, the scaling factor is set to be 6. To generate multiple sets of artificial control units, we select the starting year between 1976 to 1986 and take the following 31 data points as 31 years of data for each state, which gives 11 sets of artificial control units. The simulation results are obtained over 11 runs.
Income.
The income data include time series of Top 0.1 Wealth Share in the United States from 1953 to 2012 every other year (Saez and Zucman, 2016). A linear transformation is applied to ensure that the trend and scale is compatible with tobacco data:
where is the original Top 0.1 Wealth Share data and is the transformed data. We fit an autoregressive model to this template and create 39 noisy copies as the adversarial time series for 39 states by adding Gaussian noise to them. In order to generate multiple sets of artificial control units, we scale the Gaussian noise with factor , where , which gives 11 sets of artificial control units. The simulation results are obtained over 11 runs.
Downshift.
The downshift data is chosen from the downshift trend time series from Synthetic Control Chart Time Series ^{7}^{7}7http://archive.ics.uci.edu/ml/datasets/synthetic+control+chart+time+series (Lichman, 2013). Since the time series length is 50, we choose different starting points to generate 21 different templates. For each template, we fit an autoregressive model and created 39 noisy copies as the artificial control units by adding Gaussian noise to them. The simulation results are obtained over 21 runs.
a.2 Brexit vote
We normalized the time series for each country by dividing the time series by its 1995 average and then taking logarithm of that time series to generate the approximately zero starting point in 1995. The predictors of outcome variable include:

Real private consumption: the sum of real final consumption expenditure of both households and nonprofit institutions serving households, from 1997Q1 to 2017Q2.

Real investment: total gross fixed capital formation, from 1995Q1 to 2017Q2.

Net exports: the external balance of goods and services, from 1997Q1 to 2017Q2.

Inflation series: computed as the change in the Consumer Price Index (CPI), from 1998Q1 to 2017Q3.

Quarterly shortterm nominal interest rates: computed as quarterly averages of monthly values, from 2002Q1 to 2017Q4.

Nominal exchange rates: from 1997Q1 to 2018Q4.
The artificial control units are generated in the same way as the example of California’s tobacco control program. We use time series template and create 31 artificial countries (same number of control countries in the original study) with corresponding panel data and six predictors of the outcome variable. Specifically, the panel data are generated from multiple sets of noisy copies from templates and predictors are still from original tobacco data but with permuted indices. We add to the original pool to construct a new pool including 61 control units. We also apply moderate data transformations to ensure these adversaries sizable and unrelated to original panel data. The details on how to generate the adversaries are described as follows.
Calls.
We use the calls data collected from the Monthly average daily calls to directory assistance from Jan 1962 to Dec 1976 ^{8}^{8}8https://datamarket.com/data/set/22yq as the template. We choose the first 106 data points from this series due to its similar trend with the brexit data, which gives 11 different sets of templates by choosing different starting points. For each template, we fit an autoregressive model and create 31 noisy copies as 31 artificial countries by adding Gaussian noise to them. The simulation results are obtained over 11 runs.
Appendix B Theory
b.1 Proof of Theorem 1
Theorem.
Let and , where is the delayed coordinate embedding of , as defined in Equation (7), and are the nearest neighbors to . The CCM score of on based on MAE metric is equal to:
where , and
Similarly, let and , where is the delayed coordinate embedding of , as defined in Equation (7), and are the nearest neighbors to . The CCM score of on based on MAE metric is equal to:
where
Proof.
Given the following autoregressive model:
where , are fixed, with , is the drift, and are zeromean and constantvariance normal errors. The autoregressive model can be equivalently written as
where and is the summation of weighted error terms , assuming that . It is obvious that as .
Since drives in this model, we can generate a cross estimation of from denoted as . For every , where is the library size, we can find such that are nearest neighbors of , where is the corresponding delayed coordinate embedding with dimension . Thus, the cross estimation for is
Then, CCM score of on based on MAE metric is
where .
Similarly, we generate a cross estimation of from denoted as . For every , where is the library size, we can find such that are nearest neighbors of , where is the corresponding delayed coordinate embedding with dimension . Thus, the cross estimation for is
Then, the CCM score of on based on MAE metric is
where . ∎
b.2 Simulations and analysis details
We now turn to simulations by generating two time series and using the aforementioned autoregressive model by setting , , . The embedding dimension is 4 and thus the number of nearest neighbors is 5. We set library size and compute MAE for the prediction of future data, i.e., when . The simulation results are averaged over 200 runs displayed in Figure 13, where the default setting is ,