Approximate Causal Abstraction
Scientific models describe natural phenomena at different levels of abstraction. Abstract descriptions can provide the basis for interventions on the system and explanation of observed phenomena at a level of granularity that is coarser than the most fundamental account of the system. Beckers and Halpern (?), building on work of Rubenstein et al. (?), developed an account of abstraction for causal models that is exact. Here we extend this account to the more realistic case where an abstract causal model offers only an approximation of the underlying system. We show how the resulting account handles the discrepancy that can arise between low- and high-level causal models of the same system, and in the process provide an account of how one causal model approximates another, a topic of independent interest. Finally, we extend the account of approximate abstractions to probabilistic causal models, indicating how and where uncertainty can enter into an approximate abstraction.
marginparsep has been altered.
paperwidth has been altered.
marginparwidth has been altered.
marginparpush has been altered.
paperheight has been altered.
The page layout violates the UAI style. Please do not change the page layout, or include packages like geometry, savetrees, or fullpage, which change it for you. We’re not able to reliably undo arbitrary changes to the style. Please remove the offending package(s), or layout-changing commands and try again.
Approximate Causal Abstraction
Sander Beckers Dept. of Philosophy and Religious Studies Utrecht University email@example.com Frederick Eberhardt Humanities and Social Sciences California Instituite of Technology firstname.lastname@example.org Joseph Y. Halpern Dept. of Computer Science Cornell University email@example.com
Scientific models aim to provide a description of reality that offers both an explanation of observed phenomena and a basis for intervening on and manipulating the system to bring about desired outcomes. Both of these aims lead to a consideration of models that represent the causal relations governing the system. They also imply the need for scientific models that describe the system at a granularity or level of description appropriate for the user and suitable for interventions that are feasible. Such more abstract causal models do not capture all the detailed interactions that occur at the most fundamental level of the system, nor do they, in general, represent outcomes of the system completely accurately at the abstract level. Nevertheless, such abstract causal models can (at least) approximately explain the phenomena, and can be informative about how the system will respond to interventions that are specified only at the abstract level.
This paper provides a formal account of such approximate abstractions for causal models that builds on the definition of an abstraction provided by Beckers and Halpern (?) (see Section 2), which in turn built on the work of Rubenstein et al. (?). That notion of abstraction implicitly assumed an underlying causal system that permitted an exact description of the system at the abstract level. Here we weaken that assumption to handle what we take to be the more realistic case, namely, that abstract causal models will capture the underlying system in only an approximate way.
As a simplified working example to illustrate our points we use the case of the wind and sea surface temperature patterns over the equatorial Pacific that give rise to the high-level climate phenomena of El Niño and La Niña, as described by Chalupka et al. (?). They considered the question of how the El Niño climate phenomenon related to the underlying wind and sea surface temperature patterns that constitute it (see Fig. 1). At the low level they considered two high-dimensional vector-valued variables representing the wind speeds and the sea surface temperatures , respectively, on a grid of geographical locations in the equatorial Pacific. They assumed (with some justification from climate science) that wind speed is a cause of sea surface temperature , that is, for some high-dimensional function and exogenous causes . They allowed the possibility that may be a confounder of and , so that there might be an additional causal relation . Leaving details about feedback and temporal delay aside, they were interested in whether the same system could be described at a higher level, using a low-dimensional structural equation , where there is a surjective mapping from , the set of possible values of and , to . In the language of this paper, they were searching for an abstract causal description of the system. They required that the high-level model retain a causal interpretation, in the sense that if one intervened on , there would still be a well-defined causal effect on , no matter how the intervention on was interpreted as an intervention on the underlying set of variables .
Chalupka et al. (?) were able to learn such a high-level model, and one of the states of (the high-level description of the sea surface temperatures) indeed corresponded to what would commonly be described as an El Niño occurring, conventionally defined by an average temperature deviation in a rectangular region of the Pacific. However, the high-level description was not perfect: it provided an informative causal description of the underlying systems and allowed for predictions that approximated the actual outcomes. Here we make precise the nature of such an approximation between a high-level and low-level causal model of the same system. In the process, we define what it means for one causal model to approximate another.
Although our running example is a vastly simplified climate model, the challenge of approximately modeling phenomena at a more abstract level is part of almost every scientific model. For example, it was Robert Boyle’s great insight that, despite its inaccuracies for real gases in practice, the ideal gas law still provides an approximate abstract description of the behavior of the molecules of a gas in a container that is extraordinarily useful for understanding and manipulating real systems. Approximate abstractions can take a variety of forms in scientific practice, ranging from idealizations and discretizations to other forms of simplification and dimension reduction (as in the climate example). Our account captures these in a unified formal framework.
The main contribution of this paper is to present a framework that offers a foundation for analyzing abstraction and approximation in causal models. We provide what we believe are sensible definitions of approximation and approximate abstraction, and a conceptual discussion of these notions. In addition, we provide some technical results regarding the difficulty of determining whether an approximate abstraction can be viewed as the composition of an approximation and an exact abstraction.
Since we are interested in scientific models that support explanations of phenomena and can inform interventions on a system, we start by defining a deterministic causal model with a set of possible interventions. We use exogenous and endogenous variables to distinguish those influences that are external to the system and those that are internal. The definitions follow the framework developed by Halpern (?).
: A signature is a tuple , where is a set of exogenous variables, is a set of endogenous variables, and , a function that associates with every variable a nonempty set of possible values for (i.e., the set of values over which ranges). If , denotes the crossproduct .
For simplicity in this paper, we assume that signatures are finite, that is, and are finite, and the range of each variable is finite.
: A basic causal model is a pair , where is a signature and defines a function that associates with each endogenous variable a structural equation giving the value of in terms of the values of other endogenous and exogenous variables. Formally, the equation maps to , so determines the value of , given the values of all the other variables in .
Note that there are no functions associated with exogenous variables, since their values are determined outside the model. We call a setting of values of exogenous variables a context.111We remark that the notion of context used here, which goes back to [Halpern and Pearl 2005], is similar to that of Boutilier et al. (?), in that both are assignments of values to variables. However, here it is used in particular to denote an assignment of values to all the exogenous variables.
The value of may not depend on the values of all other variables. depends on in context if there is some setting of the endogenous variables other than and such that if the exogenous variables have value , then varying the value of in that context results in a variation in the value of ; that is, there is a setting of the endogenous variables other than and and values and of such that .
In this paper, we restrict attention to recursive (or acyclic) models, that is, models where there is a partial order on variables such that if depends on , then .222Halpern (?) calls this strongly recursive, in order to distinguish it from models in which the partial order depends on the context. This distinction has no impact on our results. In a recursive model, given a context , the values of all the remaining variables are determined (we can just solve for the value of the endogenous variables in the order given by ) We often write the equation for an endogenous variable as ; this denotes that the value of depends only on the values of the variables in , and the connection is given by . Our climate example is recursive, since .
An intervention has the form , where is a set of endogenous variables. Intuitively, this means that the values of the variables in are set to . The structural equations define what happens in the presence of external interventions. Setting the value of some variables to in a causal model results in a new causal model, denoted , which is identical to , except that is replaced by : for each variable , (i.e., the equation for is unchanged), while for each in , the equation for is replaced by (where is the value in corresponding to ).
Halpern and Pearl (?) and Halpern (?) implicitly assumed that all interventions can be performed in a model. For reasons that will become clear when defining abstraction, we follow Rubenstein et al. (?) and Beckers and Halpern (?) in adding the notion of “allowed interventions” to a causal model. This allows us to capture situations where not all interventions are of interest to the modeler and/or some interventions may not be feasible. We can then define a causal model as a tuple , where is a basic causal model and is a set of allowed interventions. We sometimes write a causal model as , where is the basic causal model , if we want to emphasize the role of the allowed interventions.
Given a signature , a primitive event is a formula of the form , for and . A causal formula (over ) is one of the form , where is a Boolean combination of primitive events, are distinct variables in , and . Such a formula is abbreviated as . The special case where is abbreviated as . Intuitively, says that would hold if were set to , for .
A causal formula is true or false in a causal model, given a context. As usual, we write if the causal formula is true in causal model given context . The relation is defined inductively. if the variable has value in the unique (since we are dealing with recursive models) solution to the equations in in context (i.e., the unique vector of values that simultaneously satisfies all equations in with the variables in set to ). The truth of conjunctions and negations is defined in the standard way. Finally, if .
To simplify notation, we sometimes write to denote the unique element of such that . Similarly, given an intervention , denotes the unique element of such that .
These definitions allow us to describe the climate system both in terms of a low-level causal model and a high-level model :
Chalupka et al. (?) treated as not only exogenous, but also unobserved, and therefore made no claim about its dimensionality in the high-level model. We do not explicitly spell out and here; the description of the climate model indicates that is much smaller than : many different low-level (vector-valued) states may correspond to one high-level low-dimensional state. Finally, for our climate example we have not yet said anything about interventions, so and are currently placeholders.
We are now in a position to specify a relation between the high and low-level models and .
Beckers and Halpern (?) gave a sequence of successively more restrictive definitions of abstraction for causal model. The first and least restrictive definition is the notion of exact transformation due to Rubenstein et al. (?). Examples given by Beckers and Halpern show that the notion of exact transformation is arguably too flexible. Thus, in this paper the notion of abstraction we consider is that of -abstractions, introduced by Beckers and Halpern, which can be viewed as a restriction of exact transformations333Exact -transformations relate probabilistic causal models, while -abstractions relate (deterministic) causal models. Beckers and Halpern (?) show that we can compare the two by proving that every -abstraction is what they call a uniform -transformation: specifically, if is a -abstraction of , then for every probability , there exists a probability such that is an exact -transformation of . and avoids some of their problems. However, nothing hinges on this choice: all definitions can just as well be interpreted using the less restrictive notions of abstractions.
The key to defining all the notions of abstraction from a low-level to a high-level causal model considered by Beckers and Halpern is the abstraction function that maps endogenous states of to endogenous states of . This is a generalization of the surjective mapping from to discussed in the introduction. In the formal definition, we need two additional functions: , which maps exogenous states of to exogenous states of , and , which maps low-level interventions to high-level interventions. Beckers and Halpern (?) show that, given their definition of abstraction, and can be derived from . We briefly review the relevant definitions here; we refer the reader to their paper for more details and motivation.
: Given a set of endogenous variables, , and , let
Given , define if , , and (where, as usual, given , define ). It is easy to see that, given and , there can be at most one such and . If such a and do not exist, we take to be undefined. Let be the set of interventions for which is defined, and let .
Note that if is surjective, then it easily follows that , and for all , .
With this definition, the need for the intervention sets and becomes clear: in general, not all low-level interventions will neatly map to a high-level intervention, since the abstraction function may aggregate variables together; some low-level interventions will constitute only a partial intervention on a high-level variable. The “allowed intervention” sets ensure that the set of interventions can be suitably restricted to retain only those that can actually be abstracted. Similarly, there may be cases where the high-level model does not support all interventions because they may not be well-defined. For example, what does it mean in the ideal gas law to change temperature, while keeping pressure and volume constant? It is not even clear that such an intervention is meaningful.
Of course, a minimal requirement for any causal model to be a -abstraction of some other model is that the signatures of both models need to be compatible with . Beckers and Halpern (?) add two further minimal requirements. We capture all of them by requiring the two causal models to be -consistent:
: If , then and are -consistent if is surjective, , and .
: is a -abstraction of if and are -consistent and there exists a surjective such that for all and ,
Abstraction means that for each possible low-level context-intervention pair, the two ways of moving up “diagonally” to a high-level endogenous state always lead to the same result. The first way is to start by applying to get a low-level state, and then moving up to a high-level state by applying , whereas the second way is to first move to a high-level context and intervention (by applying and ), and then to obtain a high-level state by applying .
A common and useful form of abstraction occurs when the low-level variables are clustered, so that the clusters form the high-level variables. Roughly speaking, the intuition is that in the high-level model, one variable captures the effect of a number of variables in the low-level model. This makes sense only if the low-level variables that are being clustered together “work the same way” as far as the allowed interventions go. The following definition makes this special case of abstraction precise.
: If , then is constructive if there exists a partition of , where are nonempty, and mappings for such that ; that is, , where is the projection of onto the variables in , and is the concatenation operator on sequences. If is a -abstraction of then we say it is constructive if is constructive and .
In this definition, we can think of each as describing a set of microvariables that are mapped to a single macrovariable . The variables in (which might be empty) are ones that are marginalized away.
The climate example almost exactly fits the notion of constructive abstraction: the variables in the high-level model, and , each correspond to a vector-valued low-level variable, and , but and could have been replaced by disjoint sets of variables. Consequently, maps states from the same low-level variable to the same high-level variable (see Fig. 2, left). Although interventions are practically not feasible in the climate case, hypothetically they are perfectly well-defined: an intervention on can be instantiated at the low level by several different interventions on (see Fig. 2, right). Finally, given an intervention on , we have that
The correspondence between and is exact. In fact, the high-level model constructed by Chalupka et al. (?) did not satisfy this biconditional precisely, but had to approximate it. We maintain that, in general, high-level models in science are only approximate abstractions.
3 Approximate Abstraction
In order to define what it means for one causal model to be an approximation of another, we need a way of measuring the “distance” between causal models. We take a distance function to simply be a function that associates with a pair a distance, that is, a non-negative real number. We show how various distance functions on causal models can be defined, starting from a metric on the state space of a causal model. (Recall that a metric on a space is a function such that (a) iff , (b) , and (c) .) Such a metric is typically straightforward to define. Given two states and , we can compare the value of each endogenous variable in and . The difference in the values determines the distance between and .
The choice of distance function is application-dependent. Different researchers looking at the same data may be interested in different aspects of the data. For example, suppose that the model is defined in terms of 5 variables, . might be gender and might be height. Suppose that we restrict to distance functions that takes the distance between and to be of the form , where are weights that represent the importance (to the researcher) of each of these five features. One researcher might not be interested in gender (so doesn’t care if her predictions about gender are incorrect), and thus might take ; another researcher might care about gender and not about height, so she might take and . While, as we shall see, the choice of distance function makes a crucial difference in evaluating the “goodness” of an approximate abstraction, in light of the above, we leave the choice of distance function unspecified.
In the remainder of the paper, we assume that the state space of endogenous variables for each causal model comes with a metric . We provide a number of ways of lifting the metric on states to a distance function on models, and then use the distance function to define both approximation and approximate abstraction.
Our intuition for the distance function is based on how causal models are typically used. Specifically, we are interested in how two models compare with regard to the predictions they make about the effects of an intervention. Our intuition is similar in spirit to that behind the notion of structural intervention distance considered by Peters and Bühlmann (?), although the technical definitions are quite different. (We discuss the exact relationship between our approach and theirs in the next section, in the context of probabilistic models.)
We start with the simplest setting where this intuition can be made precise, one where the models and differ only with regards to their equations. We say that two models are similar in this case. If two models are similar, then, among other things, we can assume that they have the same metric . In this setting, we can compare the effect of each allowed intervention in the two models. That is, for each context , we can compare the states and that arise after performing the intervention in context in each model. We get the desired distance function by taking the worst-case distance between all such states.
: Define a distance function on pairs of similar models by taking
The causal model is a - approximation of if .
Thus, is a - approximation of if the predictions of are always within of the predictions of .
We apply similar ideas to defining approximate abstraction. But now we no longer have a distance function defined on causal models with the same signature. Rather, the distance function is defined on pairs consisting of a low-level and high-level causal model (which, in general, have different signatures), related by a a surjective mapping . The idea behind is that we start with a low-level intervention , consider its effects in , lift this up to using , and compare this to the effects of in .
: Fix a surjective map . Define the distance function on pairs of -consistent models by taking
is a - approximate abstraction of if .
We take the minimum over all functions because the function that lifts the low-level contexts up to the high-level contexts does not play a major role. We thus simply focus on the best choice of .
To get an intuition for an approximate abstraction, consider the climate example again. For a low-level intervention and a low-level context , there are two ways of lifting their effect to (see Fig. 3). The first is to start by applying to the context-intervention pair to determine a low-level state , and then apply to obtain the high-level state . (Recall that can be viewed as a function from context-intervention pairs to states in .) The second is to first lift the intervention to , that is, an intervention on , and the context to by applying and . Then we apply , which again gives a high-level endogenous state . We are identifying the degree to which approximates by considering the worst-case distance between the two ways of lifting the context-intervention pairs, for an optimal choice of .
The following straightforward results show that our notion of approximate abstraction is a sensible generalization of both the notion of an exact abstraction and the notion of approximation between similar models.
: is a --approximate abstraction of iff is a -abstraction of .
: If and are similar, then is a --approximate abstraction of (where is the identity function on ) iff is a --approximation of .
4 Approximate Abstraction for Probabilistic Causal Models
A probabilistic causal model is just a causal model together with a probability on contexts .
In this section, we assume that all causal models are probabilistic, and extend the notion of approximation to probabilistic causal models. We again start by considering the simplest setting, where we have probabilistic models that differ only in their equations. We again call such models similar. Now we have several reasonable distance functions.
: Define a distance function on pairs of similar probabilistic causal models by taking
The probabilistic causal model is a - approximation of if .
In this definition, we have just replaced the max over contexts in Definition 3.1 by an expectation over contexts.
But we may not always just be interested in the expected distance. We may, for example, be more concerned with the likelihood of serious prediction differences, and not be too concerned about small differences. This leads to the following definition.
: Define a distance function on pairs of similar probabilistic causal models by taking
The probabilistic causal model is a - approximation of if .
We can now extend these ideas to approximate abstraction. We first extend the definition of -abstraction to the probabilistic setting.
Note that we can view as a probability measure on , by taking . An intervention also induces a probability on in the obvious way:
In the deterministic notion of abstraction, we require that the two high-level states obtained by the two different ways of lifting the effects of a low-level intervention to the high level be equal. In the probabilistic notion, we require that the two different probability distributions obtained by the two ways of lifting an intervention be equal.
: is a -abstraction of if and are -consistent and for all interventions , we have that
We can now extend our definitions to the approximate scenario just as we did for deterministic causal models.
: Fix a surjective map . Define the distance function on pairs of -consistent probabilistic causal models by taking
is a - approximate abstraction of if .
For the climate example this definition implies the following: Suppose we introduce probabilities by specifying distributions over the contexts and . The resulting probabilistic causal model is a - approximate abstraction of if the expectation (in terms of ) of the difference between the states and of the high-level temperature variable is less or equal to , where and are determined exactly in accordance with the two pathways in Fig. 3, selecting the worst-case intervention and the best case .
Analogously to the deterministic case, we have the following straightforward results.
: is a --approximate abstraction of iff is a -abstraction of .
: If and are similar, then is a --approximate abstraction of iff is a --approximation of .
In Definition 4.4 we consider the worst-case low-level intervention to define the distance. In many cases, however, the whole point of an abstraction is to be able to exclude rare low-level boundary cases (e.g., when the ideal gas law is taken to refer only to equilibrium states). Moreover, often the actual manipulations that we can perform are known to us only at the high-level, because the low-level implementation of the intervention is unobservable to us. For example, in setting the room temperature to F we do not generally consider the instantiation of that intervention which superheats one corner of the room and freezes the rest such that the mean kinetic energy works out just right. As Spirtes and Scheines (?) show, in the absence of any further information, such ambiguous manipulations can be quite problematic. Fortunately, often our knowledge of the mechanism of how a high-level intervention is implemented does give us significant probabilistic information. For example, we might know that the heater has a fan that circulates the air, most likely resulting in relatively uniform distributions of the kinetic energies of the particles.
We capture this information using what we call an intervention distribution.
: Given a surjective map , an intervention distribution is a distribution on such that iff .
We think of as telling us how likely the high-level intervention is to have been implemented by the low-level intervention .
: Given a surjective map and an intervention-distribution , the distance function on pairs of -consistent probabilistic causal models is defined by taking
Intuitively, for each high-level intervention , we take the expected distance between the two ways of lifting a low-level intervention to . In computing the expectation, there are two sources of uncertainty: the likelihood of a given context (this is determined by ) and the likelihood on each low-level intervention that maps to (this is given by ). Fig. 4 illustrates the point for the climate example.
We can also define and in a manner completely analogous to Definition 4.2. We omit these definitions for reasons of space, but the details should be clear.
It is worth comparing our approach to other approaches to determining the distance between causal models. The more standard way to compare two causal models is to compare their causal graphs. The causal graph is a directed acyclic graph (dag) that has nodes labeled by variables; (the node labeled) is an ancestor of (the node labeled) iff . Dags have been compared using what is called the structural Hamming distance (SHD) [Acid and de Campos 2003], where the SHD between and (which are assumed to have an identical set of nodes) is the number of pairs of nodes on which and differ regarding the edge between and (either because one of one of them has an edge and the other does not, or the edges are oriented in different directions). As Peters and Bühlmann (?) observe, the SHD misses out on some important information in causal networks. In particular, it does not really compare the effect of interventions. They want a notion of distance that takes this into account, as do we. However, they take into account the effect of interventions in a way much closer in spirit to SHD. Roughly speaking, in our language, given two similar causal models and , they count the number of pairs of endogenous variables such that intervening on leads to a different distribution over in and . More formally, let denote the marginal of on the variable in model . (Since we want to compare probability distributions in two different models, we add the model to the superscript.) The SID between similar models and is the number of pairs such that there exists an such that .
Although SID does compare the predictions that two models make, it differs from our distance functions in several important respects. First, it compares just the effect of interventions on single variables, whereas we allow arbitrary interventions. We believe that it is important to consider arbitrary interventions, since sometimes variables act together, and it takes intervening on more than one variable to really distinguish two models. Second, we are interested in how far apart two distributions are, not just the fact that they are different. Finally, we want a definition that applies to models that are not similar, since this is what we need for approximate abstraction.
5 Composing Abstraction and Approximation
It can be useful to understand an approximate abstraction as the result of composing an approximation and an abstraction, in some order. For example, we can explain the ideal gas law in terms of thinking of frictionless elastic collisions between particles (this is an approximation to the truth) and then abstracting by replacing the kinetic energy of the particles by their temperature (a measure of average kinetic energy). In this section, we examine the extent to which an approximate abstraction can be viewed this way. We start with two easy results showing that if we compose an approximation and an abstraction in some order, then we do get an approximate abstraction.
: If is a -abstraction of and is a --approximation of then is a - approximate abstraction of . 444Proofs of all technical results from here onwards can be found in the appendix.
In Proposition 5.1 we considered an abstraction followed by an approximation. Things change if we do things in the opposite order; that is, if we consider an approximation followed by an abstraction. For suppose that is a - approximation of and is a -abstraction of . Now it is not in general the case that is a --approximate abstraction of . The problem is that when assessing how good an abstraction is of , we are comparing two high-level states (using ). But when comparing to , we use . In general, and may be unrelated.
: If is a -abstraction of and is a - approximation of , then is a - approximate abstraction of , where is
While composing an approximation and an abstraction gives us an approximate abstraction, the following two examples show that we cannot in general decompose an approximate abstraction into an abstraction composed with an approximation or an approximation composed with an abstraction. Theorem 5.6 below shows that if we restrict ourselves to the constructive case then we can do the former. However, Example 5.4 shows that, even if we restrict to the constructive case, we cannot do the latter.
: has one exogenous variable and endogenous variables , , and . also has one exogenous variable and endogenous variables and . All the endogenous variables have range ; has range . Let be , where denotes addition mod 2. The equations for both and take the form for all variables . consists of the empty intervention, and interventions , , , and . consists of only the empty intervention. Taking as the Euclidean distance, we leave it to the reader to verify that is a - approximate abstraction of . However, there does not exist any that is similar to such that is a -abstraction of . To see why, note that the only state in that arises from applying an intervention is . Therefore, the only states in that can arise from interventions are ones where and . This means that under the intervention , we must have , and under , we must have . Therefore . It also means that under we must have and under we must have , so that . Thus, cannot be recursive.
: Let be the model where , , for , the equations are , , and , and consists of all interventions such that is intervened on iff is intervened on. Let be the model where , , , , the equations are , , the first component of , and consists of all interventions. Let , and is such that . Note that is constructive, that and are -consistent, and that . Therefore is a constructive - approximate abstraction of for some (the value of depends on the choice of metric ). It is easy to see that and