Team semantics for interventionist counterfactuals and causal dependence

Team semantics for interventionist counterfactuals and causal dependence

Abstract

We introduce a generalization of team semantics which provides a framework for manipulationist theories of causation based on structural equation models, such as Woodward’s and Pearl’s; our causal teams incorporate (partial or total) information about functional dependencies that are invariant under interventions. We give a unified treatment of observational and causal aspects of causal models by isolating two operators on causal teams which correspond, respectively, to conditioning and to interventionist counterfactual implication.

The evaluation of counterfactuals may involve the production of partially determined teams. We suggest a way of dealing with such cases by 1) the introduction of formal entries in causal teams, and 2) the introduction of weaker truth values (falsifiability and admissibility), for which we suggest some plausible semantical clauses.

We introduce formal languages for both deterministic and probabilistic causal discourse, and study in some detail their inferential aspects. Finally, we apply our framework to the analysis of direct and total causation, and other notions of dependence and invariance.

1 Introduction

Notions of dependence and independence entered the realm of logical investigation in the early days of mathematical logic, essentially with the introduction, by Frege, of nested quantification; this aspect of quantification was made explicit by the notion of Skolem function ([25]). It is only in the last decades, however, that a systematical analysis of (in)dependence notions within predicative, propositional, modal logical languages has been undertaken. One of the main unifying tools in this enterprise is the so-called team semantics ([14],[15],[31]), whose key idea is that formulas involving dependencies acquire meaning only when evaluated over sets of assignments. Variations of this methodology have allowed a systematical study of logical systems enriched with dependencies that arise from database theory, probabilistic theory and quantum information theory. In many cases, distinct notions of (in)dependence can coexist in one and the same formal language, and this kind of interplay has been systematically investigated from the point of view of definability and complexity. However, to the best of our knowledge, the framework of team semantics has not yet been used to investigate notions of causal and counterfactual dependence. In the present paper, we provide a generalization of team semantics that is also adequate to capture causal, counterfactual and probabilistic notions of (in)dependence which arise from modern manipulationist theories of causation such as Pearl’s ([22]) and Woodward’s ([38]). The generalization is not trivial. It is usually acknowledged in the literature that causal relationships cannot be reduced to mere correlations of data (the latter can be represented by a set of assignments.) Instead, a richer structure, encoding counterfactual assumptions, is needed.

The plan of the paper is as follows. In section 2 we give a short motivation for the manipulationist (interventionist) approach to causation. In section 3 we present team semantics and show how to adequately enrich it so that it can handle interventionist counterfactuals. We will introduce several languages to express various (deterministic) notions of dependence. In section 4 we analyze the logical properties of these languages, up to some basic soundness and completeness proofs; we use some of these properties to compare our counterfactuals with those of Stalnaker ([27]), Lewis ([18]) and Galles&Pearl ([8]). Section 5 is dedicated to the logical issues that arise from nonparametric models. In section 6 we introduce probabilistic causal languages. In section 7 we will discuss various notions of causation (mainly taken from Woodward) and invariance, in the light of the logic developed in earlier sections.

2 Theories of causation: background

The reductive approach to causation has been philosophers’ favourite tool. It aims, roughly, at finding necessary and sufficient conditions for causal relationships like “ causes ”. Two such conditions have been prominent in the literature: those formulated in terms of conditional probabilities, and those based on counterfactuals (counterfactual dependence). We discuss them shortly in the next two sections. The main purpose of presenting them is to understand some of the reasons why the reductive approach has been found unsatisfactory and replaced by non-reductive approaches, such as the manipulationist or interventionist accounts of counterfactuals and causation (Pearl, Woodward, Halpern, Hitchcock, Briggs, among others).

2.1 Conditional probabilities

Two well known endeavours to connect causal relationships with conditional probabilities are due to P. Suppes ([29]) and N. Cartwright ([4]). For instance, Cartwright ([4], p. 26) requires that causes raise the probabilities of their effect:

(CC)

causes iff for all state descriptions which satisfy certain conditions.

Woodward ([37]) finds (CC) defective in two ways. Firstly, the requirement to conditionalize on all is too strong, and a weaker, existential condition would suffice. Secondly, (CC) holds, as initially intended, only for positive causes. When negative (inhibiting) changes are taken into account, it is natural to replace by which only requires and to be (probabilistically) dependent. With these two points in mind, Woodward ([37]) proposes to replace (CC) with something of the following sort:

(*)

causes if and only if and are dependent conditional on certain other factors .

where and are variables standing for properties.

The problem now becomes that of specifying the other factors . Cartwright suggests that they include other causes of with the exception of those which are on a causal chain from to . She recognizes, however ([4], p. 30; [5], p. 95 ff; [37], p. 58) that the claim that we should never condition on all such intermediate variables is too strong. Woodward ([37]) makes it clear that in order to understand these restrictions, and more broadly, in order for the project of specifying the other factors to have any hope of success, we need to refer to other causes of and the way they are connected. For instance, to see why it is inappropriate to conditionalize on the variables which lie on a causal chain from to , as Cartwright first suggestion goes, it is enough to consider the causal structure

Now if we were to conditionalize on , we would expect, intuitively, and to be independent, which according to the definition above would result in not being a cause of . But this is not what we want.

On the other side, to see that the requirement of never conditionalizing on the variables on causal chains from to is too strong, it is enough to consider the causal structure

in which both and are “direct” causes of and are on the causal paths between and . If the causal connection between and is to be reflected in the probabilistic dependence of on conditional on some other properties , then these other properties must include . In other words, to determine the causal influence of on we must take into account the influence of on .

What all this shows, according to Woodward, is that the project to connect causal relationships between and with conditional probabilities, which finds its expression in (*), goes via a mechanism which provides information about other causes (contributing or or total) of besides and how these causes are connected with one another and with . ([37], p. 58.)

One such mechanism is that of causal Bayesian networks (Pearl 2000/2009 [22], Spirtes, Glymour and Scheines 1993/2001 [26]). They are built on Directed Acyclic Graphs (DAGs) of the kind we have already encountered in our earlier examples. We start with a set of variables whose causal relationships we want to investigate and a set of (directed) edges. A directed edge from the variable (parent) to the variable (child) is intended to represent the fact that is a direct cause of . The set of all parents of is denoted as . The project is now to investigate the connection between causal relationships generated by the edges of the graph and conditional probabilities determined by a joint probability distribution over . More exactly, let be a joint probability distribution on the set . We denote by the values associated with the variables in , that is, A DAG is said to represent if the equation

(1)

holds, where the variables obey the parent-child relation of : if and only if there is an arrow in from to .

A well known result states the following:

Theorem 2.1 (Markov Condition, [34]).

Let be a DAG with its set of variables. represents a probability distribution if and only if every variable in is independent of all its nondescendants (in G) conditional on its parents.

A DAG which represents a probability distribution is often referred as causal Bayesian network. The Markov Condition is thought to be important because it establishes a connection between causal relationships as represented by the arrows of a DAG and dependence relationships. For instance, if we take “ is a nondescendant of ” to stand for “ does not cause ”, then the Markov condition implies that if “ does not cause ” then conditional on its parents, is independent of , which by contraposition gives us the right-to-left direction of (*):

  • If variables and are dependent (that is: ), then causes .

(Cf. e.g. [37])

As handy as causal Bayesian networks are to connect causes with conditional probabilities, they fail to represent caunterfactual reasoning ([22], p. 37). In other words, if we want a robust notion of cause which sustains counterfactuals, we need to supplement causal Bayesian networks with an additional, deterministic component.

2.2 Counterfactual dependence

Accounts of causal relationships based on counterfactual dependence have been available starting with the works of David Lewis and G. H. von Wright. Lewis ([17], [19]) reduces causal relationships to counterfactual dependence, which, in the end, is defined in terms of similarity between possible worlds. Von Wright ([35]) distinguishes a causal connection between and from an accidental generalization (the concomitance of and ) on the basis of the fact that the former, unlike the latter, sustains a counterfactual assumption of the form ”on occasions where , in fact, was not the case, would have accompanied it, had been the case”; that is, is a state of affairs which we can produce or suppress at will. It is thus the manipulativity of the antecedent which is the individuating aspect of the cause factor ([35], p. 70). Lewis’s account has been criticized for relying too much on “dubious metaphysics” and von Wright’s account for being too “anthropomorphic”.

Both Lewis’s and von Wright’s accounts contain ingredients which have been incorporated later on into interventionist accounts of counterfactuals and causation. Roughly, one needs a mechanism to represent the exogenous process which is the manipulation of the (variables of the) antecedent of a counterfactual. This mechanism has become known as intervention2. In addition, we need another mechanism to measure the effects the changes of the intervened variables have on the (variables of the) consequent. This mechanism is encoded into the so-called structural (functional) equations ([22], [37]).

In more details, we divide the set of variables whose causal relationships we want to investigate into two disjoint sets: a set of endogenous variables and a set of exogenous variables. With each endogenous variable an equation of the form

is associated, where the variables are called the parents of . The standard interpretation of a functional equation is that of a law which specifies the value of given every possible combination of the values of . If we draw an arrow from each variable in to the variable we obtain directed graphs as in the case of the causal Bayesian frameworks mentioned in the previous section. The crucial difference between the two frameworks is that in the present case, instead of characterizing the child-parent relationships stochastically in terms of conditional probabilities we characterize them deterministically using the equations.

Various notions of intervention have been proposed, both in the context of causal Bayesian networks and in that of structural equations (e.g., [26], [36], [12], and [22]). A detailed discussion of this variety is outside the scope of this paper. Suffice it to say that an intervention on a variable is an action which disconnects the variable from all the incoming arrows into while preserving all the other arrows of the graph including those directed out of . This is known as the arrow-breaking conception of interventions. In the structural equations framework where the causal graph is induced by the appropriate set of equations, the intervention results also in the alteration of that set: the equation associated with the variable is replaced with a new equation , while keeping intact the other equations in the set.

It may be useful to illustrate these notions by way of an example ([37]).

Consider the following two equations:

This set of equations induces the DAG

If we intervene on and set its value to (i.e., , the result will be the altered system of equations:

corresponding to the new DAG:

2.3 Various notions of cause

Woodward ([36],[37]) uses interventions in the context of structural equations to define is a cause of ”. It turns out, however, that there are several distinct notions of cause, each satisfying the central commitment of the manipulability theory of causation: there is a causal relationship between and whenever there is a possible intervention that changes the value of such that carrying it out changes the value of (or its probability). [37], p. 54). Here are several notions of cause:

(DC)

(Direct cause) A necessary and sufficient condition for to be a direct cause of with respect to some variable set is that there be a possible intervention on that will change (or the probability distribution of ) when all the other variables in besides and are held fixed at some values by interventions. ([37], p. 52)

For Woodward, direct causes correspond to the parent-child relationships in the underlying DAG. The other notions cannot be always recovered from the arrows of the DAG:

(TC)

(Total cause) is a total cause of if and only if it has a non-null total effect on - that is, if and only if there is some intervention on alone such that for some values of the other variables, this intervention on will change . The total effect of a change in on is the change in the value of that would result from an intervention on alone that changes it by amount (given the values of other variables that are not descendants of ). ([37], p. 54)

(CC)

(Contributing cause) is a contributing cause of if and only if it makes non-null contribution to along some directed path in the sense that there is some set of values of variables that are not on this path such that if these variables were fixed at those values, there is some intervention on that will change the value of . The contribution to a change in the value of due to a change in the value of along some directed path is the change in the value of that would result from this change in , given that the values of off path variables are fixed by independent interventions. ([37], pp. 54-55)

For instance in our earlier example consisting of the set of equations (6) and (7), is a direct cause of . To make this more transparent we shall assume that the coefficients are all equal to 1.

We first intervene on and set its value to as we did above (i.e., . The result will be the altered system of equations:

Next we perform two (independent) interventions on the system (6’)-(7): yields ; and yields . We conclude that is a direct cause of in the sense of (DC).

Consider now the set of equations

which corresponds to the DAG

Any intervention leads to the system of equations

Now it is obvious that no change in the value of will have any influence on the value of , hence is not a direct cause of as expected. On the other side, it is easy to see that is a total cause of in the sense of (TC) – except in the special case that .

In section 7 we shall represent some of these causal notions in the framework of causal team semantics, to which we now turn.

3 Causal team semantics

3.1 Teams

Team semantics was introduced by W. Hodges ([14],[15]) in order to provide a compositional presentation of the (game-theoretically defined) semantics of Independence-Friendly logic ([13],[20]). In the following years, team semantics has been used to extend first-order logic with database dependencies (e.g. Dependence logic [31], Independence logic [10], Inclusion logic [9]); similar approaches have been applied to propositional logics ([40],[41]) and modal logics ([32], [30], [1]). Appropriate generalizations of teams have been used as descriptive languages for probabilistic dependencies ([7]), for quantum phenomena ([16]), for Bayes networks ([6]).

The basic idea of team semantics is that notions such as dependence and independence, which express properties of relations (instead of individuals), cannot be captured by Tarskian semantics, which evaluates formulas on single assignments3; the appropriate unit for semantical evaluation is instead the team, i.e., a set of assignments (all sharing a common variable domain). In the standard approach, all the values of the variables come from a unique domain of individuals associated with an underlying model. However, in order to model causal and counterfactual dependence, we shall need to relax the assumption that variables may take as values only individuals coming from a common domain. Instead we shall take variables to represent properties which may have all kinds of values (as specified by their range). That is, once a set of variables is fixed, each assignment will be a mapping such that for each . A team of domain will be any set of such assignments.

As an example, recall Woodward’s DAG corresponding to the structural equations (6) and (7) (subsection 2.2). We may take to express the property “(whether it is) winter”, the property “(whether it is) cloudy” and the property “(whether it is) snowing” and take them to be represented in the team :

The basic semantic relation is now : the team satisfies the formula .

One can define a team semantics already for classical propositional languages. However such a semantics does not really add anything new, in the sense that

if and only if for all , (in the Tarskian sense).

That is, the meaning of a first-order formula is always reducible to a property of single assignments. However, once their semantics is expressed in terms of teams, propositional languages can be extended in ways that would be unavailable within Tarskian semantics; for instance, one can add (functional) dependence atoms whose semantics is defined by

(*)

if and only if for all , if , then

expressing that is functionally determined by ; and this is a global property of the team, not reducible to properties of the single assignments4.

Thus, in our example, it holds that whether it is snowing depends completely on whether it is winter, , and whether it is cloudy, ; and whether it is cloudy depends entirely on whether it is winter, .

A team therefore might be used, for example, to represent a set of individual records coming from a statistical or experimental investigation; or, to represent all possible configurations that are compatible with the ranges of each variable; or yet, a subset of all possible configurations. This last case is particularly important in our context: it may well happen that some configurations are forbidden, even though they respect all variable ranges; this in particular happens if there are functional dependencies between the variables. For what regards the first possibility (analysis of statistical data) for many purposes it may be more suitable to use multiteams, which allow multiple copies of assignments. Team-theoretical logical languages could thus be used to express global properties of distributions of values. On a second, different interpretation, teams and multiteams may be used to represent epistemic uncertainty about the current state of affairs; if one thinks of each assignment as a possible world, then the team may be thought as representing a set of equally plausible worlds. An intervention on such an object should, therefore, produce a new set of equally plausible candidates for the actual world.

3.2 Causal teams

Despite some claims to the contrary in the literature, teams are insufficient to represent counterfactuals and causal notions based on them. As explained in subsection 2.2, a proper treatment of causal notions requires an account of counterfactual information, as encoded e.g. in invariant structural equations. It is true that a team sustains a number of functional dependencies among variables; but such dependencies may well be contingent, and disappear if the system is intervened upon. For this reason, we must extend teams so that they incorporate invariant dependencies or functions; and we must explain what may count as an intervention on a team.

Besides the ideal case in which a causal model contains a complete description of the functions involved in the structural equations (parametric case), we develop our semantics with enough generality as to accomodate the more realistic case in which we possess only partial information about the functions (nonparametric case); as an extreme case, the model might only incorporate information as to which functional dependencies are invariant. In the nonparametric case, not all counterfactual statements can be evaluated; the additional logical complications related to this case will be examined in section 5. Most of the paper will focus on the parametric case.

Before proceeding, we want to fix some notational conventions. As is often done in the literature on causal models, we use the symbol ambiguously, so that it may mean either the set of parents of , or a sequence of the same variables in some fixed alphabetical ordering. For other sets/sequences of variables, we will adhere to the following conventions:

Notation 3.1.
  • We use boldface letters such as X to denote either a set of variables or a sequence of the same variables (in the fixed alphabetical order)

  • We use x to denote a set or sequence of values, each of which is a value for exactly one of the variables in X. We leave the details of these correspondences between variables and values as non-formalized.

  • Writing we mean the set/sequence of values that the assignment assigns to each of the variables in

  • is an abbreviation for

  • By we denote the set/the sequence (in alphabetical order) of variables occurring in but not in , and by a corresponding set/sequence of values

  • By we denote the set/sequence of variables occurring in both and , and by the corresponding set/sequence of values

and so on.

Given a team and a variable , we write for the set of values that are obtained for in the team ; that is, . As before, we say that a team satisfies a functional dependence , and we write , if:

Definition 3.2.

A causal team over variable domain with endogenous variables is a quadruple , where:

  1. is a team.

  2. is a graph over the set of variables. For any , we denote as the set of all variables such that the arrow is in .

  3. (where the may be arbitrary sets) is a function which assigns a range to each variable

  4. is a function that assigns to each endogenous variable a -ary function
    (for some )

which satisfies the further restrictions:

  1. for each

  2. If , then

  3. if is such that , then .

In case for each , we say the causal team is parametric; otherwise it is nonparametric.

Clause b) is there to ensure that whenever the graph contains an arrow , and is the maximal set of variables whence arrows come to , then the team satisfies the corresponding functional dependency . Clause c) further ensures that such functional dependency is in accordance with the (partial description of the) function .

The functional component induces an associated system of structural equations, say

for each variable .

Example 3.3.

Consider a causal team which has underlying team , graph , ranges , and partial description of (one value of) the invariant function for : . We represent the and components of by means of a decorated table:

U  X   YZ

3.3 Explicit causal teams

For many purposes – first of all, to keep a smoother account of the operations of taking subteams, and of applying iterated interventions – it will be convenient to restrict attention to causal teams of a special form. This restriction causes no loss of generality within the developments pursued in the present paper.

Definition 3.4.

A causal team with endogenous variables is explicit if, for every , the following additional condition holds:

d) Let be the list of the parents of in the fixed alphabetical order. For every , .

(Here, as in the definition of causal team, is a shorthand for ).

In words, the component of an explicit causal team encodes all the information of the team that concerns invariant functions; no further values of the functions can be reconstructed from the team component .

Given any causal team, it is always possible to construct in a canonical way an explicit causal team that corresponds to it, in the sense that it encodes exactly the same information (but it is more stable under the operations of taking subteams or interventions – to be defined in the following subsections). For this purpose, given a causal team with endogenous variables , to any variable we may associate an explicit function as follows; given , define



(Conditions b) and c) in the definition of causal team ensure that is well-defined.)

We collect these explicit functions in a single function such that, for each variable , . Then:

The explicit causal team associated to is .

3.4 Causal subteams

It will be important, in order to define a semantics for our languages, to talk about causal subteams. A causal subteam of a causal team is meant to express a condition of lesser uncertainty; this will be encoded by the fact that the assignments in form a subset of the assignments of , which may be interpreted as the fact that less configurations are considered possible. At the same time, the transition to a subteam should not erase information concerning the graph, the ranges of variables, and the invariant functions. The definitions in the previous subsection should make it clear that this proviso is easily guaranteed if is an explicit causal team. In this case, we can define:

Definition 3.5.

Given an explicit causal team , a causal subteam of is a causal team with the same domain and the same set of endogenous variables, which satisfies the following conditions:

  1. .

In case the team is not explicit, what can go wrong is that, for some endogenous variable , there may be some assignment such that (where list in alphabetical order); in such case, the team encodes the fact that the invariant function which produces assigns, to the list of arguments , the value ; but this information is lost in the subteam . To avoid this problem, we can define more generally:

Definition 3.6.

Given a causal team , a causal subteam of is a causal subteam of the associated explicit team (as defined in subsection 3.3)

This second definition obviously coincides with the previous one over explicit causal teams.

3.5 A basic language and its semantics

Before discussing interventions and counterfactuals, we need to specify what it means for a causal team to satisfy atomic formulas and their boolean combinations. The kind of language we consider, for now, contains atomic dependence statements of the form ; atomic formulas of the forms and , where and ; connectives and .

By analogy with the other kinds of team semantics that have been proposed in the literature, we can define satisfaction of a formula by a causal team by the clauses:

  • if for all , implies .

  • if, for all , .

  • if, for all , .

  • if and .

  • if there are two causal subteams of such that , and .5

3.6 Selective implication

Our main goal is to give an exact semantics to counterfactual statements of the form “If had been the case, then would have been the case”. Very often, however, one find examples in the literature where these statements are embedded into a larger context. We have seen that von Wright ([35]) considers examples of the form “on occasions where , in fact was not the case, would have accompanied it, had been the case”. Pearl ([22]) analyzes the following query: “what is the probability that a subject who died under treatment would have recovered had he or she not been treated ?

The appropriate representation of the last statement seems to be:

where the symbol stands for counterfactual implication, while the selective implication is a form of restriction of the range of application of the counterfactual to the available evidence. What it does is to generate a subteam by selecting those assignments which satisfy the antecedent; and then it is checked whether the consequent holds in this subteam.

Given a causal team , and a classical formula (that is, a formula as in subsection 3.5, but without dependence atoms) define the subteam by the condition:

  • .

Then, we define selective implication by the clause:

  • iff .

Here the consequent can be any formula of our current logical language; therefore, it might happen not to be a property of single assignments. Instead, we require for now the antecedent to be classical. The general idea is that selective implication is a reasonable operator only for antecedents which are flat formulas, in the sense with which this word is used in the literature on logics of dependence (which will be reviewed in the following subsections). Actually, typical applications involve at most conjunctions of atomic formulas of the type .

Example 3.7.

We observe that the selective implication

holds on any causal team that is based on the team which is depicted in the figure:

Z  Y  X

To see that the formula holds on it, we have to construct the reduced subteam :

Z  Y  X

which is obtained by selecting the third and fourth row of the previous table (the rows that satisfy ). We can see, then, that is satisfied by each row of this smaller table. Therefore, by the semantical clause for this kind of atomic formulas, . Then, the semantical clause for selective implication allows us to conclude that .

Notice that we only needed the team structure in order to evaluate a selective implication; all the information required to evaluate it is already encoded in the team, and no information about the structural equations or the underlying DAG is needed.

3.7 Interventions on teams: some examples

Our goal is to define (interventionist) counterfactual implication. The task is more complicated than in the case of selective implication; we first illustrate the idea with some examples; formal definitions will be provided after that. Informally, the idea is that a counterfactual is true in the causal team if is true in the causal team which results from an intervention applied to the team .

Example 3.8.

Consider any causal team with , and underlying graph (we omit specifying the and components). We can represent it as an annotated table:


X  Y  Z

We want to establish whether the counterfactual holds in . The idea is to intervene in by setting the value of to (this corresponds to replacing the function with the constant function ); updating all other variables that invariantly depend on ; and removing all the arrows that enter into . The causal team thus produced will be denoted by . So, first we intervene on :


X   Y  Z

is the only variable which has an invariant dependence on ; therefore, we have to update its column. Since the function that determines has as its only parameter, we just have to consult and see that, in rows where has value , takes value 3. Therefore, looks like this:


X  Y  Z

(Again, we are omitting a representation of the and components). Now , therefore we conclude that . The team describes a counterfactual situation in which we have certainty about the values of and , but not about the value of .

Notice that this example uses both the team and the graph structure, but the specific invariant functions are not needed in the evaluation of this specific sentence. However, the observations above do not cover, for example, evaluation of counterfactuals of the form , because team and graph structure do not tell us anything about what value should take in circumstances in which .

Example 3.9.

We consider a slightly less trivial example. Here we evaluate the counterfactual in the causal team shown in the picture below

X  Y  Z

(The components and are omitted as before). So we must produce again . First we intervene on

XY   Z

and next we update the value of , taking into account both the values of and in each row (since is reached both by arrows coming from and from ). Notice that now two of the modified rows become identical, so that only one appears in the table:

X  Y  Z

This team can be partitioned into two causal subteams:

X  Y  Z
X  Y  Z

the first of which satisfies , while the second satisfies . Therefore, by the semantical clause for disjunction, satisfies . We may conclude that .

Example 3.10.

Here we present an example involving nonparametric teams. Our goal is to evaluate in the nonparametric team shown in the picture:

U  X   YZ

Here we are assuming that ; ranges might be given, for example, by . In order to evaluate we need to generate the causal team . First we intervene on ; this will affect all descendants of , which in this case are the (children) and .

UX    Y     Z

Notice, however, that we cannot yet evaluate (which is a function of and ) unless we first update :

UX   Y    Z

Finally we update , but we have a surprise:

UX   Y         Z

The information contained in the team is insufficient for the evaluation of the -value of the last row of . Since the triple is not in the domain of , the best we could do is to fill that part of the table with a formal term. Here we wrote as a formal symbol distinguished from the function . In general, in case of repeated interventions, we can expect also complex terms, with incapsulated function symbols, to be produced. Notice however that we have no uncertainties about the column; so, it is natural to state that , and that, therefore, . What if we were instead trying to evaluate some statement about under these same counterfactual circumstances? We might renounce completely to assign truth values to such statements; or we might perhaps add a further truth value, which holds of such statement if the statement is admissible given the form of the terms involved. A precise semantical definition is not trivial, and we address it later.

Example 3.11.

Consider now a causal team which is identical to that which was considered in the previous example, except for that its functional component, instead of being empty, contains at least a partial description of the invariant function for ; we assume that is such that , and that . If we now apply the intervention to this causal team, we can use this extra information to explicitly evaluate the last entry of , obtaining the causal team

: UX   YZ

3.8 Formal definition of interventions and counterfactuals

All the examples in the previous subsection have one aspect in common: the graphs underlying causal teams are acyclic. In this case the corresponding causal team is called recursive, by analogy with the literature on structural equation models; the examples show that, for these kinds of causal teams, the notion of intervention is naturally conceived in algorithmic terms. We now move towards a precise definition.

We have learned a number of lessons from the previous examples:

  1. An intervention amounts to 1) setting the whole -column to ; 2) eliminating all arrows that enter into ; 3) updating the columns that correspond to descendants of .

  2. It might not be possible to update all the descendants of in a single step (actually, the order of updating might not be trivial to decide).

  3. The information encoded in the causal team might be insufficient for generating, under intervention, a proper causal team; we must then admit teams which assign formal terms to some variables.

We begin by addressing this last problem. Given a graph whose set of vertices is a set of variables, we call the set of function symbols (of arity ), for each (actually, we only need one symbol for each endogenous variable, that is, for variables which have indegree at least one). We call -terms the terms generated from variables in and from symbols in by the obvious inductive rules; the set of -terms will be denoted as . When speaking of a causal team , we will implicitly assume, from now on, that the ranges of all variables contain the set of terms . Actually, it is not difficult to prove that (iterated) interventions (as defined below) on a recursive causal team with finite variable ranges can generate only a finite number of formal terms. Therefore, in principle a finite causal team could always be extended to a finite causal team with formal terms, by using an appropriate finite subset of to extend the ranges of variables.

We may combine these ideas with those of subsection 3.3. Given a causal team , let be the causal team obtained by extending the ranges of variables with formal terms, as described above. Form then the corresponding explicit team following the construction given in subsection 3.3. We call the fully explicit causal team corresponding to . The invariant functions of now encode sufficient information for performing any kind of intervention that we will consider in this paper. The fact that the ranges of variables contain formal terms solves issue C. The fact that the team is explicit ensures that no information is lost after an intervention.

Now we address problem . How do we decide the order of updating for the columns? To understand the problem, observe the graph in the figure:

is connected to by an arrow, so one might think that, in an intervention on , it is possible to update immediately after updating . However, this is impossible, because also the updated values of are needed in the evaluation of the new values for . The point is that, from to , there is a longer path than the direct one. This suggests defining a distance from , , given by the maximum length of paths from to . Nodes at distance 1 can be immediately evaluated after updating ; at this point it is safe to update nodes at distance 2; and so on. Nodes that are not accessible by directed paths from will be assigned a negative distance and excluded from the updating procedure. Of course, this strategy will work provided there are no loops in the graph, or at least no loops in the part of the graph which is accessible by directed paths from (the set of descendants of ). Since in the applications it is very common to have conjunctive interventions, say , we will define, more generally, the notion of distance from a set of variables . In this more complex case, one must consider a reduced graph in which all arrows entering in have been removed, and examine the directed paths of this reduced graph. The reader may think of the case for ease of visualization.

Definition 3.12.

Given a graph and ,

  • We denote as the graph obtained by removing all arrows going into some vertex of (i.e., an edge is in iff it is in and ). Notice that, in the special case that , the set of directed paths of starting from coincides with the set of directed paths of starting from .

  • Let . We call (evaluation) distance between and the value
    . In case no such path exists, we set . Clearly, if the graph is finite and acyclic, for any pair . When the graph is clear from the context, we simply write .

Let be a fully explicit, recursive causal team of endogenous variables . Let and corresponding values with the additional property that, if and denote the same variable, then . We define the intervention (in short, ) as an algorithm6:

Stage . Delete all arrows coming into , and replace each assignment with . Denote the resulting team7 as . Replace with its restriction to .

Stage . If is the set of all the variables such that , define a new team by replacing each with the assignment .

End the procedure after step .

Notice that is not modified by the algorithm; and that, except for the modifications to , it would be the same to apply the algorithm to each assignment separately. In case the causal team is recursive but not fully explicit, we should begin the algorithm with an additional step:

Step . Replace with the corresponding fully explicit causal team .

In case the intervention is a terminating algorithm on , we define the causal team (of endogenous variables ) as the quadruple which is produced when is applied to . Actually, relaxing a bit the notion of algorithm, the algorithm applies as well to causal teams with infinite variable ranges:

Theorem 3.13.

If is a finite acyclic graph, then is well-defined.

Proof.

We can assume without loss of generality that is fully explicit.

Assume also, at first, that is finite.

Suppose that, for , . Then there is a path from some to such that . Then there is a node which is crossed at least twice by ; so contains a cycle: contradiction.

Therefore, ; this means that the “for” cycle in the algorithm goes through a finite number of iterations over the variable .

Finally, notice that, for each , there are a finite number of variables such that (due to finiteness of ) and a finite number of assignments in the team that is undergoing modification (due to the finiteness of ). Therefore, the algorithm terminates after a finite number of steps.

If instead is infinite, we can replace with an infinitary algorithm which, in each iteration of the “for” cycle, performs simultaneously the substitution for all the assignments in the current team. By the arguments above, such an “algorithm” terminates and yields a well-defined causal team. ∎

In case a causal team is not recursive (i.e., its graph is cyclic), the algorithm above may well fail to terminate. In this case, a definition of the intervention could still be given in terms of the set of solutions of the modified system of structural equations (in which the equations for are replaced by ), provided the team is parametric. Each assignment is a solution of the system of structural equations encoded in (this is ensured by parametricity and conditions a) and c) in the definition of causal team). Galles and Pearl ([8]) consider the case of systems with unique solutions, defined as follows: 1) for fixed values of the exogenous variables, the system has a unique solution, and 2) each “intervened” system of equations obtained from the initial one replacing some equations of the form with constant equations still has a unique solution for each choice of values for the exogenous variables. In case the component of a team encodes a system with unique solutions, then the natural way to define an intervention on the team is to replace each assignment with the (unique) assignment which encodes the solution of the intervened system for the choice of values to the exogenous variables8. The definition of the other components of the causal team produced by the intervention is straightforward. Over recursive parametric causal teams the team so obtained coincides with the causal team that is produced by the algorithm. Interestingly, for the unique-solution causal teams an algorithmic definition of interventions is still available. The algorithm discussed above is still correct, if we replace the notion of distance with:

This notion of distance guarantees that the value of each variable (unique for each assignment) is computed only once; and that all the distances are finite, which entails that the algorithm is terminating. The fact that interventions are computable is counterbalanced by the fact that membership in the unique-solution class is not in general decidable.

It is not difficult to extend the previous ideas to systems that may have no solution at all. In case the intervened system obtained replacing the appropriate equations with has no solution corresponding to the choice for the exogenous variables, then no assignment should correspond to in the intervened team . Although this extension is straightforward, we can expect significant changes in the underlying logic.

The case of systems with multiple solutions is more problematic. If many solutions correspond to an assignment of the initial team, should we include them all in the intervened team? And, for what regards the probabilistic developments of the following sections, should we consider all the assignments thus produced as equiprobable? These matters should probably be settled according to the kind of interpretation we want to give to nonrecursive causal teams in a given application, and one is lead to the classical problems of interpretation of nonrecursive causal models (see e.g [28]). It might also be reasonable to model such an intervention as producing not one, but multiple teams, corresponding to possible different outcomes of the intervention. This set of “accessible teams” would then induce a nontrivial modality, and it would then be reasonable to treat counterfactuals as necessity operators in a dynamic logic setting (in the spirit of [11]). This would lead us far away from the approach of the present paper, in which we focus on the nonproblematic recursive case. We now return to it.

Having defined the intervened team , we are immediately led to a semantical clause for counterfactuals of the form :

In case the antecedent is inconsistent (i.e., it contains two conjuncts with ), the corresponding intervention is not defined; in this case, we postulate the counterfactual to be (trivially) true.

3.9 Logical languages

We call the (basic) language of causal dependence, , the language formed by the following rules:

for variables, values, formulae of , classical formula.

If we interpret this language by the semantical clauses introduced so far, it can be shown that:

Theorem 3.14.

The logic is downwards closed, that is: if , is a parametric causal team with at most unique solutions, is a causal subteam of , and , then also .

Proof.

We prove it by induction on the synctactical structure of . The atomic and propositional cases are routine.

Suppose . Then . Now ; therefore, by inductive hypothesis, . Thus, by the semantic clause for selective implication, .

Suppose