Causal Inference with a Graphical Hierarchy of Interventions

# Causal Inference with a Graphical Hierarchy of Interventions

\fnmsIlya \snmShpitser\thanksrefm1 label=e1]ilyas@cs.jhu.edu [    \fnmsEric \snmTchetgen Tchetgen\thanksrefm2 label=e2]etchetge@hsph.harvard.edu [ Johns Hopkins University\thanksmarkm1 and Harvard University\thanksmarkm2 Department of Computer Science
Johns Hopkins University
3400 N Charles Street
Baltimore, Maryland 21218
School of Public Health
Harvard University
677 Huntington Avenue
Kresge Building
Boston, Massachusetts 02115
###### Abstract

Identifying causal parameters from observational data is fraught with subtleties due to the issues of selection bias and confounding. In addition, more complex questions of interest, such as effects of treatment on the treated and mediated effects may not always be identified even in data where treatment assignment is known and under investigator control, or may be identified under one causal model but not another.

Increasingly complex effects of interest, coupled with a diversity of causal models in use resulted in a fragmented view of identification. This fragmentation makes it unnecessarily difficult to determine if a given parameter is identified (and in what model), and what assumptions must hold for this to be the case. This, in turn, complicates the development of estimation theory and sensitivity analysis procedures.

In this paper, we give a unifying view of a large class of causal effects of interest, including novel effects not previously considered, in terms of a hierarchy of interventions, and show that identification theory for this large class reduces to an identification theory of random variables under interventions from this hierarchy. Moreover, we show that one type of intervention in the hierarchy is naturally associated with queries identified under the Finest Fully Randomized Causally Interpretable Structure Tree Graph (FFRCISTG) model of Robins (via the extended g-formula), and another is naturally associated with queries identified under the Non-Parametric Structural Equation Model with Independent Errors (NPSEM-IE) of Pearl, via a more general functional we call the edge g-formula.

Our results motivate the study of estimation theory for the edge g-formula, since we show it arises both in mediation analysis, and in settings where treatment assignment has unobserved causes, such as models associated with Pearl’s front-door criterion.

\startlocaldefs\endlocaldefs\runtitle

Hierarchy of Interventions

{aug}

and

## 1 Introduction

The goal of the empirical sciences is discerning cause-effect relationships by experimentation and analysis. This is made difficult by the ubiquity of hidden variables, and the difficulty of collecting data free from confounding and selection bias. Two useful frameworks for addressing these difficulties have been potential outcomes, introduced by Neyman [8], and expanded by Rubin [21], and causal graphical models, first used in linear models by Wright [35], and later expanded into a general framework (see for example [30], and [11]). There exists a modern synthesis of these two frameworks, where causal models based on non-parametric structural equations are defined on potential outcome random variables, and assumptions defining these models can be represented by (absences) of arrows in a graph. See [11] chapter 7, and [13] for a detailed treatment.

Potential outcome random variables represent outcomes under a hypothetical intervention operation, which corresponds to an idealized randomized control trial. Concepts such as the overall causal effect of a treatment can be represented as causal parameters on appropriate potential outcomes, and as statistical estimands if appropriate assumptions hold.

The synthesis of potential outcomes and graphs has been instrumental in much of the recent work on identification of various types of causal parameters such as total effects [14, 33, 25, 26, 27], and mediated effects [10, 1, 24].

Nevertheless, the existing literature suffers from three problems. First, a single graph may correspond to different causal models, which means a particular causal parameter may be identified under one causal model, but not under another, even though the models share the same graph. Second, different types of causal parameters seem to have different key issues underlying their identification, which makes it difficult to determine the specific assumptions that must hold for identification. For instance, certain types of unobserved confounding must be absent in order for overall effects to be identifiable, while even completely unconfounded mediated effects may be unidentified [1]. Finally, because of the complex nature of identification theory for causal parameters, existing conventional wisdom on what is identifiable is too conservative. For example, it is often assumed that a mediator and outcome must remain completely unconfounded in order to obtain identification of mediated causal effects. However, this is not true [24].

These issues make it difficult to determine if a particular causal parameter is identified, and under what model, what assumptions underlie this identification, and what the corresponding statistical parameter is. This complicates estimation theory, the development of parametric relaxations that permit identification, and sensitivity analysis procedures.

### 1.1 Outline of the Paper

The contents of the paper can be summarized by a picture in Fig. 1. In section 2, we introduce our notation, necessary graph theory, standard interventions (which we call node interventions in this manuscript) and potential outcomes, which are responses to node interventions. We also introduce the FFRCISTG model of Robins, which in this paper we call the “single world model (SWM),” and the NPSEM-IE of Pearl, which is a submodel of the FFRCISTG model, and which we call the “multiple worlds model (MWM).” The reasons for these names will become clear when these models are defined. The subset relationship of these two models is shown explicitly in Fig. 1. Finally, we discuss targets of interest in causal inference known as total effects, which are defined in terms of node interventions, and discuss identification theory for these targets under the SWM via the extended g-formula.

In section 3, we define additional types of interventions, that we term edge and path interventions, and responses to these types of interventions via recursive substitution. Responses to node, edge and path interventions form an inclusion hierarchy in the sense that responses to node interventions are a special case of responses to edge interventions, which are in turn a special case of responses to path interventions. This inclusion is denoted by the subset relations in Fig. 1. We also discuss how targets of inference in mediation analysis known as direct and indirect effects are defined in terms of edge interventions.

In section 4, we show how we can express a wide variety of targets of interest in causal inference, such as path-specific effects (PSEs) or effects of treatment on the multiply treated (ETMTs) as responses to path interventions. In addition, we show that path interventions are general enough to accommodate novel targets which combine features of PSEs and ETMTs, which we call effects of treatment on the indirectly treated (ETITs). Our results then imply novel identification results for these targets, and others not previously considered in the literature, but expressible as path interventions.

In section 5, we show that there is a natural correspondence between causal models and intervention types we discuss in the following sense. We show that responses to node interventions are identified under the SWM, and responses to edge interventions are identified under the MWM. Furthermore, we show that if a response to an edge intervention cannot be expressed as a node intervention, then it is not identified under the SWM, and if a response to a path intervention cannot be expressed as an edge intervention, then it is not identified under the MWM.

The identification of node interventions under the SWM is via the well known extended g-formula [20, 13], which we give as equation (2). The identification of edge interventions under the MWM is via a generalization of (2), which we call the edge g-formula, and give as equation (5).

We also give examples of targets of interest in causal inference that do not correspond to responses to path interventions, as well as an example of a submodel of the MWM where even path interventions not ordinarily identified under the MWM are identified.

In Section 6 we briefly discuss the relationship of our results to Single World Intervention Graphs (SWIGs) [13].

Section 7 shows that a certain class of functionals that identify causal effects in latent variable causal models [33, 25] corresponds to functionals derived from the edge g-formula. This implies, in particular, that functionals that arise for treatment effects with unobserved causes of treatments, such as the front-door functional, also arise in mediation analysis.

In section 8, we illustrate the connection of our work to existing estimation theory for causal parameters, and suggest avenues of future work, by giving a known example of an estimator for a parameter derived from a special case of the edge g-formula.

What the overall picture implies is that once we solve the identification problem for the responses to interventions in our hierarchy, as we do here, we immediately reduce the identification problem for a wide class of targets of interest to the much easier problem of translating those targets into responses to path interventions. Once that translation is complete, the question of what is identified under what model is immediately settled. In addition, our developments imply that estimation theory for functionals derived from the edge g-formula is relevant for a large class of inference targets identified under the MWM, including path-specific effects, effects of treatment on the multiply treated, and certain total causal effects with unobserved causes of treatments.

In the interests of space, the vast majority of arguments for our results appear in the appendices in the supplementary materials [29]. In addition, the supplementary materials contains our rationale for the use of path interventions, rather than simpler or more algebraic representations of causal inference targets.

## 2 Notation and Definitions

We introduce graph theory terms, potential outcomes, and statistical and causal graphical models.

### 2.1 Graphs and Random Variables

We will associate random variables with vertices in graphs. We will denote both a single vertex and a single corresponding random variable as an uppercase Roman letter, e.g. . Sets of vertices (and corresponding random variables) will be denoted by uppercase bold letters, e.g. .

For a random variable , let be the state space of . For example if is binary, then . We denote elements of a set (values of ) by lowercase Roman letters: . The state space of a set of random variables is simply the Cartesian product of the individual state spaces: .

Sets of values corresponding to sets of random variables will be denoted by lowercase bold letters, e.g. . Sometimes we will denote a restriction of a set of values by a set subscript. That is if is a set of values of , and , then is a restriction of to .

An edge in a graph is a vertex adjacency coupled with an orientation. A path in a directed graph is a (possibly empty) sequence of nodes of the form , where each node in the sequence occurs exactly once, and each share an edge. The first vertex in a path sequence is called the source, and the last vertex is called the sink. A path with two vertices is just an edge.

A subpath of a path is a subsequence of edges in a path that themselves form a path. A suffix subpath of is a subpath of the form , while a prefix subpath is a subpath of the form . A directed path from to has edges for every of the form . We will denote a directed path as , and also by Greek letters, e.g. , and sets of directed paths by bold Greek letters, e.g. . A source vertex of will be written , and the sink vertex will be written .

We say a directed cycle exists in a graph if it contains a path and an edge . A directed graph lacking directed cycles is called acyclic, abbreviated as DAG.

### 2.2 Causal Models of a DAG

For a subset of random variables , and a value assignment to , we denote a forced assignment of to an element of as a node intervention. A node intervention which maps to will be denoted by . Pearl denoted node interventions by , and Robins by . We use alternative notation in this paper to avoid ambiguity, because we will consider other types of interventions. It is also possible to consider more complex types of interventions on nodes, known as dynamic treatment regimes, where assigned values to are not constants, but functions of variables assigned and observed in the past [14, 7, 6]. Although generalizations of our results to this setting are possible, we do not pursue them in the interests of space.

For a random variable , and for a set , we denote a (random) response to a node intervention as . These random variables are also called potential outcomes, because is often an outcome of interest, and the intervention is often hypothetical, rather than actually occurring. Given a set of random variables, we denote by or .

Let be the set of parents of in , that is the set . Following [13], given a DAG with vertices , we will assume the existence of for every , and for all , as well as a well-defined joint distribution over these random variables, and use these potential outcomes, and the associated joint, to define others using recursive substitution.

In particular, for any , and any , we define for every

 V(a)≡V(apaG(V),{paG(V)∖A}(a)) (1)

In words, this states that the response of to is defined as the potential outcome where all parents of which are in are assigned an appropriate value from , and all other parents are assigned whatever value they would have attained under a node intervention (these are defined recursively, and the definition terminates because of the lack of directed cycles in ). For example, in the graph in Fig. 2 (a), .

It is possible to construct additional types of potential outcomes other than those that are responses to node interventions. We will discuss some such potential outcomes later. However, responses to node interventions are sufficient to define causal models. Just as a statistical model is a set of distributions over defined by some restriction, we view a causal model as a set of distributions over defined by some restriction. We will call elements of a causal model causal structures, and denote them as , by analogy with , but indexed by a graph. In this paper we will consider two causal models.

We adopt the definitions presented in [13]. We define the finest fully randomized causally interpretable structured tree graph (FFRCISTG) model associated with a DAG with vertices , as the set of all possible potential outcome responses subject to the restriction that the variables in the set

 {V(vpaG(V))∣V∈V}

are mutually independent for every . We define the non-parametric structural equation model with independent errors (NPSEM-IE) associated with a DAG with vertices , as the set of all possible potential outcome responses subject to the restriction that the sets of variables

 {{V(aV)∣aV∈XpaG(V)}∣∣V∈V}

are mutually independent. The NPSEM-IE associated with a particular graph is a submodel of the FFRCISTG model associated with the same graph, because it always places at least as many restrictions on potential outcome responses, and in most cases many more.

For example, the binary FFRCISTG model associated with the DAG in Fig. 2 (a) asserts that variables , , , are mutually independent for any , while the binary NPSEM-IE model associated with the same DAG asserts that sets are mutually independent. The FFRCISTG model always imposes restrictions on a set of variables under a single set of interventions (a “single world”), while the NPSEM-IE may also impose restrictions on variables across multiple conflicting sets of interventions simultaneously. To emphasize this, we will refer to the FFRCISTG model as a “single world model” (SWM), and to the NPSEM-IE as a “multiple worlds model” (MWM) in the remainder of this paper.

A crucial difference between the SWM and the MWM, is that the assumptions of the former are possible to test, at least in principle, by checking independences in a distribution of responses in an idealized randomized controlled trial. That is, if we wanted to check if is independent of , we could check independence in a joint distribution obtained from recording, for a set of units, the values of immediately before treatment is assigned, and the response values of under that assignment. However, checking if is independent of would entail somehow knowing how the response of a unit behaves under assigned treatment , and simultaneously how the response of the unit behaves under a conflicting treatment (and ). One may be able to argue for explicit construction of such joint responses in certain designs [5], or for certain types of units, for instance logic gates in a digital circuit. However, in general, assumptions defining the MWM are not experimentally testable.

### 2.3 Identification of Node Interventions

Responses to interventions of various types can be used to define targets of interest, discussed in more detail in Section 4. However, in order for these definitions to be useful, they must be linked to actually observed data. If such a link can be provided, that is, if a particular response can be expressed as a functional of the observed joint distribution for any element of a causal model, we say that the response is identified under that causal model from .

In causal models, this link is typically provided via the consistency assumption, which is sometimes informally stated as “in the subpopulation where , behaves as .” Under the definition of the SWM (and the MWM), consistency is implied by (1), see [13], p. 21. Thus, consistency is “folded in” to the model definition. Thus we will describe identification in terms of a particular model, and not mention consistency itself. Note that (1) is an assumption defined using a particular graph. If we are mistaken about the true graph, for instance due to the presence of unaccounted hidden variables, then some parts of (1), and thus some parts of the consistency assumption, may not be justifiable under the true causal model.

Identification theory for node interventions in causal DAG models is well understood. Given a DAG with vertices , and two arbitrary subsets of (not necessarily disjoint), the distribution for any value assignment can be identified under the SWM as a functional of the observed distribution using the extended g-formula [20], given by

 p(Y(a)=vY)=∑vV∖Y∏V∈Vp(vV∣apaG(V)∩A,vpaG(V)∖A) (2)

where . A recent proof of this appears in [13]. Special cases of (2) where and are disjoint are known as the g-formula [14], the manipulated distribution [30], or the truncated factorization [11]. Because the MWM is a causal submodel of the SWM, (2) also holds under the MWM.

### 2.4 Total Effects as Responses to Node Interventions

Node interventions are used to represent causal effects of treatments as a contrast of potential outcome responses to different treatment assignments. By considering an intervention we remove the impact of confounding via assignment policy. For example, consider the simple causal graph shown in Fig. 2 (a), representing an observational study with a single application of one of two treatments . Variable is assigned to either or based on (observed) patient health status (), and survival is measured. Doctors follow a known policy in assigning where sicker patients are more likely to get . Note that may hold simply due to the assignment policy in the study which introduces confounding by health status, even if is a better drug.

One appropriate contrast that adjusts for the influence of confounding by health status on the effect of interest can be expressed via node interventions, and is known as the average causal effect (ACE): . This contrast can be computed from the distribution for all , which is equal, under (2), to

 p(Y(m))=∑w,a,m′p(Y∣m,a,w)p(m′∣a,w)p(a,w)=∑w,ap(Y∣m,a,w)p(a,w).

This recovers the well-known back-door formula [11].

Consider now a more complex example corresponding to the following problem from HIV research. In a longitudinal study, HIV patients were put on an antiretroviral drug regimen, where the specific level of drug exposure over time was controlled by a known policy, which was based on covariates observed for each patient. However, the outcome of the study has been disappointing. The question is whether this was due to the drug itself performing poorly, or whether patient’s adherence was poor. Consider a causal graph representing two time slices of this longitudinal study. To avoid cluttering the figure with too many edges, we represent the causal graph schematically by its transitive reduction with respect to blue edges, shown in Fig. 2 (b). That is, the true graph contains a blue arrow between any pair of nodes connected by a blue directed path in Fig. 2 (b) (and inherits all red edges as well).

Here is a vector of observed baseline confounders, are exposures over time, are drug toxicity levels at each exposure time, are adherence levels at each time, are outcomes, and is an unobserved confounder. Both red and blue arrows represent direct causation. In general, a reasonable causal graph will contain unobserved common causes of most vertices, but in this example we assume adherence , and treatments are only directly affected by the observed variables in the past, such as the toxicity level of the drug, and not by . These assumptions are represented graphically by the absence of red edges from to .

We first consider the total effect of the two exposures on outcome , formalized as the two-exposure version of ACE. We consider more complex effects involving mediation by adherence in subsequent sections. The ACE contrast is defined with respect to active treatment levels, which we denote , and baseline treatment levels, which we denote . In our case, the contrast is equal to . If we were able to randomize treatment assignment to , we could evaluate the ACE directly from experimental data. However, our data comes from an observational longitudinal study, and therefore we must properly adjust for observed confounders of the exposures. Robins [14] noted that in cases like these, assuming the underlying SWM represented by our graph is correct, we can get a bias-free estimand of the ACE from observational data using the g-computation algorithm, which in this case gives

 ACE=∑y1,c1,w1,c0 E[Y2∣a2,y1,c1,w1,a1,c0]p(y1,c1,w1∣a1,c0)p(c0)− ∑y1,c1,w1,c0 E[Y2∣a′2,y1,c1,w1,a′1,c0]p(y1,c1,w1∣a′1,c0)p(c0)

This is, yet again, a special case of (2). This estimand can be estimated via either the parametric g-formula [15], inverse weighting methods [19], or doubly robust methods [18].

In the following section, we introduce intervention types that generalize node interventions, and consider other types of causal effects which may be represented as responses to such intervention types.

## 3 Edge and Path Interventions

We consider two additional types of interventions defined on graphical features, edge and path interventions, and define responses to these interventions using recursive substitution in a natural way. As we shall see, responses to path interventions include many targets of interest in causal inference, including effects of treatment on the treated, mediated effects, and even novel effects that combine features of both.

### 3.1 Edge Interventions

For a set of edges in a DAG , define . In other words, is a Cartesian product of the state spaces of source variables of all directed edges in .

The state space of a given vertex in may occur multiple times in if multiple edges in share the same source vertex. We denote members of by lowercase Frankfurt font: . We do so to emphasize that elements of may contain multiple conflicting value assignments to the same random variable, unlike elements of . For example, consider the graph in Fig. 2 (a), where . Then if , a valid element of associates with the variable associated with the parent vertex of and with the variable associated with the parent vertex of . Unlike elements of , it is not immediately clear what set of edges is referring to, so we will subscript the set of edges if necessary, like so: .

We call a forced assignment of variables corresponding to source vertices of edges from to an element of an edge intervention. An edge intervention which assigns to an element will be denoted by . As with elements of , we denote a restriction of by a set subscript. That is, if , and , then is a restriction of to variables corresponding to source vertices of .

We define responses of outcomes to edge interventions in the natural way using recursive substitution, the potential outcomes of the form , and a joint distribution over these potential outcomes. For every , a set of edges in a DAG , and an element , we define the response of to as

 V(aα)≡V(a{(∗V)→∈α},{pa¯¯¯αG(V)}(aα)) (3)

where .

In words, this states that the response of to , where is defined as the potential outcome where all parents of along edges in are assigned an appropriate value from , and all other parents are assigned whatever value they would have attained under an edge intervention (these are defined recursively, and the definition terminates because of the lack of directed cycles in ).

As before, given a set of random variables, we denote by or .

### 3.2 Direct and Indirect Effects as Responses to Edge Interventions

Just as responses to node interventions can be used to represent total causal effects, so can responses to edge interventions be used to represent direct and indirect effects. Consider again Fig. 2 (a), but now assume is the treatment (one of two drugs ), is the outcome (survival), and is a dangerous side effect that mediates some of the effect of on .

We may be interested in how much of the total effect, as formalized via the ACE contrast , can be attributed to the direct effect of the drugs on , and how much to the mediated effect via the side effect . To formalize this, we want to consider how varies if we can set treatments separately for the purposes of the direct causal pathway represented by and the pathway mediated by , represented by . This is precisely what edge interventions allow us to do. Consider that sets to and to . Then (3) implies . We can use this type of response to define the direct effect as the contrast , and the indirect effect as the contrast . Note that the ACE is a sum of the direct and indirect effect contrasts above.

The idea of using nested responses like to represent direct and indirect effects for mediation analysis appears in [16], and is discussed in the context of graphical causal models in [10]. Our contribution is to aid interpretability of such nested responses by viewing them as responses to interventions associated with edges, graphical features intuitively associated with effects we are trying to formalize.

Just as it is good practice to only discuss node interventions in settings where it is possible, at least in principle, to assign treatment by fiat, so it is good practice to only discuss edge interventions in settings where it is possible, at least in principle, to conceive of assigning only those components of the overall treatment that influences a particular direct consequence. For instance, if smoking affects cardiovascular disease only by means of nicotine content, then we might simulate the absence of smoking, but only for the purposes of cardiovascular disease, by assigning the “treatment” of nicotine-free cigarettes. In this paper, we leave the issues of applicability of edge interventions and mediation analysis in particular settings aside [17], and consider, in subsequent sections, questions of identification and the form of resulting functionals.

### 3.3 Path Interventions

We are going to define responses to path interventions, which associate a set of directed paths with values of sources of every path in the set. A response to a path intervention will behave as if the source of a path were set to a particular value, but only for the purposes of a particular outgoing directed path. This behavior generalizes the behavior of edge interventions, where vertices may behave differently with respect to different outgoing edges. Path interventions serve as a very general, graphical representation of counterfactual quantities associated with causal pathways that generalizes both edge and path interventions. The supplementary materials [29] contain our rationale for the use of path interventions versus simpler or more algebraic approaches to representing counterfactuals of interest.

To make sure we end up with well-defined responses, we insist on a property for sets of directed paths called properness. A set of directed paths in a DAG is called proper if no path in is a prefix subpath of another path in . A set consisting of a single path is always proper, as is a set of length 1 paths (e.g. a set of edges). In the remainder of the paper, when we say “a set of paths ,” we mean a proper set of directed paths.

For a set of paths in a DAG , define . In other words, is a Cartesian product of the state spaces of source variables of all directed paths in . Since sets of paths clearly generalize sets of edges, the same issue occurs where a single vertex in may occur multiple times in . As before, to emphasize this, we will denote elements of by lowercase Frankfurt font: , possibly indexed by a path set subscript: .

We denote a forced assignment of variables corresponding to source vertices of paths from to an element of as a path intervention. A path intervention which assigns to an element will denoted by . As with elements of , we denote a restriction of by a set subscript. That is, if , and , then is a restriction of to variables corresponding to source vertices of .

As was the case with node and edge interventions, our definition of path interventions will be inductive. To get the induction to work, we need to consider how treatments affect the response via pathways that end in a particular edge. We use the following definition to formalize this. Given a set of paths in a DAG , and an edge , define a funnel operator which maps from to the set of paths obtained from by replacing any path of the form by , by removing all paths containing but no suffix , and keeping all other paths intact.

###### Lemma 3.1.

If is proper, then for any edge , so is .

Given a path intervention that assigns to , and a funnel operator , we consider funneled path interventions on . For every such that , the funneled path intervention assigns to , that is it keeps the same value assignment as the original path intervention. For the path the funneled path intervention assigns to , that is assigns the value given by the original intervention to . We denote such an assignment by .

Our insistence on being proper, together with Lemma 3.1, means that there is never any ambiguity in defining the funneled path intervention. That is, it is never the case that two distinct paths in are of the form and . If such a pair of paths were allowed, the difficulty would then be that these paths can both reasonably be claimed to represent an effect of setting along the path , while potentially disagreeing on what that setting is.

We are now ready to define responses to path interventions. For every , a proper set of directed paths in a DAG , and an element , we define the response of to as

 V(aα)≡V(a(∗V)→∈α,{W(a⊲(WY)→(α))∣W∈pa¯¯¯αG(V)}) (4)

where .

In words, this states that the response of to , where is defined as the potential outcome where all parents of along edges which are (length 1) paths in are assigned an appropriate value from , and all other parents are assigned whatever value they would have attained under the funneled path intervention associated with a funnel operator for the edge between that parent and . Note that the definition is inductive for such parents, with the result of applying a funnel operator serving as the new set of paths. Lemma 3.1 ensures that properness propagates to this set, and thus the overall response is well-defined.

For example, if assigns to in Fig. 2 (a), then is defined by (4) to equal . We will use a notational shorthand for responses to path interventions, where rather than listing nested responses in parentheses after the response, we list the paths with the source node replaced by the intervened on value. For example, we write above as . We use the same shorthand for responses to edge interventions.

As before, given a set of random variables, we denote by or .

### 3.4 Responses to Path Interventions to Natural Values

So far we have defined path interventions as a mapping from a proper set of directed paths to values in . However, we might be interested in considering responses to interventions that assign a variable not to a specific constant value, but to a value the variable would have attained under a no intervention regime. For instance, this might happen if the baseline exposure is one received by the general population, not a specific exposure level assigned by the experimenter, or if the effect of multiple treatments on the treated is of interest. In the context of node interventions, this situation was discussed in [4]. In order for responses to path interventions to include this case, we must extend the definition of path interventions to include intervening to natural values, that is values attained by variables under no interventions. Allowing arbitrary variables to be set to natural values may lead to identification difficulties even in very simple cases. Consider the following response to a node intervention in the MWM given by Fig. 2 (a), . In words, this is the joint response of and to an intervention where is set to value , and is set to the natural value it attains under no interventions. The definition of responses to node interventions via recursive substitution shows that . However, the distribution is not identified under the MWM for the graph in Fig. 2 (a), see Lemma 5.8, and thus neither is the joint response in question.

To avoid this difficulty, we consider only a special subset of path interventions containing settings on natural values. This special subset can safely be rephrased in such a way that only interventions on constants remain explicit. To define this special subset, we need a few preliminary definitions.

For a node , and a directed path (or an edge) with source , define the extended state space as follows , and . We define the extended state space for sets of nodes, edges, and paths disjunctively as before. An intervention on an extended state space is allowed on either any constant value, or on the “natural value.”

Given a set of paths and a response set , we call a directed path relevant for given if , where , and no path in is a subpath of except possibly a prefix of . We denote the set of all relevant paths for given in by .

Paths relevant for given are those paths consisting of sequences of intermediate responses that arise in the inductive definition (4). For example, assume we are interested in the singleton response set and a singleton path set in Fig. 2 (a). Then defining for a particular via (4) entails defining intermediate responses and . The sequence of vertices are all linked by directed edges by (4), and is relevant for given . Similarly, and are relevant for given .

We now give two useful results about relevant paths.

###### Lemma 3.2.

If , then for any suffix subpath of .

###### Lemma 3.3.

If , then for any , .

A set of interventions may not all have an effect on a response, due to constraints of the model. For instance, since but for any in Fig. 2 (a), has an effect on , but does not, given that we also intervene on and . We extend this notion to path interventions, and call those paths with sources that actually have an effect on the response, given interventions on other paths, live. More precisely, given a proper set of paths and a response set , we call a path live for given if there is an element of containing as a prefix.

Consider the maximal subset of consisting of paths in live for given , or . We say a set of directed paths is live for if . When discussing path interventions, we can always restrict our attention to sets of paths live for without loss of generality, due to the following result.

###### Lemma 3.4.

For any and proper for , , , and in addition, for any , .

We now show that we can either ignore interventions to natural values in a response to a path intervention, or the response is not identified under the MWM. The set of paths for which the former is true for the response will be called natural for . Due to this result, we do not need to consider interventions to natural values explicitly.

###### Definition 1.

Let be live for . Let be a path intervention in where a subset is assigned constant values, and is assigned natural values. Then if no element of with a prefix subpath in contains a subpath in , we say is natural for .

###### Lemma 3.5.

Let be a path intervention natural for , and is all paths assigned constant values by . Then .

###### Lemma 3.6.

If is not natural for in , then is not identified under the MWM for .

Lemma 3.5 does not guarantee that a response to a natural path intervention is identifiable, merely that it can be expressed as a response to an intervention only setting to constant values.

## 4 Causal Inference Targets as Responses to Path Interventions

In this section we consider how a number of targets of interest in causal inference, including novel targets not previously considered in the literature, may be expressed as responses to path interventions.

We use as our running example the two time point fragment of a longitudinal study in HIV research, described in Section 2.4. We consider path-specific effects that arise in mediation analysis, and effects of treatment on the multiply treated, which are of interest in tort cases (since these are effects of the exposure on those actually exposed), and in epidemiology if natural exposure levels carry information about the causal effect of the exposure. It is not straightforward to see whether these types of effects are identifiable, and under what model, nor is it obvious whether there is a single unifying principle which governs identification for these effects.

By translating the effect types above into responses to path interventions, we show that such responses form a very general class of causal inference targets. Thus, the advantage of path interventions is that we can use them to give a single characterization for a wide variety of targets of interest at once. The close relationship between effects of treatment on the treated and mediated effects hinted by their common generalization as responses to path interventions is currently not widely known.

We will define a special set of directed paths important for our translation scheme. Given a treatment set and an outcome set (that possibly intersect) in a DAG , define the set to be the set of all directed paths with a source in , a sink in and which do not intersect except at the source and sink. Since and are allowed to intersect, the names “treatment” and “outcome” are slightly misleading. We allow the intersection to admit cases such as effect of treatment on the treated (ETT) where some treatments are also treated as responses for the purposes of certain paths.

###### Lemma 4.1.

is always proper.

### 4.1 Effects of Treatment on the Treated

We consider an effect on the mean difference scale where we condition on the naturally observed treatment levels. This is known as the effect of treatment on the treated (ETT), and in our two time point HIV example, it is defined as follows

 ETT≡E[Y(a1,a2)∣a1,a2]−E[Y(a′1,a′2)∣a1,a2].

This contrast is often of interest to epidemiologists. It also arises in cases where interventions are functions of the natural value of the exposure. For example, we may be interested in outcome for people who were encouraged to exercise for 30 more minutes than they normally would, which is a random variable of the form . These types of interventions are discussed in [36], in particular sufficient conditions for identification under the SWM, in terms of the extended g-formula (2) are given there and in [13].

Assume is a binary variable (only two treatment levels). If we consider, instead, the ETT with respect to only the exposure , we obtain the following derivation for the second term in the contrast

 p(Y2(a′1)∣a1)=p(Y2(a′1),a1)p(a1)=p(Y2(a′1))−p(Y2(a′1),a′1)p(a1),

where the first identity is by definition, and the second by the binary treatment assumption. Since consistency implies for any value , the ETT for a single binary exposure can be identified if is identified.

However, if the exposure is not binary, or if there are multiple exposures, as in our example, we cannot use the same algebraic trick to obtain identification, and we must proceed by exploiting additional assumptions in our causal model.

In our case, the first conditional mean in the contrast can be readily identified via consistency: . However, the second conditional mean presents a problem, because it contains a conflict between the naturally observed exposures, and the assigned exposures. Here we show how to represent the underlying joint distribution over potential outcomes, , in terms of path interventions, and then attack the identification problem for all responses to path interventions, which would then include the problematic second term of the ETT.

We consider all directed paths from to , which we assign a value , all directed paths from to not through , which we assign a value , and all directed paths from to , which we assign the natural value of . Note that this set of paths is simply for that is the transitive closure with respect to blue edges of the graph in Fig. 2 (b), and thus is proper by Lemma 4.1. We then consider the response of to the path intervention so defined, or . By our definition, all paths set to a value ancestral for are set to natural values. Thus, is defined in terms of natural values of its direct causal parents, or as and .

Finally, we consider all paths ancestral for . Since and are parents of in , the single edge paths and are in our set, thus we substitute and into the potential outcome answer. Furthermore, for other parents of , namely and , we consider an appropriate set derived from . For example, for the node , we replace the path by a path (while keeping the assignment ). We proceed in this way recursively until we obtain the response for , which is

 Y2(a1,a2,U,C0,W1(a1,…),C1(a1,…),Y1(a1,…),W2(a1,a2,…),C2(a1,a2,…)),

where is a shorthand that means “include all earlier potential outcomes.” For example, means . By definition of node intervention responses, this counterfactual is equal to , and our overall joint distribution over the responses is .

For arbitrary sets of treatments and outcomes , and active treatment values , we may still represent ETT as a single mean difference, for example , for some function .

Note that though ETT resembles the total effect, it is in fact a more complex kind of counterfactual. This is because we are simultaneously interested in “outcome responses” , and “treatment responses” . Defining these treatment responses may introduce conflicts among intermediate counterfactual responses, not well represented by node interventions, which is why we represent ETT as a response to a path intervention.

The ETT path intervention simply assigns all paths in to the appropriate value. That is, paths from to are assigned the appropriate natural value, and paths from to are assigned the appropriate value in . Given this definition, either the ETT is not identified, or the joint distribution from which ETT is obtained corresponds to the joint response of to the ETT path intervention.

###### Lemma 4.2.

If there exists such that , is not identified under the MWM for . If there does not exist such an , .

If is expressible as a response to a path intervention, it may still not be identifiable under the MWM.

Our subsequent results on identification of path interventions under the MWM complement identification results in [36, 13]. In particular, our results imply the distribution is identified under the MWM for Fig. 2 (a), but not under the SWM for Fig. 2 (a).

### 4.2 Path-Specific Effects

Next, we consider the mediated effect of on through , in other words, the effect of exposures on outcome mediated by adherence. Originally these kinds of effects were considered in [3] in the context of linear models, and were generalized to a form not restricted by particular parametric models in [16]. We discuss a simple version of mediated effects in the graph in Fig. 2 (a), known as natural direct and indirect effects [16, 10] in Section 3.2, where we represented them as responses to edge interventions.

In our case, we are interested in a more complicated effect, but we can represent it using a similar idea using paths rather than edges – paths we are interested in are assigned active treatment values , while paths we are not interested in are assigned baseline treatment values . The paths we are interested in are all directed paths with the first edges are one of , which end in , and which do not proceed through if started at . The paths we are not interested in are all other paths which start with or (and do not proceed through ) and end in . Call this assignment . Note that the assignment is on the set of paths that is precisely equal to for that is the transitive closure with respect to blue edges of the graph in Fig. 2 (b), and thus is proper.

We apply our definition to obtain a response of to this intervention. We must substitute a value for every parent of . The values for will be the baseline , while the values for will just be the natural values of those variables. Complications arise for other parents, due to the recursive nature of the definition. We proceed recursively:

 Y2(a1) =Y2(a′1,a′2,{C2,W2,Y1,C1,W1,C0}(a1),U) C2(a1) =C2(a1,a2,{W2,Y1,C1,W1,C0}(a1),U) W2(a1) =W2(a′1,a′2,{Y1,C1,W1,C0}(a1),U) Y1(a1) =Y1(a′1,{C1,W1,C0}(a1),U) C1(a1) =C1(a1,{W1,C0}(a1)) W1(a1) =W1(a′1,C0(a1),U) C0(a1) =C0(U)=C0

In the matter similar to direct and indirect effects, we can use this response along with the total effect responses to define “the effect along paths we want” as , and “the effect along paths we do not want” as . As before, the ACE additively decomposes into these two effect measures. This definition (without the use of path interventions) appears in [24].

We may also consider a response of where the paths we are not interested in are assigned the natural values, as discussed in Section 3.4, rather than fixed baseline values. Such an effect is defined similarly.

Consider a set of active treatment values of , a set of fixed baseline treatment values , and a subset of (which contains “paths of interest”). Define the fixed baseline PSE path intervention as a path intervention that assigns appropriate active values in to sources in and appropriate baseline values in to sources of all paths in .

Similarly, we call an intervention that assigns active values in to sources of paths in and appropriate natural values to sources of all paths in the average baseline PSE path intervention.

Path specific effects along all paths in (with a fixed baseline) can then be defined on the mean difference scale as , and along all paths not in as . Average baseline path specific effects on the difference scale are defined similarly.

### 4.3 Effects of Treatment on the Indirectly Treated

In this section we show that the language of path interventions is general enough to incorporate novel targets not currently considered in the literature. Our results immediately settle identification questions for any such target.

We consider a seemingly innocuous ETT with two treatments that in fact can only be represented by a path intervention, not an edge intervention, and variations of this target that are identified under the SWM and the MWM. Assume Fig. 2 (a) represents a simple two time point partially randomized observational study, where and are treatments at the first and second time points, respectively, is an intermediate health measure, and is the outcome. We make very strong assumptions about this study. In particular, is randomized, while is only assigned based on . Finally, no unobserved confounding exists anywhere, including between and . We are interested in the effect of treatments on the treated in this study. To obtain this contrast, we need to identify which is identified if and only if is. It is not difficult to show that

 p(Y(m,w),W,M) =p({Y,M,W}((wAY)→,(mY)→)) =p(Y(m,A(w)),M(A(W),W),W).

As we will show in the next section, there is no way to express this response as a response to an edge intervention, and it is not identified under the MWM. This is the case despite the fact that there is no unobserved confounding in this study. The difficulty is that the response is defined in terms of and jointly, and the distribution is not identified under the MWM without more assumptions.

To obtain a target that is identified under the SWM in this case we may consider the response on the treated to the natural value , and the value of occurring under the intervention setting to . This results in which is then identified under the SWM. To obtain identification we gave up on conditioning on the natural value of the second treatment . This may not be “in the spirit” of the ETT target.

One compromise is to assume a stronger model, the MWM, and allow the response to be “as natural as possible” while still retaining identification. This would correspond to defining a contrast in terms of , which in turn is equivalent to . A conditional distribution
represents the response among those individuals whose treatment value for is (untreated), and whose treatment value for is whatever value would have attained had assumed the active value with respect to the path , and untreated value with respect to the path .

We can define a contrast based on this quantity, using a summary function , equal to

 E[f(Y(m,w),M(A(w),w′))∣W=w′]−E[f(Y(m′,w′),M(A(w′),w′))∣W=w′],

which we call “the effect of treatment on the indirectly treated (ETIT).” The name is due to the fact that we consider people whose baseline treatment is untreated, and whose followup treatment is set to a value that is a kind of response to the indirect effect of the first treatment. Such a quantity would be difficult to conceive of without a direct representation of effects along pathways, something path interventions provide. Our results also directly imply that this quantity is identified under the MWM, but not SWM.

## 5 Identification of Edge and Path Interventions

Having established a correspondence between responses to path interventions and a variety of targets of interest in causal inference, we now consider what assumptions are necessary to express path interventions as edge interventions, edge interventions as node interventions, and edge and node interventions as functions of the observed data.

As we showed in section 3.4, we can restrict our attention to path interventions that only assign paths to constant values, since paths that are assigned natural values either can be dropped from the intervention without affecting the response, or the overall response is not identified.

### 5.1 Node and Edge Interventions as Path Interventions

If node interventions are a special case of edge interventions, which are in turn a special case of path interventions, we ought to be able to give a path intervention the response of which is equal to the response to an arbitrary node or edge intervention. For any such response there may be multiple path interventions the responses to which are identical. We give one such path intervention here.

###### Lemma 5.1.

Let be disjoint vertex sets in a DAG , and a value assignment to