Bayesian Inference for Sequential Treatments under Latent Sequential Ignorability

Bayesian Inference for Sequential Treatments under Latent Sequential Ignorability

Alessandra Mattei1, Federico Ricciardi2 and Fabrizia Mealli1   
1Department of Statistics, Computer Science, Applications, University of Florence, Italy
2Department of Statistical Science, University College London, UK
Abstract

We focus on causal inference for longitudinal treatments, where units are assigned to treatments at multiple time points, aiming to assess the effect of different treatment sequences on an outcome observed at a final point. A common assumption in similar studies is Sequential Ignorability (SI): treatment assignment at each time point is assumed independent of unobserved past and future potential outcomes given past observed outcomes and covariates. SI is questionable when treatment participation depends on individual choices, and treatment assignment may depend on unobservable quantities associated with future outcomes. We rely on Principal Stratification to formulate a relaxed version of SI: Latent Sequential Ignorability (LSI) assumes that treatment assignment is conditionally independent on future potential outcomes given past treatments, covariates and principal stratum membership, a latent variable defined by the joint value of observed and missing intermediate outcomes. We evaluate SI and LSI, using theoretical arguments and simulation studies to investigate the performance of the two assumptions when one holds and inference is conducted under both. Simulations show that when SI does not hold, inference performed under SI leads to misleading conclusions. Conversely, LSI generally leads to correct posterior distributions, irrespective of which assumption holds.


Keywords: Longitudinal treatments, Principal stratification, Sequential ignorablity, Rubin Causal Model.

1 Introduction

Many observational studies in different fields, including economics, social science and epidemiology, are often interested in the evaluation of causal effects of time-varying treatments, which are assigned to units sequentially over time (e.g., Robins, 1986, 1989, 1997; Robins et al., 2000; Gill & Robins, 2001; Lechner, 2009; Achy-Brou et al., 2010; Zajonc, 2012; Imai & Ratkovic, 2014).

In the presence of time-varying treatments, causal inference is challenging because intermediate variables are simultaneously post-treatment outcomes and pretreatment confounders. Therefore the analysis of time-varying treatments requires methodological tools that can properly account for a growing number of intermediate variables, some of which are only partially observed, and sequential selection. In this paper we propose to face these challenges when assessing the effect of different sequences of a time-varying treatment on some final outcome observed at the end of the study.

We will frame our discussion in the context of the potential outcomes approach to causal inference, also referred to as the Rubin Causal Model (RCM, e.g., Rubin, 1974, 1977, 1978; Holland, 1986). A critical part of the RCM is the formulation of a treatment assignment mechanism, and this task is even more crucial in longitudinal studies. An assumption usually invoked in evaluation studies with longitudinal treatment is Sequential Ignorability (SI, Robins, 1986), which amount to assuming that the observed treatment at a given time point is independent of future potential outcomes given past observed outcomes, past treatments and covariates up to that point. Sequential ignorability may be a reasonable assumption in various settings. For instance, in medicine, physicians may propose therapies randomly conditional on observed patient’s characteristic, prognostic factors and prior treatments up to that point. In labor economics caseworkers may randomly offer training programs to participants conditional on previous training program participation up to that point and observed performances.

On the other hand, and especially for observational studies or in settings where participation in the treatment depends on individual choices, treatment assignment may depend on unobservable quantities associated with future potential outcomes as well as on unobserved past potential outcomes, even conditional on the observed history, so that the sequential ignorability assumption fails to hold. For instance, in program evaluation, subjects may decide to participate in a program at a given time point using both information on their performances under the treatments previously received (the observed outcomes), which also the experimenter can observe, as well as information on their performances under alternative unobserved treatment sequences (the missing outcomes), which may be known to subjects (maybe with some approximation) but unknown to the experimenter. In medicine, the treatment a patient decides to take at a given time point may depend on both the observed patient’s history (including previous treatments and observed outcomes) as well as on some unobserved patient’s characteristic related to the missing outcomes.

In order to relax SI, we rely on Principal Stratification (PS, Frangakis & Rubin, 2002) and we formulate a milder version of SI that we call Latent Sequential Ignorability (LSI). LSI assumes that treatment assignment is conditionally independent on future potential outcomes given pre-treatment variables, past treatments, and principal strata, defined by the joint value of observed and missing intermediate outcomes up to that point. Principal strata encode personal characteristics reflected in the intermediate outcomes, therefore if intermediate outcomes are associated with future treatment and outcomes they can be viewed as a coarsened representation of the latent unobserved structure that may affect the decision to participate in the treatment. Alternative assumptions could be considered, e.g., by gleaning from the literature on non-ignorable missing data, but we look at LSI as a valuable starting point to move forward the traditional SI assumption.

LSI has appealing features, but also raises challenging inferential issues due to the latent nature of principal strata. We propose the Bayesian approach for inference, which is particularly useful for accounting for uncertainties and for pooling information from the data in complex settings. Under SI causal estimands, such as average causal effects, are usually point identified, that is, they can be expressed as known function of the distribution of the observed data. Under LSI, some parameters may be partially or weakly identified in the sense that their posterior distributions have substantial regions of flatness (Gustafson, 2010). The Bayesian approach, however, is particularly appealing to draw inference on partially or weakly identified parameters. In fact, Bayesian inference is based on the posterior distribution of the parameters of interest, which are derived by updating a prior distribution to a posterior distribution via a likelihood, irrespective of whether the parameters are fully or partially identified, and if the prior is proper, the posterior distribution will be proper, too. Bayesian analysis conducted under LSI naturally provides a framework for sensitivity analysis with respect to violations of SI, where sensitivity parameters are meaningful quantities, with a direct interpretation.

In this work we discuss and compare sequential ignorability and latent sequential ignorability, using both theoretical arguments and simulation studies in which we investigate the relative performance of the two alternative assumptions when, in turn, one holds and inference is conducted under both assumptions. We also illustrate our framework using real data on financial aids to firms to investigate the effectiveness of interests free loans on firms’ employment policies. In this study firms may have access to public loans multiple times over subsequent years and our focus is on contrasting firms’ performances measured in terms of employment levels at the end of the study under different treatment sequences Pirani et al. (2013).

Throughout the article we focus on assessing causal effects of a specified longitudinal treatment on an outcome that would have been observed at the end of the study. A valuable topic for future research is the extension of our framework to the evaluation of dynamic treatment regimes, which usually describe adaptive policies that propose actions in each treatment period depending on past observations and decisions (e.g., Heckman & Navarro, 2007; Hong & Raudenbush, 2008; Murphy, 2003; Robins, 2004; Zajonc, 2012).

The article is organized as follows. In Section 2 we introduce the framework and the causal estimands we focus on. In Section 3 we formally define the assignment mechanism and the critical assumptions, SI and LSI. In Section 4 we compare SI and LSI, highlighting their implications and showing how latent sequential ignorability provides a natural framework for assessing the robustness of the estimates to specific violations of the sequential ignorability assumption. In Section 5 we discuss the inferential challenges arising with longitudinal treatments, briefly reviewing the existing approaches to address them, which are mainly based on SI. We then describe the Bayesian framework for inference, a natural and appealing approach that also allows us to make comparisons between SI and LI on the same ground. In Section 6 we investigate the role and implications of the two alternative assumptions using some simulated experiments. In Section 7 we conduct causal inference under SI and LSI in the context of the illustrative case study. Finally, we conclude with a discussion in Section 8.

2 Basic Setup

In this article we will focus on a simple setup with a two-period structure and binary treatments. This simplified setting allows us to clearly describe all the conceptual issues surrounding sequential treatments, avoiding technical complications that may mask our primary objective, that is, highlighting the implications of SI and LSI and comparing inferences under the two assumptions. Indeed, the extension to more time points makes notation more complicated, but does not represent an issue for the theoretical framework, although it may raise inferential and computational challenges.

2.1 Notation

Consider a group of units indexed by . In each of two periods, indexed by , units can be potentially assigned either an active treatment () or a control treatment, which may be no treatment at all (). Let denote the treatment unit actually receives at time : if the unit is exposed to the active treatment, if the unit is exposed to the control treatment. Let . Then , that is, units can experience treatment in neither period, ; only in the first period, ; only in the second period, ; or in both periods, . Let denote the -dimensional vector with -th element , which is a random vector prior to the assignment at time , and let be a realization of the random vector .

Let denote the final outcome, which is the object of primary interest and it is measured after assignment of the final treatment, . After assignment to the first treatment, but prior to the assignment to the second treatment, an intermediate outcome, , can be measured for each unit . The intermediate variable we consider is the lagged outcome (or a transformation of the lagged outcome), which is a measure of the same substantive quantity as the final outcome, but measured at a previous time-point between the receipt of the first treatment and the receipt of the final treatment. This choice is compelling, since it is reasonable to believe that the lagged intermediate outcome is related to both the treatment assignment at time , , and the final outcome, .

For each unit , let denote the potential outcomes for the intermediate variable at time given treatment assignment in the first period, and let denote the potential outcome for the final outcome given the entire treatment assignment sequence, .

We make the Stable Unit Treatment Value Assumption (SUTVA, Rubin, 1980), stating that potential outcomes for any unit are unaffected by the treatment assignments of other units (no interference), and that for each unit there are no different versions of treatment. Formally,

Assumption 1

SUTVA

If , then ;
If , then .

SUTVA allows us to write and , therefore for each unit there are two potential outcomes for the post-treatment intermediate variable measured after assignment to the first treatment, and , and four potential outcomes for the final outcome, , , and .

2.2 Causal Estimands

Causal effects on the final outcome, , are defined at the unit-level as comparisons of potential outcomes for the final outcome under alternative treatment sequences. For instance, a causal effect of the treatment sequence versus the treatment sequence for a unit is defined as a comparison between the potential outcomes and . Estimands of interest may be simple differences , but in general comparisons can take different forms. Causal effects can also be defined for collections of units. More generally, causal effects are comparisons between potential outcomes for a common set of units Frangakis & Rubin (2002); Rubin (2005). In this article we consider the units as a random sample from a large superpopulation, and we focus on population Average Treatment Effects (ATEs) on the final outcome, that is, the expected value of the difference between potential outcomes at time under different treatment sequences. In the presence of two-period binary treatments, we have:

(1)

We focus on six causal effects by comparing the following treatment sequences: versus and ; versus and ; and versus .

3 The Assignment Mechanism

The fundamental problem of causal inference (Holland, 1986; Rubin, 1978) is that for each unit we can only observe at most one of the potential outcomes for each post-treatment variable. In our setting with two-period binary treatments, for each unit we observe one out of two intermediate potential outcomes at time , i.e., ; and one out of four potential outcomes at time , i.e. . Potential outcomes under unassigned treatment sequences are missing: and . Therefore, inference on causal effects require to solve a missing data problem, which is particularly challenging in the presence of longitudinal treatments, even in the case with two-period binary treatments.

In order to learn about the causal effects of interest it is crucial to posit a treatment assignment mechanism. The assignment mechanism is a row-exchangeable function of all covariates and of all potential outcomes, giving the probability of any vector of treatment sequences. For each unit , let denote an observed vector of pre-treatment variables, variables that are not affected by treatments assignment. The assignment mechanism for a two-period treatment can be formally defined as follows:

where is a matrix with -th row equal to , is a matrix with rows and -th row equal to , and and are dimensional vectors with elements equal to and , respectively, for and .

In longitudinal settings the assignment mechanism is very complex. We consider two basic restrictions on the assignment mechanism, assuming that it is individualistic and probabilistic. Let

denote the unit-level assignment probabilities for . An assignment mechanism is individualistic if

for all and , and

for , for some set , and zero otherwise.

An assignment mechanism is probabilistic if

for all , and .

Even under these restrictions, the assignment mechanism still remains complex, because it depends on a large number of missing values, and , for all . In order to reduce the complexity of the assignment mechanism, we now formulate some assumptions, which allow us to characterize longitudinal observational studies and draw inference on the causal estimands of interest. To this end, it is useful to factorize the unit-level assignment probabilities as product of the assignment probabilities at time and the conditional assignment probabilities at time given the treatment received at time one. Formally, by the law of total probability, we have

Much of the literature on time-varying treatments copes with the complications arising in the presence of sequential treatments by assuming that the assignment mechanism is sequentially ignorable Robins (1986):

Assumption 2

Sequential Ignorability (SI)

(2)
(3)

SI implies that treatment assignment at each time point is independent of all future potential outcomes given past observed outcomes, treatments and covariates.

SI guarantees that, within cells defined by the pre-treatment covariates, the mean of the potential outcomes under a specific treatment sequence can be estimated from the observed data as weighted average of the means of the observed final outcome under that treatment sequence across groups defined by the observed intermediate outcome, with weights that depend on the distribution of the observed intermediate outcome. Formally, under SI

where is the conditional cumulative distribution function of the intermediate outcome, , given the observed treatment at time and pre-treatment covariates.

It is worth noting that SI defines the assignment mechanism at each time point separately and independently of the other time points. Essentially the underlying idea is that at each time point a new study has been conducted, for which an assignment mechanism must be posited, and SI implies that at every time the treatment is as if randomized with probabilities depending on the observed history. Although SI allows one to easily identify and estimate the conditional expectation of the potential outcomes of interest, it does not permit to reconstruct the assignment mechanism underlying the longitudinal study in its entirety, that is, the joint conditional probability of given all the potential outcomes and covariates. To this end we can introduce a different ignorability assumption, which is highly related to SI:

Assumption 3

Sequential Ignorability of Longitudinal Treatment Assignment (SIL)

Assumption 3 amounts to assuming that treatment assignment at each time point is independent of past missing potential outcomes and all future potential outcomes given past observed outcomes, treatments and covariates. Assumption 3 is slightly stronger than Assumption 2, because it implies Assumption 2 but the converse is not true: Assumption 2 ignores the relationship between treatment assignment at time and past missing potential outcomes, only requiring that the assignment mechanism at time is independent of all future potential outcomes conditional on the observed history. Nevertheless Assumptions 2 and 3 have the same implications from an inferential perspective. For this reason, although Assumption 2 is weaker than Assumption 3, in practice it is difficult that a convincing argument can be made for the weaker Assumption 2 without the argument being equally cogent for the stronger Assumption 3.

Sequential ignorability assumptions may be reasonable in various settings, including longitudinal observational studies where it is reasonable to believe that treatments are sequentially assigned using only the observed information (e.g., Zajonc, 2012). However, as in single point observational studies, where the usually made strong ignorability assumption may fail to hold due to the presence of unobserved confounders associated with both the potential outcomes and the treatment indicator Rosenbaum & Rubin (1983); Rosenbaum (1987); Imbens (2003); Ichino et al. (2008), here sequential ignorability may be arguable due to the presence of time-varying unobserved confounder factors. The key insight is that the joint potential values of the intermediate outcome at time , , may represent an accurate summary of the unobserved variables related to both treatment assignment at time and the final outcome, due to which sequential ignorability assumptions do not hold.

Motivated by this intuition, we use the concept of principal stratification (Frangakis & Rubin, 2002) to define a new assumption on the longitudinal assignment mechanism, which may be a valuable alternative to sequential ignorability assumptions when they are assumed to fail in some specific and meaningful ways. The joint potential values of the intermediate outcome at time , , defines a classification of units into principal strata. Principal stratification per se does not require that the intermediate outcome is binary or categorical. Recent work has indeed considered the application of principal stratification in the presence of continuous post-treatment variables (e.g., Schwartz et al., 2011). Nevertheless, continuous intermediate variables introduce serious challenges to principal stratification analysis. Specifically continuous intermediate outcomes induce an infinite number of possible principal strata, leading to substantial complications in both inference and interpretation. In order to avoid additional complications, which may mask our primary objectives, here we consider a binary intermediate variable. Thus, the (basic) principal stratification with respect to the binary intermediate outcome classifies units into four groups according to the joint potential values of , and : ; ; ; and . Let denote the principal stratum membership for unit , with ), then .

For instance, in our illustrative example, the intermediate outcome is an indicator variable taking on value one if a firm hires new staff between the assignment to the first treatment and the assignment to the second treatment. Therefore, for example, principal stratum includes firms that would hire new staff irrespective of their treatment assignment at time (see Section 7 for further details).

Principal stratum membership is not affected by treatment assignment at time , , so it only reflects characteristics of unit . Therefore, principal strata can be viewed as a representation of the latent unobserved structure that may influence the decision to participate in the treatment at a future time point.

Based on principal stratification, we introduce a Latent Sequential Ignorability (LSI) assumption, where the word latent indicates that treatment assignment is conditionally independent on future potential outcomes conditionally on pre-treatment covariates, past treatments and the latent indicator for principal stratum membership. In fact we cannot, in general, observe the principal stratum which a unit belongs to, because principal strata are defined by the joint values of observed and missing intermediate outcomes. In other words, LSI is a form of latent ignorability (Frangakis & Rubin, 1999), in that it conditions on variables that are (at least partially) unobserved or latent. Formally:

Assumption 4

Latent Sequential Ignorability (LSI)

LSI is a relaxed version of SIL (Assumption 3): SIL implies LSI, therefore SIL is a stronger assumption. LSI can be equivalently formulated as follows

(4)
(5)

This formulation of LSI makes it clear the critical difference between SI (Assumption 2) and LSI. Although SI and LSI both assume that the assignment mechanism at time is ignorable given the set of observable variables, (see Equation (2) and Equation (4)), SI and LSI impose different restrictions on the assignment mechanism at time : standard sequential ignorability implies that it is ignorable given the observable past history, whereas LSI requires that it is ignorable given the observable past history and the missing intermediate outcomes.

LSI implies that

where is the conditional cumulative distribution function of the principal stratum membership, , given pre-treatment covariates. Therefore if principal stratum membership were observed, under LSI within cell defined by the covariates, could be derived as the weighted average of the means of the observed outcome for units with and across principal strata with weights that depends on the conditional distribution of principal strata given covariates. In practice, principal stratum membership is generally unobserved, therefore inference under LSI raises non trivial challenges (see Section 5.1 for details on inference under LSI).

4 Assessing Sequential Ignorability through Latent Sequential Ignorability

In this section we investigate the role of LSI (Assumption 4) in causal inference for sequential treatment. Let first consider the relationship between SIL (Assumption 3) and LSI (Assumption 4). LSI is a relaxed version of SIL and for this reason SIL can be viewed as a special case of LSI. Therefore, in order to compare SIL with LSI and to investigate which one is more appropriate for a given problem at hand, we rely on the relationship between SIL and LSI when SIL holds.

Under SIL, treatment assignment at does not depend on the missing intermediate potential outcomes, implying that treatment assignment probabilities are homogeneous across some principal strata, conditionally on the treatment assigned at and covariates. Specifically, SIL implies that the assignment probabilities of in principal strata sharing the same value for the observed intermediate outcome that is, the intermediate outcome under the treatment assigned at time 1, are the same. Formally, under SIL, for each and , we have

Therefore, if SIL holds we have:

(6)

Under SI (Assumption 2) the assignment probabilities of only depend on the observed intermediate outcomes conditionally on the treatment assigned at and covariates: , therefore they can be ignored in drawing inference on the causal effects of interest. If SI does not hold, but LSI holds, ignoring the assignment probabilities of , , does not, in general, lead to a valid analysis. This result suggests that we can investigate the robustness of the estimated causal effects with respect to violations of the sequential ignorability assumptions, using the assignment probabilities under LSI, , as sensitivity parameters.

If principal strata encode characteristics of the units that are associated with the treatment assigned at time and possibly with the final outcome, i.e., LSI holds but neither SIL nor SI holds, inference under LSI is expected to show evidence against at least one of the equalities in Equation (6), and SI/SIL and LSI are expected to lead to substantially different inferential conclusions on the causal effects of interest. Conversely, if we find that treatment assignment probabilities are homogeneous across principal strata according to the equalities in Equation (6), then causal inference under sequential ignorability is more defensible.

In this sense, LSI naturally provides a framework for sensitivity analysis with respect to violations of sequential ignorability: looking at the inferential results on the assignment probabilities under LSI we can get some insight on the plausibility of the sequential ignorability assumptions. This framework for sensitivity analysis is in line with the existing approaches in the literature to sensitivity analysis with respect to violations of the unconfoundness assumption, usually made in single time observational studies Rosenbaum & Rubin (1983); Rosenbaum (1987); Imbens (2003); Ichino et al. (2008); Ding & VanderWeele (2016), where the robustness of the estimated causal effects with respect to the unconfoundness assumption is generally assessed focusing on its violations due to the presence of unobserved covariates that are correlated both with the potential outcomes and with the treatment indicator. In those settings, sensitivity parameters are quantities characterizing the distribution of the unobserved covariates and their association with the potential outcomes and with the treatment indicator, but they do not generally have a substantial meaning. In our framework, sensitivity parameters are meaningful quantities with a direct interpretation: they are the assignment probabilities for specific sub-population of units.

5 Inference

Under SI and SIL average causal effects are point identified, i.e, they can be expressed as known function of the distribution of the observed data, since different effect values cannot correspond to the same distribution of the observables. Therefore, ideally, we could estimate average treatment effects non-parametrically. In practice, data are often sparse and high dimensional, and model assumptions are usually introduced. Methods usually applied to estimate causal effects of longitudinal treatment under SI (Assumption 2) include the G-computation algorithm formula (Robins, 1986), inverse probability of treatment weighting estimation of marginal structural models (Robins, 1989), and G-estimation of structural nested models (Robins, 1999). The three methods would give identical estimates of the treatment effects if a non-parametric approach to inference or saturated marginal structural models/structured nested models were used, but under model assumptions they generally provide different estimates, depending on the specific parametric assumptions that are introduced. The G-formula requires to specify many models, often raising model-compatibility issues. Marginal structural models (MSMs) and structured nested models, which have received increasing attention in the last years, require to specify models for marginal potential outcomes ( for each in our setting) and for the causal effects, which may assume, e.g., constant treatment effects, additivity and so on. Moreover inferential methods based on inverse probability of treatment weighting require to also specify a model for the probability of treatment. These assumptions may be critical because model misspecification may lead to biased estimates of the treatment effects even if the identifiability conditions hold.

Under LSI the average causal effects are generally not point identified, due to the latent nature of the principal strata. In our setting, we can only observe four groups based on the treatment actually received at time , , and the observed value of the intermediate outcome, , and each of them comprises a mixture of two principal strata, as shown in the last two columns in Table 1.

Observed group Latent group
0 0 00 01
0 1 10 11
1 0 00 10
1 1 01 11
Table 1: Group classification based on observed data , associated data pattern and latent principal strata.

In the principal stratification literature, structural or modeling assumptions are typically invoked (e.g., Imbens & Rubin, 1997; Mattei & Mealli, 2007; Schwartz et al., 2011). Monotonicity and exclusion restriction assumptions, usually used in experimental studies with noncompliance, may be questionable in longitudinal settings. Depending on the substantive empirical setting, other structural or modeling assumptions can be introduced. In this paper we prefer to avoid structural assumptions, which may make the comparison between SI/SIL and LSI unfair or strongly depending on some specific assumption, and we opt for a model-based approach for inference.

Following the literature on principal stratification, models for potential outcomes are specified conditional on covariates and principal strata (see Section 5.1 for further details). Again, distributional assumptions may be critical. Nevertheless in our opinion this model-based approach is very flexible, and in some settings model assumptions on the conditional distributions of potential outcomes may be less demanding than model assumptions on the marginal distributions of potential outcomes and on the causal effects. In order to make the comparison between SI/SIL and LSI as fair as possible, the same model-based approach is used under SI/SIL, although we will also show results from G-methods under SI. An advantage of this model-based approach is that it allows us to directly get information on the heterogeneity of the effects with respect to principal strata both under SI/SIL and LSI.

5.1 Bayesian Inference

We adopt a Bayesian approach to inference, which is particularly suitable for model-based causal inference. The Bayesian perspective appears to be particularly appropriate for addressing problems of causal inference because it treats the uncertainty in the missing potential outcomes in the same way that it treats the uncertainty in the unknown parameters. A Bayesian approach explicitly deals with the different sources of uncertainty, treating them separately. Also in a Bayesian framework, we can be formally clear about the role played by the treatment assignment mechanism and the complications that raise in drawing inference for sequential treatments under LSI Rubin (1978); Imbens & Rubin (1997). From a Bayesian perspective, all inferences are based on the posterior distribution of the causal estimands, defined as functions of observed and unobserved potential outcomes, or sometimes as functions of model parameters (Rubin, 1978). Because with proper prior distributions, posterior distributions are always proper, from a Bayesian perspective, there is no conceptual difference between fully and partially/weakly identified parameters. Weak identifiability is usually reflected in the flatness of the posterior distribution Gustafson (2010). Therefore the Bayesian approach appears to be a natural and appealing inferential approach to make comparisons between SI/SIL and LSI on the same ground.

Bayesian inference considers the observed values to be realizations of random variables and the missing values to be unobserved random variables, starting from the joint probability distributions of all random variables for all units:

We assume this distribution is unit exchangeable, that is, invariant under a permutation of the indexes, then de Finetti’s theorem (de Finetti, 1937, 1964) implies that there exists a vector of parameters , which is a random variable itself, with prior distribution , such that and consist of independent and identically distributed random variables given . Thus,

(7)

and the posterior distribution of can be written as

(8)

The assumptions on the assignment mechanism are crucial to draw inference on the causal estimands. Under latent sequential ignorability (Assumption 4), within cells defined by the values of pre-treatment variables , the treatment at time is assigned independently of the relevant post-treatment variables, and , , , and the treatment at time is assigned independently of the final potential outcomes, , , , conditional on the treatment assigned at time , , and the principal strata defined by . Therefore, under LSI the posterior distribution of becomes

(9)

Equation (9) further simplifies under SIL (Assumption 3), which implies that the treatment at time is assigned independently of both missing intermediate potential outcomes and final potential outcomes, , , , conditional on the pre-treatment variables, , the treatment assigned at time , , and the past observed potential outcomes, :

(10)

The right hand of Equation (10) is also the posterior distribution of under SI (Assumption 2). It is worth noting that, under the assumption that the parameters governing the distributions under the integral sign in Equations (9) and (10) are a priori distinct and independent from each other (Rubin, 1978), we can ignore the distributions and in drawing Bayesian inference on the relevant estimateds. If SI holds, Bayesian causal inference does not even require to model the distribution of the treatment at time , Rubin (1978); Zajonc (2012), although we decided to model it in the analyses below to better describe and discuss the role of LSI and SI/SIL in longitudinal studies.

Throughout the article we assume that conditional on and , the four outcomes , , , are independent. Data are not informative about the partial association structure between final potential outcomes, because , , , are never jointly observed, but the independence assumption has little inferential effect if we regard the units in the study as a random sample from a super-population and we focus on super-population causal estimands that do not depend on the association structure between the final potential outcomes. Indeed, the causal estimands of primary interest here, the average causal effects in Equation (1) are super-population causal estimands, which are free of the association structure between the final potential outcomes (Imbens & Rubin, 1997, 2015, Chapter 6, pp. 98-101).

Let denote the observed group defined by the observed variables , , and , and recall that . Let , and , , , . Then performing the integration in Equation (9), under LSI the posterior distribution of given the observed data can be written as follows:

Therefore model-based Bayesian inference under LSI requires to specify three models: the model for principal strata conditional on covariates,