Bayesian Methods for Multiple Mediators: Relating Principal Stratification and Causal Mediation in the Analysis of Power Plant Emission Controls

# Bayesian Methods for Multiple Mediators: Relating Principal Stratification and Causal Mediation in the Analysis of Power Plant Emission Controls

Chanmin Kim, Michael J. Daniels, Joseph W. Hogan, Christine Choirat and Corwin M. Zigler
Department of Biostatistics, Boston University School of Public Health
Department of Statistics, University of Florida
Department of Biostatistics, Brown University
Swiss Data Science Center
Department of Statistics and Data Sciences, University of Texas at Austin
###### Abstract

Emission control technologies installed on power plants are a key feature of many air pollution regulations in the US. While such regulations are predicated on the presumed relationships between emissions, ambient air pollution, and human health, many of these relationships have never been empirically verified. The goal of this paper is to develop new statistical methods to quantify these relationships. We frame this problem as one of mediation analysis to evaluate the extent to which the effect of a particular control technology on ambient pollution is mediated through causal effects on power plant emissions. Since power plants emit various compounds that contribute to ambient pollution, we develop new methods for multiple intermediate variables that are measured contemporaneously, may interact with one another, and may exhibit joint mediating effects. Specifically, we propose new methods leveraging two related frameworks for causal inference in the presence of mediating variables: principal stratification and causal mediation analysis. We define principal effects based on multiple mediators, and also introduce a new decomposition of the total effect of an intervention on ambient pollution into the natural direct effect and natural indirect effects for all combinations of mediators. Both approaches are anchored to the same observed-data models, which we specify with Bayesian nonparametric techniques. We provide assumptions for estimating principal causal effects, then augment these with an additional assumption required for causal mediation analysis. The two analyses, interpreted in tandem, provide the first empirical investigation of the presumed causal pathways that motivate important air quality regulatory policies.

Keywords: Bayesian nonparametrics, Gaussian copula, Natural indirect effect, Multi-Pollutants, Ambient PM,

## 1 Introduction

Motivated by a evidence of the association between ambient air pollution and human health outcomes, the US Environmental Protection Agency (EPA) oversees a vast program for air quality management designed to limit population exposure to harmful air pollution (Pope III et al., 2009; Dominici et al., 2014). Fine particulate matter of diameter 2.5 micrometers or less (PM) is of particular importance, with regulations to limit exposure to PM  estimated to account for over half of the benefits and a substantial portion of the costs of all monetized federal regulations (Office of Management and Budget, 2013). A large contributor to ambient PM  in the US is the power generating sector, in particular coal-fired power plants. These plants emit PM  directly into the atmosphere, but are also major sources of sulfur dioxide (SO) and nitrogen oxides (NO) that, once emitted into the atmosphere, contribute to secondary formation of PM  through chemical reaction, coagulation and other mechanisms. The amount PM  formation initiated by emissions of SO  and NO  depends largely on atmospheric conditions such as temperature (Hodan & Barnard, 2004). Power plants are also major sources of CO  emissions.

A variety of regulatory programs under the purview of the Clean Air Act (e.g., the Acid Rain Program) are designed to reduce emissions from power plants, with one goal of reducing population exposure to ambient PM. One key strategy for achieving this reduction is the installation of SO  control technologies such as flue-gas desulfurization scrubbers (henceforth, “scrubbers”), on power plant smokestacks to reduce SO  emissions and, in turn PM. Estimates of the annualized human health benefits of regulatory polices such as the Acid Rain Program rely heavily on presumed relationships between such control strategies, emissions, ambient PM, and human health. While the underlying physical and chemical understanding of the link between power plant emissions and PM  is well established, there remains considerable uncertainty about the effectiveness of specific strategies for reducing harmful pollution amid the realities of actual regulatory implementation. Accordingly, the EPA and other stakeholders have increasingly emphasized the need to provide evidence of which specific air pollution control strategies are most effective or efficient for reducing population exposures to PM(HEI Accountability Working Group, 2003; U.S. EPA, 2013).

The goal of this paper is to deploy newly-developed statistical methods to examine the causal effect of scrubbers installed at coal-fired power plants on the ambient concentration of ambient PM  using observed data on power plant emissions and ambient pollution. Physical and chemical understanding of these processes provide strong support for the expectation that scrubbers reduce ambient PM“through” reducing emissions of SO, but this relationship has never been empirically verified using observed data in the context of regulations that may simultaneously impact a variety of factors. A key statistical challenge to verifying this relationship derives from the fact that SO  emissions are highly correlated with emissions of NO  and CO  and NO  is known to play an important role in the formation of ambient PM, possibly through interactions with SO . Thus, the question will be formally framed as one of mediation analysis: To what extent is the causal effect of a scrubber (the “treatment”) on ambient PM  (the “outcome”) mediated through reduced emissions of SO, NO  and CO  (the “mediators”)? Recovering a statistical answer to this question amid the problem of multiple highly correlated and possibly interacting mediators that are measured contemporaneously requires new methods development and would also serve to bolster the promise of statistical methods in studies of air pollution that have historically relied on physical and chemical knowledge and not on statistical analysis.

To answer this question, we develop new methods that draw from two frameworks for estimating causal effects in the presence of mediating variables: (1) principal stratification (Frangakis & Rubin, 2002) and (2) causal mediation analysis (Robins & Greenland, 1992). The methodological contributions of this paper come in three areas. First, we develop new methods to accommodate multivariate mediating variables that are measured contemporaneously (not sequentially), are correlated, and may interact with each to impact the outcome (see Figure 1. for a an illustrative directed acyclic graph). This is essential for evaluating scrubbers because power plants simultaneously emit multiple pollutants that may interact through atmospheric processes to impact ambient PM. Existing methods in the literature for both principal stratification and mediation analysis have primarily focused on settings with a single mediator (e.g., Baron & Kenny (1986); Frangakis & Rubin (2002); VanderWeele (2009); Joffe & Greene (2009); Daniels et al. (2012)) and existing extensions to cases with multiple mediating variables cannot accommodate the setting of power plant emissions where mediators may simultaneously and jointly impact the outcome (Wang et al., 2013; Imai & Yamamoto, 2013; VanderWeele & Vansteelandt, 2014; Daniel et al., 2015). Our second methodological contribution is the use of Bayesian nonparametric approaches to model the observed distribution of emissions and pollution outcomes, making use of a multivariate Gaussian copula model to link flexibly-modeled marginal distributions of observed outcomes to a joint distribution of potential outcomes. Similar strategies with a single mediator have received recent attention in the principal stratification literature (Bartolucci & Grilli (2011); Ma et al. (2011); Schwartz et al. (2011); Conlon et al. (2014)) and are emerging for causal mediation analysis (Daniels et al., 2012; Kim et al., 2016). These approaches are important for confronting continuous mediators and infinitely many principal strata, and are deployed here in a novel way to address the problem of multiple mediators while flexibly modeling the observed-data distributions of both mediators and outcomes. Finally, we provide a unification of principal stratification and causal mediation analysis. While the mathematical relationships between these two approaches are well understood (Mealli & Rubin, 2003; VanderWeele, 2011; Mattei & Mealli, 2011), there has not been, to our knowledge, a comprehensive deployment of both perspectives in a complementary fashion to illuminate the scientific underpinnings of a specific problem. Baccini et al. (2015) made important progress in this direction using different observed-data models to estimate principal effects and mediation effects in a problem with a single mediator. In contrast, the approach developed here uses the exact same observed-data models to ground both perspectives, proposes a common set of basic assumptions for estimating both principal effects and mediating effects, modularizes an additional assumption required to augment a principal stratification analysis in order to obtain estimates of natural direct and indirect effects, and considers settings with multiple mediating variables. Ultimately, we provide a new dimension of quantitative, statistical evidence for supporting air policy regulatory decisions.

## 2 Scrubber Installation and Linked Data Sources

Title IV of the Clean Air Act established the Acid Rain Program (ARP), which required major emissions reductions of SO  (and other emissions) by ten million tons relative to 1980 levels. This reduction was achieved mostly through cutting emissions from power plants, or more formally, electricity-generating units (EGUs). Impacts of the ARP have been evaluated extensively, and the program is generally lauded as a success due to marked national decreases in SO  and NO  coming at relatively low cost. Estimates of the annualized human health benefits of the entire ARP range from $50 billion to$100 billion (Chestnut & Mills, 2005), but rely heavily on presumed relationships between power plant emissions, ambient PM, and human health.

While power plants under the ARP had latitude to elect a variety of strategies to reduce emissions, one key strategy is the installation of a scrubber to reduce emissions. The precise extent to which installation of a scrubber reduces ambient through reducing emissions remains unknown, and has never been estimated empirically amid the realities of actual regulatory implementation where pollution controls may impact a variety of factors that are also related to the formation of . Knowledge of these relationships is complicated by the fact that power plants emit more than just SO, and emissions of a variety of pollutants likely interact in the surrounding atmosphere to form ambient PM.

To provide refined evidence of the extent to which scrubbers reduce emissions and cause improvements to ambient air quality, we assembled a national database of ambient air quality measures, weather conditions, and information on power plants. Specifically, we assembled data on 258 coal-fired power plants from the EPA Air Markets Program Data and the Energy Information Administration, with information on plant characteristics, emissions control technologies installed (if any), and emissions of SO, NO, and CO  during 2005, five years after promulgation of an important phase of regulations under the Acid Rain Program. For each power plant, we augment the data set with annual average ambient PM  concentrations in 2005 and baseline meteorologic conditions in 2004 measured at all monitoring stations in the EPA Air Quality System that are located within 150km. The 150km range was chosen both to acknowledge that atmospheric processes carry power plant emissions across distances at least this great, but also to minimize the number of monitoring stations considered within range of more than one power plant. We regard any power plant as “treated” with scrubbers in 2005 if at least 10% of the plant’s total heat input was attributed to a portion of the plant equipped with a scrubber as of January 2005. Note that this proportion was nearly 0% or nearly 100% for the vast majority of plants, indicating robustness to this 10% cutoff. Other power plant characteristics are listed in Table 1. The data files and programs to assemble the analysis data set are available at https://dataverse.harvard.edu/dataverse/mmediators and https://github.com/lit777/MultipleMediators, respectively.

## 3 Causal Mediation Analysis and Principal Stratification

### 3.1 Mediation Analysis with a Single Mediator

To fix ideas, consider the single mediator case. Let indicate the presence of the intervention of interest, here, whether power plant has installed scrubbers in January 2005 () and let be the vector of intervention indicators for power plants . Using potential-outcomes notation (Rubin, 1974), let denote the potential emissions that the -th power plant would be generated under the vector of scrubber assignments , and let denote the potential ambient PM  outcome that could, in principle, be defined for any scrubber assignment vector and any vector of intermediate emissions values . Throughout the paper, we adopt the stable unit treatment value assumption (SUTVA; Rubin 1980) which implies 1) there is no “interference” in the sense that potential intermediate and outcome values from power plant do not depend on scrubber treatments and emissions intermediates of other power plants (i.e, and ) and 2) there are “no multiple versions” of scrubber treatments such that whenever , and . For reasons that will become clear later, we augment the standard SUTVA to also assume “no multiple versions” of emissions intermediates which states, if , then (Forastiere et al. 2016). We revisit possible violations of SUTVA in Section 8, but note here that the linkage of power plants to monitors within 150km provides some justification for this assumption.

The natural direct effect (Pearl, 2001) is defined by , representing the effect of the intervention obtained when setting the mediator to its ‘natural’ value ; i.e., its realization in the absence of the intervention. The natural indirect effect is defined as , representing the effect of holding the intervention status fixed at but changing the value of the mediator from to . The total causal effect of the intervention on the outcome can then be defined as . Similar controlled effects could also be defined to represent causal effects at specific values of (Pearl, 2001; Robins & Greenland, 1992).

Implicit in the definition of these effects is the conceptualization of hypothetical interventions that could independently manipulate values of both and to, for example, “block” the effect on the mediator. Thus, it is important to note that potential outcomes of the form are purely hypothetical for , and can never be observed for any observational unit. Such unobservable potential outcomes have been referred to as a priori counterfactuals (Robins & Greenland, 1992; Rubin, 2004). We revisit conceptualization of a priori counterfactuals in the context of the power plant study in Section 4.1, but note here the distinction between a priori counterfactuals and potential outcomes of the form that are observable and actually observed for some units.

### 3.2 Principal Stratification

A distinct but related framework for defining causal effects in the presence of intermediate variables is principal stratification (Frangakis & Rubin, 2002). Continuing with the single-mediator case, principal stratification considers only a single intervention and relies on definition of two causal effects: the effect of on , defined as , and the effect of on , defined as . The objective is to estimate principal effects, which are average causal effects of on within principal strata of the population defined by .

With principal stratification, dissociative effects are defined to quantify the extent to which the intervention causally affects outcomes when the intervention does not causally affect the mediator, for example, . Dissociative effects are similar to direct effects in a mediation analysis in that they represent causal effects of an intervention on the outcome among the subpopulation where there is no causal effect on the mediator, but they refer only to the specific subpopulation with . VanderWeele (2008) and Mealli & Mattei (2012) show that dissociative effects represent a quantity that is only one contributor to the NDE, with the amount of contribution tied to the size of the subpopulation with .

Associative effects are defined to quantify the causal effect of the intervention on the outcome among those for which the intervention does causally affect the mediator, for example, . An associative effect that is large in magnitude relative to the dissociative effect indicates that the causal effect of the intervention on the outcome is greater among those for which the mediator is causally affected, compared to those for which the mediator is not affected. This could be interpreted as suggestive of a causal pathway whereby the intervention impacts the outcome through changing the mediator, but note that associate effects are generally a combination of the NDE and NIE for a defined subpopulation.

Dissociative effects that are similar in magnitude to associative effects indicate that the intervention effect on the outcome is similar among observations that do and do not exhibit causal effects on the mediator, which could be interpreted as suggestive of other causal pathways through which affects .

A primary distinction between principal stratification and causal mediation analysis is that principal effects only pertain to population subgroups comprised of observations with particular values of , whereas natural direct and indirect effects are defined for the whole population (as discussed in detail in Mealli & Mattei (2012)). Importantly, note that the a priori counterfactuals of the form for do not appear in the definition of principal effects, which rely only on the definition of observable potential outcomes . Thus, there is no conception in principal stratification of a hypothetical intervention acting on independently from , and there is no definition of a causal effect of on that is mediated through . From a modeling perspective, principal effects can be estimated when an outcome model is specified conditional on both potential mediators (intermediate outcomes), and while causal mediation analysis has tended to rely on an outcome model that depends on the observed mediator. The differences in modeling strategies that are typically employed in principal stratification and causal mediation analysis complicate comparisons, as results of such analyses have typically been driven in part by different modeling assumptions. In Section 5, we will propose a new set of assumptions to build a common observed-data model for principal stratification and causal mediation analysis.

### 3.3 Existing Considerations for Multiple Mediators

Extensions of the causal mediation ideas outlined in Section 3.1 to settings of multiple mediating variables are emerging. For contemporaneously observed mediators, straightforward extensions of the Baron and Kenny (1986) regression-based structural equation model approach (MacKinnon, 2008) have been proposed. For each of contemporaneous mediators , a series of regression models is used to estimate mediator-specific NIEs in a manner that implies additivity of indirect effects:

 JNIE=K∑k=1NIEkandTE=% NDE+JNIE, (1)

where JNIE is used to denote the joint natural indirect effect due to changes in all mediators, and NIE represents the natural indirect effect of the -th mediator. These approaches assume that each mediates the treatment effect independently of the other mediators, without interactions among mediators (i.e., the mediators are causally independent or parallel). Figure 1.a without dashed lines illustrates this case. Wang et al. (2013) propose an alternative modeling approach under the setting of causally independent mediators. If the mediators interact with each other in terms of their impact on the outcome, then additivity of indirect effects as in the above cannot hold; and estimation of multivariate mediated effects can then be further complicated by correlations among the mediators.

Dependence among mediators has been considered when are observed sequentially (i.e., sequential mediators; Figure 1.b), as in Imai & Yamamoto (2013). Albert & Nelson (2011), and Daniel et al. (2015) propose approaches for either sequentially dependent mediators or mediators that do not affect nor interact with each other. These approaches offer a decomposition of the JNIE in the case of sequential dependence, and assume additivity of natural indirect effects otherwise. VanderWeele & Vansteelandt (2014) discuss an approach to decompose the JNIE further when the mediators simultaneously affect each other; however, their approach does not evaluate the impact of each individual mediator (see Section 4.3).Taguri et al. (2015) propose an approach for contemporaneous, non-ordered mediators, but rely on an assumption that the mediators are conditionally independent given observed covariates, which does not fully represent the possibility of contemporaneous interactions among the mediators, as may be the case with multiple emissions (in particular SO  and NO ) and the formation of ambient PM. Section 6 examines the possibility of contemporaneous interactions among (possibly correlated) mediators in the context of the scrubber study.

In summary, existing methods for multiple mediators rely on either assumed causal independence of (parallel) mediators and additivity of indirect effects, sequential dependence of mediators, or on restrictive assumptions of conditional independence among mediators. VanderWeele & Vansteelandt (2014) point out that if there are interactions between the effects of (non-sequential) multiple mediators on the outcome, the joint indirect effect may not be the sum of all three indirect effects. They note that, in principle, an analysis could proceed with an outcome model including interactions for all combinations combined with models for (). However, this approach would lead to issues of model compatibility between the models for and and that for the product . The lack of satisfactory methods for more general settings of multiple contemporaneously-measured mediators motivates the methods developed herein, where we offer a new decomposition of the joint natural indirect effect into individual indirect effects that may not affect the outcome additively.

## 4 New Methods for Causal Mediation Analysis and Principal Stratification with Multiple Contemporaneous Mediators

### 4.1 Notation for Multiple Mediating Variables

Suppressing the subscript indexing power plants, let denote the potential emissions of pollutants that would occur if a power plant were to have scrubber status , for . While much of our development is general for any , we focus on the case so that denotes the potential emissions of SO, NO, and CO, respectively. The causal effect of the scrubber on emission can then be defined as a comparison between and . Let denote potential emissions under a set of three scrubber statuses .

We similarly define potential PM  outcomes, but extend the notation to define potential concentrations under different values of scrubber status, , and different possible values of emissions, . Thus, in full generality, each power plant has a set of potential outcomes for PM, , which denote potential values of PM  that would be observed under intervention with pollutant emissions set at values under interventions . Definition of all potential PM  concentrations is required for definition of natural direct and indirect effects and entails a priori counterfactuals. For example, would represent the potential ambient PM  concentration near a plant under the hypothetical scenario where the plant installs a scrubber (), but where emissions of SO  and NO  are set to what they would be without the scrubber () and emissions of CO  are set to what they would be with the scrubber (). This may be conceptualized as a setting where a power plant installs a scrubber, but offsets the cost of the technology by burning coal with a higher sulfur content and discontinuing use of a different NO  control, thus “blocking” the intervention and maintaining SO  and NO  emissions at levels that would have occurred without the SO  technology. Principal stratification will only rely on potential outcomes with that are observable from the data, such as and observed for any power plant that installs a scrubber. Finally, let denote a vector of baseline covariates measured at the power plant or the surrounding area.

### 4.2 Observable Outcomes: Principal Causal Effects for Multiple Mediators

Extending principal stratification to settings where the intermediate variable is multivariate is conceptually straightforward. Principal stratification defines a principal stratum for every combination of the joint vector , and principal causal effects are defined as comparisons between and within principal strata.

For any subset , let denote the element-wise absolute differences between emissions of the subset of pollutants in , e.g., . Definitions of quantities such as average associative and dissociative effects can proceed following Zigler et al. (2012) by defining:

 EDEK = E[Y(1;M(1,1,1))−Y(0;M(0,0,0))∣∣|(M(1,1,1)−M(0,0,0))|KCAK],

where denotes a vector of thresholds beyond which a change in each emission in is considered meaningful, is a vector of thresholds below which changes in these emissions are considered not meaningful, and and represent element-wise comparisons. Note that the dissociate effect is now defined on principal strata where potential changes (or differences) in the intermediate variables are less than some vector of thresholds instead of principal stratum with strict equality to accommodate continuous intermediate values. For example, would be used to define the associative (dissociative) effect in the subpopulation exhibiting an effect on SO  and CO  in excess of (below ), without regard to the effect on NO. For the data analysis in Section 7, we divide the EAE defined above into two parts: EAE will denote the average associative effects among power plants where all emissions in are causally increased in excess of , while EAE will denote the average associative effect in power plants where all emissions in were causally reduced in excess of . Note that these summary quantities only consider a subset of principal strata that may be of interest. For example, analogous average principal effects could be calculated among strata where some emissions are decreased and others are increased. We avoid burdensome notation for such summaries, but will revisit estimates in additional principal strata in the context of the data analysis in Section 7.

In addition to estimating average dissociative and associative effects for different as defined above, interest may lie in entire surfaces of, for example, how the causal effect on PM  varies as a function of the causal effect on each emission (“causal effect predictiveness” surface (Gilbert & Hudgens, 2008)).

### 4.3 Observable and a priori Counterfactual Outcomes: Natural Direct and Indirect Effects for Multiple Mediators

Extending definitions of natural direct and indirect effects to the multiple mediator setting is somewhat more complicated. The natural direct effect is defined as NDE = , representing the causal effect of on that is “direct” in the sense that it is not attributable to changes in any of the emissions. The joint natural indirect effect of all mediators, , is derived by subtracting the natural direct effect from the total effect, .

In addition to , we introduce a decomposition into the natural indirect effects attributable to changes in different combinations of the mediators. Maintaining focus on the case where , the can be decomposed into emission-specific indirect effects and the joint indirect effects of all possible pairs of emissions. See Figure 5 in the Web Appendix for a graphical representation.

We define the mediator-specific NIE for the -th emission as a comparison between the potential PM  outcome under scrubbers and the analogous outcome with the value of the -th emission fixed to the natural potential value that would be observed without scrubbers. Specifically, for emissions of SO, NO, and CO  define:

 NIE1=E[Y(1;M(1,1,1))−Y(1;M(0,1,1))], NIE2=E[Y(1;M(1,1,1))−Y(1;M(1,0,1))], (2) NIE3=E[Y(1;M(1,1,1))−Y(1;M(1,1,0))].

In a similar fashion we can define the joint natural indirect effect attributable to subsets of mediators and for as differences between the observable potential PM  outcomes under scrubbers and the analogous a priori counterfactual with values of pollutants and set to their natural values that would be observed without scrubbers. For example, defines the joint natural indirect effects of mediators 1 (SO) and 2 (NO) as

 JNIE12=E[Y(1;M(1,1,1))−Y(1;M(0,0,1))].

Values of JNIE for other pairs of mediators can be defined analogously, and all such pairs correspond to the second row in Figure 5 in the Web Appendix. Note that the joint natural indirect effect of each pair of mediators is not equal to the sum of corresponding mediator-specific NIEs unless there is no overlap between mediator-specific NIEs (additivity). For example, we can represent the relationship between and the mediator-specific effects and as

 (NIE1+NIE2)−JNIE12 = E[Y(1;M(1,1,1))−Y(1;M(0,1,1))−Y(1;M(1,0,1))+Y(1;M(0,0,1))].

Thus, if this quantity is not equal to , we argue that additivity of mediator-specific NIEs does not hold. Note that the above decomposition of JNIE differs from VanderWeele & Vansteelandt (2014), which considers the portion of the JNIE mediated through , then sequentially considers the additional contribution of each mediator in the presence of the others. This presumed ordering of mediators precludes estimation of the effect through different pairs of mediators such as or , the availability of which is a benefit of our proposed decomposition. Our decomposition also differs from Daniel et al. (2015) who only allow interacting overlap between mediator-specific NIEs when one mediator causally affects another.

Note that alternative definitions of NIE could use contrasts of the form: . Such a strategy is also considered in Daniel et al. (2015), but defining NIE in this way would rely entirely on a priori counterfactuals, whereas a benefit of using the definitions in (2) is that each definition uses the observable potential outcome , comparing against only one a priori counterfactual (e.g., ).

## 5 Flexible Bayesian Models Assumptions and Estimation

Under the assumptions developed in this section, Bayesian inference for the causal effects defined in Section 4 follows from specifying models for the joint distribution of all potential mediators (conditional on covariates) and the outcome model conditional on all potential mediators and covariates, and prior distributions for unknown parameters. Posterior distributions cannot be computed directly from observed data because potential outcomes are never jointly observed in both the presence and absence of a scrubber and a priori counterfactuals are never observed. Our estimation strategy consists of three steps. First, we specify nonparametric models for the observed data. The marginal distribution of each observed mediator (i.e., observed for power plants that did not install scrubbers and observed for those that did) is specified separately and then linked into a coherent joint distribution using a Gaussian copula model (Nelsen, 1999). The models for the potential outcomes and are specified conditional on covariates and all potential mediators ( and ) that are never observed simultaneously. Thus, the conditional outcome models are estimated via the data augmentation for unobserved potential mediators. Second, we introduce two assumptions for estimating the TE and the associative and dissociative effects. Third, we employ an additional assumption to equate the distributions of a priori counterfactuals to those of the observed potential outcomes under intervention to allow estimation of the natural direct and indirect effects. We also provide optional modeling assumptions to sharpen posterior inference for the power plant evaluation. Throughout, we estimate the distribution of the covariates, , using the empirical distribution.

### 5.1 Models for the Observed Data

We specify Dirichlet process mixtures for the marginal distribution of each mediator (Müller et al., 1996). For each intervention , and baseline covariates , the conditional distribution of the -th observed mediator is specified as

 Mk,i|Zi=z,Xi=xi ∼ N(βzk0,i+x⊤iβzk1,τzk,i),Mk,i≥0;i=1,⋯,nz βzk0,i,τzk,i ∼ Fzk, Fzk ∼ DP(λzk,Fzk),

where denotes the observations with and indicates the -th mediator. We bound the mediator from below (0) using a truncated normal kernel (within the interval [0, )). and denote the intercept and precision parameters for the -th emission at the -th power plant that received intervention . Here, denotes the Dirichlet process with two parameters, a mass parameter () and a base measure (). To not overly complicate the model we only ‘mixed’ over the intercept and precision parameters in the conditional distributions, and . The base distribution is taken to be the normal-Gamma distribution, . Details including hyper prior specification are given in Section A of the Web Appendix.

The marginal distributions of each mediator under each are linked to model the joint distribution of with Gaussian copula models of the form:

 FM(z,z,z)(mz,z,z)=Φ3[Φ−11{FM1(z)(m1)},Φ−11{FM2(z)(m2)},Φ−11{FM3(z)(m3)}],

where are values of potential mediators under intervention and is the -variate standard normal CDF. Note that we elect to model the marginal distribution of each univariate random variable separately, and then combine with the Gaussian copula model, rather than directly model the joint distributions of . Thus, we allow full flexibility using DP mixtures of (truncated) normals for the marginal distributions (the fit of which can be checked empirically) and use the Gaussian copula to link them to construct the joint distribution of potential mediators. The Gaussian copula model implies some (correlation) structure to the joint distribution of all observable potential outcomes, without implying any specific causal structure. Flexibility of this structure derives from the fact that each marginal distribution is modeled as nonparametric with infinite dimensional parameter spaces. The strategy is designed to coalesce with the modeling strategy in Section 5.2. Note that other potential alternatives to link the fixed marginal distributions such as mixtures of marginals (e.g. or ) do not specify the full joint distribution distribution of (Nelsen, 1999)) and our method does not limit the number of the mediators in general. While the joint distribution of all potential mediators ( and ) is also modeled via the same Gaussian copula model, this entails modeling unobserved potential mediators and will be discussed as a part of the assumptions in Section 5.2

To model the distributions of the potential outcomes for each conditional on all potential mediators and covariates, we use a locally weighted mixture of normal regression models (Müller et al., 1996) that is induced by specifying a DP mixture of normals for the joint distribution of the outcome, all mediators and covariates. For each intervention , potential values of all (counterfactual) mediators and baseline covariates , the conditional distribution of the observed outcome is specified as

 f(yi|mi(0,0,0),mi(1,1,1),xi,Zi=z ) = ∞∑l=1ωzlN(yi,mi(0,0,0),mi(1,1,1),xi|μzl,Σzl)

where and denotes all elements of mean parameters except for . Similarly, denotes a submatrix of covariance matrix formed by deleting the the first row and the first column. The weight involves the parameter where and . This flexible conditional model specification is a necessary feature in our case since we allow the outcome model to capture nonlinear and/or interaction effects of the mediators. Note again that this outcome model is conditional on all potential mediators which cannot be observed at the same time. We use a similar approach to that used in Schwartz et al. (2011) to model the observed outcome distribution conditional on partly missing potential intermediate variables by constructing complete intermediate data. Here, we impute unobserved potential mediators for each unit with a data-augmentation approach based on the joint distribution of all potential mediators specified above. Details about hyper prior specification and posterior computation are given in the Web Appendix.

### 5.2 Assumptions for Estimation of Causal Effects

To estimate causal effects based on the model for the observed data specified in Section 5.1, we formulate assumptions relating observed quantities to both observable outcomes and a priori counterfactuals. Denote the conditional distribution with where is a vector of hypothetical values of the mediators under the interventions . The conditional distribution is denoted by . Other conditional distributions are defined analogously, and we henceforth omit conditioning on covariates to simplify notation.

#### 5.2.1 Assumptions for principal causal effects

We begin with an ignorability assumption stating that, conditional on covariates, “assignment” to scrubbers is unrelated to the observable potential outcomes: Assumption 1 (Ignorable treatment assignment)

 {Y(z;M(z,z,z)),M(0,0,0),M(1,1,1)}to0.0pt$⊥$⊥to0.0pt$⊥$⊥Z|X=x,

for . This assumption permits estimation of the distributions of potential outcomes under intervention with observed data on ambient PM  and emissions under the same intervention.

We adopt a Gaussian copula model to link the distributions of for into a single joint distribution of observable potential outcomes.

Assumption2: The joint distribution of all potential mediators conditional on covariates follows a Gaussian copula model (Nelsen, 1999):

 FM(0,0,0),M(1,1,1)(m0,0,0,m1,1,1)= Φ6[Φ−11{FM1(0)(m1)},Φ−11{FM2(0)(m2)},Φ−11{FM3(0)(m3)},Φ−11{FM1(1)(m1)}, Φ−11{FM2(1)(m2)},Φ−11{FM3(1)(m3)}]

where is the multivariate normal CDF with mean and a correlation matrix .

Assumption 2 implies a joint distribution of all observable potential mediators in a manner consistent with the models for described in Section 5.1. However, this entire joint distribution of potential mediators under both interventions is not fully identified from the data since potential mediators under different interventions are never jointly observed. Specifically, entries of the correlation matrix corresponding to, for example, the correlation between and , are not identifiable in the sense that no amount of data can estimate unique values for these parameters. Nonetheless, proper prior distributions for these parameters can still permit inference from proper posterior distributions. Such parameters are sometimes referred to as “partially identifiable” in the sense that increasing amounts of data may lead the supports of posterior distributions to converge to sets of values that are smaller than those specified in the prior distribution (Gustafson, 2010; Mealli & Pacini, 2013). This can arise due to restrictions on the joint distribution implied by the models for the marginal distributions (e.g., the positive-definiteness restriction on may exclude some possible values for its entries). We discuss two prior specifications for the partially-identified parameters in , noting that further details of partial identifiability in the principal stratification context appear in Schwartz et al. (2011).

#### 5.2.2 Assumptions for Mediation Effects

Towards estimation of natural direct and indirect effects, we augment the assumptions of Section 5.2.1 with one relating observable outcomes to a priori counterfactual outcomes.

Assumption 3: For intervention , the conditional distribution of the potential outcome given values of all potential mediators (and covariates) is the same regardless of whether the mediator values were induced by or .

This assumption implies that the a priori counterfactual and the observable potential outcomes have the same conditional distribution,

 f1,M(0,0,0)(y|M(0,0,0)=m,M(1,1,1),x) =

This assumption also applies to any two mediators in the absence of the intervention. For instance, the a priori counterfactual of , , and have the same conditional distribution regardless of whether corresponding emissions values arose under a scrubber () or absent a scrubber (),

 f1,M(0,1,0)(y|M(0,1,0)=m,M(1,0,1),x) =

The key point is that the distribution of PM  under a given (unobservable) combination of mediators () only depends on the values of the mediators and not the intervention that led to those mediators. Asserting this assumption in this case relies in part on what is known about the underlying chemistry relating SO, NO, and CO  emissions to PM. Note that such an assumption may be more difficult to justify in, say, a clinical study where assumptions about a priori counterfactuals might pertain to choices of study participants.

The above assumption can be cast as two homogeneity assumptions of the form proposed in Forastiere et al. (2016). For example, one implication of Assumption 3 is that the a priori counterfactual is homogeneous across all principal strata with , regardless of the value of . Viewing Assumption 3 in terms of the implied homogeneity across principal strata aids interpretation and justification in the context of the power plant example. Homogeneity across strata implies that the potential ambient air quality value in the area surrounding a power plant is related to (possibly counterfactual) emission levels only, and not to the power plant characteristics that govern effectiveness of scrubbers for reducing emissions (i.e., the power plant characteristics that determine the exact principal stratum membership). This underscores the importance of including covariates in that capture characteristics of the monitoring locations (e.g., temperature and barometric pressure). Appendix D provides details of the relationship between Assumption 3 and assumptions of homogeneity across principal strata. While Assumption 3 implies homogeneity assumptions, the converse is not true in the case of multiple mediators due to the connection of Assumption 3 to a priori counterfactuals defined to have mediator values induced by different interventions (e.g., ). We discuss a sensitivity analysis to this assumption in Web Appendix J.

#### 5.2.3 Optional Modeling Assumptions to Sharpen Posterior Inference

With the above model specification, the partial identifiability of the model parameters in warrants careful attention. Proper but noninformative prior distributions for these parameters could be specified marginally for these parameters as , or equivalently, as conditionally uniform on intervals satisfying positive definiteness restrictions for the correlation matrix. In either case, posterior inference may exhibit large uncertainty.

We consider in detail an alternative prior specification similar to that in Zigler et al. (2012) to sharpen posterior inference. Specifically, the correlations between mediators under different interventions are specified as follows:

 Cor(Mj(0),Mk(1))=Cor(Mj(0),Mk(0))+Cor(Mj(1),Mk(1))2×ρ, for j,k=1,2,3,

with a sensitivity parameter. This strategy implies that (a) the correlation between the same mediator under opposite interventions is , and (b) the correlation between different mediators under opposite interventions is an attenuated version of the correlation observed separately under each intervention. Section B of the Web Appendix provides a correlation matrix implied by this assumption in the case of 2 mediators. We assume a single and specify a uniform prior distribution, , but a different parameter could be specified for each mediator.

As an additional assumption to sharpen posterior inference, we assume that the correlations between emissions (mediators) are all positive. Support for this assumption comes from observed-data estimates of these conditional correlations that are all positive.

In summary, assumptions 1-2 are sufficient to estimate the principal causal effects, and pertain only to observable potential outcomes. Adding assumption 3 relating observed quantities to a priori counterfactuals permits estimation of direct and indirect effects for mediation analysis. The optional assumptions here in Section 5.2.3 are designed to sharpen posterior inference in the power plant analysis.

#### 5.2.4 Posterior Inference

A Markov chain Monte Carlo (MCMC) algorithm is used to sample from this posterior distribution and estimate causal effects using the following steps: (1) sampling parameters from each marginal distribution for potential mediators and conditional distribution for potential outcomes defined in Section 5.1; (2) sampling parameters from the correlation matrix of the Gaussian copula; (3) sampling via data augmentation a priori counterfactual mediators from the joint distribution; (4) computing causal effects based on all potential mediators and outcomes including imputed a priori outcomes and mediators; (5) iterate Steps 1-4. The specifics of estimation (conditional on our specific model formulation) are based on the existing literature on Bayesian estimation of causal effects (and principal causal effects in particular), for example, in Mattei & Mealli (2011); Zigler et al. (2012); Daniels et al. (2012).

The Web Appendix contains details of the MCMC procedure (Section F), prior specification for all other model hyper-parameters (Section A), and the procedure for computing the principal causal effects and the mediation effects from the posterior distributions of model parameters (Section C).

## 6 Numerical Study

We examine the performance of the proposed model under combinations of the following two data generating scenarios: (1) correlations among the mediators (Case 1: uncorrelated mediators vs. Case 2: correlated mediators) and (2) interaction terms between the mediators in the outcome model (Case A: interaction term between and vs. Case B: interaction terms between and , and between and ). Data sets of size are simulated for each of the four cases (1/A, 1/B, 2/A, 2/B), each with three continuous confounders. In all cases, the three mediators are generated based on a multivariate normal distribution. See the Web Appendix (Section G) for the exact data generating mechanism.

We compare our method for estimating mediation effects to a regression-based model(MacKinnon, 2008):

 M1 = α01+α11Z+X⊤α1+ϵ1 M2 = α02+α12Z+X⊤α2+ϵ2 M3 = α03+α13Z+X⊤α3+ϵ3 Y = β0+β1Z+β2M1+β3M2+β4M3+X⊤β+ϵY

where and are all independently distributed as .

Table 2 summarizes the results based on 400 replications for each of the four scenarios. It shows that our proposed model (BNP) performs well in terms of bias and MSE for all cases. Note that the true effects change when the mediators are correlated in the presence of interaction term(s) in the outcome model. Thus, with any interaction effects of the mediators, it is desirable to capture the correlation structure of the mediators, which our method does by flexibly modeling the joint distribution of all potential mediators. Also, the flexible Bayesian nonparametric model can capture both complex relationships/interactions among the mediators and non-additive and nonlinear forms of mediators and/or confounders in the outcome model. In each scenario, interaction terms in the outcome model introduce non-additivity in the joint natural indirect effect (e.g., ) and the traditional regression model has larger biases (and larger MSEs) for mediation effects.

## 7 Analysis of Power Plant Scrubbers in the Acid Rain Program

Here we estimate causal effects of having scrubbers installed in January 2005 () on annual average emissions of SO, NO, and CO  in 2005 () and on the 2005 annual average ambient PM  concentration within 150km of a power plant (). Emissions are log-transformed. Before reporting results, note that basic checks of the fit of marginal nonparametric models appear in Web Appendix I, indicating fit that is clearly superior to simple parametric models.

A simple comparison of means indicates that the 150km area around power plants with scrubbers installed () had average ambient PM  that was lower, on average, than the areas surrounding power plants without scrubbers (12.4 vs. 13.7 ). Similarly, the power plants with scrubbers also emitted less SO, more NO, and more CO  than the plants without scrubbers. Table 1 lists the covariates in to adjust for confounding and presents summary statistics for scrubber and non-scrubber power plants.

We present an analysis with the proposed method using the constrained prior specification in Section 5.2.3. Analysis using uniform prior distributions on all elements of the correlation matrix appears in the Web Appendix. All reported estimates are listed as posterior means (95% posterior intervals). The analysis estimates that having scrubbers installed causes SO  emissions to be -1.17 (-1.86, 1.55) 1000 tons lower, on average, than they would be without the scrubber. The analogous causal effects for NO  and CO  emissions were 0.04 (0.00, 0.07) 1000 tons and 0.001 (-0.00, 0.004) million tons, respectively, indicating that scrubbers did not significantly affect these emissions, on average. The total effect (TE) of having scrubbers installed on ambient PM  within 150km is estimated to be -1.12 (-2.07, -0.29) , suggesting a reduction amounting to approximately 10% of the national annual regulatory standard for PM.

### 7.1 Principal Causal Effects

For the -th emission, let denote the posterior standard deviation of the estimated individual-level causal effect of a scrubber on , with posterior mean estimates , , . Let denote the vector of for the emissions in . To summarize dissociative effects, we set to estimate EDE among power plants where the scrubber effect on emissions in is within one-fourth of a standard deviation of the effect in the population. Similarly, we summarize associative effects with to estimate EAE (EAE) among power plants where the scrubber causally reduces (increases) emissions in more than one-fourth of a standard deviation of the effect in the population.

Before providing estimates of specific principal effects, we first examine 3-D surface plots in Figure 2. For each emission separately (), Figure 2 depicts estimated scrubber effects on PM  across varying effects on emissions determined by values of simulated from the model. Note the pattern for all emissions that the surfaces are sloped downward in the direction of increasing and (sloped towards the viewer), indicating larger effects on PM  among plants with larger emissions values under both scrubber statuses, i.e., larger plants.

In Figure 2LABEL:sub@so2 for SO, the dots in the -plane lie almost entirely in the region where , indicating as expected that scrubbers predominantly decrease SO  emissions. Associative effects for SO  are indicated by the downward slope of the surface in the direction of decreasing (towards the left of the viewer), indicating that larger decreases (increases) in SO  are associated with larger decreases (increases) in PM.

The analogous surfaces for NO  and CO  appear in Figures 2LABEL:sub@nox and 2LABEL:sub@co2, respectively. In contrast to the surface for SO, the dots in the -plane fall more closely and symmetrically around the line , reflecting that scrubbers do not affect these emissions, on average. The surface for NO  exhibits some evidence of associative effects in the opposite direction of those for SO; there is some downward slope of the surface in the direction of increasing (towards the right of the viewer), indicating that larger increases (decreases) in these emissions are associated with larger decreases (increases) in PM.

Table 5 lists posterior mean and standard deviation of EDE, EAE, and EAE for all possible .

Estimates of EDE for all indicate little to no reduction in PM  among plants where emissions were not affected in excess of , with the exception of some pronounced estimates of EDE for NO and CO. Estimates of EAE and EAE tend to be less than zero. The most pronounced estimate of EAE -1.19 (0.46) for SO suggests that PM  was reduced among power plants where SO  emissions were substantially reduced, which corresponds to the contour of the surface in Figure 2LABEL:sub@so2 and is consistent with the anticipated causal pathway whereby scrubbers reduce PM  through reducing SO  emissions. In accordance with the opposite sloping surface in Figures 2LABEL:sub@nox, the estimate of EAE is most pronounced for NO, and NO, CO, indicating that ambient PM  is decreased among plants with substantial increases in NO  emissions.

Recall that the estimates in Table 5 represent average principal effects over only a subset of principal strata, in particular those where changes in multiple emissions are concordant (i.e., all decreasing, all increasing, or none changing). Other strata may be of interest. Figure 3 provides estimates of principal effects in a cross-classification of strata defined by changes in CO  and SO, with changes defined as increases, decreases, or no change in reference to and . For example, the third column of Figure 3 subdivides the stratum defined by causal increases in CO  into three substrata: those where CO  increases and SO  (1) decreases (in excess of ); (2) does not substantially change (beyond ) ; or (3) increases (in excess of ). Principal causal effect estimates for these three substrata appear along with their relative proportion among the stratum defined by CO  increases, indicated by the size of the plotting symbol. The light grey dot corresponds to EAE for SO, CO as reported in Table 5, but note that only 4% of the CO-increase stratum exhibits SO  increases. The dark grey dot corresponds to the principal effect among the 21% of the CO-increase stratum in substratum (2) where SO  does not change, with a principal effect estimate of -0.13 (0.99). The remaining proportion (75%) of the CO-increase stratum belongs to substratum (3) where the plants exhibiting decreases in SO  and a corresponding principal effect estimate of -1.21 (0.73). Thus, for CO, the negative estimate of EAE from Table 5 is revealed to be generated in large part by strata where SO  decreases and there is a pronounced negative effect on PM. Analogously, the second column of Figure 3 considering the stratum where CO  emissions do not substantially change (used to estimate EDE) reveals that 63% of this strata exhibited causal reduction in SO  and a causal reduction in PM  of -0.87 (0.49), explaining in large part the negative estimate of EDE for CO in Table 5. Analogous cross-classification of strata by changes in NO  and SO  appears very similar to Figure 3 and is not presented.

The main conclusions from the principal stratification analysis are that 1) scrubbers reduce SO  on average, but not NO  or CO, 2) there is some evidence of a nonzero dissociative effect for SO, 3) associative effects for SO  are more pronounced than dissociative effects, with PM  reduced more around plants where scrubbers cause large reductions in SO; 4) associative effects for NO  and CO  are more pronounced than dissociative effects, with PM  reduced more around plants where scrubbers cause larger increases in these emissions; but that 5) strata defined by increases (or no change) in NO  and/or CO  are comprised in large part by substrata where SO  and PM  were causally reduced. This analysis points towards (but cannot confirm) the conclusion that scrubbers affect PM  among plants where emissions are not changed, and that scrubber effects on PM  are mediated in part through effects on SO, with less evidence of a mediating role of NO  and CO.

### 7.2 Mediation Effects

To estimate direct and indirect effects, we augment the principal stratification analysis with Assumption 3 in Section 5.2.2 about a priori counterfactuals. Figure 1 (top) in the Web Appendix depicts boxplots of the posterior distributions of TE, NDE, JNIE, JNIE, JNIE, JNIE, and the individual NIEs. The estimated NDE, representing the direct effect of a scrubber on ambient PM  that is not mediated through any emissions changes, is -0.53 (-1.51, 0.39) , indicating no evidence of a direct effect of scrubbers on PM  that is not mediated through SO, NO, or CO. The NIEs for NO  (NIE) and CO  (NIE) are estimated to be very close to 0, -0.02 (-0.26, 0.21) and -0.04 (-0.33, 0.23), respectively. The estimated NIE for SO  (NIE) is -0.54 (-1.20, -0.01), indicating a significant indirect effect. The joint natural indirect effects involving SO