# On the collapsibility of measures of effect in the counterfactual causal framework

###### Abstract

The relationship between collapsibility and confounding has been subject to an extensive and ongoing discussion in the methodological literature. We discuss two subtly different definitions of collapsibility, and show that by considering causal measures of effect based on counterfactual variables it is possible to separate out the component of non-collapsibility which is due to the mathematical properties of the effect measure, from the components that are due to structural bias such as confounding. We provide weights such that the causal risk difference and the causal risk ratio are collapsible over arbitrary baseline covariates, and demonstrate that such general weights do not exist for the odds ratio.

## 1 Introduction

A measure of association (such as the risk difference or the risk ratio) is said to be collapsible if the marginal measure of association is equal to a weighted average of the stratum-specific measures of association [1]. The relationship between collapsibility and confounding has been subject to an extensive and ongoing discussion in the literature[2]. In this paper, we argue that the concept of collapsibility can be made clearer by framing the discussion in terms of causal effect measures based on counterfactual variables.

In all the examples, we are interested in the effect of a binary exposure (e.g. a drug), on a binary outcome (e.g. a side effect). We use superscript to denote counterfactual variables [3]. For example, is an indicator for whether an individual would have got the outcome if, possibly contrary to fact, she had been exposed to the drug. We will make a distinction between measures of association, which compare the distribution of the outcomes in the exposed with the distribution of outcomes in the unexposed; and causal measures of effect, which compare the counterfactual distribution under exposure (across everyone) with the counterfactual distribution under the absence of exposure (across everyone). For example, the associational risk difference is whereas the causal risk difference () is . These effect measures may be defined within levels of covariates . We denote this using a subscript on the effect measure: .

## 2 Definitions of Collapsibility

We will adopt Pearl’s definition of collapsibility for measures of association [4]:

###### Definition 1.

(Collapsibility of a Measure of Association) Let be any functional that measures the association between and in the joint distribution . We say that is collapsible on a variable with weights if

Newman [5] showed conditions under which the associational risk difference, risk ratio, and odds ratio are collapsible according to this definition. He also provided corresponding weights. Briefly, we note that: the associational risk difference is collapsible with weights if is not associated with the outcome in the unexposed, or if is not associated with the exposure; the associational risk ratio is collapsible with weights under similar conditions; and the associational odds ratio is collapsible with weights under certain very limiting conditions, for example if is equal for all values of . A full discussion of the graphical and probabilistic conditions that lead to collapsibility under this definition is provided by Greenland and Pearl [6].

From these results, it follows that general statements about the collapsibility properties of effect measures (e.g. “the risk difference is collapsible”) must either be qualified by the specification of the conditions that are being assumed, or alternatively taken to refer to some other definition of collapsibility. We propose a suitable definition: a causal measure of effect is collapsible if the marginal effect measure is equal to a weighted average of the stratum-specific causal effect measures. This is a formalization of the definition used in Fine Point 4.3 in Hernan and Robins textbook Causal Inference:

###### Definition 2.

(Collapsibility of a Measure of Causal Effect)

Let be any functional that measures the association between and in the joint distribution . We say that is collapsible on a variable with weights if

Under definition 2, collapsibility is understood as a mathematical property of the effect measure, rather than a consequence of certain graphical or probabilistic structures in the data set. Consequently, results from Greenland and Pearl do not apply under definition 2, and measures of effect may be collapsible over even if is a confounder. Definitions 1 and 2 are not generally equivalent: a set of weights that satisfies definition 1 may not satisfy definition 2, and conversely a set of weights that satisfies definition 2 may not satisfy definition 1. The definitions are however equivalent if there is both no confounding conditional on V, and no confounding unconditionally (i.e. if and for all values of a).

Finally, we consider a third related concept, discussed by Miettinen [7], who stated (correctly, but without proof) that the “standardized risk ratio” (SRR), which is constructed by standardizing the risk in the exposed and the risk in the unexposed separately with weights Pr() and reporting the ratio of these measures (Formula 4 in Miettinen), is equal to a weighted average of the stratum-specific risk ratios under the weights (Formula 6 in Miettinen). Since Miettinen’s SRR is equal to the causal risk ratio if there is no unmeasured confounding, Miettinen’s weights satisfy Definition 2 in the special case of no confounding conditional on .

## 3 Collapsibility of Measures of Causal Effect

### 3.1 Risk difference

The causal risk difference is collapsible over covariates V with respect to weights if . We next proceed to show that the causal risk difference is collapsible over arbitrary covariates if we use the weights .

First note that the sum of the weights is 1, allowing the denominator to be ignored. Next,

(1) | ||||

Also note that if the risk difference is the same in every stratum (i.e. in the absence of effect modification) the stratum-specific risk differences will also be equal to the marginal risk difference, and the risk difference is collapsible with any weights. It can be shown that this is true for any measure of effect for which there exist weights that guarantee collapsibility over arbitrary covariates.

### 3.2 Risk Ratio

The risk ratio is asymmetric with respect to coding of the outcome, so it is necessary to consider each risk ratio model separately. These are defined as follows:

The two risk ratio models require different sets of weights for collapsibility. We next show that the causal risk ratio is collapsible over arbitrary covariates if we use the weights , i.e. weights determined by the distribution of the baseline covariates among those individuals who would have been cases if they, possibly contrary to fact, were not treated with drug A:

Our goal is to show that

Again, we note that the sum of the weights is 1, and that the denominator can therefore be ignored.

(2) | ||||

(Bayes Theorem) | ||||

This proof is not invariant to the coding of the exposure or outcome variables, and the correct weights will therefore depend on the exact specification of the risk ratio parameter. Analogous proofs can be provided to show that the weights for are given by , the weights for are given by , and that the weights for are given by ,

Note that the marginal causal risk ratio is generally not equal to a weighted average of the conditional causal risk ratios, if the weights are determined by the marginal distribution of the covariates . Exceptions occur in special situations, such as when the risk ratio is equal in every stratum (i.e. when there is no effect modification on the risk ratio scale).

### 3.3 Odds Ratio

For all the previously discussed parameters, we have shown that for any baseline covariates V, there exist weights such that the marginal effect measure is equal to a weighted average of the stratum-specific effects. We will now show that this does not hold for the odds ratio by considering the following simple counterexample:

Consider a population, with 25 percent men and 75 percent women, where a randomized trial is conducted on the effect of drug A. The hypothetical results are shown in Table 1. The randomization probability is equal in men and women and we have an infinite sample size, there is therefore no confounding.

Average Counterfactual Risk of Outcome (Placebo) | Average Counterfactual Risk of Outcome (Treatment) | Odds Ratio | |

Men (25 Percent) | 0.5 | 0.75 | 3 |

Women (75 Percent) | 0.25 | 0.5 | 3 |

Overall | 0.3125 | 0.5625 | 2.82 |

This table shows that for the variable sex, the stratum-specific causal odds ratios are equal between men and women, but the overall causal odds ratio is different from the stratum-specific odds ratios. Moreover, since any weighted average of the stratum-specific odds ratios is 3, there does not exist any set of weights that makes the odds ratio collapsible over sex. This counterexample shows that no generally applicable weights such as those for the risk difference and the risk ratio can be provided for the odds ratio.

## 4 Identification of the Weights

If the investigator intends to report an average of the stratum-specific effects as an estimate of the marginal effect, it is necessary to know not only that the effect is collapsible in principle, but also to construct appropriate weights, identify them from the data and apply them in the analysis. The weights for the risk ratio , Pr, have a counterfactual variable in the conditioning event, and may not be identified from the data. However, we proceed to show that the weights are identified in the absence of unmeasured confounding, i.e if

###### Proof.

(3) | ||||

Bayes Theorem | ||||

Exchangeability | ||||

Consistency |

∎

is constant over and can therefore be factored out of the weights. In the absence of confounding, the weights Pr are therefore equivalent to Miettinen’s weights Pr as discussed earlier.

An alternative identification of the weights can be used if standardizing experimental results to a population where everyone is unexposed. In such situations, in all individuals by consistency, and the weights in the target population are identified as

## 5 Discussion

We have reviewed well-established results from previous work on the collapsibility of measures of association, and shown corresponding results for causal measures of effect. With these causal effect measures, one is able to disentangle the components of non-collapsibility that are due to the mathematical properties of the effect measure from the components that are due to structural bias and the probabilistic structure of the dataset. We have provided new, simple weights for the causal risk ratio, which guarantee collapsibility over arbitrary baseline covariates, and showed that such weights do not exist for the causal odds ratio.

Our weights for the risk ratio are equivalent to the weights previously discussed by Miettinen when there is no unmeasured confounding; in other words, in all situations where standardizing over V provides a valid estimate of the causal effect. However, our formulation allows much simpler presentation of the weights and of the proofs. Furthermore, our formulation highlights pitfalls of using weighted averages: When conditioning on V, the correct weights cannot be estimated from the data if unmeasured confounding is present. In such scenarios, using erroneous weights may amplify the bias that is caused by unmeasured confounding within the strata.

Finally, we note that in many cases it is possible to sidestep the collapsibility of the effect measure entirely, by standardizing the distributions of and separately. One way to do this is by reporting the overall marginal risk ratio as

Since this procedure does not depend on non-collapsibility, analogous procedures are valid for any effect measure, including the odds ratio.

## References

- [1] Alice S Whittemore. Collapsibility of Multidimensional Contingency Tables. Journal of the Royal Statistical Society. Series B, 1978.
- [2] Sander Greenland, Judea Pearl, and James M. Robins. Confounding and Collapsibility in Causal Inference. Statistical Science, 14(1):29–46, 2 1999.
- [3] Miguel A Hernán and James M Robins. Causal Inference. Chapman & Hall/CRC, Boca Raton, 2016.
- [4] Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, Cambridge, 2 edition, 2009.
- [5] Stephen C. Newman. Biostatistical methods in epidemiology. John Wiley & Sons, 2001.
- [6] Sander Greenland and Judea Pearl. Adjustments and their Consequences-Collapsibility Analysis using Graphical Models. International Statistical Review, 79(3):401–426, 12 2011.
- [7] Olli S Miettinen. Standardization of risk ratios. American Journal of Epidemiology, 96(6):383–8, 12 1972.

## Acknowledgement

The authors thank James Robins for pointing out the link between the weights proposed in this paper, and previously published weights due to Miettinen.

## Author Contributions

AH had the original idea, provided the original version of the theorems and proofs, wrote the first draft of the manuscript and coordinated the research project. MJS and ES contributed original intellectual content and extensively restructured and revised the manuscript. All authors approved the final version of the manuscript.