Additive hazard and MSMs in continuous time

# The additive hazard estimator is consistent for continuous time marginal structural models

Pål C. Ryalen, Mats J. Stensrud, and Kjetil Røysland Department of Biostatistics, University of Oslo, Domus Medica Gaustad, Sognsvannsveien 9, 0372 Oslo, Norway
July 11, 2019
###### Abstract.

Marginal structural models (MSMs) allow for causal interpretations of longitudinal data. The standard MSM is based on discrete time models, but the continuous time MSM is a conceptually appealing alternative for survival analysis. In particular, the additive hazard model allows for flexible estimation of treatment weights in continuous time MSMs. In applied analyses, it is often assumed that the theoretical treatment weights are known, but usually these weights are fundamentally unknown and must be estimated from the data. Here we provide a sufficient condition for continuous time MSM to be consistent even when the weights are estimated, and we show how additive hazard models can be used to estimate such weights. Our results suggest that continuous time weights perform better than discrete weights when the underlying process is continuous. Furthermore, we may wish to transform effect estimates of hazards to other scales that are easier to interpret causally, and here we show that a general transformation strategy also can be used on weighted cumulative hazard estimates. Finally we explain how this strategy can be applied on data using our R-package ahw.

\DeclareUnicodeCharacter

00A0

## 1. Outline

Marginal structural models (MSMs) may obtain causal effect estimates in the presence of confounders, which e.g. may be time-dependent [robins2000marginal]. The procedure is particularly appealing because it allows for a sharp distinction between confounder adjustment and model selection [joffe2004model]: First, we adjust for observed confounders by weighing the observed data to obtain balanced pseudopopulations. Then, we calculate marginal effect estimates from these pseudopopulations based on our structural model.

Traditional MSM techniques for survival analysis have considered time to be a discrete processes [hernan2000marginal]. In particular, inverse probability of treatment weights (IPTWs) are used to create the pseudopopulations, and then e.g. several subsequent logistic regressions are fitted for discrete time intervals to mimic a proportional hazards model.

However, time is naturally perceived as a continuous process, and it also seems natural to analyse time-to-event outcomes with continuous models. Inspired by the discrete time MSMs, Røysland [roysland2011] therefore suggested a continuous time analogue to MSMs. Similar to the discrete MSMs, we have shown that the continuous MSM may obtain consistent effect estimates when the theoretical treatment weights are known [roysland2011]. In particular, the additive hazard regressions can be weighted with the theoretical continuous time weights to yield consistent effect estimates [ryalen2017transforming]. Nevertheless, the weights are usually unknown in real life, and must be estimated from the data. To the best of our knowledge, the performance of MSM when the IPTW are estimated remain to be elucidated.

In this article, we show that continuous time MSM also perform desirable when the treatment weights are estimated from the data: We provide a sufficient condition to ensure that weighted additive hazard regressions are consistent. Furthermore, we show how such weighted hazard estimates may be consistently transformed to other parameters that are easier to interpret causally. To do this, we use stability theory of SDEs, which allows us to find a range of parameters expressed as solutions of ordinary differential equations; many examples can be found in [ryalen2017transforming]. This is immediately appealing for causal survival analysis: First, we can use hazard models, that are convenient for regression modeling, to obtain weights. Estimates on the hazard scale are hard to interpret causally per se [robins1989probability, hernan2010hazards, aalen2015does, stensrud2017exploring], but we present a generic method to consistently transform these effect estimates to several other scales that are easier to interpret.

The continuous time weights and the causal parameters can be estimated in the R package ahw. We show that this ahw weight estimator, which is based on additive hazard models, satisfies the criterion that ensures consistency in Theorem LABEL:thm:ahwConsist. The ahw package makes continuous time marginal structural modeling simple to implement for applied researchers.

## 2. Weighted additive hazard regression

### 2.1. Hypothetical scenarios and likelihood ratios

Suppose we have observational event-history data where we follow i.i.d. subjects over time , and let count treatment and outcome of interest respectively for subject . Furthermore, let be the at-risk process for and . We let be the collection of baseline variables, as well as the treatment and outcome processes. are the processes that may influence or be influenced by , but are not of immediate interest for the outcome; i.e. we want to marginalize over . We could allow for possible dependent censoring, see [Andersen, III.2.1], but for the ease of presentation we initially consider independent censoring. We will show how our methods can be applied in some scenarios with dependent censoring in Section LABEL:sec:censoring_weights.

We let denote the filtration that is generated by all the observable events for individual . Moreover, let denote the probability measure on that governs the frequency of observations of these events, and let denote the intensity for with respect to and the filtration .

We are not really interested in the observed risk of individual having an event of type . Instead, we want to estimate the risk of such an event in a hypothetical situation where we had intervened according to some specified strategy. Suppose that the frequency of observations we would have seen in this hypothetical scenario had been governed by another probability measure on . Furthermore, we assume that all the individuals are also i.i.d. in this hypothetical scenario, and that , i.e. that there exists a likelihood-ratio

 Rit:=d~Pi|Fi,V0∪LtdPi|Fi,V0∪Lt

for each time . We will later describe how a detailed form of can be obtained. It relies on the assumption that the underlying model is causal, a concept we define in Section LABEL:section:causal_validity. For the moment we will not require this, only that defines the intensity with respect to for both and ; that is, the functional form of is identical under both and .

Suppose that has an additive hazard with respect to and the filtration that is generated by the , the processes in and the at-risk process . We stress that we consider the intensity process marginalized over , and thereby it is defined with respect to , and not . In other words, we assume that this hazard is on the (additive) form:

 (1) Xi⊺t−bt,

where is a bounded and continuous vector valued function, and the components of are covariate-processes or baseline-variables from .

### 2.2. Re-weighted additive hazard regression

Our main goal is to estimate the cumulative coefficient function in (1), i.e.

 (2) Bt:=∫t0bsds

from the observational data distributed according to . If we had known all the true likelihood-ratios , we could try to estimate (2) by re-weighting each individual in Aalen’s additive hazard regression [Andersen, VII.4] according to its likelihood ratio. However, the true weights are unlikely to be known, even if the model is causal. In real-life situations, we can only hope to have consistent estimators for these weights. We therefore assume for each that we have -adapted estimates that converge to in probability as increases. We will see, under relatively weak assumptions, that Aalens additive hazard regression, re-weighted according to , indeed gives consistent estimates of the cumulative hazard we would have seen in the hypothetical scenario. The estimator we will consider is defined as follows: Let be the vector of counting processes and the matrix containing the ’s, that is,

 (3) N(n)t:=⎛⎜ ⎜ ⎜⎝N1,Dt⋮Nn,Dt⎞⎟ ⎟ ⎟⎠ and X(n)s:=⎛⎜ ⎜⎝X1,1s…X1,ps⋮⋮Xn,1s…Xn,ps⎞⎟ ⎟⎠,

and let denote the -dimensional diagonal matrix, where the ’th diagonal element is . The weighted additive hazard regression is given by:

 (4) B(n)t:=∫t0(X(n)⊺s−Y(n),DsX(n)s−)−1X(n)⊺s−Y(n),DsdN(n)s.

#### 2.2.1. Parameters that are transformations of cumulative hazards

It has recently been emphasized, see e.g. [hernan2010hazards], that the common interpretation of hazards in survival analysis as risk of death during for an individual that is alive at , is often not true. A simple example in [aalen2015does] shows that this can also be a problem in RCTs. If is a counting process that jumps at the time of the interesting event, is a randomized treatment, and is an unobserved frailty, we could imagine that this situation would be described by the causal diagram:

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters