Validity of time reversal for testing Granger causality
Abstract
Inferring causal interactions from observed data is a challenging problem, especially in the presence of measurement noise. To alleviate the problem of spurious causality, Haufe et al. (2013) proposed to contrast measures of information flow obtained on the original data against the same measures obtained on timereversed data. They show that this procedure, timereversed Granger causality (TRGC), robustly rejects causal interpretations on mixtures of independent signals. While promising results have been achieved in simulations, it was so far unknown whether time reversal leads to valid measures of information flow in the presence of true interaction. Here we prove that, for linear finiteorder autoregressive processes with unidirectional information flow between two variables, the application of time reversal for testing Granger causality indeed leads to correct estimates of information flow and its directionality. Using simulations, we further show that TRGC is able to infer correct directionality with similar statistical power as the net Granger causality between two variables, while being much more robust to the presence of measurement noise.
I Introduction
The estimation of causal relations between time series is a signal processing topic promising to enhance our understanding of dynamical systems in numerous application domains. For data with time structure, the concept of Granger causality (GC) has gained popularity as a simple testable definition of causality based on temporal precedence. Signal processing techniques based on Grangercausality have been studied in a variety of fields such as econometrics [1], neuroscience [2, 3, 4, 5], and climate science [6, 7].
In its original formulation, a time series is said to Grangercause a time series , if the past of helps to predict above what can be predicted by using ‘all other information in the universe’ besides the past of [8]. In the bivariate framework, it is common to consider only the information contained in the past of and (cf. [9]).
A serious problem for the estimation of information flow using Granger causality is that spurious Granger causality can occur due to measurement noise. On one hand, if two sensors measuring the same signal are superimposed with noise, they mutually help predicting each other’s future [10, 11]. This is a problem especially in the study of brain connectivity using noninvasive electrophysiology, where the activity at a given sensor is typically a mixture of contributions from several neuronal sources due to the volume conduction of electric currents in the head [12, 13, 14, 15, 16]. On the other hand, noise that is correlated across sensors has a similar adverse effect on estimates of directed interaction even if the actual signalsofinterest are not mixed into different sensors [17, 18]. Such spurious causality can occur in any measure based on the concept of Granger causality, including multivariate [19, 20] and nonlinear [21, 22, 23] variants.
Recently, a number of ways to make causality estimates more robust to the presence of mixed signals and noise have been proposed. These include novel measures of directed information flow [12, 10, 11] as well as novel ways of assessing their statistical significance [24, 23, 17, 16, 18]. Recently, Haufe et. al [17, 16] suggested to contrast causality scores obtained on the original time series to those obtained on timereversed signals. The intuitive idea behind this approach is that, if temporal order is crucial to tell a driver from a recipient, directed information flow should be reduced (if not reversed) if the temporal order is reversed. In fact, Haufe et al. showed that for correlated, but noninteracting signals, the use of time reversal for testing Granger causality scores (here referred to as timereversed Granger causality, TRGC) and other metrics based on crossspectral estimates or linear autoregressive modeling correctly leads to rejection of causal interpretations. This was confirmed for Granger causality in an independent simulation study [18] showing that TRGC leads to a much smaller fraction of false positive detections compared to the original Granger causality index, and also compares favorably against the Phase Slope Index (PSI) [11].
While timereversed Granger causality thus displays an intriguing noise robustness property, and yields very encouraging results in simulations, its behavior in the presence of causal interactions is still poorly understood. In particular, it is currently unclear how Granger causality scores computed on timereversed signals link to the causal interactions on the original timeseries, and therefore whether TRGC correctly indicates the direction of causality. Theoretical guarantees have only been derived for special cases in which either the signal’s auto and crosscovariances are very small in magnitude, or in which both signals have very similar autocorrelations [18].
The aim of this paper is twofold. In the theory section, we provide new theoretical insights on timereversal for testing Granger causality between two variables. After introducing the concepts of linear autoregressive modeling, Granger causality, and timereversed Granger causality (Section IIA and IIB), we elaborate on the existing result of Haufe et al. [16] showing that, for mixtures of independent signals, causality measures based on crosscovariances are invariant to the reversal of the temporal order (Section IIC). This is the theoretical basis for the noiserobustness property of timereversal testing of causality scores. We then investigate the timereversal of a process fulfilling the assumptions typically made by Granger causality estimators: a finiteorder vector autoregressive (VAR) process that is unaffected by measurement noise. We review what is known about the timereversal of a VAR process (Section IID), based on what we provide an analytic description of Granger causality scores of timereversed signals in terms of their autoregressive coefficients (Section IIE) and a minimal example (Section IIF). Using these insights, we prove our main result stating that, in the case of unambiguous unidirectional information flow from to , time reversal leads to a decrease of the Grangercausal net information flow relative to the original time series. The difference of net Granger causality scores obtained on original and timereversed data thus indicates the correct direction of interaction (Section IIG).
In the second part of the paper (Section III), we revisit scenarios known to cause problems for conventional Granger causality. Using simulations, we illustrate when and how the theoretical guarantees of TRGC lead to measurable performance increases in practice. We point out the implications of our theoretical and empirical results in Section IV, along with a discussion of ambiguities in causal interpretation caused by the presence of correlated residuals in VAR models.
Ii Theory
Vectors are considered to be column vectors (unless otherwise stated), and are generally typed in bold. The symbol denotes the transpose operator, the identity matrix, and concatenation. The symbol refers to the Kronecker product, and to the vectorization operator, which converts a matrix into a column vector. The symbol denotes expectation. The crosscovariance matrices of a stationary process are denoted by
We use the notation both for an observed time series and its underlying data generating process. We denote all quantities related to the timereversed process with a tilde.
A process is said to be white noise if it is stationary with mean zero, finite covariance and zero autocorrelation; that is, if . Note that the covariance matrix is not necessarily diagonal, and that neither independence nor joint Gaussianity is required.
Iia Granger causality and the linear VAR model
Consider a stable bivariate vector autoregressive process of lag order (VAR() process), ,
(1) 
where is a dimensional white noise process (that is, , for , and for ) with residual covariance matrix
(2) 
The noise variables are also called innovations or residuals. Stability requires that for all with .
Following [25], and possess themselves autoregressive (AR) representations, which we denote by
(3)  
(4) 
The residuals and of these two univariate processes are each serially uncorrelated, but may be correlated with each other at various lags. Importantly, even though the bivariate autoregressive process (1) is of finite order, the univariate processes (3) and (4) are in general of infinite order. We refer to (1) as the unrestricted or full model, while (3) and (4) contain the restricted models.
Directed Grangercausal information flow is defined based on the socalled Grangerscores [25]
(5) 
Granger causality from to implies that information from the past of improve the prediction of the present of compared to what can be predicted by the past of alone. That is, the residual variance of the unrestricted model is required to be smaller than the residual variance of the restricted model. Under the assumption of Gaussiandistributed residuals, and are asymptotically distributed, giving rise to an analytical test of their significance [25]. An asymptotically equivalent test is given by an Ftest of the goodnessoffit of the two models (cf. [9, 4]). We refer to this approach as standard Granger causality (standard GC).
As variables in physical systems often mutually influence each other, it is also of interest to determine the net driver of the interaction by assessing whether more information is flowing from to then from to or vice versa. Following [11, 26], net Granger causality (NetGC) is defined as the difference of the Granger causality scores, that is
(6) 
As the analytical distributions of these differences are unknown, statistical significance of NetGC scores needs to be assessed using resampling methods as outlined in Section IIIA.
IiB Timereversed Granger causality (TRGC)
To avoid false detections of causal interactions, Haufe et al. proposed to contrast causality measures applied to the original time series with the same measures obtained from timereversed signals [17, 16]. Here, we formalize this idea in the context of Granger causality.
Given a bivariate VAR() process, its timereversed process also possesses a VAR() representation, which we derive in Section IID. We denote the residual covariance matrix of the timereversed process by
(7) 
The restricted AR models of the timereversed data have a simple structure, as they are concerned with univariate time series. The autocovariance function of a univariate time series is symmetric, i.e., we have and for all . As a result of this and (19) (Section IIC), the timereversed signals will have the same autocovariances as the original series. Because the AR representation is uniquely determined by the autocovariance function (cf. Section IID1), they also share the same AR representation. The restricted models of the timereversed univariate processes are thus given by
(8)  
(9) 
with
(10) 
In analogy to the original time series, we define the timereversed Granger scores as
(11) 
and the net Granger causality scores as
(12) 
Finally, the differences of the Granger scores obtained on original and timereversed signals are given by
(13)  
(14)  
(15) 
Timereversed Granger causality can be applied in the following variants.
Conjunctionbased timereversed Granger causality (ConjTRGC)
Here, net information flow from to is inferred if
(16) 
that is, if the directionality of net Granger causality reverses for timereversed signals. This variant has been investigated in [18].
Differencebased timereversed Granger causality (DiffTRGC)
Conjunction of NetGC and DiffTRGC
Finally, we can require both the timereversed net difference and the net Granger score to be significantly larger than zero in order to infer net information flow from to , that is
(18) 
Just as for NetGC, statistical significance of ConjTRGC and DiffTRGC, as well as the combination of NetGC and DiffTRGC can be assessed using resampling techniques (see Section IIIA).
IiC Robustness of timereversed Granger causality (TRGC)
In [16] it is pointed out that timereversed Granger causality robustly rejects causal interpretations for mixtures of noninteracting signals such as correlated noise sources. The mathematical basis for this noise robustness property is the fact that the crosscovariance matrices of the timereversed signals are equal to the transposed crosscovariance matrices of the original signals, that is
(19) 
for all . If a series only contains a mixture of independent signals, all its crosscovariance matrices are symmetric [27]: consider where contains a number of independent sources. Then, for all , and thus is symmetric. For mixtures of independent noise sources, any causality measure that is solely based on a series’ crosscovariance matrices therefore yields the same result on the original and the timereversed signals. This includes Granger causality, but also other popular variants such as directed transfer function (DTF) [19] and partial directed coherence (PDC) [20]. Given sufficient amounts of data, the conditions for ConjTRGC and DiffTRGC cannot be fulfilled for mixtures of independent sources using these measures, preventing the detection of spurious interaction.
IiD The VAR representation of a timereversed process
There is so far no theoretical argument guaranteeing that timereversed Granger causality correctly indicates the presence of information flow as well as its direction in the presence of actual interaction. In order to provide such a guarantee, we here study the timereversal of (linear) finiteorder VAR processes. Note that studying this case is sufficient since, as a results of Wold’s decomposition theorem, every stationary, purely nondeterministic, process can be approximated well by a finite order VAR process [28, 1].
We start by briefly revisiting the link between crosscovariance matrices and VAR representation, which we use throughout the paper, in Section IID1. In Sections IID2 and IID3, we then review the theoretical result of Andel [29] stating that the timereversed signal of any VAR() process has again a VAR() representation that can be expressed analytically in terms of the original process. As the description for is mathematically involved, we only treat the case in the main paper, while the proof for arbitrary is presented in Appendix A.
We use these results to provide an analytic description of differencebased TRGC scores in terms of their autoregressive coefficients (Section IIE), give a minimal example (Section IIF), and prove our main result stating that, in the case of unambiguous unidirectional information flow, differencebased timereversed Granger causality indeed yields the correct result (Section IIG).
IiD1 The crosscovariance function of a VAR process
Most of the insights in this paper are based on the direct link between autoregressive coefficient matrices and residual covariance matrices on one hand, and crosscovariance matrix on the other hand. This link is established by the YuleWalker equations as follows (see e.g. [1]). For a VAR(1) process
(20) 
the YuleWalker equations read
(21)  
(22) 
Given and , the crosscovariances are uniquely determined from (21) through
(23) 
while higherorder crosscovariances can be recursively computed using (22). Conversely, and are uniquely determined by the crosscovariances through
(24)  
(25) 
Results on VAR() processes can typically be extended to higherorder VAR() processes by reducing VAR() processes to their VAR() form. The VAR() representation of a VAR() process as well as the YuleWalker equations for general VAR() processes are provided in Appendix AA.
IiD2 The VAR representation of a timereversed VAR() process
The timereversed autoregressive representation of a VAR() process has been derived by Bartlett in 1955 [30]. Suppose we generate an infinite sequence of according to the VAR(1) process (20). The VAR representation of the timereversed or backward process is given by
(26) 
where
(27) 
and where the reversed residuals are calculated from as
(28) 
with residual covariance matrix
(29) 
It is easy to show that the sequence is indeed white noise, that is for all : and for all : .
From (27), we see that the timereversed coefficient matrix is similar to , and thus shares some of its properties, notably its eigenvalues, determinant, trace and rank. However, in the context of Granger causality, it is important to note that many properties of do not transfer to . In particular, if is triangular, diagonal, or symmetric, this is not generally the case for .
IiD3 The VAR representation of a timereversed VAR() process
The result of Bartlett on the timereversed VAR() process has been generalized to VAR() processes by Andel in 1972 [29], in a paper that received, so far, little attention. Andel showed that any stable VAR() process (1) has a timereversed representation
(30) 
that is again of order with uniquely defined autoregressive coefficients and residual covariance matrix . We reproduce this result in Appendix AB. Note that, while we only treat bivariate VAR processes in this paper, the analytic description of the timereversed VAR process holds for processes of arbitrary dimensionality.
IiE Analytic description of DiffTRGC
Contrasting Granger scores obtained on original with those obtained on timereversed signals is simplified by the fact that the AR representation of a univariate time series does not depend on the direction of time. It follows immediately from (10), that the differences of the Granger scores related to original and timereversed data do not depend on the restricted models:
(31) 
The Granger score differences , and thus only depend on the residual covariance matrices of the full models of the original and timereversed data. For the VAR(1) process, these are given in (25) and (29). For VAR() processes, the residual covariance matrices can be obtained through (56) and (59) as described in Appendix AA and AB.
IiF A minimal example
It is not intuitive to see how the residual variance of the timereversed process, and thus Granger causality, depends on the autoregressive coefficients of the model. Interpretation is made difficult by the occurrence of in (29).
Let us therefore consider the following minimal case: a VAR(1) process with . In that case, and for all (from (22)). All asymmetries in the crosscovariance matrices are thus due to asymmetries in .
Furthermore, timereversing the signal leads to transposition of the autoregressive coefficient matrix as a result of (27). The residual covariance matrices (25) and (29) are now given by
Denote with the autoregressive coefficients. We then have
and
The difference of the Granger scores computed on the original and timereversed time series thus indicates the correct net direction of information flow. We will in general not be able to infer whether has a Grangercausal influence on . However, we will be able to tell whether Grangercauses more than Grangercauses , or vice versa.
While this simple case will almost never occur in practice, we give theoretical guarantees for more general cases in the next section.
IiG Validity of TRGC for unidirectional information flow
We now prove our main result, the validity of differencebased timereversed Granger causality in the presence of unidirectional information flow. Consider a bivariate VAR() process with unambiguous unidirectional information flow. This is the case when all coefficient matrices are triangular and the residual covariance matrix is diagonal. Then the following theorem holds.
Theorem 1.
Let be a stable bivariate VAR() process (1) with the timereversed representation (30). Under the assumptions

are lower triangular matrices (i.e., may Grangercause , but does not Grangercause ), and

is a diagonal matrix, i.e. (the residuals are uncorrelated), and

is invertible ,
it holds that
(32)  
and that  
(33) 
Corollary 1.
Under assumptions (A1)(A3), Theorem 1 and (31) immediately imply the following inequalities for the differences of Granger scores:
(34)  
(35)  
(36) 
As a result of Corollary 1, net Grangercausal information flow from to is reduced or remains the same when the signal is timereversed. Thus, in the case of unambiguous unidirectional information flow, differencebased timereversed Granger causality yields the correct result. Note that it is not true in general that the net flow between the timereversed signals and , , is negative (reverses compared to the original series). That is, conjunctionbased TRGC might in some cases incorrectly reject the presence of true causal interaction.
Corollary 1 states that each of the three difference scores, , , and alone is sufficient to infer the correct directionality under assumptions (A1)–(A3). As (A1) requires information flow to be unidirectional, the individual scores and only indicate net information flow, which is what is also observed in Section IIF.
The three scores will behave differently if the assumption of uncorrelated residuals (A2) is violated. Then, and still hold, but the inequalities , and do not. On average, the net difference (which equals ) is less affected by the presence of correlations in the residuals than any of the individual scores, which is why we defined differencebased TRGC based on in (17). Nevertheless, all three scores are valid measures for net information flow, as residuals should be uncorrelated if the VAR model accurately describes a physical process. The significance of uncorrelated as opposed to correlated residuals is discussed in Section IVA.
Sketch of the proof. The first inequality (32) is relatively easy to prove. The intuition is the following: Since does not Grangercause , the prediction of is only based on past . In contrast, the coefficient matrices , …, of the timereversed representation are in general not triangular. This means that prediction of the timereversed signals is not only based on past , but can also use information from past . We would thus expect that can be better predicted than , and that the corresponding residuals are smaller.
The proof of the second inequality (33) is more involved. The intuition is the following: we would expect that the ‘amount’ of unexplainable variance is the same for both the original and the timereversed process. Thus, since the residual variance of decreases, the residual variance of should increase. Mathematically, we prove that
(37) 
The proof of (37) is the only part that requires the analytic description of , and is the main difficulty of the overall proof. It is not straightforward, because depends on the inverse of the covariance matrix , while we only have an analytic description of . From (37), it is easy to infer , which completes the proof. It is only in this final step that we need assumption (A2) that is diagonal.
Proof:
As are lower triangular matrices (assumption (A1)), is an autoregressive process of order ,
(38) 
Its timereversed representation (cf. Section IIB) is
(39) 
Proof:
As mentioned in the proof sketch, we need to derive (37), the equality of the determinants and . To improve readability, we here treat only the case , and derive (37) for general in Appendix AC.
The proof relies on Sylvester’s determinant theorem [31], which states that for any matrices , :
(41) 
We then have:
From the result of Part 1 (32), the equality of residual covariance determinants (37) (derived for general in Appendix AC), and assumption (A2) of uncorrelated residuals in , we then obtain:
∎
Strict inequality. Let us further note that inequality (36) for differencebased TRGC is strict in the presence of causal interaction. The following theorem holds.
Theorem 2.
Under assumptions (A1)(A3), it holds that
(42) 
The proof is provided in Appendix AD. Combined with Corollary 1, Theorem 2 immediately implies that in the presence of unidirectional information flow from to . That is, net Grangercausal information flow from to is truly reduced and cannot remain the same when the signal is timereversed.
Iii Experiments
In this section, we provide an empirical investigation of model violations and other factors influencing the performance of Granger causal measures using numerical simulations. After describing the tested methods and performance measures (Section IIIA), we compare several variants of TRGC in either the presence or absence of noise (Section IIIB). We then investigate the influence of common drivers, various types of noise (Section IIIC and IIID) and downsampling (Section IIIE) on standard Granger causality and DiffTRGC.
Iiia Experimental setup
We consider bivariate time series in the presence of unidirectional information flow () as well as in the absence of causal interaction. Unless otherwise stated, time series of length are generated from stationary VAR(5) processes, whose autoregressive coefficients are drawn from a normal distribution with mean and standard deviation . The absence of causal interaction is modeled by setting respective AR coefficients to zero. Residuals are generated from a normal distribution with diagonal covariance matrix, whose entries are drawn from the standard uniform distribution.
We compare standard GC as well as NetGC to DiffTRGC (see (17)). In Section IIIB, we also include ConjTRGC (see (16)), the conjunction of NetGC and DiffTRGC (see (18)), and a variation of DiffTRGC, in which is computed using only the full bivariate models according to (31). This variant is denoted by DiffTRGC (full).
All statistical tests are performed at significance level . For standard GC, we perform two separate Ftests, one to assess whether Grangercauses , and one to assess whether Grangercauses . It is possible that both variables are estimated to Grangercause each other. In contrast, all other metrics indicate net directionality. We assess their statistical significance by bootstrapping residuals from the regression model: We regress on its past and future values , and retain the fitted values and residuals . In each bootstrap repetition, causality metrics are computed on synthetic variables , where is selected randomly for each . Percentile confidence intervals are then constructed from the bootstrap sampling distribution. Significance is determined by evaluating if the confidence interval does not contain 0. We use 500 bootstrap samples and select the number of lags as the optimizer of Schwarz’s Bayesian Information Criterion (BIC) [32].
All experiments are repeated 300 times. In each run, a true positive (TP) is defined as a significant detection of the true direction of interaction. The true positive rate (TPR) is the fraction of true positives among all runs. It is here also referred to as the sensitivity or power. A false positive (FP) is defined as a significant detection of the wrong direction of interaction, or a significant detection of causal interaction in the absence of any causal interaction. The false positive rate (FPR) is the fraction of false positives among all tested runs.
IiiB Comparison of TRGC variants under interaction

We assess Granger causality and timereversed Granger causality in the presence of unidirectional interaction considering differing sample sizes, standard deviations of the AR parameters, noise types and signaltonoise ratios (SNR).
In a first experiment, we consider the noiseless case, and vary the sample size from 400 to 4000 for a fixed standard deviation of the AR coefficients. In a second experiment, we vary the standard deviation at a constant sample size of . This experiment thus tests the impact of the strength of the causal connections relative to the innovation noise. The standard deviations tested are 0.05, 0.1, 0.2, …, and 0.6. Finally, for a fixed standard deviation , and a fixed sample size , we add linearly mixed, autocorrelated measurement noise to each system according to
(43) 
where the subscript denotes the underlying latent variables and defines the signaltonoise ratio (SNR). Noise is generated by multiplying two independent AR(5) timeseries with a random matrix , with . We consider the signaltonoise ratios 0, 0.25, 0.5, 0.75, 0.9 and 1.
The TP and FP rates attained in the three experiments are depicted in Figure 1. From Figure 1(a), we see that DiffTRGC (full), which computes the difference score only using the full model according to (31), seems to be suboptimal for finite samples. While we have demonstrated the equivalence of (31) to the original definition (15) for infinite samples in Section IIE, this equivalence does not hold for the finite samples studied here. Estimating residuals from the restricted models increased the power of the test for all investigated parameter settings.
ConjTRGC has lower power relative to DiffTRGC. This is particularly so for high , which corresponds to a dominance of the dynamical and causal aspects of the model comprised in the AR coefficients relative to the innovation noise. This result is not unexpected, as timereversing the signals does not necessarily reverse the direction of information flow. Note that, on the other hand, ConjTRGC is the more conservative measure compared to DiffTRGC and could be expected to produce fewer spurious results in the presence of noise. However, as we see in Figure 1(b), both variants yield almost no spurious results in the presence of measurement noise. We will therefore use DiffTRGC in the remaining experiments.
IiiC Impact of latent variables and measurement noise in the absence of causal interaction
Already Granger pointed out that standard Granger causality can lead to spurious results if not all relevant variables are incorporated in the model [8]. In a bivariate system, GC cannot determine whether the observed variables and are both driven by a third common cause. This argument extends to multivariate systems, if a relevant confounding variable is not part of the measurement. Furthermore, standard GC is susceptible to measurement noise [33, 10, 11, 26, 34, 18] and to instantaneous linear mixing of activity, which is a major problem for example in the analysis of electroencephalographic (EEG) recordings [13, 14, 16]. We demonstrate these effects here in additional simulations, in all of which no actual interaction occurs. We consider three different scenarios.
(A) Linear mixing. The observed time series and are a linear mixture of two independent signals , , that is
(44) 
where denotes the mixing matrix. and were generated as two independent univariate AR(5) processes.
(B) Common hidden cause. The observed time series and are driven by a common unobserved cause . Time series , and are generated from a threedimensional VAR(5) model with , in which Grangercauses and , with no causal interaction between and as modeled by the respective AR coefficients being set to zero.
(C) Additive noise. The observed time series and are a superposition of two independent univariate AR(5) processes , and additive noise as in (43), with adjusting the SNR. We consider three different types of noise. Independent white noise is generated from a normal distribution with diagonal covariance matrix, whose entries are drawn from the standard uniform distribution. Mixed white noise is created by multiplying independent noise with a random matrix with . Mixed autocorrelated noise is created by multiplying two independent AR(5) timeseries with .
Figure 2 illustrates the behavior of standard Granger causality, NetGC and DiffTRGC in the various simulation settings. Values on the yaxis indicate the FP rate at significance level . As all experiments are characterized by the absence of any interaction between and , any significant detection of information flow either from to or to is counted as a false positive.
It is apparent from Figure 2 that standard GC and NetGC lead to spurious detection of causality in all tested scenarios. Their behavior in the presence of noise (panel C) depends on the properties of that noise. Mixed noise (left and center plots of panel C) is generally very problematic, especially if it is also autocorrelated (left part). As and are already independent, adding independent noise (obviously) does not pose a problem here (right part of panel C).
In contrast to standard GC and NetGC, timereversed Granger causality implemented through DiffTRGC is insensitive to mixtures of independent sources regardless of their spatial and temporal correlation structure (see panels A and C). This behavior thus reflects its known theoretical properties discussed in Section IIC. The presence of a hidden common confounder, however, cannot be ruled out by using timereversed Granger causality (panel B).
IiiD Impact of noise in the presence of causal interaction
We further study the behavior of standard GC, NetGC and DiffTRGC in the presence of unidirectional causal interactions superimposed with noise. Four different scenarios are considered. In all cases, data are generated according to (43) with Grangercausing . In the first three scenarios, (AC), interacting signals from bivariate VAR(5) models are superimposed with noise. As in Section IIIC, we use mixed autocorrelated noise (scenario A), mixed white noise (B), and independent white noise (C). The same signal to noise ratios as in Section IIIC are used.
In the fourth scenario, (D), we simulate the following VAR(1) process with long memory:
(45)  
adopted from [26], where denotes the normal distribution.
True positive and false positive rates as estimated from 300 simulation runs are reported in Figure 3 (AD). Just as in the absence of causality (cf. Section IIIC), we observe that linearly mixed, autocorrelated noise leads to the highest numbers of false detections for standard GC, while independent white noise leads to lowest FP rates. DiffTRGC is characterized by negligible amounts of false positives in all cases at the cost of slightly decreased sensitivity as compared to standard GC in scenarios (AC). Interestingly, NetGC behaves very similar to DiffTRGC in the presence of nonautocorrelated noise both in terms of sensitivity and specificity (BC). In these settings, spurious causality could already be almost entirely eliminated by testing for net Granger causality. This result, however, does not imply that NetGC cannot be affected by nonautocorrelated noise in general. A counterexample is the system with long memory studied in scenario (D). Here, NetGC (as well as standard GC) fails, because contains delayed but cleaner information about than itself and thus may help to predict future . DiffTRGC, however, robustly identifies as the driver.
Our examples show timereversed Granger causality almost completely eliminates spurious causalities arising from any kind of additive noise. At the same time, it exhibits similar statistical power as net Granger causality. We also observe that net Granger causality is typically more robust with respect to additive noise than standard Granger causality.
IiiE Impact of downsampling and temporal aggregation
Spurious Granger causality has also been reported to arise due to downsampling and temporal aggregation [35, 36, 37], posing serious problems, for example, in functional magnetic resonance imaging (fMRI) [38, 39].
We generate data using a VAR(5) model with random coefficients with , in which Grangercauses . These data are decimated at different factors in two ways. In the downsampling scenario (E), causal measures are applied to time series of length constructed from the original time series by skipping time points in between sampled data points. In the temporal aggregation scenario (F), time series of length are constructed from the original time series by averaging over data points. No noise was added.
Figure 3 (EF) depicts TP and FP rates attained in the two scenarios as a function of . We see that NetGC and DiffTRGC are more robust then standard GC. Both NetGC and DiffTRGC did not result in spurious causality.
Iv Discussion
We established the theoretical guarantee that differencebased timereversed Granger causality (DiffTRGC) indicates the correct direction of causality in bivariate autoregressive processes characterized by unambiguous unidirectional information flow. Our results complement previous work by [16, 17] showing that TRGC in general correctly rejects causal interpretations for mixtures of noninteracting sources (thus, in the absence of any causality). While further compelling intuitive ideas for robust causality measures have been presented [11, 23, 18], our result provides, to the best of our knowledge, the first proof of the correctness of one of such techniques (DiffTRGC) for a relatively general class of timeseries models.
Our theory is accompanied by simulations, in which we confirmed that timereversed Granger causality robustly detects the presence of true causal interactions in various realistic scenarios including mixed noise and downsampling. We showed that DiffTRGC is able to infer correct directionality with similar power as net Granger causality, while at the same time producing fewer (in most cases, negligible amounts of) false alarms than NetGC and standard GC. We therefore suggest to use DiffTRGC whenever the data under study are likely to be corrupted by noise.
Iva Correlated residuals
To define an unambiguous unidirectional information flow, our theory assumes uncorrelated residuals, as is common in the literature. Correlated residuals indicate instantaneous effects that the variables exert on each other. While we would not expect correlated residuals if the VAR model accurately describes the data generating process, such effects are likely to occur in practice (e.g., if the sampling rate of the acquired data falls below the time scale of the causal interactions). They pose severe problems for causal estimation, because they can be explained by several possible data generating models, the coefficients of which cannot be uniquely identified using second order information only.
Data generating models. Instantaneous interactions can be modeled implicitly through correlated residuals in classical VAR processes, or explicitly, for example using socalled ‘structural’ VAR (SVAR) processes [1, 40, 41]. By augmenting the VAR model with an instantaneous mixing matrix , the SVAR model
(46) 
achieves that the residuals are uncorrelated. Here, the diagonal of is assumed to be zero.
Correlated residuals emerge naturally in electrophysiological neuroimaging data, where the signals observable at the sensors (e.g., EEG electrodes) are a linear mixture of the latent activity of possibly interacting neuronal populations within the brain. A model for such mixtures of potentially interacting sources is given by
(47) 
where denotes the observed data, denotes the activity of underlying latent variables (e.g., brain sources) following a VAR() process with uncorrelated residuals , and is an unknown mixing matrix (representing, e.g., the volume conduction effect of the human head). We call (47) the ‘mixture of interacting sources’ model.
Note that VAR models with correlated residuals, SVAR models, and mixture of interacting sources models can be used interchangeably to represent the same statistical process. For example, an interacting sources model (47) can be equivalently written as a VAR() process (1) with coefficients
(48) 
and correlated residuals . Likewise, an SVAR() process (46) can be converted into a VAR() process with correlated residuals and coefficients
(49) 
The reverse transformations from VAR models to SVAR or interacting source models, as well as the transformations between SVAR and interacting source models, are, however, not unique (see Model identifiability).
Ambiguous causal interpretations can emerge in cases where one of the three models indicates timedelayed causal interactions through nonzero offdiagonal coefficients in the , or , while another one does not. This ambiguity can in general only be resolved if the model generating the data is known apriori. In case of EEG data, for example, (47) reflects the true datagenerating process. Therefore, only the parameters of the source VAR process (47) permit meaningful causal interpretation (wrt. to the source variables ), while, for example, the VAR parameters in (48) are distorted by the mixing matrix .
Model identifiability. A further complication in the presence of instantaneous effects in the data is that for mixture of interacting sources as well as SVAR models, the parameters are not uniquely defined from second order information only. This can be best seen for the latter model (47). Identifying the model parameters requires the estimation of a full factorization of the data into a mixing matrix and source time series . This means that the estimation problem falls into the blind source separation (BSS) setting, in which Gaussianity of the factors is not sufficient for their identification. The classical approach to BSS, independent component analysis (ICA) assumes statistical independence and nonGaussianity of the sources to ensure identifiability. This concept can be adopted in the context of source AR models by enforcing independence/nonGaussianity of the residuals of the source AR process in (47) [13, 15, 42]. In a similar way, independence of residuals has been used in the identification of SVAR models [40, 43].
Example. Consider the following VAR(1) process with correlated residuals:
This process can also be represented by the SVAR(1) model
as well as the mixtures of interacting sources model
with uncorrelated residuals , . Note that both the VAR(1) and the SVAR(1) representation indicate unidirectional causal interaction between the observed variables and , whereas the mixture model suggests that the observed data can also arise from a mixture of two independent latent sources and . However, another equivalent mixture model
with suggests bidirectional informational flow on the source level. Similarly, the following SVAR(1) model indicates bidirectional flow
IvB Future work
Further effort is required to investigate the behavior of TRGC in the presence of bidirectional information flow. Also, our theoretical analysis only covers the bivariate framework. Both Granger causality and TRGC can result in spurious causality when relevant variables are not included (cf. Fig. 2, panel B). Therefore, an extension of the analysis of timereversal to general multivariate signals would be very interesting. Furthermore, it would be desirable to obtain theoretical guarantees for the performance of TRGC in the presence of true interaction superimposed by noise in the form of bounds on the false positive rate. A major difficulty here is to obtain the residual covariance of the superposition of a VAR process and additive noise. Analytically computing Granger causality in the presence of noise is mathematically involved even for special cases [44].
Finally, [16] showed that for any causality measure based on crosscovariances, differences of the scores obtained on the original and timereversal signals correctly indicate the absence of causality on mixtures of independent sources. While we focused here on Granger causality, it remains to be shown whether validity of timereversal in the presence of causal interaction can also be demonstrated for other causality measures.
Appendix A Proofs for VAR()
Aa The VAR() process and its crosscovariance function
Consider a stable bivariate VAR() process, , as defined in (1),
where is a dimensional white noise process (i.e.