The dynamic importance of nodes is poorly predicted by static topological features

The dynamic importance of nodes is poorly predicted by static topological features


One of the most central questions in network science is: which nodes are most important? Often this question is answered using topological properties such as high connectedness or centrality in the network. However it is unclear whether topological connectedness translates directly to dynamical impact. To this end, we simulate the kinetic Ising spin model on generated and a real-world networks with weighted edges. The extent of the dynamic impact is assessed by causally intervening on a node state and effect on the systemic dynamics. The results show that topological features such as network centrality or connectedness are actually poor predictors of the dynamical impact of a node on the rest of the network. A solution is offered in the form of an information theoretical measure named information impact. The metric is able to accurately reflect dynamic importance of nodes in networks under natural dynamics using observations only, and validated using causal interventions. We conclude that the most dynamically impactful nodes are usually not the most well-connected or central nodes. This implies that the common assumption of topologically central or well-connected nodes being also dynamically important is actually false, and abstracting away the dynamics from a network before analyzing is not advised.


library.bib \DeclareSourcemap \maps[datatype = biblatex] \map[overwrite = true] \step[fieldset = abstract, null]


Information theory Causality Drive-node identification

1 Introduction

Understanding complex dynamical systems is a fundamental problem for the 21st century [Mercury2000]. A complex system per definition is a system with a heterogeneous structure of interactions. When governed by a certain dynamic these systems are said to form a complex dynamical system. A key feature of many complex dynamical systems is their absence of a central control unit, i.e. the complex systemic behavior is said to ‘emerge‘ from the network through local often non-linear interactions. Emergent behavior exists by virtue of these interactions and cannot be found in isolated elements. Importantly, emergent behavior pervades nature at all spatio-temporal scales such a galaxies emerging from its stars [Penrose2010], ecosystems emerging from food-webs [Breckling2005], human cognition from the interaction of populations of neurons [Thompson2004, Kandel2000], cellular regulatory processes from protein-protein interactions [Alberghina2009], or atoms from elementary particles [Penrose2010]. Providing understanding to the inner-workings of these systems is a central issue for many scientific disciplines and is crucially important for the public at large.

Obtaining explanations in complex dynamic systems may start with asking ‘What node is dynamically most important?’, i.e. what node drives the systemic behavior. For example, it is desirable to understand how to effectively stop a viral infection from becoming a pandemic by shutting down the airport that would lead to catastrophic results if not shutdown, understanding what brain area is dynamically most important for cognitive function , or how traffic could be rerouted to attenuate traffic jams. However, current available theories and algorithms for inferring causal influence of dynamic variables are not readily applicable to complex systems. Previous studies fail to address three primary issues relating to (i) assumptions on the mechanisms that generate system dynamics, (ii) intervention methods, and (iii) the applicability of structural methods and metrics when considering form versus function in complex systems.

First, most methods for analyzing complex systems are developed using at least one of the following assumptions (see [Wang2016] for overview): stationarity of system state dynamics, (local) linearity of dynamics, time-independence of the interactions, or dynamics having reached equilibrium. An advantage of these assumptions is that they often lead to analytical expressions and/or closed-form solutions of the problem [Penrose2010]. However, real-world dynamical systems are notoriously non-linear in general where small variations in the input of a system may results in non-intuitive systemic behavior. The aforementioned assumptions on dynamics offer a temporary gain in the reduction of complexity and provide insights in how simpler systems work [Penrose2010]. The translation of the simpler model to its real-world counterpart offers a varying degree of accuracy regarding how the real-world system functions. This poses a problem for theory building and providing causal explanations in the long run.

Second, the dynamic importance of nodes is often determined through overwhelming interventions. An overwhelming intervention is an external influence similar to knocking out a gene from a cell’s regulation processor replacing a signal altogether [Cowan2012, Yan2017, Pasqualetti2013, Liu2016, Gates2016, Yan2012]. Interventions are essential in determining causal influence as it allows for experimental control in determining cause from effect, i.e. they are crucial in obtaining causal explanations for observed systemic behavior [Pearl2000, Woodward2005, Woodward2014, Woodward2015]. One of the aims of science is to provide causal explanations for natural phenomena. The aim is to understand what mechanism caused behavior . This is achieved by performing controlled interventions on such that induces a changed systemic behavior . In the worst case overwhelming interventions change in such a way that yields behavior that is independent from . As a consequence, it does not provide any novel insights in what parts of was relevant for . The smaller the intervention, the higher the correlation between and and as such the stronger the claims about .

Lastly, dynamic importance is often studied from a structural perspective; the most important node is identified based on the intuition of flow, i.e. the more outward connections a node has, the more opportunity it has for spreading perturbations in the network [Harush2017, Lu2016, Ay2008, Gu2015, Chen2017, Yin2016, Kim2017a]. In recent years, applying structural methods to complex systems has gained popularity. Specifically, the use of centrality measures has been used in a variety of different systems ranging from pandemics [Sikic2013], social networks [Freeman1979], brain networks [Joyce2010] to networks of psychosymptoms [Fried2015]. These measures rank nodes using a real-valued function based on some structural property [Borgatti2005, Borgatti2006, Fried2015, Bringmann2018, Sikic2013]. Borgatti et al. argued that from a graph theoretical perspective all centrality metrics quantify the walk structure of a network [Borgatti2005, Borgatti2006]. This can be interpreted from a complex system perspective as assuming a particular type of dynamics which the nodes use to pass on information. For example, betweenness centrality, ranks nodes based on how often a node acts as a bridge along the shortest path between any two other nodes in the network. In other words, betweenness centrality, assumes that information flows along the short path. However, not all dynamics in a network assume shortest path dynamics. If a river overflows, the water may not follow the shortest path to the sea, rather it follows the path of least resistance. Depending on the complex system considered, the network structure alone may not be indicative of the types of dynamics that are exerted on the network. In other words, the connectedness of nodes may not be indicative of the nodal dynamical importance. The relation between network structure and dynamic importance was recently shown by Harush et al.[Harush2017]. By varying the dynamics on the network on but keeping the structure constant, their results showed show that the steady state dynamics embody a non-linear relation between the network’s structure and the governing systemic dynamics. Importantly, the dynamic importance of a node could drastically change as the dynamics of the systems are changed. This implies that the interaction of system structure with the systemic dynamics is the cause of failure of many of the structural methods applied to complex dynamic systems. Therefore, if there is no a priori knowledge about the dynamics governing the system or substitutes as a approximation that could lead to reliable estimates of systemic behavior, it would be unsafe in general to use centrality measures or any other structural method as a guide for finding driver-nodes.

Thus, there is a need for a model-free approach that is able the capture the underlying causal mechanism that causes the system behavior reliably without assumptions on dynamics and structure of the network. One promising approach was proposed by Ay and Polani [Ay2008]. Their methods relate concepts from information theory to Pearl’s do-formalism [Pearl2000]. Conditional Shannon mutual information was able to to deduce the causal hierarchy in directed acyclic graphs. Similar approaches include transfer entropy [Schreiber2000], information bottleneck [Buisson2018], Granger causality [Granger1969], relative entropy [Cover2005]. The main aim of these approaches is to reconstruct the complete network of causal interactions by evaluating the (short-term) causal influence between every pair of nodes [Wang2016]. However, many of these approaches either overestimate or underestimate the influence of so-called synergistic and redundant information [James2016a, Quax2017, Cover2005]. Estimating synergistic information is currently a hot area of research and it remains an open question what the proper approach is to quantify and study this phenomenon, e.g. see [James2016a].

Quax et al. offered a solution by not conditioning on a node or set of nodes, but rather comparing the influence of a node with the entire system state, i.e. using the regular non-conditional mutual information [Quax2013, Quax2013a]. The non-linear interaction between nodes or sets of the network would be captured by the total system entropy. They introduced the concept of information diffusion time which embodies the non-linear correlation of a node with the system dynamics using time-delayed Shannon mutual information. Dynamic importance can therefore be quantified by the amount of information a node shares with the system state over time. Importantly, their analytical results show that for infinitely sized, unitary weight, scale-free networks that not nodes with high degree were dynamically most important [Quax2013]. Rather, intermediately connected nodes were found to be more dynamically relevant. This striking result puts into question whether the assumption that topologically central or well-connected nodes in a network necessarily correspond to dynamically important nodes. However, for practical purposes a network is never infinitely sized, or necessarily tree-like, and it should be investigated whether this holds for real-world networks of arbitrary size.

The aim of this paper is to test the hypothesis that well-connectedness translates to dynamic importance in a real-world weighted network consisting of pyschological symptoms obtained from [Fried2015]. Temporal data is simulated using Glauber dynamics. It will be shown that structural metrics provide no reliable predictive power in determining the driver-node. A solution is offered by means of a novel metric based on time-delayed Shannon mutual information named information impact without assumptions on dynamics or structural dependencies. The metric is validated using causal interventions with varying intensity. The results of this study provide scientists of all fields a novel, reliable and accurate metric for the identification of driver nodes, and enable them to climb the ladder of causation [Pearl2018].

2 Material and methods

2.1 Terminology

A system is defined as with static network structure indicated by edges where each node in the system is governed by dynamics . Each node state is determined through nearest neighbor interaction , i.e. node chooses its next state with . This is also known as a Markov network.

Node dynamics

For dynamics this papers considers the kinetic Ising spin dynamics. The kinetic Ising model corresponds to one of the simplest models for real complex systems and is believed to provide a sensible description of a large number of physical systems. The model was originally developed to study the behavior of ferromagnetism in statistical mechanics [Brush1967]. A prominent property of the Ising model in higher dimensions (two or more) is the phase transition between an ordered phase to a disordered phase by increasing the noise parameter (fig. 1). The increase in noise allows the probabilistic local interactions to produce macroscopic qualitative change of behavior from tending to align their states with their neighbors (ordered phase) to a being more independent of their neighbors (unordered phase). Both the simplicity of the model as well as its phase transition has led researchers to successfully model a variety of different behavior ranging from consensus emerging through social interactions [Grabowski2006, Kandiah2012], the behavior of lattice gas and fluids [Glauber1963], and the behavior of neurons [Izhikevich2007].

The Ising model consists of binary distributed variables dictated by the Gibbs distribution that interact through nearest neighbor interactions;


where is a node state, is the system state, is the inverse temperature , are the interaction (edge weights) between and , represents external influence on node , and is the partition function.

Figure 1: Absolute magnetization as a function of temperature. The red dots indicate the matched magnetization for [0.8, 0.5, 0.2] percent of the max . It is noteworthy that the network seems to be unstable for low temperatures as indicated by the outliers around T=0. The results are strikingly different from fig. S11 which shows the traditional range of behavior [Glauber1963]. The difference is determined by weighted vs. unweighted graphs.

Information as a measure of dynamic impact

Each node in the system can be considered as an information storage unit. Over time, the information stored in a node will percolate throughout the system while at the same time decaying due to noise. The longer the information of a node stays in the system, the longer it can affect the systemic dynamics. Therefore, dynamic impact of a node can be measured by the amount of information a node shares with the entire system [Quax2013, Quax2013a, Quax2017].

How does one measure information stored in a node? A node dictated by some dynamic can be considered a random variable where the node is able to assume different states. In information theory information is quantified in bits, i.e. yes/no questions concerning the outcome of a random variable. The average information a random variable can encode is called entropy and is defined as:


Note all are base 2 in this paper unless specified otherwise.

Entropy can also be interpreted as the amount of uncertainty of a random variable. In the extremes the random variable either conveys no uncertainty (i.e. a node always assumes the same state), or is randomly chosen between all possible state (uniform distribution). For example consider a coin flip. One may ask how much information does a single coin flip encode? If the coin is fair, i.e. there is equal probability of the outcome being heads or tails, the amount of questions needed to determine the outcome is exactly 1. In other words, a fair coin encode 1 bit of information. However, when the coin is unfair the information encoded is less than one. In the extreme case where the coin always turns up heads, the entropy is exactly 0.

The information shared between a node state and a system state can be quantified by mutual information [Quax2013, Cover2005, Quax2013a, Quax2017, James2017]. Mutual information can be informally thought of as a non-linear correlation function. Formally, mutual information quantifies the reduction in uncertainty of random variable by knowing the outcome of random variable [Cover2005]:


where and are the marginals of over and respectively, and is the conditional entropy of . The conditional entropy is similar to the entropy; it quantifies the reduction in uncertainty of the outcome by knowing the outcome of . Please note that the yes/no question interpretation even applies to continuous variables; although it may take an infinite amount of questions to determine the outcome of a continuous random variable.

2.2 Information impact

The driver-node would be the node whose mutual information with the system is the largest over time. This can be achieved by using the mutual information and shifting one random variable with respect to another over time. As such information impact is defined as the integral of mutual information of a node with the system state over time


where is the system state at some time and is the state of a node away from that system state. At time the value for any node, and for ergodic Markovian systems the delayed mutual information will always decay to zero as [Quax2013, Cover2005]. The question is how fast this decay takes place for each node, and consequently how much information impact the node will have on the system. This property is also known as data-processing inequality [Cover2005] and states that information can only decrease in Markov chains without external information injection, a proof is provided in appendix A.

Figure 2: Example of non-causal correlation. The directed graph has one main driver-node 0 as this nodes has the most opportunity to influence downstream nodes. The star (*) at 0 means it has a self-edge; it is used to stabilize the nodal activity. Non-causal correlation may occur in node 5. Node 5 has no downstream nodes and as such can not influence the system. However due to its direct input from node 0, it shares will yield similar mutual information values to node 1. This graph illustrates that the mutual information value for 5 is inflated as it shows similar decay as node 1.

The observant reader may have noticed that the definition of causal impact and information impact is ambiguous to the whether is positive or negative. Namely, if the node state is captured in forward in time or backward in time with respect to some state . For undirected graphs there exists time symmetry with respect to how causal influence flows through the network (see appendix D). However, for directed graphs this is not the case. Numerically, it is more convenient to apply a forward method than a backward method. As the network analyzed here is undirected, the forward method was used.

Correlation and mutual information

The driver-node is the node that has the most dynamic impact on the system state. Consequently, it shares the most mutual information with the system over time. However, for all other nodes it is possible for its mutual information value to be inflated due to non-causal correlations. Mutual information can be decomposed in two parts; which is the information that is due to a causal relation between the state variables and which is due to spurious correlations that does not overlap with the causal information. Non-causal correlation may occur if the unit and both causally depend on a third confounder variable such that . This can lead to a non-zero mutual information among the two units, even if the two units would not directly depend on each other in a causal manner.

It is not possible for the node with the highest information impact to have non-causal long-term correlation as there exists no other node in the system which influenced both the driver node as well as the system state. If there would exists such a node, this would lead to a second node having shared information with the system state over time, and as such that would have yielded a larger information impact.

To give a better intuition for the cause of , consider a Markov system consisting of two disjoint directed path graphs with a common source (fig. 2). A directed edge from node to node means the new state probabilities of node depend on the current state of node . Due to this causal dependency, the state of node 1 will store information about the previous state of node 0. Similarly, node 2 will store information about 1 and so on. The information about the states of node 4 and 5 will be immediately lost since they cannot influence any other node state; they have no outgoing arrows to any other node in the system. Their states are thus ‘overwritten’ each simulation step. From the network structure, we can deduce that the causal influence will be the highest for node 0; its information will be lost after at most 5 time steps for sufficiently low temperature. The non-causal correlation issue arises for node 5 which will strongly correlate with node 0. Namely, both 1 and 5 will share a significant, non-zero amount of information about node 1. For this reason the information impact of node 5 will be similar to that of node 1 even though 5 has no ability to influence the system directly. If the network structure is unknown, then there is no way of knowing which node’s correlation are causal or non-causal in general. Even if the network structure could be observed, it would still be challenging to correctly identify all the information impact values other than the largest. Consider, for example, if an edge would be added between 5 2. There is currently no known method of determining exactly how much information in node 1 is uniquely from 5, uniquely from 1, or jointly from 1 and 5 [Quax2017, Lizier2013, Olbrich2015, James2016a]. This is known as the information decomposition problem and is still an active area of research.

Therefore, we can only be certain that the node with the largest information impact will correspond to the node with the largest causal impact (driver-node), whereas for all other information impact could be significantly inflated by non-causal correlations.

2.3 Causal impact

How does one quantify causal influence? A common idea entails that a cause raises the probability of its effects, i.e . However, this does not hold in case of a confounder when only observations are used (fig. 2). Pearl and Woodward noted that the only way to disentangle spurious causal relations is by means of intervening on the system [Pearl2000, Woodward2005]. To illustrate this, consider a simple barometer. A barometer is a device which measures the atmospheric pressure. It can be observed that whenever the barometer levels drop, it starts to rain. Consequently, conditioning on the barometer level, one may conclude that it causes rain. Rain, however, is caused by a fall in atmospheric pressure. Consequently, the barometric reading is not causally related to causing rain. By physically intervention on the barometer reading, e.g. by setting the needle of the barometer and observing whether it rains or not, one is able to falsify the relation that low barometer readings causes rain.

An intervention is an external influence that changes the distribution of a random variable. External in this context means not part of the closed system. For the Ising model this can be conceptualized as an unobserved node with a directed edge to a node part of the model. Recall that in Markovian systems the node probability is given as


where represents the nearest neighbors of . An intervention changes this distribution to


where . The effect of the intervention will percolate throughout the network over time. A node with large causal influence will cause a large change in the system behavior. Given that dynamics for closed systems corresponds to , the dynamics under intervention can be understood as the system using a different mechanism to generate system behavior. Alternatively, one can interpret this as the system using a different code. Consequently, the question arises ‘How much information does the intervention encode?’ Nodes important for the system dynamics will encode more information, whereas nodes that have no causal influence will yield no information. As such we define causal influence of node by intervention at time as


where is the Kullback-Leibler divergence (KL-divergence)


KL-divergence is also known as relative entropy and represents the extra number of bits needed to identify a value drawn from if a code was used corresponding to rather than the distribution . Alternatively, KL-divergence can be understood in terms of Bayesian inference; it represents the updating of one’s belief from prior distribution to the posterior distribution . When the new code is the same as the old code , KL-divergence yields . This represents the case when an intervention yields no change in mechanisms driving system behavior.

KL-divergence has some desired properties for an expression of causal influence. In particular it is non-negative and invariant under parameter transformation. Additionally, KL-divergence is asymmetric, i.e. generally . The asymmetry is a non-issue for the aims of this paper as the rationale is that represents the true dynamics of the system, i.e. the unperturbed system. As such we we are interested in finding the driver-node for the unperturbed dynamics and not . Lastly, it should be noted that under certain conditions mutual information can be expressed in terms of KL-divergence and has been used in other causality work using directed acyclic graphs (see appendix C, and [Janzing2013]).

Next, we define causal impact as the integral of causal influence


Casual impact embodies the combined effect of using intervention on node on the system over time.

Figure 3: Illustration of abstract system consisting of two variables . The system dynamics are given as and . The dynamics of this system has 2 limit cycles; one stable at and one unstable at . Additionally, it has an unstable point at . Within the unstable limit cycle, the system converges to the stable limit cycle (e.g. dark blue state trajectory). If however some external influence pushes the system further than the unstable limit cycle, it will produce boundless behavior (orange state trajectory).

Intervention size

In many experiments concerned with measuring causal flows in networks, overwhelming interventions are used to determine the causal impact of nodes [Zhang2017, Liu2016a, Gates2016, Yan2017]. In complex dynamic systems the size of the intervention is crucial for the observed systemic behavior. Consider for example a system as depicted in fig. 3. One can map the state dynamics in a so-called phase plot that shows the relation among variables. From the phase plot in fig. 3, we observe an unstable point at the origin, a stable limit cycle at and an unstable limit cycle at . A stable limit cycle is a trajectory for which any system within a certain radius will converge to this trajectory as .

An intervention can be conceptualized as a small perturbation from the system input vector, i.e. it’s position in state space from unperturbed dynamics . Small perturbations will not drastically alter the system dynamics. Consider for example a trajectory starting out near the origin. The intervention has caused a divergence between such that the state is set to (dark blue trajectory). The system will converge to the stable limit cycle as . If however, the intervention causes the state to be set such that it is outside the unstable limit cycle, e.g. (orange trajectory), the system may be unstable and change boundlessly. The main point here is that depending on how the system reacts to an intervention, it may or may not lead to dynamics that are relevant to unperturbed dynamics . It is more likely however that smaller interventions are closer to the original dynamics. As such, our secondary aim is to test whether there is a relation between intervention strength and dynamic importance. Recall that an intervention in Ising model can be conceptualized as an external node which inputs energy on a node which does not interact with the rest of the system. We test two different levels of external intervention: Underwhelming and Overwhelming. The underwhelming condition corresponds to a low amount of injected energy (), whereas the overwhelming intervention corresponds to injecting infinite amount of energy (). The latter effectively pins the state of the node in Ising models, whereas the former slightly biases the node states.

2.4 Structural methods

Network analysis has traditionally resulted in analyzing the structure of the graph. A fundamental concept within network science is centrality, and how to measure the centrality of nodes has become an essential part of understanding networked systems such as social networks, the internet, biological networks, traffic and ecological networks. At its core, a centrality measure quantifies the ‘importance’ of a node based on some structural property. It allows to rank nodes based on a real-valued function.

There is, however, a long-standing debate concerning what centrality metrics actually measure for networked systems [Bringmann2018, Borgatti2005, Borgatti2006, Sikic2013]. From a graph theoretical perspective most centrality measures, e.g. betweenness, closeness, eigenvector and degree centrality, essentially classify the ‘walk structure’ of a network [Borgatti2005, Borgatti2006]. A walk from node to node is a sequence of adjacent nodes that begins with and ends with . The structure of walks can be divided along different criteria. For example a trail is a walk in which no edge (i.e. adjacent pair of nodes) is repeated. In contrast, a path is a trail in which no node is visited more than once. Similarly, one could define a walk structure by only using the shortest path from one node to another, or by using random movements between nodes (random walks). The graph theory interpretation can also be interpreted in a complex system framework. Namely, the centrality metrics implicitly assume dynamics on the graph. Betweenness centrality for example, computes centrality based on how often a node acts as a bridge along the shortest path between two other nodes. If one assumes that the network has dynamics where information between nodes follows shortest path, this metric may be a valid description to use and identify dynamically important nodes.

In the best case, a centrality metric is fully predictive for identifying important nodes a complex system. Consequently, the centrality metric can be used to understand the system. However, an issue with the use of centrality metrics is determining which centrality metric to use. Consider for example fig. 4 and fig. S8; different centrality metrics can identify different nodes as most central. This has lead to the common observation that some centrality measures can ‘get it wrong’ when the aim is to predict dynamical important structure in networked systems. Additionally, the ranking produced through some centrality metric does not quantify inter-rank differences. This potentially leads to underestimation of nodal influence when used in dynamic context [Sikic2013].

We will show how centrality measures have no meaningful prediction power of the most causal node in nodes dictated by the Gibbs measure. We are aware that centrality measures do not embody the full extent of what structural methods embody, or what network science in particular has to offer. However, many structural methods share the common characteristics listed above, i.e. they quantify the walk structure of a graph. For our analysis, we used the weighted variants of degree centrality, betweenness centrality, information centrality, and eigenvector centrality. What follows is a brief description of commonly used centrality metrics.

Degree centrality

Degree centrality is the best-known measure of all the centrality measures. It is often thought that degree centrality is indicative for the dynamic importance of a node. This intuition is based on the concept of flow: the more connection a node has, the more interaction potential that node has and therefore the more important a node must be. Freeman defined centrality measure as the count of the number of edges incident upon a given node [Freeman1979]:


where is the row/column of node in the adjacency matrix of the network. Please note that the entries are weighted and not binary.

Betweenness centrality

Betweenness centrality quantifies the number of times a node acts as a bridge along the shortest path between two other nodes. It was introduced as a measure for quantifying the control of communication among humans in social networks by Freeman [Freeman1979]. Nodes that have a high probability to occur on a randomly chosen shortest path between two randomly chosen vertices have a high betweenness. Formally, this can be written as:


where represents the number of shortest paths between node and , and is the subset that goes through node . We use the normalized version of betweenness that divides the betweenness score by the number of pairs of vertices (not including node );


Information centrality

Information centrality, also known as current flow betweenness centrality and random walk betweenness, was developed by Ulrik Brandes and Daniel Fleischer [Brandes2005, Stephenson1989]. The metric is similar to other centrality measures such as betweenness and closeness in the sense that it assumes some sort of flow process on the network structure. Rather than information spreading along it shortest path (e.g. as is the case with closeness and betweenness), it is assumed that ‘information spreads efficiently like an electrical current’ [Brandes2005]. Information centrality, thus, implicitly models how current would flow through a network and is defined for node as:


with , and represents the ‘current’ through node .

Eigenvector centrality

Eigenvector centrality is the most difficult centrality measure to give an intuitive feeling for. Where is the adjacency matrix of the system, eigenvector centrality of node is defined as:


For any square matrix of rank , the matrix will have at most eigenvector-eigenvalues pairs. A common choice for eigenvector centrality is motivated by The Perron-Frobenius theorem, and involves choosing the eigenvector with the largest eigenvalue [Debye1918, Frobenius1912]. This has the desired property that if is irreducible, or equivalently if the graph is strongly connected, that the eigenvector is both unique and positive.

The sign and size of the eigenvalue are important for the relation between the value and importance of a node. In linear differential equations negative eigenvalues correspond to non-oscillatory exponentially stable solutions. In contrast in difference equations it indicates an oscillatory behavior. Geometrically speaking, negative eigenvector embodies a linear transformation across some axis.

Intuitively speaking, eigenvector centrality quantifies the influence of a node in the network. It assigns relatives scores to all nodes in the network based on the concept that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes. A high eigenvector score implies that the node is connected to many other nodes that themselves have high scores. Google PageRank and Katz centrality are variants of eigenvector centrality [Langville2005]. A node with high eigenvector centrality is not necessarily a node that has many connection (incoming or outgoing). For example a node may have a high eigenvector centrality if it has few connections but those connections are connected to nodes that are of high importance.

2.5 Data

The data originates from the Changing Lives of Older Couples (CLOC) and compared depressive symptomology assessed via 11-item Center for Epidemologic Studies Depression Scale (CES-D) among those who lost their partner (N=241) with still-married control group (N=274) [Fried2015]. Each of the CES-D items were binarized with the aid of a causal search algorithm using Ising model developed by [VanBorkulo2014] and represented as a node with weighted connections (fig. 4). For more info on the procedure see [VanBorkulo2014, Epskamp2017, Fried2015]. The 11 CES-D items are (abbreviated names used in the remainder of this text in brackets): ‘I felt depressed’ (depr), ‘I felt that everything I did was an effort’ (effort), ‘My sleep was restless’ (sleep), ‘I was happy’ (happy)’, ‘I felt lonely’ (lonely), ‘People were unfriendly’ (unfr), ‘I enjoyed life’ (enjoy), ‘My appetite was poor’ (appet), ‘I felt sad’ (sad), ‘I felt that people disliked me’ (dislike), and ‘I could not get going’ (getgo).

Figure 4: The psycho-symptom network obtained from [Epskamp2017] weighted by different metrics. The metrics enclosed by the box represent structural metrics highlighting different topological features (see main text). The size of the circles are proportional to the importance of the metric in each subplot; the larger the radius the more important. The right column depicts the ground truth as validated by causal interventions. The red and green edges represent negative and positive weights respectively. Additionally, the thickness of the edges are proportional to the edge strength. Most notably, information impact of each node was highly similar to low causal impact but not for high causal impact. This indicates that information impact is can reliably identify driver-node for low causal influence. None of the structural metrics showed similar predictive power.

2.6 Numerical methods

Magnetization matching

In the kinetic Ising model the temperature parameter embodies the external noise. The phase change from congruent to incongruent behavior is not only caused by this external noise (fig. 1). In addition to the temperature, the connections between nodes also determine the exact shape of the phase change. Consequently, the absolute magnetization is representative of both the external noise induced by temperature as well as the inter-node noise interactions. As such, we matched the noise level in the system based on absolute mean magnetization to get an estimate of the accuracy of information impact on causal impact. Specifically, the temperature was matched with fraction of the maximum magnetization by means of regression, see fig. 1. A sigmoid kernel was used to estimate the required temperature for the given magnetization levels listed above.

For unitary weight kinetic Ising models the maximum magnetization changes from full magnetization at and decays sigmoidally to zero as (fig. 1) [Glauber1963]. Simulations were performed for the Ising model between the range with a resolution of for .

Figure 5: Simulation setup. independent Markov chains were run for steps. The last sample in each chain was used to construct as illustrated by the dashed box. For each of the unique states was constructed in parallel.

Base procedure

For each temperature, independent Markov chains are run for simulation steps with steps (fig. 5). Every simulation step follows Glauber dynamics:

  1. Pick a node at random from the system with equal probability;

  2. Compute energy using Eq. (2);

  3. Flip the node state with probability Eq. (1).

From this set, the distribution over states was constructed. For each of the unique states, Monte-Carlo methods are used to construct the conditional using repeats for time-steps.


Interventions are performed two different strengths (underwhelming and overwhelming) independently. Each intervention is applied for time steps each to enhance the difference between and . After this period the intervention is released for the remaining (fig. 5). The duration of decay will be proportional to the dynamic impact of each node. This entire procedure was repeated for times.


A general toolbox was developed for analyzing any discrete systems using information impact, e.g. Susceptible-Infected-Recovered [Matsuda1994], Random Boolean networks [Harvey1997]. The core engine is written in Cython 0.28.5 with Python 3.7.2 and offers C/C++ level performance1, for more information see appendix H. As such we invite scientists from all disciplines to easily include information impact on past and future experiments.

2.7 Data pre-processing

Area under the curve estimation

The mutual information over time and KL-divergence over time were rescaled in the range (0,1) per trial set. This transformation does not affect the relative ordering nodal decay curves as the sample data is multiplied by a scalar. A double exponential, was fit to estimate the causal and information impact (Eq. (5) and (10)) using least squares regression.

The kernel showed to be a good fit in general as indicated by the low fit error (fig. S1).

Sampling bias correction

Empirical estimates for mutual information are inherently contaminated due to sampling bias. In order to correct for this, Panzeri-Treves correction was applied [Panzeri2007]. This methods offers a good performance in terms of signal to noise and computational complexity.

Outlier rejection

Estimating the area under the curve is affected by some estimation noise either due to the numerical methods or sampling bias. Outlier rejection was performed using a Minimum Covariance Determinant estimator on the causal impact and information impact estimates [Rousseeuw1999]. The procedure involves computing Mahalanobolis distance to nodal estimates and rejecting anything above standard deviations above the mean (fig. S2). The values of the outliers were set to mean of the in-group set of the covariance estimates.

On average 15.9% of the data was rejected (fig. S3).

2.8 Classification analysis

Mathematical formulation

Each of the independent variables (information impact and the centrality measures) are considered to be predictors. For each predictor the maximum value was extracted. For example in each each trial the node corresponding to highest information impact was extracted. A random forest (RNF) classifier was trained on each of the intervention magnitudes to get an estimate of which predictor offers the best predictions in determining the causal impact [Breiman1984]. The dependent variable was binarized into congruency (correct / false) for each of the predictors. The data can thus be described as:


where each data-pair consists of vectors consisting of the regression predictors at trial , i.e. the output of the centrality metrics and information impact. Note that the output vector is a binary vector where is the number of regressors. The binary vector represents whether the ground truth was congruent or incongruent with the output provided by the different regressors.

A decision tree is built by recursively partitioning the space such that the samples with the same labels are grouped together. Let the data at node be represented by . For each candidate split consisting of a feature and threshold , partition the data into and subsets:


with impurity at being computed by impurity function :


is chosen such that it minimizes the impurity, i.e. :


This procedure is recursively repeated for subsets and until the maximum allowed depth is reached, or .


for target of node representing a region with samples. Classification is achieved using:

Conversely, misclassification is defined as with is the training data in node . For the random forest classifier of such trees were built and the estimator was decided based on majority voting.


The RNF classifier was provided by the python package sci-kitlearn and fitted separately for the different intervention levels (underwhelming, overwhelming). The classifier has a number of hyper parameters that can be controlled with varying impact on feature bias, and prediction performance. Probst et al. recently showed how in general the effect of hyper parameter selection is small compared to other machine learning methods[Probst2018]. Therefore the default parameters were used.

The aim here is two-fold; (a) we aim to find the feature that is most important for predicting causal impact, i.e. explanatory modeling, and (b) the feature that will lead to the best prediction of causal impact, i.e. predictive modeling [Shmueli2003]. Since the hyper parameters settings are deemed to be stable regardless of the exact configuration, the goal here is to quantify the bias in feature selection. To this end leave-one-out cross-validation was used to compute the accuracy score on features (information impact, weighted degree centrality, closeness centrality, information centrality, and eigenvector centrality).

RNF classifiers are prone to bias when features are lacking in variance [Strobl2007]. Since the structural methods will provide the same estimate over trials, sensitivity analysis was performed to evaluate the reliability of the feature importance. This was achieved by shuffling the th feature for every fold in the cross validation. A feature with large predictive power will be associated with a large reduction in accuracy scores. In contrast, if a feature has no impact on the prediction accuracy, it will yield a zero difference in between the non-shuffled score. The reduction was quantified as a deltas score indicating the loss in percentage due to the shuffling procedure compared to the non-shuffled accuracy score. For each of the folds, a shuffled score was computed for each feature.

The default settings were used for the RandomForestClassifier (scikitlearn=0.20.2) except for n_estimators which was set to 100. N_estimators are the number of trees to fit, generally speaking more trees will improve the accuracy of the classifier with the disadvantage of run time increase.

Statistical analysis

The hypothesis was tested that (a) the accuracy score was significantly different from random choice, and (b) the feature importance significantly differs from uniform distribution. The first test was evaluated using a binomial test using trials. The second test was evaluated with a goodness-of-fit of the feature importance with a uniform distribution. Additional post hoc test are applied to test whether information impact differed significantly differed from the aggregated feature importance of the centrality measures. An alpha-value of was maintained unless specified otherwise, and p-values are Bonferroni corrected for the tests.

Figure 6: Random forest classifier performance. Dotted lines indicate random choice model. (top) Normalized accuracy score per temperature and intervention condition. Accuracy was significantly different from the random choice for overwhelming interventions () and underwhelming (). (middle) Normalized feature importance per predictor and condition; a value of 1 indicates most important. Feature importance significantly differed from a uniform distribution (overhwelming , overwhelming ) (bottom) Delta score was used to validate the feature importance score. Only the shuffled informatiom impact regressor had effect on the prediction accuracy. Statistical results for this graph are in tables LABEL:table:chi_scores, LABEL:table:rnf_scores.

3 Results

The RNF classifier yielded high accuracy scores for both underwhelming (78.73 %) as well as overwhelming interventions (100%, fig. 6). The binomial tests revealed the classifier to be significantly better in predicting the causal impact than random choice in both underwhelming intervention () and overwhelming intervention (, ). Importantly, feature importance scores indicate that information impact was the only feature that significantly contributed to the accuracy scores. The analysis revealed that the observed feature importance scores were significantly different from a uniform distribution (LABEL:table:chi_scores) in both overwhelming (, ) and overwhelming interventions (, ). In addition, the delta scores showed that the shuffle procedure only significantly affected the accuracy score when information impact vector was shuffled (34 %). In contrast, the other structural features yielded no effect (0%) on the obtained prediction accuracy. The perfect prediction accuracy for overwhelming interventions is misleading however. For overwhelming interventions, nodes with highest information impact never matched with the highest causal impact (see observed column in LABEL:table:chi_scores). Therefore, the RNF classifier was 100% correct in predicting false. The combined results lead to the conclusions that information impact is predictive for low intervention size. In contrast, none of the structural methods showed any predictive power (fig. S7 and S6). The three-way relation between causal impact, information impact and centrality measures are depicted in fig. 10.

For closed systems, it was hypothesized that mutual information over time would be highest for the driver-node in systems without confounders. The results imply that this hypothesis was wrong. From fig. 8, however, it can be observed that the lack of performance is not due to erroneous prediction based on false driver-node detection, rather the lack of performance is due to lack of resolution. Namely, from the time plots in fig. 9 three crucial observations can be made. First, causal influence over time has similar decay curves as delayed Shannon mutual information. In fact, the causal impact varies linearly with information impact (figs. 11 and 8). No linear relation is found for overwhelming interventions (figs. S7 and S6). Second, there is an interaction between magnetization ratio and the largest driver-node. For (low noise), ‘sleep’ is the most causal node, whereas for higher noise levels ‘sad’ is the most causal node. Importantly, the switch in driver-node is mirrored by Shannon mutual information. Third, as the magnetization ratio decreases, the curves tend to collapse. Namely, the distance between the largest driver-node and the second largest driver-node decreases significantly as the magnetization ratio decreases (fig. 7). Consequently, it was the reduction in resolution that caused the classifier to lack 100% accuracy. It was not due to the false assertion that the driver-node did not have the highest information impact score. Since the definition of the driver-node targets the max value only, the prediction accuracy could increase by adjusting the metric to include for cases where causal influences is similar.

Figure 7: Mean distance standard deviations from largest driver-node to second largest driver-node. The distance between driver-node and the node with the second highest value decreases as the magnetzatio ratio decreases. This reduces the ability to resolve the most causal node, as group of similar causal influence emerges from middle to high thermal noise.

Information impact varies linearly with low intervention strength

Unexpectedly the results indicated that information impact shared a linear relation for low intervention strength. As such a post hoc multiple linear regression was performed to quantify the main effects between regressors and low intervention strength causal impact (fig. 11). Information impact had a high regression coefficient (, , ). In contrast, no centrality metric had a significant regression coefficient (table A7). No tests were performed on high intervention strength due to its non-linear dependency. The linear relation is striking as it implies that information impact could be leveraged to provide a direct estimate for dynamic importance for all nodes in the network. Although the claim could certainty be made for this particular system, no generalization claims can be made for all network structures due to the aforementioned confounding of mutual information.

Figure 8: Causal impacts as a function of information impact per node and in different conditions. From top to bottom the temperatures increases from low thermal(), to high noise (). The left column reflects the causal impact with an underwhelming intervention, and the right with an overwhelming intervention.

Previous research showed how nodes with high degree had a diminishing role in scale-free networks [Quax2013]. The dynamic importance as measured by information diffusion time showed a positive skew with degree centrality. Similar to a hill-climb, the dynamic importance gradually increased as the degree increased. The nodes with the most dynamic importance were nodes with intermediate degree centrality. For high degree the dynamic importance diminishes. The degree distribution of the network used here would fall within the linear phase of the results in [Quax]. However, we don’t observe a linear relation between degree and information impact. The difference may be due to the difference in metric, i.e. time based versus integral based, difference based on network structure, and/or network size.

Future studies should determine whether the linear relation between causal impact and information impact holds for larger network structure of arbitrary size governed by Gibbs dynamics.

4 Discussion

4.1 Centrality measures fail to identify the driver-node

The structure of a network is crucial for the observed behavior of a system. This has lead to the implicit assumption that the connectedness of a nodes can be related to its dynamic importance [Pequito2017, Liu2016a, Liu2011a, Lu2016, Cornelius2013, Zanudo2017]. The results show that this assumption does not always hold for complex systems. In order to understand the system behavior, a metric needs to account for a wide range of behavior that the system exhibits. For example, fig. 9 shows how the driver-node changes as a function of noise. For low noise levels ‘sleep’ is the driver-node. However, as the noise levels increase a set of nodes obtain similar causal influence. Namely, ‘depr’, ‘lonely’, and ‘sad’ have similar causal influence. By definition the structural methods do not have the ability to change their predictions for the driver-node based on a change in dynamics only, i.e. they always produce the same prediction regardless of the behavior of the system. Consequently, this raises the the question whether of applying centrality metrics to complex systems is appropriate in general.

In particular, the network used in this study stems from bereavement scores in elderly people showing depressive symptoms (see [Fried2015] and appendix G). The symptom ‘appet‘ has a relatively high betweenness score. Recall that betweenness centrality ranks nodes based on how often a nodes acts as bridge between other nodes. In other words, a high betweenness score indicates that the node can be used to quickly travel between distance nodes of the network. However, ‘appet’, is not associated with a high causal impact score in general. As as consequence, centrality metrics provide inappropriate interpretations if they are assumed to provide information regarding dynamic importance when no assumption on the dynamics governing the system are known or can be assumed. The concerns regarding the validity of the use structural metrics in psychological networks were recently highlighted by Bringmann et al. [Bringmann2018]. They argue that from the onset, it is not clear what centrality metrics actually measure in networks of psychological symptoms. The issues raised by Bringmann et al. were on an conceptual level without actual data. The results from this study combined with the theoretical results from [Quax2013] strengthen their casus with quantative data on real-world and artificial networks.

4.2 Information impact: an excellent predictor for unperturbed dynamics

One of the primary aims of science is to provide causal explanations of natural phenomena. As a consequence, a scientist’s goal is to understand what causes the observed systemic behavior. Interventions allow for a controlled approach to determine cause from effect and are essential to the scientific method [Woodward2005, Woodward2014, Woodward2015, Pearl2000, Pearl2018]. Overwhelming interventions are ‘drastic’ changes to a system similar to knocking out a gene, or replacing a systemic signal altogether [Pequito2017, Liu2016a, Liu2011a, Lu2016, Cornelius2013, Zanudo2017, Izhikevich2007, Rabinovich2006, Rabinovich2006a, Rabinovich2008]. They are often preferred as they maximize the experimental effect, and by extension maximize the information gained from the experiment. The effect of intervention size can be clearly seen in fig. 9. Namely, large causal interventions generated a different ordering in the causal structure compared to low intervention. Importantly, the driver-node changed as well. This raises the question whether the system after overwhelming intervention is still similar the unperturbed system. If the goal is to provide causal explanations for the unperturbed system, overwhelming interventions are inappropriate as it induces artificial system dynamics that do not in occur in the unperturbed system.

Notably, information impact mirrored the driver-node identified by low causal influence only. Consequently, information impact can be used as a tool to provide insights into what drives systemic behavior. Additionally, as information impact is computed based on observations only, it can be used in situations where direction intervention is difficult, e.g. in systems that work on time-scales that succeed human life or are physically hard to observe.

Figure 9: Mean causal influence and information decay as a function of time with 2 standard deviations as confidence interval. The driver node for each subplot based on largest causal impact or information impact is indicated by the color of the dot in the upper right corner. The left column represents the delayed Shannon mutual-information with Panzeri-Treves correction as a function of time without any external interventions (Unperturbed), the middle and right column depict the causal influence as a function of time for the underwhelming and overwhelming intervention as measured by a delayed version of the Kullback-Leiber divergence. Each row represents the different magnetization ratio indicated in fig. 1.
Figure 10: The relation between causal impact (), information impact () and centrality measures. Each column represents the data in the different intervention condition, whereas each rows shows the different magnetization ratios used (see main text). In each subplot the colors represent the nodes in the graph, and different markers are used to indicate centrality measures. The data was rescaled within for plotting purposes only and do not reflect to original data range. For underwhelming interventions there is no clear relation between causal impact and any centrality metric, however there is a linear dependency between information impact and causal dependency. In contrast, for overwhelming centrality some centrality metrics do seem to vary linearly with causal impact. The relation is however, not as clear as the underwhelming case. Additionally, the system dynamics do not reflect the unperturbed dynamics.
Figure 11: Multiple linear regression on underwhelming causal intervention. The data for each subplot contains all the values regardless of magnetization ratio for underwhelming causal intervention only. The significant regression coefficients () are shown in boxes with matching colors to the scatter points. The dotted lines are the fitted lines obtained from the model with standard error. The squared value for the model is shown in the upper right corner of the top left plot. There is a clear linear relation between causal impact and information impact regardless of temperature. For non of the centrality metrics a similar relation holds.

4.3 Future directions

Generalization to other types of dynamics and graph structures

The results show that for Gibbsian dynamics in a weighted real-world networks structural methods are not predictive for causal influence. Although Gibbsian dynamics represent a general class of dynamics, future studies should investigate if this holds for other types of dynamics such as epidemic, biochemical, regulatory or population. The advantage of the proposed information theoretical method is that it allows for direct comparison of different types of dynamics under the condition that it can be expressed in terms of probabilities.

Additionally, future should investigate the interaction of network structure and dynamics governing the nodes similar to [Harush2017]. Preliminary results using different graph structure show that information impact generalizes well to other graph structure (see appendix F). Namely, information impact correctly identifies the driver-node for for low causal interventions but not for high causal intervention. However, no comparisons were made using different dynamics on the same graph structure. Future studies should investigate how information impact performs as a function of dynamics and graph structure.

Detecting transient dynamical structures

Information impact was originally derived from a time-based measure using delayed mutual information [Quax2013]. The area under the curve removes time in favor of a singular value for comparison. Complex dynamical systems can behave at different temporal time scales. An interesting direction would be to determine in systems with varying temporal time-scales how information impact can be used to detect transient systemic behavior in systems by shifting integral range. For example in Ising models there is a common observation that nodes with high-degree (hubs) can flip sporadically over time. This has large scale effects on systemic properties such as mean magnetization. Quax et al. postulated that this flip is caused by bottom-up interactions where nodes with lower degree flip, causing a chain reaction that moves as a ripple through the network eventually causing a node with higher degree to flip [Quax2016]. The exact nature and conditions under which such a flip occurs, may provide insights in riot dynamics, swinging the popular vote in elections, or how damage in DNA can cause cellular failure.

Unraveling a larger causal structure

As mentioned mutual information values can be inflated for all other nodes other than the driver-node. However, given large enough network size, it is highly likely of two nodes having equal causal influence (fig. 8). As such not all the information impact values will be confounded. Equal causal influence of nodes may be resolved by computing the mutual information of nodes directly. For example consider two nodes and with equal causal influence and corresponding similar information impact. By computing one can conclude whether and are separate driver-nodes. Namely, if it can be concluded that and are in fact separate driver-nodes. This raises the question how much of the causal structure can be extracted using observed data and information impact.

5 Conclusion

The goal of this paper was to show that structural methods provide unreliable estimates of the driver-node in complex dynamical systems. The results from this study undeniably show that the common assumption of topologically central or well-connected nodes being dynamically most important is actually false. Furthermore, it implies that we cannot abstract away the dynamics of a complex dynamic system before analyzing it. The proposed novel metric, information impact, was able to reliably identify the driver-node for natural dynamics in complex systems, and enables scientists to climb to the ladder of causation [Pearl2018].



Appendix A Data-processing inequality

The data-processing inequality can be used to show how no clever manipulation of the data can improve the infrerences made from that data.

Definition 1

Random variables are said to form a Markov chain if the conditional distribution of depends only on and is conditionally independent of . Specifically, form a Markov chain if the joint probablity can be written as:

Theorem 1

(Data-processing inequality) If , then .

Proof: By the chain rule, the mutual information can be expanded in two different ways:


Since and are conditionally independent given , we have . Conversely, if , this would give


Thus we only have equality if and only if for Markov chains. Similarly, one can prove that .

Corollary 1


Proof: forms a Markov chain .

This result implies that no function can can increase the information about .

Corollary 2

If , then

From Eq. (21) it is noted that due to the definition of independence of the Markov chain and . Therefore:


The dependence of and is decreased or remains unchanged by the observation of a “downstream” random variable . The observant reader may recognize that when the set does not form a Markov chain. To illustrate, let be independent fair binary random variables with . Then , but bit.

Appendix B Data correction and fit errors

Figure S1: Fit error per temperature and condition for decay curves. A double exponential was fit to the decay curves for extracting the area under the curves. This figured depicts the fit error for the kernel per node in all conditions and noise levels. In general the fit error is extremely low.
Figure S2: Causal impacts as a function of information impact per node and in different conditions. From top to bottom the temperatures increases from low () to high noise (). The left column reflects the causal impact with an underwhelming intervention, and the right with an overwhelming intervention. See the rejection rate in fig. S3.
Figure S3: Rejection rate for different nodes per condition and temperature. The rejection threshold was set as 15.9 % the standard deviation of the Mahalanobis distance.
Figure S4: Normalized covariance between information impact and centrality measures in the psychonetwork. The covariance indicates two things. First, information impact metric is relatively independent of the centrality metrics. Second there exist a correlation among the centrality metrics.

Appendix C Mutual information and causality

Under certain conditions causal influence of nodes reduces to mutual information. In this section we will show when that is the case.

In a Markov system each node is updated as

where represents the inputs or parents of node . Hence we have

The causal influence from can be defined as

with representing the Kullback-Leibler divergence (KL-divergence).

Theorem 2

when and have no common neighbors.

Proof: For any Markov system the Markov condition holds, i.e. and . Therefore we can write:


Appendix D Mutual information and time symmetry

The methods applied in the main text imply that the metric can be used in a symmetric manner. For partical purposes, mutual information was performed in a ‘forward‘ manner. Namely, the system state was simulated for positive from some . For undirected graphs there is a symmetry with regard to where information flows. Information is not bounded by any directionality of edges (fig. 4(b)).

It is important to emphasize that this (generally) is not the case for directed graphs. If information is constricted to flow in one direction, the mutual direction of time simulation is crucial. Additionally, directed graphs show that the metric can be applied for different purposed. This can be seen in fig. 4(a), where forward simulations gives ‘information sincs’ and backward simulation provides ‘information sources’. Information impact in directed graphs will provide information about what nodes receives the most information over time. In contrast, simulating backwards shows what nodes has most impact on the instantaneous state of the system. The different properties of this finding shall be the subject of further studies.

Figure S5: Example of time symmetry in directed and undirected graphs. 4(a) shows the asymmetry that occurs when information flow is directed. The time before the system state can be interpreted as information sending. Namely, nodes that have the most impact on the current system state . In contrast, information for as information receiving; nodes that receive information from . The most striking example is node 4 which has a sharp decay for but a relatively fat tail for . This change is due to the difference in meaning of the information impact measure, e.g. sending vs receiving.
4(b) shows that for undirected graphs there is no difference between node importance before or after ; information flows both directions.

Appendix E Statistical result tables

Dep. Variable: Underwhelming R-squared: 0.937
Model: OLS Adj. R-squared: 0.936
Method: Least Squares F-statistic: 2119.
Prob (F-statistic): 0.00
Log-Likelihood: -1084.0
No. Observations: 720 AIC: 2180.
Df Residuals: 714 BIC: 2207.
Df Model: 5
coef std err t Pt [0.025 0.975]
intercept 5.4800 0.041 134.280 0.000 5.400 5.560
4.1846 0.044 95.871 0.000 4.099 4.270
0.1061 0.091 1.165 0.244 -0.073 0.285
bet 0.1246 0.068 1.824 0.069 -0.010 0.259
ic -0.0473 0.071 -0.664 0.507 -0.187 0.093
ev 0.0060 0.093 0.065 0.948 -0.177 0.189
Omnibus: 88.651 Durbin-Watson: 2.080
Prob(Omnibus): 0.000 Jarque-Bera (JB): 728.890
Skew: -0.156 Prob(JB): 5.29e-159
Kurtosis: 7.919 Cond. No. 5.26

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Table A4: Multiple linear regression for underwhelming causal impact using information impact (), weighted degree (), betweeness (), information centrality (), and eigenvector centrality () as regressors.

Supplementary figures

Figure S6: Causal impact with underwhelming intervention as a function of different centrality measures.From top to bottom the centralities are: degree (), closeness (), betweenness (), and eigenvector (). The size of the dots are the relative values of the information impact; larger size means higher information impact. No clear relation can be seen between any of the centrality metrics across temperatures.
Figure S7: Causal impact with overwhelming intervention as a function of different centrality measures. From top to bottom the centralities are: degree (), closeness (), betweenness (), and eigenvector (). The size of the dots are the relative values of the information impact; larger size means higher information impact. For the centrality metrics have a varying relation with overwhelming causal impact. Degree centrality and eigenvector seems to be mostly linear, however information centrality and betweenness have an unclear relation.

Appendix F Krackhardt kite graph

As proof of principle, the simulations were repeated for an unitary weight undirected graph, namely the Krackhardt kite graph (fig. S8). In practice there is a high correlation among centrality measures (fig. S10). This is in part due to the fact that centrality measures tend to share assumptions [Borgatti2005, Borgatti2006]. The kite graph is a famous network layout not only because of its layout, but also because different centrality measures rank different nodes as number one (fig. S8).

Figure S8: Krackhardt kite graph with loading of weighted degree centrality, betweenness, information centrality, and eigenvector centrality. In each subplot the size of the nodes are scaled towards their ranked centrality value; the larger the radius of the circle the higher the node is ranked. The numbers indicate the node labels and do no reflect a ranking of any kind.

For unitary weighted graphs equilibrium occurs naturally in Ising models due to the tendency of the nodes to align their states with their nearest neighbors for low enough temperatures(1). As the snapshots are calculated based on Markov chains initiated from zero (see section H and 2.6), this will yield inflated decay rates for hubs as a function of time. This can be understood from the ferromagnetic behavior of Ising models; for low enough temperatures the node states tend to align with their nearest neighbors (fig. S11). As such hubs will have be relatively frozen over time, and therefore yield the slowest decay rates. However, short-term causal interaction won’t reflect these hubs to be causally important for short timescale interactions. Hubs can be considered as stabilization factors such as the earth orbiting around the sun where short-term interactions can be seen as the temperature differences between night and day. As such only sampled one side of the magnetization curve to align the causal curves with their actual decay rates (see section H for more information). For undirected and unitary weights graphs this method has no effect on estimating the true causal importance, rather it can be seen as a necessary step to measure short-term causal interactions.

The results show that information impact remains a good predictor for the most causal node with moderately levels of noise (fig. S9). For high noise, the decay curves collapse for both causal impact and information impact (fig. S12). This is to be expected as the nodes will behave more randomly with respect to their neighbors. Compared to the psychonetwork, the accuracy score for low intervention was higher (, see LABEL:table:kite_rnf_scores). Other graphs have also been studied and similar effects have been found, i.e. in all cases information impacts was a reliable, solid predictor for causal impact for underwhelming nudges but not for overwhelming nudges. Future studies will need to quantify this exact behavior.

f.1 Figures kite graph

Figure S9: Kite graph. The relation between causal impact(, information impact and centrality measures. Each column represents the data in the different intervention condition, whereas each rows shows the different temperatures used (see main text). In each subplot the colors represent the nodes in the graph, and different markers are used to indicate centrality measures. The data was rescaled within for plotting purposes only and do not reflect to original data range.
Figure S10: Normalized covariance among regressors in kite graph. Compared to S4 information impact shows a negative correlation with the centrality metrics. Additionally, there is a stronger positive correlation among most of the centrality metrics.
Figure S11: Kitegraph. Absolute magnetization as a function of temperature. The red dots indicate the matched magnetization for [0.8, 0.5, 0.2] percent of the max .
Figure S12: Kitegraph results. Mean causal and information decay as a function of time with 2 standard deviations as confidence interval. The left column represents the delayed Shannon mutual-information with Panzeri-Treves correction as a function of time without any external interventions (Control), the middle and right column depict the causal impacts as a function of time for the underwhelming and overwhelming intervention as measured by a delayed version of the Kullback-Leiber measure.
Figure S13: Causal impacts as a function of information impact per node and in different conditions. From top to bottom the temperatures increases from low noise (), to high noise (). The left column reflects the causal impact with an underwhelming intervention, and the right with an overwhelming intervention.
Figure S14: Causal impact with underwhelming intervention as a function of different centrality measures. From top to bottom the centralities are: degree (deg), closeness (close), betweenness(bet), and eigenvector (ev). The size of the dots are the relative values of the information impact; larger size means higher information impact.
Figure S15: Causal impact with overwhelming intervention as a function of different centrality measures. From top to bottom the centralities are: degree (deg), closeness (close), betweenness(bet), and eigenvector (ev). The size of the dots are the relative values of the information impact; larger size means higher information impact.
Figure S16: Kitegraph. Random forest classifier performance. (TOP) Accuracy score per temperature and intervention condition. (MIDDLE) Feature importance per predictor and condition. (BOTTOM) Delta score ( (accuracy score - shuffscore) / accuracy score) per feature and intervention. Statistical results accompaning this graph are in LABEL:table:kite_rnf_scores.

Tables kite graph

Dep. Variable: Underwhelming R-squared: 0.948
Model: OLS Adj. R-squared: 0.947
Method: Least Squares F-statistic: 1062.
Date: Mon, 25 Mar 2019 Prob (F-statistic): 8.60e-186
Time: 15:10:16 Log-Likelihood: 16.422
No. Observations: 300 AIC: -20.84
Df Residuals: 294 BIC: 1.378
Df Model: 5
coef std err t Pt [0.025 0.975]
intercept -5.109e-16 0.013 -3.82e-14 1.000 -0.026 0.026
0.9448 0.017 54.125 0.000 0.910 0.979
0.3633 0.184 1.973 0.049 0.001 0.726
-0.1972 0.091 -2.177 0.030 -0.375 -0.019
0.4311 0.183 2.361 0.019 0.072 0.790
-0.8008 0.345 -2.321 0.021 -1.480 -0.122
Omnibus: 25.336 Durbin-Watson: 1.722
Prob(Omnibus): 0.000 Jarque-Bera (JB): 35.049
Skew: 0.593 Prob(JB): 2.45e-08
Kurtosis: 4.181 Cond. No. 59.8

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Table A7: Multiple linear regression for underwhelming causal impact using information impact (), weighted degree (), betweeness (), information centrality (), and eigenvector centrality () as regressors.
Figure S17: Multiple linear regression on underwhelming causal intervention. The data for each subplot contains all the values regardless of magnetization ratio for underwhelming causal intervention only. The significant regression coefficients () are shown in boxes with matching colors to the scatter points. The dotted lines are the fitted lines obtained from the model with standard error. The squared value for the model is shown in the upper right corner of the top left plot. Here the OLS results for centrality metrics are biased due to outliers. No clear relation holds. In contrast, information impact is highly linear with causal impact.

Appendix G Validation with Fried et al. [Fried2015]

From the results, the most causal node ‘sleep’ was correctly identified by information impact for low noise. As the noise in the system increases, nodes that were first not causally relevant began to drive the system, specifically, ‘depr’, ‘lonely‘ and ‘sad’. In the original study, the bereavement score was most affected by ‘lonely’, and showed weak negative associations with ‘happy‘ and ‘effort’ (fig. S18 adopted from [Fried2015]). Consequently, it seems that medium to high thermal noise is most congruent with the original study. Fried et al. postulated that ‘lonely’ was the gateway from which information spreads through the network, i.e. bereavement was embodied mainly by ‘loneliness’ which then percolated its effect to the other symptoms. Since the nature of the data was cross-sectional, the comparison with the results from this study relies on the assumption that binary dynamics are representative of the absence and presence of psychological symptoms. If correct, the results from this study give a causal perspective on the associative results from [Fried2015]. The results from this study postulate that ‘depr’, ‘lonely’ and ‘sad’ have similar causal effect for moderate to high thermal noise.

Figure S18: Main results from Friend et al. [Fried2015]. The graph represents the output from a Multiple Indicators Multiple Causes (MIMIC) model. The red lines indicate significant direct effects of spousal loss on Center for Epidemological Studies Depression Scale(CES-D items); standardized estimates of these affects are represented in red below the symptoms. There was no significant loading of loss on the latent factor . For more info see [Fried2015].

It is important to emphasize that a quantification is given in terms of absolute effect size and not directed effects. This means that nudging for instance ‘sleep’ has some effect on the psycho-symptom network, in what direction that effect is, or whether it has a positive or negative effect on the bereavement score / cognitive load of the patient is not clear, and should be the subject of future studies.

As a final note, the field of psychometrics is concerned with relating the how observables (e.g. behavior, responses on questionnaires, etc) relate to theoretical cognitive constructs such as intelligence or mental disorders. A common approach in understanding high level phenomena such as depression is to use a latent variable model, i.e. assuming some high abstract feature to be the cause of the observables (or vice versa). Only recently has this paradigm shifted from a latent variable model to a network based approach [Waldorp2011, Epskamp2018, Borsboom2011]. Marsman et al. recently reconciled these two approached by showing statistical equivalence between the Ising model and canonically used latent variable models in psychometrics [Marsman2018]. The two approaches thus highlight different aspects in theory building; measurement invariance and correlation structure may be interesting from a common cause approach but not from a network perspective which is more interested in dynamical aspects of the system. Both approaches, however, aid in highlighting different aspects of the psychological constructs.

Appendix H Code Manual

Accompanying this paper, I developed a general framework for analyzing discrete systems using information impact. The code is written in python 3.7.2 and uses cython 0.28.2 for c/c++ level performance. The code is freely available on includes the latest build instructions. What follows here is a brief overview of the framework.

Design philosophy and overview

From the onset the core idea was to develop a toolbox which would enable scientists to easily adapt to code to include their own models for specific needs. As such, python’s objective-oriented tools were capitalized. The toolbox can be divided in three main components:

  1. Information toolbox. Includes Monte-Carlo methods for constructing and and computes the delayed Shannon mutual information. The functions utilizes concurrent threading to capitalize on multi-core systems.

  2. Models. The modules module contains a ‘Model’ archetype on which all user-defined models must be based. As an example some models are provided, i.e. fastIsing.pyx was used in generating the results from this study.

  3. Utils. In Utils various functions can be found that are used in extracting and loading data, statistical analysis, and visualization.

For the end-user most interesting parts is the Models submodule. Users can provide their own models with limited base requirements to run and extract information impact temporal dynamics.

An additional design consideration was to enable excellent performance speed with a requirement on multicore utilization. This was achieved by leveraging low-level threading support using openmp through cython. Typed memory views are used to maintain a fast high level interface for easily defining matrices and vectors. can be either used to analyze Ising models, or adapted to fit the models need specified by the user. In order to reproduce the graphs in this paper, can be run.


The Ising model class takes as input a graph structure, temperature and unique agent states the nodes can assume. The graph structure takes as input any graph object provided by the package networkx. The class internally converts the graph structure into an efficient adjacency list to (a) prevent memory load with a dense matrix, and (b) capitalize on the amortized lookup speeds of of c++’s unordered maps.

The function self.sampleNodes samples the node indices that are used in self.updateState. In general the Models archetype (parent class) allows for four different sampling methods:

  1. Serial: samples nodes from the sorted node indices similar to how a CRT-tv scan lines work.

  2. Single. Each simulation step a single node is sampled at random and considered for flipping.

  3. Async: each simulation steps single updates are performed with mutation interaction.

  4. Sync: the system state is frozen and each node is update according to this frozen state

For larger systems we recommend using the async option for updating as this will reduce the amount of data collected with the added benefit of increasing signal to noise, i.e. reducing the correlations experienced in single updates forcing one to sample for longer.

Additionally, the model offers the user to specify whether it wants to only measure one side of the magnetization, i.e. use out-of-equilibrium short timescale dynamics. The parameter magSide can be set to:

  1. ’pos’, for only positive side of magnetization;

  2. ’neg’, for only negative side of the magnetization;

  3. ”, equilibrium dynamics are assumed.

In undirected, unitary weight network structures contstraining setting the magnetization side does not affect the stability of the system, but it is important to consider for weighted graphs. The magSide options calculated the average magnetization at every update step and flips the entire state if the for magSide is ’neg’ and vice versa for ’pos’.

Additionally, the c-extension class has wrappers for its internal defined c-level functions. This enables the class to be used as any normal python class when testing.


For the latest example go to to find various jupyter notebooks on different facets of the software.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description