The dynamic importance of nodes is poorly predicted by static topological features
Abstract
One of the most central questions in network science is: which nodes are most important? Often this question is answered using topological properties such as high connectedness or centrality in the network. However it is unclear whether topological connectedness translates directly to dynamical impact. To this end, we simulate the kinetic Ising spin model on generated and a realworld networks with weighted edges. The extent of the dynamic impact is assessed by causally intervening on a node state and effect on the systemic dynamics. The results show that topological features such as network centrality or connectedness are actually poor predictors of the dynamical impact of a node on the rest of the network. A solution is offered in the form of an information theoretical measure named information impact. The metric is able to accurately reflect dynamic importance of nodes in networks under natural dynamics using observations only, and validated using causal interventions. We conclude that the most dynamically impactful nodes are usually not the most wellconnected or central nodes. This implies that the common assumption of topologically central or wellconnected nodes being also dynamically important is actually false, and abstracting away the dynamics from a network before analyzing is not advised.
library.bib \DeclareSourcemap \maps[datatype = biblatex] \map[overwrite = true] \step[fieldset = abstract, null]
Information theory Causality Drivenode identification
1 Introduction
Understanding complex dynamical systems is a fundamental problem for the 21st century [Mercury2000]. A complex system per definition is a system with a heterogeneous structure of interactions. When governed by a certain dynamic these systems are said to form a complex dynamical system. A key feature of many complex dynamical systems is their absence of a central control unit, i.e. the complex systemic behavior is said to ‘emerge‘ from the network through local often nonlinear interactions. Emergent behavior exists by virtue of these interactions and cannot be found in isolated elements. Importantly, emergent behavior pervades nature at all spatiotemporal scales such a galaxies emerging from its stars [Penrose2010], ecosystems emerging from foodwebs [Breckling2005], human cognition from the interaction of populations of neurons [Thompson2004, Kandel2000], cellular regulatory processes from proteinprotein interactions [Alberghina2009], or atoms from elementary particles [Penrose2010]. Providing understanding to the innerworkings of these systems is a central issue for many scientific disciplines and is crucially important for the public at large.
Obtaining explanations in complex dynamic systems may start with asking ‘What node is dynamically most important?’, i.e. what node drives the systemic behavior. For example, it is desirable to understand how to effectively stop a viral infection from becoming a pandemic by shutting down the airport that would lead to catastrophic results if not shutdown, understanding what brain area is dynamically most important for cognitive function , or how traffic could be rerouted to attenuate traffic jams. However, current available theories and algorithms for inferring causal influence of dynamic variables are not readily applicable to complex systems. Previous studies fail to address three primary issues relating to (i) assumptions on the mechanisms that generate system dynamics, (ii) intervention methods, and (iii) the applicability of structural methods and metrics when considering form versus function in complex systems.
First, most methods for analyzing complex systems are developed using at least one of the following assumptions (see [Wang2016] for overview): stationarity of system state dynamics, (local) linearity of dynamics, timeindependence of the interactions, or dynamics having reached equilibrium. An advantage of these assumptions is that they often lead to analytical expressions and/or closedform solutions of the problem [Penrose2010]. However, realworld dynamical systems are notoriously nonlinear in general where small variations in the input of a system may results in nonintuitive systemic behavior. The aforementioned assumptions on dynamics offer a temporary gain in the reduction of complexity and provide insights in how simpler systems work [Penrose2010]. The translation of the simpler model to its realworld counterpart offers a varying degree of accuracy regarding how the realworld system functions. This poses a problem for theory building and providing causal explanations in the long run.
Second, the dynamic importance of nodes is often determined through overwhelming interventions. An overwhelming intervention is an external influence similar to knocking out a gene from a cell’s regulation processor replacing a signal altogether [Cowan2012, Yan2017, Pasqualetti2013, Liu2016, Gates2016, Yan2012]. Interventions are essential in determining causal influence as it allows for experimental control in determining cause from effect, i.e. they are crucial in obtaining causal explanations for observed systemic behavior [Pearl2000, Woodward2005, Woodward2014, Woodward2015]. One of the aims of science is to provide causal explanations for natural phenomena. The aim is to understand what mechanism caused behavior . This is achieved by performing controlled interventions on such that induces a changed systemic behavior . In the worst case overwhelming interventions change in such a way that yields behavior that is independent from . As a consequence, it does not provide any novel insights in what parts of was relevant for . The smaller the intervention, the higher the correlation between and and as such the stronger the claims about .
Lastly, dynamic importance is often studied from a structural perspective; the most important node is identified based on the intuition of flow, i.e. the more outward connections a node has, the more opportunity it has for spreading perturbations in the network [Harush2017, Lu2016, Ay2008, Gu2015, Chen2017, Yin2016, Kim2017a]. In recent years, applying structural methods to complex systems has gained popularity. Specifically, the use of centrality measures has been used in a variety of different systems ranging from pandemics [Sikic2013], social networks [Freeman1979], brain networks [Joyce2010] to networks of psychosymptoms [Fried2015]. These measures rank nodes using a realvalued function based on some structural property [Borgatti2005, Borgatti2006, Fried2015, Bringmann2018, Sikic2013]. Borgatti et al. argued that from a graph theoretical perspective all centrality metrics quantify the walk structure of a network [Borgatti2005, Borgatti2006]. This can be interpreted from a complex system perspective as assuming a particular type of dynamics which the nodes use to pass on information. For example, betweenness centrality, ranks nodes based on how often a node acts as a bridge along the shortest path between any two other nodes in the network. In other words, betweenness centrality, assumes that information flows along the short path. However, not all dynamics in a network assume shortest path dynamics. If a river overflows, the water may not follow the shortest path to the sea, rather it follows the path of least resistance. Depending on the complex system considered, the network structure alone may not be indicative of the types of dynamics that are exerted on the network. In other words, the connectedness of nodes may not be indicative of the nodal dynamical importance. The relation between network structure and dynamic importance was recently shown by Harush et al.[Harush2017]. By varying the dynamics on the network on but keeping the structure constant, their results showed show that the steady state dynamics embody a nonlinear relation between the network’s structure and the governing systemic dynamics. Importantly, the dynamic importance of a node could drastically change as the dynamics of the systems are changed. This implies that the interaction of system structure with the systemic dynamics is the cause of failure of many of the structural methods applied to complex dynamic systems. Therefore, if there is no a priori knowledge about the dynamics governing the system or substitutes as a approximation that could lead to reliable estimates of systemic behavior, it would be unsafe in general to use centrality measures or any other structural method as a guide for finding drivernodes.
Thus, there is a need for a modelfree approach that is able the capture the underlying causal mechanism that causes the system behavior reliably without assumptions on dynamics and structure of the network. One promising approach was proposed by Ay and Polani [Ay2008]. Their methods relate concepts from information theory to Pearl’s doformalism [Pearl2000]. Conditional Shannon mutual information was able to to deduce the causal hierarchy in directed acyclic graphs. Similar approaches include transfer entropy [Schreiber2000], information bottleneck [Buisson2018], Granger causality [Granger1969], relative entropy [Cover2005]. The main aim of these approaches is to reconstruct the complete network of causal interactions by evaluating the (shortterm) causal influence between every pair of nodes [Wang2016]. However, many of these approaches either overestimate or underestimate the influence of socalled synergistic and redundant information [James2016a, Quax2017, Cover2005]. Estimating synergistic information is currently a hot area of research and it remains an open question what the proper approach is to quantify and study this phenomenon, e.g. see [James2016a].
Quax et al. offered a solution by not conditioning on a node or set of nodes, but rather comparing the influence of a node with the entire system state, i.e. using the regular nonconditional mutual information [Quax2013, Quax2013a]. The nonlinear interaction between nodes or sets of the network would be captured by the total system entropy. They introduced the concept of information diffusion time which embodies the nonlinear correlation of a node with the system dynamics using timedelayed Shannon mutual information. Dynamic importance can therefore be quantified by the amount of information a node shares with the system state over time. Importantly, their analytical results show that for infinitely sized, unitary weight, scalefree networks that not nodes with high degree were dynamically most important [Quax2013]. Rather, intermediately connected nodes were found to be more dynamically relevant. This striking result puts into question whether the assumption that topologically central or wellconnected nodes in a network necessarily correspond to dynamically important nodes. However, for practical purposes a network is never infinitely sized, or necessarily treelike, and it should be investigated whether this holds for realworld networks of arbitrary size.
The aim of this paper is to test the hypothesis that wellconnectedness translates to dynamic importance in a realworld weighted network consisting of pyschological symptoms obtained from [Fried2015]. Temporal data is simulated using Glauber dynamics. It will be shown that structural metrics provide no reliable predictive power in determining the drivernode. A solution is offered by means of a novel metric based on timedelayed Shannon mutual information named information impact without assumptions on dynamics or structural dependencies. The metric is validated using causal interventions with varying intensity. The results of this study provide scientists of all fields a novel, reliable and accurate metric for the identification of driver nodes, and enable them to climb the ladder of causation [Pearl2018].
2 Material and methods
2.1 Terminology
A system is defined as with static network structure indicated by edges where each node in the system is governed by dynamics . Each node state is determined through nearest neighbor interaction , i.e. node chooses its next state with . This is also known as a Markov network.
Node dynamics
For dynamics this papers considers the kinetic Ising spin dynamics. The kinetic Ising model corresponds to one of the simplest models for real complex systems and is believed to provide a sensible description of a large number of physical systems. The model was originally developed to study the behavior of ferromagnetism in statistical mechanics [Brush1967]. A prominent property of the Ising model in higher dimensions (two or more) is the phase transition between an ordered phase to a disordered phase by increasing the noise parameter (fig. 1). The increase in noise allows the probabilistic local interactions to produce macroscopic qualitative change of behavior from tending to align their states with their neighbors (ordered phase) to a being more independent of their neighbors (unordered phase). Both the simplicity of the model as well as its phase transition has led researchers to successfully model a variety of different behavior ranging from consensus emerging through social interactions [Grabowski2006, Kandiah2012], the behavior of lattice gas and fluids [Glauber1963], and the behavior of neurons [Izhikevich2007].
The Ising model consists of binary distributed variables dictated by the Gibbs distribution that interact through nearest neighbor interactions;
(1) 
(2) 
where is a node state, is the system state, is the inverse temperature , are the interaction (edge weights) between and , represents external influence on node , and is the partition function.
Information as a measure of dynamic impact
Each node in the system can be considered as an information storage unit. Over time, the information stored in a node will percolate throughout the system while at the same time decaying due to noise. The longer the information of a node stays in the system, the longer it can affect the systemic dynamics. Therefore, dynamic impact of a node can be measured by the amount of information a node shares with the entire system [Quax2013, Quax2013a, Quax2017].
How does one measure information stored in a node? A node dictated by some dynamic can be considered a random variable where the node is able to assume different states. In information theory information is quantified in bits, i.e. yes/no questions concerning the outcome of a random variable. The average information a random variable can encode is called entropy and is defined as:
(3) 
Note all are base 2 in this paper unless specified otherwise.
Entropy can also be interpreted as the amount of uncertainty of a random variable. In the extremes the random variable either conveys no uncertainty (i.e. a node always assumes the same state), or is randomly chosen between all possible state (uniform distribution). For example consider a coin flip. One may ask how much information does a single coin flip encode? If the coin is fair, i.e. there is equal probability of the outcome being heads or tails, the amount of questions needed to determine the outcome is exactly 1. In other words, a fair coin encode 1 bit of information. However, when the coin is unfair the information encoded is less than one. In the extreme case where the coin always turns up heads, the entropy is exactly 0.
The information shared between a node state and a system state can be quantified by mutual information [Quax2013, Cover2005, Quax2013a, Quax2017, James2017]. Mutual information can be informally thought of as a nonlinear correlation function. Formally, mutual information quantifies the reduction in uncertainty of random variable by knowing the outcome of random variable [Cover2005]:
(4) 
where and are the marginals of over and respectively, and is the conditional entropy of . The conditional entropy is similar to the entropy; it quantifies the reduction in uncertainty of the outcome by knowing the outcome of . Please note that the yes/no question interpretation even applies to continuous variables; although it may take an infinite amount of questions to determine the outcome of a continuous random variable.
2.2 Information impact
The drivernode would be the node whose mutual information with the system is the largest over time. This can be achieved by using the mutual information and shifting one random variable with respect to another over time. As such information impact is defined as the integral of mutual information of a node with the system state over time
(5) 
where is the system state at some time and is the state of a node away from that system state. At time the value for any node, and for ergodic Markovian systems the delayed mutual information will always decay to zero as [Quax2013, Cover2005]. The question is how fast this decay takes place for each node, and consequently how much information impact the node will have on the system. This property is also known as dataprocessing inequality [Cover2005] and states that information can only decrease in Markov chains without external information injection, a proof is provided in appendix A.
The observant reader may have noticed that the definition of causal impact and information impact is ambiguous to the whether is positive or negative. Namely, if the node state is captured in forward in time or backward in time with respect to some state . For undirected graphs there exists time symmetry with respect to how causal influence flows through the network (see appendix D). However, for directed graphs this is not the case. Numerically, it is more convenient to apply a forward method than a backward method. As the network analyzed here is undirected, the forward method was used.
Correlation and mutual information
The drivernode is the node that has the most dynamic impact on the system state. Consequently, it shares the most mutual information with the system over time. However, for all other nodes it is possible for its mutual information value to be inflated due to noncausal correlations. Mutual information can be decomposed in two parts; which is the information that is due to a causal relation between the state variables and which is due to spurious correlations that does not overlap with the causal information. Noncausal correlation may occur if the unit and both causally depend on a third confounder variable such that . This can lead to a nonzero mutual information among the two units, even if the two units would not directly depend on each other in a causal manner.
It is not possible for the node with the highest information impact to have noncausal longterm correlation as there exists no other node in the system which influenced both the driver node as well as the system state. If there would exists such a node, this would lead to a second node having shared information with the system state over time, and as such that would have yielded a larger information impact.
To give a better intuition for the cause of , consider a Markov system consisting of two disjoint directed path graphs with a common source (fig. 2). A directed edge from node to node means the new state probabilities of node depend on the current state of node . Due to this causal dependency, the state of node 1 will store information about the previous state of node 0. Similarly, node 2 will store information about 1 and so on. The information about the states of node 4 and 5 will be immediately lost since they cannot influence any other node state; they have no outgoing arrows to any other node in the system. Their states are thus ‘overwritten’ each simulation step. From the network structure, we can deduce that the causal influence will be the highest for node 0; its information will be lost after at most 5 time steps for sufficiently low temperature. The noncausal correlation issue arises for node 5 which will strongly correlate with node 0. Namely, both 1 and 5 will share a significant, nonzero amount of information about node 1. For this reason the information impact of node 5 will be similar to that of node 1 even though 5 has no ability to influence the system directly. If the network structure is unknown, then there is no way of knowing which node’s correlation are causal or noncausal in general. Even if the network structure could be observed, it would still be challenging to correctly identify all the information impact values other than the largest. Consider, for example, if an edge would be added between 5 2. There is currently no known method of determining exactly how much information in node 1 is uniquely from 5, uniquely from 1, or jointly from 1 and 5 [Quax2017, Lizier2013, Olbrich2015, James2016a]. This is known as the information decomposition problem and is still an active area of research.
Therefore, we can only be certain that the node with the largest information impact will correspond to the node with the largest causal impact (drivernode), whereas for all other information impact could be significantly inflated by noncausal correlations.
2.3 Causal impact
How does one quantify causal influence? A common idea entails that a cause raises the probability of its effects, i.e . However, this does not hold in case of a confounder when only observations are used (fig. 2). Pearl and Woodward noted that the only way to disentangle spurious causal relations is by means of intervening on the system [Pearl2000, Woodward2005]. To illustrate this, consider a simple barometer. A barometer is a device which measures the atmospheric pressure. It can be observed that whenever the barometer levels drop, it starts to rain. Consequently, conditioning on the barometer level, one may conclude that it causes rain. Rain, however, is caused by a fall in atmospheric pressure. Consequently, the barometric reading is not causally related to causing rain. By physically intervention on the barometer reading, e.g. by setting the needle of the barometer and observing whether it rains or not, one is able to falsify the relation that low barometer readings causes rain.
An intervention is an external influence that changes the distribution of a random variable. External in this context means not part of the closed system. For the Ising model this can be conceptualized as an unobserved node with a directed edge to a node part of the model. Recall that in Markovian systems the node probability is given as
(6) 
where represents the nearest neighbors of . An intervention changes this distribution to
(7) 
where . The effect of the intervention will percolate throughout the network over time. A node with large causal influence will cause a large change in the system behavior. Given that dynamics for closed systems corresponds to , the dynamics under intervention can be understood as the system using a different mechanism to generate system behavior. Alternatively, one can interpret this as the system using a different code. Consequently, the question arises ‘How much information does the intervention encode?’ Nodes important for the system dynamics will encode more information, whereas nodes that have no causal influence will yield no information. As such we define causal influence of node by intervention at time as
(8) 
where is the KullbackLeibler divergence (KLdivergence)
(9) 
KLdivergence is also known as relative entropy and represents the extra number of bits needed to identify a value drawn from if a code was used corresponding to rather than the distribution . Alternatively, KLdivergence can be understood in terms of Bayesian inference; it represents the updating of one’s belief from prior distribution to the posterior distribution . When the new code is the same as the old code , KLdivergence yields . This represents the case when an intervention yields no change in mechanisms driving system behavior.
KLdivergence has some desired properties for an expression of causal influence. In particular it is nonnegative and invariant under parameter transformation. Additionally, KLdivergence is asymmetric, i.e. generally . The asymmetry is a nonissue for the aims of this paper as the rationale is that represents the true dynamics of the system, i.e. the unperturbed system. As such we we are interested in finding the drivernode for the unperturbed dynamics and not . Lastly, it should be noted that under certain conditions mutual information can be expressed in terms of KLdivergence and has been used in other causality work using directed acyclic graphs (see appendix C, and [Janzing2013]).
Next, we define causal impact as the integral of causal influence
(10) 
Casual impact embodies the combined effect of using intervention on node on the system over time.
Intervention size
In many experiments concerned with measuring causal flows in networks, overwhelming interventions are used to determine the causal impact of nodes [Zhang2017, Liu2016a, Gates2016, Yan2017]. In complex dynamic systems the size of the intervention is crucial for the observed systemic behavior. Consider for example a system as depicted in fig. 3. One can map the state dynamics in a socalled phase plot that shows the relation among variables. From the phase plot in fig. 3, we observe an unstable point at the origin, a stable limit cycle at and an unstable limit cycle at . A stable limit cycle is a trajectory for which any system within a certain radius will converge to this trajectory as .
An intervention can be conceptualized as a small perturbation from the system input vector, i.e. it’s position in state space from unperturbed dynamics . Small perturbations will not drastically alter the system dynamics. Consider for example a trajectory starting out near the origin. The intervention has caused a divergence between such that the state is set to (dark blue trajectory). The system will converge to the stable limit cycle as . If however, the intervention causes the state to be set such that it is outside the unstable limit cycle, e.g. (orange trajectory), the system may be unstable and change boundlessly. The main point here is that depending on how the system reacts to an intervention, it may or may not lead to dynamics that are relevant to unperturbed dynamics . It is more likely however that smaller interventions are closer to the original dynamics. As such, our secondary aim is to test whether there is a relation between intervention strength and dynamic importance. Recall that an intervention in Ising model can be conceptualized as an external node which inputs energy on a node which does not interact with the rest of the system. We test two different levels of external intervention: Underwhelming and Overwhelming. The underwhelming condition corresponds to a low amount of injected energy (), whereas the overwhelming intervention corresponds to injecting infinite amount of energy (). The latter effectively pins the state of the node in Ising models, whereas the former slightly biases the node states.
2.4 Structural methods
Network analysis has traditionally resulted in analyzing the structure of the graph. A fundamental concept within network science is centrality, and how to measure the centrality of nodes has become an essential part of understanding networked systems such as social networks, the internet, biological networks, traffic and ecological networks. At its core, a centrality measure quantifies the ‘importance’ of a node based on some structural property. It allows to rank nodes based on a realvalued function.
There is, however, a longstanding debate concerning what centrality metrics actually measure for networked systems [Bringmann2018, Borgatti2005, Borgatti2006, Sikic2013]. From a graph theoretical perspective most centrality measures, e.g. betweenness, closeness, eigenvector and degree centrality, essentially classify the ‘walk structure’ of a network [Borgatti2005, Borgatti2006]. A walk from node to node is a sequence of adjacent nodes that begins with and ends with . The structure of walks can be divided along different criteria. For example a trail is a walk in which no edge (i.e. adjacent pair of nodes) is repeated. In contrast, a path is a trail in which no node is visited more than once. Similarly, one could define a walk structure by only using the shortest path from one node to another, or by using random movements between nodes (random walks). The graph theory interpretation can also be interpreted in a complex system framework. Namely, the centrality metrics implicitly assume dynamics on the graph. Betweenness centrality for example, computes centrality based on how often a node acts as a bridge along the shortest path between two other nodes. If one assumes that the network has dynamics where information between nodes follows shortest path, this metric may be a valid description to use and identify dynamically important nodes.
In the best case, a centrality metric is fully predictive for identifying important nodes a complex system. Consequently, the centrality metric can be used to understand the system. However, an issue with the use of centrality metrics is determining which centrality metric to use. Consider for example fig. 4 and fig. S8; different centrality metrics can identify different nodes as most central. This has lead to the common observation that some centrality measures can ‘get it wrong’ when the aim is to predict dynamical important structure in networked systems. Additionally, the ranking produced through some centrality metric does not quantify interrank differences. This potentially leads to underestimation of nodal influence when used in dynamic context [Sikic2013].
We will show how centrality measures have no meaningful prediction power of the most causal node in nodes dictated by the Gibbs measure. We are aware that centrality measures do not embody the full extent of what structural methods embody, or what network science in particular has to offer. However, many structural methods share the common characteristics listed above, i.e. they quantify the walk structure of a graph. For our analysis, we used the weighted variants of degree centrality, betweenness centrality, information centrality, and eigenvector centrality. What follows is a brief description of commonly used centrality metrics.
Degree centrality
Degree centrality is the bestknown measure of all the centrality measures. It is often thought that degree centrality is indicative for the dynamic importance of a node. This intuition is based on the concept of flow: the more connection a node has, the more interaction potential that node has and therefore the more important a node must be. Freeman defined centrality measure as the count of the number of edges incident upon a given node [Freeman1979]:
(11) 
where is the row/column of node in the adjacency matrix of the network. Please note that the entries are weighted and not binary.
Betweenness centrality
Betweenness centrality quantifies the number of times a node acts as a bridge along the shortest path between two other nodes. It was introduced as a measure for quantifying the control of communication among humans in social networks by Freeman [Freeman1979]. Nodes that have a high probability to occur on a randomly chosen shortest path between two randomly chosen vertices have a high betweenness. Formally, this can be written as:
(12) 
where represents the number of shortest paths between node and , and is the subset that goes through node . We use the normalized version of betweenness that divides the betweenness score by the number of pairs of vertices (not including node );
(13) 
Information centrality
Information centrality, also known as current flow betweenness centrality and random walk betweenness, was developed by Ulrik Brandes and Daniel Fleischer [Brandes2005, Stephenson1989]. The metric is similar to other centrality measures such as betweenness and closeness in the sense that it assumes some sort of flow process on the network structure. Rather than information spreading along it shortest path (e.g. as is the case with closeness and betweenness), it is assumed that ‘information spreads efficiently like an electrical current’ [Brandes2005]. Information centrality, thus, implicitly models how current would flow through a network and is defined for node as:
(14) 
with , and represents the ‘current’ through node .
Eigenvector centrality
Eigenvector centrality is the most difficult centrality measure to give an intuitive feeling for. Where is the adjacency matrix of the system, eigenvector centrality of node is defined as:
(15) 
For any square matrix of rank , the matrix will have at most eigenvectoreigenvalues pairs. A common choice for eigenvector centrality is motivated by The PerronFrobenius theorem, and involves choosing the eigenvector with the largest eigenvalue [Debye1918, Frobenius1912]. This has the desired property that if is irreducible, or equivalently if the graph is strongly connected, that the eigenvector is both unique and positive.
The sign and size of the eigenvalue are important for the relation between the value and importance of a node. In linear differential equations negative eigenvalues correspond to nonoscillatory exponentially stable solutions. In contrast in difference equations it indicates an oscillatory behavior. Geometrically speaking, negative eigenvector embodies a linear transformation across some axis.
Intuitively speaking, eigenvector centrality quantifies the influence of a node in the network. It assigns relatives scores to all nodes in the network based on the concept that connections to highscoring nodes contribute more to the score of the node in question than equal connections to lowscoring nodes. A high eigenvector score implies that the node is connected to many other nodes that themselves have high scores. Google PageRank and Katz centrality are variants of eigenvector centrality [Langville2005]. A node with high eigenvector centrality is not necessarily a node that has many connection (incoming or outgoing). For example a node may have a high eigenvector centrality if it has few connections but those connections are connected to nodes that are of high importance.
2.5 Data
The data originates from the Changing Lives of Older Couples (CLOC) and compared depressive symptomology assessed via 11item Center for Epidemologic Studies Depression Scale (CESD) among those who lost their partner (N=241) with stillmarried control group (N=274) [Fried2015]. Each of the CESD items were binarized with the aid of a causal search algorithm using Ising model developed by [VanBorkulo2014] and represented as a node with weighted connections (fig. 4). For more info on the procedure see [VanBorkulo2014, Epskamp2017, Fried2015]. The 11 CESD items are (abbreviated names used in the remainder of this text in brackets): ‘I felt depressed’ (depr), ‘I felt that everything I did was an effort’ (effort), ‘My sleep was restless’ (sleep), ‘I was happy’ (happy)’, ‘I felt lonely’ (lonely), ‘People were unfriendly’ (unfr), ‘I enjoyed life’ (enjoy), ‘My appetite was poor’ (appet), ‘I felt sad’ (sad), ‘I felt that people disliked me’ (dislike), and ‘I could not get going’ (getgo).
2.6 Numerical methods
Magnetization matching
In the kinetic Ising model the temperature parameter embodies the external noise. The phase change from congruent to incongruent behavior is not only caused by this external noise (fig. 1). In addition to the temperature, the connections between nodes also determine the exact shape of the phase change. Consequently, the absolute magnetization is representative of both the external noise induced by temperature as well as the internode noise interactions. As such, we matched the noise level in the system based on absolute mean magnetization to get an estimate of the accuracy of information impact on causal impact. Specifically, the temperature was matched with fraction of the maximum magnetization by means of regression, see fig. 1. A sigmoid kernel was used to estimate the required temperature for the given magnetization levels listed above.
For unitary weight kinetic Ising models the maximum magnetization changes from full magnetization at and decays sigmoidally to zero as (fig. 1) [Glauber1963]. Simulations were performed for the Ising model between the range with a resolution of for .
Base procedure
For each temperature, independent Markov chains are run for simulation steps with steps (fig. 5). Every simulation step follows Glauber dynamics:
From this set, the distribution over states was constructed. For each of the unique states, MonteCarlo methods are used to construct the conditional using repeats for timesteps.
Interventions
Interventions are performed two different strengths (underwhelming and overwhelming) independently. Each intervention is applied for time steps each to enhance the difference between and . After this period the intervention is released for the remaining (fig. 5). The duration of decay will be proportional to the dynamic impact of each node. This entire procedure was repeated for times.
Software
A general toolbox was developed for analyzing any discrete systems using information impact, e.g. SusceptibleInfectedRecovered [Matsuda1994], Random Boolean networks [Harvey1997]. The core engine is written in Cython 0.28.5 with Python 3.7.2 and offers C/C++ level performance
2.7 Data preprocessing
Area under the curve estimation
The mutual information over time and KLdivergence over time were rescaled in the range (0,1) per trial set. This transformation does not affect the relative ordering nodal decay curves as the sample data is multiplied by a scalar. A double exponential, was fit to estimate the causal and information impact (Eq. (5) and (10)) using least squares regression.
The kernel showed to be a good fit in general as indicated by the low fit error (fig. S1).
Sampling bias correction
Empirical estimates for mutual information are inherently contaminated due to sampling bias. In order to correct for this, PanzeriTreves correction was applied [Panzeri2007]. This methods offers a good performance in terms of signal to noise and computational complexity.
Outlier rejection
Estimating the area under the curve is affected by some estimation noise either due to the numerical methods or sampling bias. Outlier rejection was performed using a Minimum Covariance Determinant estimator on the causal impact and information impact estimates [Rousseeuw1999]. The procedure involves computing Mahalanobolis distance to nodal estimates and rejecting anything above standard deviations above the mean (fig. S2). The values of the outliers were set to mean of the ingroup set of the covariance estimates.
On average 15.9% of the data was rejected (fig. S3).
2.8 Classification analysis
Mathematical formulation
Each of the independent variables (information impact and the centrality measures) are considered to be predictors. For each predictor the maximum value was extracted. For example in each each trial the node corresponding to highest information impact was extracted. A random forest (RNF) classifier was trained on each of the intervention magnitudes to get an estimate of which predictor offers the best predictions in determining the causal impact [Breiman1984]. The dependent variable was binarized into congruency (correct / false) for each of the predictors. The data can thus be described as:
(16) 
where each datapair consists of vectors consisting of the regression predictors at trial , i.e. the output of the centrality metrics and information impact. Note that the output vector is a binary vector where is the number of regressors. The binary vector represents whether the ground truth was congruent or incongruent with the output provided by the different regressors.
A decision tree is built by recursively partitioning the space such that the samples with the same labels are grouped together. Let the data at node be represented by . For each candidate split consisting of a feature and threshold , partition the data into and subsets:
(17) 
with impurity at being computed by impurity function :
(18) 
is chosen such that it minimizes the impurity, i.e. :
(19) 
This procedure is recursively repeated for subsets and until the maximum allowed depth is reached, or .
Let
for target of node representing a region with samples. Classification is achieved using:
Conversely, misclassification is defined as with is the training data in node . For the random forest classifier of such trees were built and the estimator was decided based on majority voting.
Crossvalidation
The RNF classifier was provided by the python package scikitlearn and fitted separately for the different intervention levels (underwhelming, overwhelming). The classifier has a number of hyper parameters that can be controlled with varying impact on feature bias, and prediction performance. Probst et al. recently showed how in general the effect of hyper parameter selection is small compared to other machine learning methods[Probst2018]. Therefore the default parameters were used.
The aim here is twofold; (a) we aim to find the feature that is most important for predicting causal impact, i.e. explanatory modeling, and (b) the feature that will lead to the best prediction of causal impact, i.e. predictive modeling [Shmueli2003]. Since the hyper parameters settings are deemed to be stable regardless of the exact configuration, the goal here is to quantify the bias in feature selection. To this end leaveoneout crossvalidation was used to compute the accuracy score on features (information impact, weighted degree centrality, closeness centrality, information centrality, and eigenvector centrality).
RNF classifiers are prone to bias when features are lacking in variance [Strobl2007]. Since the structural methods will provide the same estimate over trials, sensitivity analysis was performed to evaluate the reliability of the feature importance. This was achieved by shuffling the th feature for every fold in the cross validation. A feature with large predictive power will be associated with a large reduction in accuracy scores. In contrast, if a feature has no impact on the prediction accuracy, it will yield a zero difference in between the nonshuffled score. The reduction was quantified as a deltas score indicating the loss in percentage due to the shuffling procedure compared to the nonshuffled accuracy score. For each of the folds, a shuffled score was computed for each feature.
The default settings were used for the RandomForestClassifier (scikitlearn=0.20.2) except for n_estimators which was set to 100. N_estimators are the number of trees to fit, generally speaking more trees will improve the accuracy of the classifier with the disadvantage of run time increase.
Statistical analysis
The hypothesis was tested that (a) the accuracy score was significantly different from random choice, and (b) the feature importance significantly differs from uniform distribution. The first test was evaluated using a binomial test using trials. The second test was evaluated with a goodnessoffit of the feature importance with a uniform distribution. Additional post hoc test are applied to test whether information impact differed significantly differed from the aggregated feature importance of the centrality measures. An alphavalue of was maintained unless specified otherwise, and pvalues are Bonferroni corrected for the tests.
3 Results
The RNF classifier yielded high accuracy scores for both underwhelming (78.73 %) as well as overwhelming interventions (100%, fig. 6). The binomial tests revealed the classifier to be significantly better in predicting the causal impact than random choice in both underwhelming intervention () and overwhelming intervention (, ). Importantly, feature importance scores indicate that information impact was the only feature that significantly contributed to the accuracy scores. The analysis revealed that the observed feature importance scores were significantly different from a uniform distribution (LABEL:table:chi_scores) in both overwhelming (, ) and overwhelming interventions (, ). In addition, the delta scores showed that the shuffle procedure only significantly affected the accuracy score when information impact vector was shuffled (34 %). In contrast, the other structural features yielded no effect (0%) on the obtained prediction accuracy. The perfect prediction accuracy for overwhelming interventions is misleading however. For overwhelming interventions, nodes with highest information impact never matched with the highest causal impact (see observed column in LABEL:table:chi_scores). Therefore, the RNF classifier was 100% correct in predicting false. The combined results lead to the conclusions that information impact is predictive for low intervention size. In contrast, none of the structural methods showed any predictive power (fig. S7 and S6). The threeway relation between causal impact, information impact and centrality measures are depicted in fig. 10.
For closed systems, it was hypothesized that mutual information over time would be highest for the drivernode in systems without confounders. The results imply that this hypothesis was wrong. From fig. 8, however, it can be observed that the lack of performance is not due to erroneous prediction based on false drivernode detection, rather the lack of performance is due to lack of resolution. Namely, from the time plots in fig. 9 three crucial observations can be made. First, causal influence over time has similar decay curves as delayed Shannon mutual information. In fact, the causal impact varies linearly with information impact (figs. 11 and 8). No linear relation is found for overwhelming interventions (figs. S7 and S6). Second, there is an interaction between magnetization ratio and the largest drivernode. For (low noise), ‘sleep’ is the most causal node, whereas for higher noise levels ‘sad’ is the most causal node. Importantly, the switch in drivernode is mirrored by Shannon mutual information. Third, as the magnetization ratio decreases, the curves tend to collapse. Namely, the distance between the largest drivernode and the second largest drivernode decreases significantly as the magnetization ratio decreases (fig. 7). Consequently, it was the reduction in resolution that caused the classifier to lack 100% accuracy. It was not due to the false assertion that the drivernode did not have the highest information impact score. Since the definition of the drivernode targets the max value only, the prediction accuracy could increase by adjusting the metric to include for cases where causal influences is similar.
Information impact varies linearly with low intervention strength
Unexpectedly the results indicated that information impact shared a linear relation for low intervention strength. As such a post hoc multiple linear regression was performed to quantify the main effects between regressors and low intervention strength causal impact (fig. 11). Information impact had a high regression coefficient (, , ). In contrast, no centrality metric had a significant regression coefficient (table A7). No tests were performed on high intervention strength due to its nonlinear dependency. The linear relation is striking as it implies that information impact could be leveraged to provide a direct estimate for dynamic importance for all nodes in the network. Although the claim could certainty be made for this particular system, no generalization claims can be made for all network structures due to the aforementioned confounding of mutual information.
Previous research showed how nodes with high degree had a diminishing role in scalefree networks [Quax2013]. The dynamic importance as measured by information diffusion time showed a positive skew with degree centrality. Similar to a hillclimb, the dynamic importance gradually increased as the degree increased. The nodes with the most dynamic importance were nodes with intermediate degree centrality. For high degree the dynamic importance diminishes. The degree distribution of the network used here would fall within the linear phase of the results in [Quax]. However, we don’t observe a linear relation between degree and information impact. The difference may be due to the difference in metric, i.e. time based versus integral based, difference based on network structure, and/or network size.
Future studies should determine whether the linear relation between causal impact and information impact holds for larger network structure of arbitrary size governed by Gibbs dynamics.
4 Discussion
4.1 Centrality measures fail to identify the drivernode
The structure of a network is crucial for the observed behavior of a system. This has lead to the implicit assumption that the connectedness of a nodes can be related to its dynamic importance [Pequito2017, Liu2016a, Liu2011a, Lu2016, Cornelius2013, Zanudo2017]. The results show that this assumption does not always hold for complex systems. In order to understand the system behavior, a metric needs to account for a wide range of behavior that the system exhibits. For example, fig. 9 shows how the drivernode changes as a function of noise. For low noise levels ‘sleep’ is the drivernode. However, as the noise levels increase a set of nodes obtain similar causal influence. Namely, ‘depr’, ‘lonely’, and ‘sad’ have similar causal influence. By definition the structural methods do not have the ability to change their predictions for the drivernode based on a change in dynamics only, i.e. they always produce the same prediction regardless of the behavior of the system. Consequently, this raises the the question whether of applying centrality metrics to complex systems is appropriate in general.
In particular, the network used in this study stems from bereavement scores in elderly people showing depressive symptoms (see [Fried2015] and appendix G). The symptom ‘appet‘ has a relatively high betweenness score. Recall that betweenness centrality ranks nodes based on how often a nodes acts as bridge between other nodes. In other words, a high betweenness score indicates that the node can be used to quickly travel between distance nodes of the network. However, ‘appet’, is not associated with a high causal impact score in general. As as consequence, centrality metrics provide inappropriate interpretations if they are assumed to provide information regarding dynamic importance when no assumption on the dynamics governing the system are known or can be assumed. The concerns regarding the validity of the use structural metrics in psychological networks were recently highlighted by Bringmann et al. [Bringmann2018]. They argue that from the onset, it is not clear what centrality metrics actually measure in networks of psychological symptoms. The issues raised by Bringmann et al. were on an conceptual level without actual data. The results from this study combined with the theoretical results from [Quax2013] strengthen their casus with quantative data on realworld and artificial networks.
4.2 Information impact: an excellent predictor for unperturbed dynamics
One of the primary aims of science is to provide causal explanations of natural phenomena. As a consequence, a scientist’s goal is to understand what causes the observed systemic behavior. Interventions allow for a controlled approach to determine cause from effect and are essential to the scientific method [Woodward2005, Woodward2014, Woodward2015, Pearl2000, Pearl2018]. Overwhelming interventions are ‘drastic’ changes to a system similar to knocking out a gene, or replacing a systemic signal altogether [Pequito2017, Liu2016a, Liu2011a, Lu2016, Cornelius2013, Zanudo2017, Izhikevich2007, Rabinovich2006, Rabinovich2006a, Rabinovich2008]. They are often preferred as they maximize the experimental effect, and by extension maximize the information gained from the experiment. The effect of intervention size can be clearly seen in fig. 9. Namely, large causal interventions generated a different ordering in the causal structure compared to low intervention. Importantly, the drivernode changed as well. This raises the question whether the system after overwhelming intervention is still similar the unperturbed system. If the goal is to provide causal explanations for the unperturbed system, overwhelming interventions are inappropriate as it induces artificial system dynamics that do not in occur in the unperturbed system.
Notably, information impact mirrored the drivernode identified by low causal influence only. Consequently, information impact can be used as a tool to provide insights into what drives systemic behavior. Additionally, as information impact is computed based on observations only, it can be used in situations where direction intervention is difficult, e.g. in systems that work on timescales that succeed human life or are physically hard to observe.
4.3 Future directions
Generalization to other types of dynamics and graph structures
The results show that for Gibbsian dynamics in a weighted realworld networks structural methods are not predictive for causal influence. Although Gibbsian dynamics represent a general class of dynamics, future studies should investigate if this holds for other types of dynamics such as epidemic, biochemical, regulatory or population. The advantage of the proposed information theoretical method is that it allows for direct comparison of different types of dynamics under the condition that it can be expressed in terms of probabilities.
Additionally, future should investigate the interaction of network structure and dynamics governing the nodes similar to [Harush2017]. Preliminary results using different graph structure show that information impact generalizes well to other graph structure (see appendix F). Namely, information impact correctly identifies the drivernode for for low causal interventions but not for high causal intervention. However, no comparisons were made using different dynamics on the same graph structure. Future studies should investigate how information impact performs as a function of dynamics and graph structure.
Detecting transient dynamical structures
Information impact was originally derived from a timebased measure using delayed mutual information [Quax2013]. The area under the curve removes time in favor of a singular value for comparison. Complex dynamical systems can behave at different temporal time scales. An interesting direction would be to determine in systems with varying temporal timescales how information impact can be used to detect transient systemic behavior in systems by shifting integral range. For example in Ising models there is a common observation that nodes with highdegree (hubs) can flip sporadically over time. This has large scale effects on systemic properties such as mean magnetization. Quax et al. postulated that this flip is caused by bottomup interactions where nodes with lower degree flip, causing a chain reaction that moves as a ripple through the network eventually causing a node with higher degree to flip [Quax2016]. The exact nature and conditions under which such a flip occurs, may provide insights in riot dynamics, swinging the popular vote in elections, or how damage in DNA can cause cellular failure.
Unraveling a larger causal structure
As mentioned mutual information values can be inflated for all other nodes other than the drivernode. However, given large enough network size, it is highly likely of two nodes having equal causal influence (fig. 8). As such not all the information impact values will be confounded. Equal causal influence of nodes may be resolved by computing the mutual information of nodes directly. For example consider two nodes and with equal causal influence and corresponding similar information impact. By computing one can conclude whether and are separate drivernodes. Namely, if it can be concluded that and are in fact separate drivernodes. This raises the question how much of the causal structure can be extracted using observed data and information impact.
5 Conclusion
The goal of this paper was to show that structural methods provide unreliable estimates of the drivernode in complex dynamical systems. The results from this study undeniably show that the common assumption of topologically central or wellconnected nodes being dynamically most important is actually false. Furthermore, it implies that we cannot abstract away the dynamics of a complex dynamic system before analyzing it. The proposed novel metric, information impact, was able to reliably identify the drivernode for natural dynamics in complex systems, and enables scientists to climb to the ladder of causation [Pearl2018].
Appendix
Appendix A Dataprocessing inequality
The dataprocessing inequality can be used to show how no clever manipulation of the data can improve the infrerences made from that data.
Definition 1
Random variables are said to form a Markov chain if the conditional distribution of depends only on and is conditionally independent of . Specifically, form a Markov chain if the joint probablity can be written as:
(20) 
Theorem 1
(Dataprocessing inequality) If , then .
Proof: By the chain rule, the mutual information can be expanded in two different ways:
(21) 
Since and are conditionally independent given , we have . Conversely, if , this would give
(22) 
Thus we only have equality if and only if for Markov chains. Similarly, one can prove that .
Corollary 1
If
Proof: forms a Markov chain .
This result implies that no function can can increase the information about .
Corollary 2
If , then
From Eq. (21) it is noted that due to the definition of independence of the Markov chain and . Therefore:
(23) 
The dependence of and is decreased or remains unchanged by the observation of a “downstream” random variable . The observant reader may recognize that when the set does not form a Markov chain. To illustrate, let be independent fair binary random variables with . Then , but bit.
Appendix B Data correction and fit errors
Appendix C Mutual information and causality
Under certain conditions causal influence of nodes reduces to mutual information. In this section we will show when that is the case.
In a Markov system each node is updated as
where represents the inputs or parents of node . Hence we have
The causal influence from can be defined as
with representing the KullbackLeibler divergence (KLdivergence).
Theorem 2
when and have no common neighbors.
Proof: For any Markov system the Markov condition holds, i.e. and . Therefore we can write:
(24) 
Appendix D Mutual information and time symmetry
The methods applied in the main text imply that the metric can be used in a symmetric manner. For partical purposes, mutual information was performed in a ‘forward‘ manner. Namely, the system state was simulated for positive from some . For undirected graphs there is a symmetry with regard to where information flows. Information is not bounded by any directionality of edges (fig. 4(b)).
It is important to emphasize that this (generally) is not the case for directed graphs. If information is constricted to flow in one direction, the mutual direction of time simulation is crucial. Additionally, directed graphs show that the metric can be applied for different purposed. This can be seen in fig. 4(a), where forward simulations gives ‘information sincs’ and backward simulation provides ‘information sources’. Information impact in directed graphs will provide information about what nodes receives the most information over time. In contrast, simulating backwards shows what nodes has most impact on the instantaneous state of the system. The different properties of this finding shall be the subject of further studies.
4(b) shows that for undirected graphs there is no difference between node importance before or after ; information flows both directions.
Appendix E Statistical result tables
Dep. Variable:  Underwhelming  Rsquared:  0.937 
Model:  OLS  Adj. Rsquared:  0.936 
Method:  Least Squares  Fstatistic:  2119. 
Prob (Fstatistic):  0.00  
LogLikelihood:  1084.0  
No. Observations:  720  AIC:  2180. 
Df Residuals:  714  BIC:  2207. 
Df Model:  5 
coef  std err  t  Pt  [0.025  0.975]  

intercept  5.4800  0.041  134.280  0.000  5.400  5.560 
4.1846  0.044  95.871  0.000  4.099  4.270  
0.1061  0.091  1.165  0.244  0.073  0.285  
bet  0.1246  0.068  1.824  0.069  0.010  0.259 
ic  0.0473  0.071  0.664  0.507  0.187  0.093 
ev  0.0060  0.093  0.065  0.948  0.177  0.189 
Omnibus:  88.651  DurbinWatson:  2.080 
Prob(Omnibus):  0.000  JarqueBera (JB):  728.890 
Skew:  0.156  Prob(JB):  5.29e159 
Kurtosis:  7.919  Cond. No.  5.26 
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Supplementary figures
Appendix F Krackhardt kite graph
As proof of principle, the simulations were repeated for an unitary weight undirected graph, namely the Krackhardt kite graph (fig. S8). In practice there is a high correlation among centrality measures (fig. S10). This is in part due to the fact that centrality measures tend to share assumptions [Borgatti2005, Borgatti2006]. The kite graph is a famous network layout not only because of its layout, but also because different centrality measures rank different nodes as number one (fig. S8).
For unitary weighted graphs equilibrium occurs naturally in Ising models due to the tendency of the nodes to align their states with their nearest neighbors for low enough temperatures(1). As the snapshots are calculated based on Markov chains initiated from zero (see section H and 2.6), this will yield inflated decay rates for hubs as a function of time. This can be understood from the ferromagnetic behavior of Ising models; for low enough temperatures the node states tend to align with their nearest neighbors (fig. S11). As such hubs will have be relatively frozen over time, and therefore yield the slowest decay rates. However, shortterm causal interaction won’t reflect these hubs to be causally important for short timescale interactions. Hubs can be considered as stabilization factors such as the earth orbiting around the sun where shortterm interactions can be seen as the temperature differences between night and day. As such only sampled one side of the magnetization curve to align the causal curves with their actual decay rates (see section H for more information). For undirected and unitary weights graphs this method has no effect on estimating the true causal importance, rather it can be seen as a necessary step to measure shortterm causal interactions.
The results show that information impact remains a good predictor for the most causal node with moderately levels of noise (fig. S9). For high noise, the decay curves collapse for both causal impact and information impact (fig. S12). This is to be expected as the nodes will behave more randomly with respect to their neighbors. Compared to the psychonetwork, the accuracy score for low intervention was higher (, see LABEL:table:kite_rnf_scores). Other graphs have also been studied and similar effects have been found, i.e. in all cases information impacts was a reliable, solid predictor for causal impact for underwhelming nudges but not for overwhelming nudges. Future studies will need to quantify this exact behavior.
f.1 Figures kite graph
Tables kite graph
Dep. Variable:  Underwhelming  Rsquared:  0.948 
Model:  OLS  Adj. Rsquared:  0.947 
Method:  Least Squares  Fstatistic:  1062. 
Date:  Mon, 25 Mar 2019  Prob (Fstatistic):  8.60e186 
Time:  15:10:16  LogLikelihood:  16.422 
No. Observations:  300  AIC:  20.84 
Df Residuals:  294  BIC:  1.378 
Df Model:  5 
coef  std err  t  Pt  [0.025  0.975]  

intercept  5.109e16  0.013  3.82e14  1.000  0.026  0.026 
0.9448  0.017  54.125  0.000  0.910  0.979  
0.3633  0.184  1.973  0.049  0.001  0.726  
0.1972  0.091  2.177  0.030  0.375  0.019  
0.4311  0.183  2.361  0.019  0.072  0.790  
0.8008  0.345  2.321  0.021  1.480  0.122 
Omnibus:  25.336  DurbinWatson:  1.722 
Prob(Omnibus):  0.000  JarqueBera (JB):  35.049 
Skew:  0.593  Prob(JB):  2.45e08 
Kurtosis:  4.181  Cond. No.  59.8 
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Appendix G Validation with Fried et al. [Fried2015]
From the results, the most causal node ‘sleep’ was correctly identified by information impact for low noise. As the noise in the system increases, nodes that were first not causally relevant began to drive the system, specifically, ‘depr’, ‘lonely‘ and ‘sad’. In the original study, the bereavement score was most affected by ‘lonely’, and showed weak negative associations with ‘happy‘ and ‘effort’ (fig. S18 adopted from [Fried2015]). Consequently, it seems that medium to high thermal noise is most congruent with the original study. Fried et al. postulated that ‘lonely’ was the gateway from which information spreads through the network, i.e. bereavement was embodied mainly by ‘loneliness’ which then percolated its effect to the other symptoms. Since the nature of the data was crosssectional, the comparison with the results from this study relies on the assumption that binary dynamics are representative of the absence and presence of psychological symptoms. If correct, the results from this study give a causal perspective on the associative results from [Fried2015]. The results from this study postulate that ‘depr’, ‘lonely’ and ‘sad’ have similar causal effect for moderate to high thermal noise.
It is important to emphasize that a quantification is given in terms of absolute effect size and not directed effects. This means that nudging for instance ‘sleep’ has some effect on the psychosymptom network, in what direction that effect is, or whether it has a positive or negative effect on the bereavement score / cognitive load of the patient is not clear, and should be the subject of future studies.
As a final note, the field of psychometrics is concerned with relating the how observables (e.g. behavior, responses on questionnaires, etc) relate to theoretical cognitive constructs such as intelligence or mental disorders. A common approach in understanding high level phenomena such as depression is to use a latent variable model, i.e. assuming some high abstract feature to be the cause of the observables (or vice versa). Only recently has this paradigm shifted from a latent variable model to a network based approach [Waldorp2011, Epskamp2018, Borsboom2011]. Marsman et al. recently reconciled these two approached by showing statistical equivalence between the Ising model and canonically used latent variable models in psychometrics [Marsman2018]. The two approaches thus highlight different aspects in theory building; measurement invariance and correlation structure may be interesting from a common cause approach but not from a network perspective which is more interested in dynamical aspects of the system. Both approaches, however, aid in highlighting different aspects of the psychological constructs.
Appendix H Code Manual
Accompanying this paper, I developed a general framework for analyzing discrete systems using information impact. The code is written in python 3.7.2 and uses cython 0.28.2 for c/c++ level performance. The code is freely available on github.com/cvanelteren/information_impact includes the latest build instructions. What follows here is a brief overview of the framework.
Design philosophy and overview
From the onset the core idea was to develop a toolbox which would enable scientists to easily adapt to code to include their own models for specific needs. As such, python’s objectiveoriented tools were capitalized. The toolbox can be divided in three main components:

Information toolbox. Includes MonteCarlo methods for constructing and and computes the delayed Shannon mutual information. The functions utilizes concurrent threading to capitalize on multicore systems.

Models. The modules module contains a ‘Model’ archetype on which all userdefined models must be based. As an example some models are provided, i.e. fastIsing.pyx was used in generating the results from this study.

Utils. In Utils various functions can be found that are used in extracting and loading data, statistical analysis, and visualization.
For the enduser most interesting parts is the Models submodule. Users can provide their own models with limited base requirements to run and extract information impact temporal dynamics.
An additional design consideration was to enable excellent performance speed with a requirement on multicore utilization. This was achieved by leveraging lowlevel threading support using openmp through cython. Typed memory views are used to maintain a fast high level interface for easily defining matrices and vectors.
run.py can be either used to analyze Ising models, or adapted to fit the models need specified by the user. In order to reproduce the graphs in this paper, analyze.py can be run.
fastIsing.pyx
The Ising model class takes as input a graph structure, temperature and unique agent states the nodes can assume. The graph structure takes as input any graph object provided by the package networkx. The class internally converts the graph structure into an efficient adjacency list to (a) prevent memory load with a dense matrix, and (b) capitalize on the amortized lookup speeds of of c++’s unordered maps.
The function self.sampleNodes samples the node indices that are used in self.updateState. In general the Models archetype (parent class) allows for four different sampling methods:

Serial: samples nodes from the sorted node indices similar to how a CRTtv scan lines work.

Single. Each simulation step a single node is sampled at random and considered for flipping.

Async: each simulation steps single updates are performed with mutation interaction.

Sync: the system state is frozen and each node is update according to this frozen state
For larger systems we recommend using the async option for updating as this will reduce the amount of data collected with the added benefit of increasing signal to noise, i.e. reducing the correlations experienced in single updates forcing one to sample for longer.
Additionally, the model offers the user to specify whether it wants to only measure one side of the magnetization, i.e. use outofequilibrium short timescale dynamics. The parameter magSide can be set to:

’pos’, for only positive side of magnetization;

’neg’, for only negative side of the magnetization;

”, equilibrium dynamics are assumed.
In undirected, unitary weight network structures contstraining setting the magnetization side does not affect the stability of the system, but it is important to consider for weighted graphs. The magSide options calculated the average magnetization at every update step and flips the entire state if the for magSide is ’neg’ and vice versa for ’pos’.
Additionally, the cextension class has wrappers for its internal defined clevel functions. This enables the class to be used as any normal python class when testing.
Example
For the latest example go to github.com/cvanelteren/information_impact to find various jupyter notebooks on different facets of the software.