A topological criterion for filtering information in complex brain networks
Abstract
In many biological systems, the network of interactions between the elements can only be inferred from experimental measurements. In neuroscience, noninvasive imaging tools are extensively used to derive either structural or functional brain networks invivo. As a result of the inference process, we obtain a matrix of values corresponding to an unrealistic fully connected and weighted network. To turn this into a useful sparse network, thresholding is typically adopted to cancel a percentage of the weakest connections. The structural properties of the resulting network depend on how much of the inferred connectivity is eventually retained. However, how to fix this threshold is still an open issue. We introduce a criterion, the efficiency cost optimization (ECO), to select a threshold based on the optimization of the tradeoff between the efficiency of a network and its wiring cost. We prove analytically and we confirm through numerical simulations that the connection density maximizing this tradeoff emphasizes the intrinsic properties of a given network, while preserving its sparsity. Moreover, this density threshold can be determined apriori, since the number of connections to filter only depends on the network size according to a powerlaw. We validate this result on several brain networks, from micro to macroscales, obtained with different imaging modalities. Finally, we test the potential of ECO in discriminating brain states with respect to alternative filtering methods. ECO advances our ability to analyze and compare biological networks, inferred from experimental data, in a fast and principled way.
* Institut du Cerveau et de la Moelle épiniere
47, Boulevard de l’Hopital
75013, Paris, France
Email. fabrizio.devicofallani@gmail.com
Introduction
Imaging connectomics uses neuroimaging techniques to map connections and/or interactions between different brain sites. Combined with tools from graph theory, imaging connectomics has considerably advanced our understanding of the brain structure and function from a system perspective [1].
Noninvasive neuroimaging is particularly attractive as it allows to map connectomes invivo and quantify network organizational mechanisms underlying behavior, cognition, development as well as disease [2, 3]. Magnetic resonance imaging (MRI) and electro/magnetoencephalography (E/MEG) are frequently used to derive macroscale connectomes whose nodes, or units, correspond to spatially remote brain sites [4]. Highfield MRI and genetically encoded calcium indicators are promising tools to image connectomes respectively at the scale of neuronal ensembles (mesoscale) and single neurons (microscale) [5]. Brain connectivity methods are typically used to estimate the links between the nodes. While anatomical connectivity (AC) measures the probability to find axonal pathways between brain areas, typically from diffusion MRI, functional connectivity (FC) rather calculates the temporal dependence between their neural processes as recorded, for instance, by functional MRI, EEG or MEG [1].
At this stage, the resulting connectomes correspond to maximally dense networks whose weighted links code for the strength of the connections between different brain nodes. Common courses in brain network analysis use thresholding procedures to filter information in these raw connectomes by retaining and binarizing a certain percentage of the strongest links (Supplementary Fig. LABEL:fig:SF1). Despite the consequent information loss, these procedures are often adopted to mitigate the incertainty carried by the weakest links and facilitate the interpretation of the network topology [6]. In addition, they enable to use all the graph theoretic tools, which are mainly conceived for sparse and binary networks [7]. At present, the choice of the specific value for such threshold remains arbitrary, so that scientists are obliged to explore brain network properties across a wide range of different candidates and eventually select one representative aposteriori [8]. These approaches are extremely timeconsuming for large connectomes and make difficult the comparison between many individuals or samples [9].
We introduce a criterion to select apriori an optimal threshold which captures the essential topology of a connectome while preserving its sparsity. Based on the perfect tradeoff between two desirable but incompatible features  namely high global and local integration between nodes, and low connection density  this method is inherently motivated by the principle of efficiency and economy observed in many complex systems [10], including the brain [11].
Results
Filtering information as a network optimization problem
Global and localefficiency [12] have revealed to be important graph quantities to characterize the structure of brain networks in terms of integration and segregation of information [13]. Both anatomical and functional brain networks tend to exhibit relatively high values of global and localefficiency. At the same time they also tend to minimize, for economical reasons, the number of their links leading to sparse networks [11].
Hence, we propose to determine a density threshold that filters out the weakest links and maximizes the ratio between the overall efficiency of a connectome and its wiring cost. We formally introduce a criterion to filter information in a given network by finding the optimal connection density that maximizes the quality function:
(1) 
where and represent respectively the global and localefficiency of a network (Online Method 1).
For both regular lattices and random networks, we proved analytically that the optimal density deriving from the maximization of reads , where is the network size, i.e., the number of nodes in the network (Supplementary Text). We confirmed this result (Supplementary Fig. LABEL:fig:SF2a,b) through extensive numerical simulations (Online Methods 2), showing that it held true also in more realistic network models, such as in smallworld networks [14] (Fig. 1a) and in scalefree networks [15] (Fig. 1b). Notably, the optimal density emphasized the intrinsic structural properties of all the implemented synthetic networks in terms of global and localefficiency (Fig. 1d,e and Supplementary Fig. LABEL:fig:SF2d,e).
Optimal density in connectomes derived from neuroimaging
We computed the quality function in both micro and macroscale brain networks and we evaluated how the optimal density scaled as a function of the network size. We considered connectomes used in previously published studies that were obtained with different imaging modalities, from calcium imaging to EEG, and constructed with different brain connectivity methods, from Pearsoncorrelation to Grangercausality (Table 1).
For each connectome we applied a standard densitybased thresholding. We started with the empty network by removing all the links (). Then, we reinserted and binarized one link at time, from the strongest to the weakest, until we obtained the maximally dense network (). At each step we computed and we recorded its profile as a function of (Fig. 1f). If a study presented more samples (individuals) with connectomes of the same size, we considered the groupaveraged profile to improve the quality of the estimation (Supplementary Fig. LABEL:fig:SF2c,f). The pooled density values, as returned by the maximization of , followed the same scaling relationship that we reported for synthetic networks (Fig. 1c). This result confirms that also for brain networks we can assume that the optimal density threshold depends on the network size according to the same rule .
In conclusion, we introduced a criterion, named efficiency cost optimization (ECO), to select a threshold leading to sparse, yet informative brain networks. Such a threshold does depend neither on how the connectome is constructed nor on its underlying structure, and can be therefore selected apriori.
ECO discriminated network properties of different brain states
To illustrate the methodology, we considered connectomes from four different imaging modalities, namely EEG, MEG, fMRI, and DTI (Table 1). Because we do not know the true structure for these connectomes, we evaluated the ability of ECO to discriminate network properties of different brain states, i.e., healthy versus diseased, at individual level.
We characterized brain networks by calculating graph quantities at different topological scales, i.e., large (global and localefficiency, and ), intermediate (community partition, ; and modularity, ), and small (node degree, ; and betwenness, ) (Online Method 3). To assess network differences between brain states, we measured distances between the values of the graph quantities obtained in the healthy group and those in the diseased group. We adopted the Mirkin index () to measure distances between community partitions, and the divergent coefficient () for other graph quantities (Online Method 4).
We explored a wide range of density thresholds and, as expected, the value of the threshold affected the ability to separate network properties of different brain states (Fig. 2, Supplementary Fig. LABEL:fig:SF3 and LABEL:fig:SF4a). Notably, the choice resulted among the best candidates in producing larger distances and improving discrimination. Furthermore, ECO overall outperformed alternative methods such as the minimum spanning tree (MST) and the planar maximally filtered graph (PMFG) [16] (Fig. 3, Supplementary Fig. LABEL:fig:SF4b, Supplementary Table LABEL:tab:ST2).
Discussion
We introduced ECO to filter information in imaging connectomes whose links are statistical or probabilistic measures of brain connectivity. Conventional graph approaches evaluate brain network properties across a large and arbitrary number of thresholds [17]. Eventually, they select a representative threshold aposteriori that maximizes the separation between different brain states [6]. ECO provides a theoretically grounded criterion to select an optimal threshold apriori, drastically reducing the computational burden. Other approaches, similar in purpose to ECO, impose unnatural constraints on the filtered connectome. The minimum spanning tree (MST), for instance, leads to brain networks with a null clustering coefficient [18]. The planar maximally filtered graph (PMFG) tries to alleviate this bias by allowing closed loops, but still forces planarity [16]. Conversely, ECO does not impose any constraint and lets the intrinsic network structure of a connectome to emerge.
In addition, ECO is based on fundamental principles of complex systems [14, 15]. Maximizing global and localefficiency with respect to connection density means emphasizing the integration and segregation properties of a connectome [19] while keeping a biologically plausible wiring cost. This rationale dovetails with current evidence showing that advantageous topological properties, such as economic smallworld architectures [11], tend to be maximized in brain networks, and that, in general, sparsity increases robustness of complex systems [20]. Other combinations could have been considered when conceiving the quality function . For example, in [21] authors introduced the costefficiency , which, however, did not include the clustering counterpart. That quality function, as well as other ones that we tested, did not exhibit an optimal density and was therefore not considered (Supplementary Text).
ECO makes use of density thresholds. Hence, filtered connectomes having same number of nodes, will have same number of links. On the one hand, this ensures that differences between brain network properties are not merely due to differences in the connection density [22]. On the other hand, ECO does not allow a direct evaluation of neural processes altering the number of links; yet it does inform on the consequent (re)organizational mechanisms. ECO is based on a graph topological criterion and cannot filter out possible false positives (i.e., spurious links) resulting from biased brain connectivity estimates [4, 6]. This method assumes that the weighted links of the raw connectomes had been previously statistically validated, either maintained or canceled. We conceived ECO to filter imaging connectomes with applications ranging from cognitive to clinical and computational neuroscience. Given its generality and simplicity, we anticipate that ECO will facilitate the analysis of interconnected systems where the need of sparsity is plausible and links are weighted estimates of connectivity. This is, for example, the case of functional networks in system biology, where links are derived from transcriptional or phenotypic profiling, and genetic interactions [23].
Online Methods
1. Topological properties of the quality function
The proposed quality function can be seen as a particular case of a general family of functions of the form . When combining different graph quantities, proper normalizing procedures need to be used [24]. By definition, each of the three quantities , and is normalized in the range , and and are nondecreasing functions of . However, since the efficiency is based on shortest paths between nodes [12], a concept which is not directly captured by the density, a scaling factor might be necessary to normalize changes among those quantities.
We therefore considered a more general form of than that in Eq. (1) by introducing two distinct dependencies on the connection density, i.e., and , where is a tunable control parameter. For values around , the parameter had no influence on the returned optimal density (Supplementary Text 3). Hence, for the sake of simplicity, we deliberately chose that corresponds to the original expression of the quality function in Eq. (1), i.e., .
When in Eq.(1), then both global and localefficiency are null leading to an indefinite form. As density slightly increases (, with sufficiently small) it can be demonstrated that tends to . In fact, in this range, the probability to find at least three nodes connected together (a triangle) is extremely low. By definition, in absence of at least one triangle [12] and therefore . By considering the definitions of and , this quantity can be rewritten as , where is the number of existing links and is the distance between the nodes and . In a generic network with links there are at least pairs of nodes directly connected (i.e., ). This means that the sum in the latter equation is bounded from below by in the case of isolated pairs of connected nodes () or in the trivial case of . It follows that when there are relatively few links in a network.
When tends to , it is trivial to see from equation (1) that , as both and tend to one. For intermediate density ranges () the analytical estimate of is not trivial since and depend on the network topology which is, in general, unknown.
2. Numerical simulations for smallworld and scalefree networks
Smallworld networks were generated according to the WattsStrogatz (WS) model [14] with a rewiring probability . Scalefree networks were generated according to the BarabasiAlbert (BA) model [15].
In the first simulation, we considered undirected networks. We varied both the network size and the average node degree, i.e., and . In the WS models, is even accounting for the number of both left and right neighbors of the nodes in the initial lattice. To obtain smallworld networks with odd, we first generated lattices with even and then, for each odd node (e.g., , , …), we removed the link with its left farthest neighbor. This procedure removes in total links leading to a new average node degree , while keeping a regular structure. As for BA models, we set the number of links in the preferential attachment and the initial seed was a fully connected network of nodes. This setting generated scalefree networks with , that is regardless of the selected network size. We then removed at random the exceeding number of links until we reached the desired value. This procedure had the advantage to preserve the original scalefree structure.
In the second simulation, we considered directed networks to confirm and extend the results we obtained for undirected WS and BA networks. We selected eight representative network sizes, i.e., covering the typical size of most current imaging connectomes, and we varied the connection density. Specifically, we performed a twostep procedure:

We fixed onehundred values quadratically spaced within the entire available density interval.

After having identified the optimal , we performed a refined research among onehundred new values linearly spaced between the density values, in step 1, before and after .
For WS models, initial lattices had equal to the nearest even integer equal or higher than , with . For BA models, the number of attaching links was to ensure an initial relatively high density; the seed was a fully connected network of nodes. By construction , where is total number of links in the initial seed. For both models, we then removed at random the exceeding links until we reached the desired density value. For both simulation we generated onehundred sample networks.
3. Graph analysis of brain networks
Complex networks can be analyzed by a plethora of graph quantities characterizing different topological properties [25]. Here, we considered a subset of representative ones which have been shown to be relevant for brain network analysis [26]. To characterize the entire brain network (i.e., largescale topology), we used global and localefficiency, which respectively read:
(2) 
where is the length of the shortest path between nodes and , and is the globalefficiency of the th subgraph of the network [12].
To characterize modules, or clusters, of brain regions (i.e., midscale topology), we extracted the community partition of the brain network by means of the Newman’s spectral algorithm maximizing the modularity:
(3) 
where G is the (nonsquare) matrix having elements if node belongs to cluster and zero otherwise, and M is the socalled modularity matrix [27].
To characterize individual brain areas (i.e., smallscale topology), we measured the centrality of the nodes in the brain network by means of the node degree and of the node betwenness, which respectively read:
(4) 
where the element of the adjacency matrix if there is a link between node and , zero otherwise; and where is the total number of shortest paths between nodes and , while is the number of those paths that pass through .
4. Distances between samples and statistical analysis
To assess brain network differences between individuals (or samples) in the two groups, we measured the distance between the respective values obtained for each graph quantity. We used the Mirkin index to compute distances between two network partitions and :
(5) 
where is the number of pairs of nodes that are in different clusters under but not in the same one under ; and is the number of pairs that are in the same cluster under but in different ones under [28]. For all other graph quantities, we used the divergent coefficient [29]:
(6) 
where and , contain the value(s) of the graph quantity for the th and th sample. Notably, for global, localefficiency and modularity (i.e., , , ). for the node degree vector and the node betweenness vector .
We used Kruskal–Wallis oneway analysis of variance to evaluate the overall effect of different thresholds, or filtering methods (i.e., MST, PMFG) on distances between individuals. A TukeyKramer multiple comparison test was then used to determine specific differences between pairs of thresholds or methods [30].
Acknowledgements
We thank all the colleagues who shared their databases (see Table 1) allowing to validate the proposed method. We are grateful to Sophie Achard, Vincenzo Nicosia, Jonas Richiardi, and Miguel Valencia for their useful comments and suggestions. V.L. and M.C. acknowledge support by the European Commission Project LASAGNE Grant 318132; V.L. acknowledges support from EPSRC project GALE Grant EP/K020633/1; F.D. and M.C. acknowledge support by French program ”Investissements d’avenir” ANR10IAIHU06; F.D. acknowledges support by the project NETBCI Grant ANR15NEUC000602.
Author contributions statement
F.D. conceived the study and performed data analysis; V.L. and M.C. contributed to the analytical proofs and simulations. All authors wrote and reviewed the manuscript.
Competing financial interests
The authors declare no competing financial interest.
Materials and correspondence
Specific inquiries on experimental data can be addressed to the corresponding authors listed in Table 1.
References
 1. Bullmore, E. & Sporns, O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience 10, 186–198 (2009).
 2. Park, H.J. & Friston, K. Structural and Functional Brain Networks: From Connections to Cognition. Science 342, 1238411 (2013).
 3. Stam, C. J. Modern network science of neurological disorders. Nature Reviews. Neuroscience 15, 683–695 (2014).
 4. Craddock, R. C. et al. Imaging human connectomes at the macroscale. Nature Methods 524 (2013).
 5. Alivisatos, A. P. et al. The Brain Activity Map Project and the Challenge of Functional Connectomics. Neuron 74, 970–974 (2012).
 6. De Vico Fallani, F., Richiardi, J., Chavez, M. & Achard, S. Graph analysis of functional brain networks: practical issues in translational neuroscience. Philosophical Transactions of the Royal Society B: Biological Sciences 369, 20130521 (2014).
 7. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M. & Hwang, D.U. Complex networks: Structure and dynamics. Physics Reports 424, 175–308 (2006).
 8. Garrison, K. A., Scheinost, D., Finn, E. S., Shen, X. & Constable, R. T. The (in)stability of functional brain network measures across thresholds. NeuroImage 118, 651–661 (2015).
 9. Sporns, O. Making sense of brain network data. Nature Methods 10, 491–493 (2013).
 10. Latora, V. & Marchiori, M. Economic smallworld behavior in weighted networks. The European Physical Journal B  Condensed Matter 32, 249–263 (2003).
 11. Bullmore, E. & Sporns, O. The economy of brain network organization. Nature reviews. Neuroscience 13, 336–349 (2012).
 12. Latora, V. & Marchiori, M. Efficient behavior of smallworld networks. Physical Review Letters 87, 198701/1–198701/4 (2001).
 13. Bassett, D. S. & Bullmore, E. Smallworld brain networks. The Neuroscientist 12, 512–523 (2006).
 14. Watts, D. & Strogatz, S. Collective dynamics of ’smallworld’ networks. Nature 393, 440–442 (1998).
 15. Barabási, A.L. & Albert, R. Emergence of Scaling in Random Networks. Science 286, 509–512 (1999).
 16. Tumminello, M., Aste, T., Matteo, T. D. & Mantegna, R. N. A tool for filtering information in complex systems. Proceedings of the National Academy of Sciences 102, 10421–10426 (2005).
 17. Fornito, A., Zalesky, A. & Breakspear, M. Graph analysis of the human connectome: Promise, progress, and pitfalls. NeuroImage 80, 426–444 (2013).
 18. Tewarie, P., van Dellen, E., Hillebrand, A. & Stam, C. J. The minimum spanning tree: An unbiased method for brain network analysis. NeuroImage 104, 177–188 (2015).
 19. Tononi, G., Sporns, O. & Edelman, G. A measure for brain complexity: Relating functional segregation and integration in the nervous system. Proceedings of the National Academy of Sciences 91, 5033–5037 (1994).
 20. Leclerc, R. D. Survival of the sparsest: robust gene networks are parsimonious. Molecular Systems Biology 4, 213 (2008).
 21. Achard, S., Salvador, R., Whitcher, B., Suckling, J. & Bullmore, E. A resilient, lowfrequency, smallworld human brain functional network with highly connected association cortical hubs. Journal of Neuroscience 26, 63–72 (2006).
 22. van Wijk, B. C. M., Stam, C. J. & Daffertshofer, A. Comparing Brain Networks of Different Size and Connectivity Density Using Graph Theory. PLoS ONE 5, e13701 (2010).
 23. Vidal, M., Cusick, M. & Barabási, A.L. Interactome Networks and Human Disease. Cell 144, 986–998 (2011).
 24. Souza, F. S. H., Mateus, G. R. & Cunha, A. S. d. Optimization in Designing Complex Communication Networks. In Thai, M. T. & Pardalos, P. M. (eds.) Handbook of Optimization in Complex Networks, no. 57 in Springer Optimization and Its Applications, 3–37 (Springer US, 2012).
 25. Costa, L. et al. Analyzing and modeling realworld phenomena with complex networks: A survey of applications. Advances in Physics 60, 329–412 (2011).
 26. Rubinov, M. & Sporns, O. Complex network measures of brain connectivity: Uses and interpretations. NeuroImage 52, 1059–1069 (2010).
 27. Newman, M. E. J. Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103, 8577–8582 (2006).
 28. Meilă, M. Comparing clusterings—an information based distance. Journal of Multivariate Analysis 98, 873–895 (2007).
 29. Cox, M. A. A. & Cox, T. F. Multidimensional Scaling. In Handbook of Data Visualization, Springer Handbooks Comp.Statistics, 315–347 (Springer Berlin Heidelberg, 2008).
 30. McDonald, J. Handbook of Biological Statistics (Sparky House Publishing, 2009).
 31. De Vico Fallani, F., Corazzol, M., M., Sternberg, J., Wyart, C. & Chavez, M. Hierarchy of Neural Organization in the Embryonic Spinal Cord: GrangerCausality Graph Analysis of Calcium Imaging Data. IEEE Transactions on Neural Systems and Rehabilitation Engineering PP, 1–1 (2014).
 32. Plomp, G., Quairiaux, C., Michel, C. M. & Astolfi, L. The physiological plausibility of timevarying Grangercausal modeling: Normalization and weighting by spectral power. NeuroImage 97, 206–216 (2014).
 33. Teller, S. et al. Emergence of Assortative Mixing between Clusters of Cultured Neurons. PLoS Computational Biology 10, e1003796 (2014).
 34. Niu, H. et al. TestRetest Reliability of Graph Metrics in Functional Brain Networks: A RestingState fNIRS Study. PLoS ONE 8, e72425 (2013).
 35. O’Reilly, J. X. et al. Causal effect of disconnection lesions on interhemispheric functional connectivity in rhesus monkeys. Proceedings of the National Academy of Sciences 110, 13982–13987 (2013).
 36. De Vico Fallani, F. et al. Multiscale topological properties of functional brain networks during motor imagery after stroke. NeuroImage 83, 438–449 (2013).
 37. Achard, S. et al. Hubs of brain functional networks are radically reorganized in comatose patients. Proceedings of the National Academy of Sciences 109, 20608–20613 (2012).
 38. Chavez, M., Valencia, M., Navarro, V., Latora, V. & Martinerie, J. Functional Modularity of Background Activities in Normal and Epileptic Brain Networks. Physical Review Letters 104, 118701 (2010).
 39. Besson, P. et al. Structural connectivity differences in left and right temporal lobe epilepsy. NeuroImage 100, 135–144 (2014).
 40. De Vico Fallani, F. et al. Community structure in largescale cortical networks during motor acts. Chaos, Solitons & Fractals 45, 603–610 (2012).
Figures
Tables
Imaging modality  Group(s)  Species  Samples x Group  Condition  Nodes  Connectivity method  Domain  Links 
[31] Ca^{+}  Healthy  Zebrafish  5  Spontaneuous  [9,21]  Granger causality  Time  Directed 
[32] eEEG  Healthy  Rodent  1  Evoked potential  15  Partial directed coherence  Time/Freq. (8 ms/1429 Hz)  Directed 
[33] Ca^{+}    Culture  2  Spontaneous  [19,32]  Time delay  Time  Directed 
[34] fNIRS  Healthy  Human  2  Resting state  46  Pearson’s correlation  Time  Undirected 
[35] fMRI  Healthy  Primate  3  Resting state  56  Pearson’s correlation  Time  Undirected 
[36] EEG  Healthy, Stroke  Human  20  Motor imagery  61  Imaginary coherence  Frequency (1429 Hz)  Undirected 
[37] fMRI  Healthy, Coma  Human  17  Resting state  90  Wavelet correlation  Time  Undirected 
[38] MEG  Healthy, Epilepsy  Human  5  Resting state  149  Spectral coherence  Frequency (514 Hz)  Undirected 
[39] DTI  Healthy, Epilepsy  Human  19    164  Fractional anisotropy    Undirected 
[37] fMRI  Healthy, Coma  Human  17  Resting state  417  Wavelet correlation  Time  Undirected 
[33] Ca^{+}    Culture  6  Spontaneous  [562,1107]  Time delay  Time  Directed 
[40] EEG  Healthy  Human  5  Motor execution  4094  Imaginary Coherence  Frequency (1330 Hz)  Undirected 