# Connecting Network Science and Information Theory

###### Abstract

A framework integrating information theory and network science is proposed, giving rise to a potentially new area. By incorporating and integrating concepts such as complexity, coding, topological projections and network dynamics, the proposed network-based framework paves the way not only to extending traditional information science, but also to modeling, characterizing and analyzing a broad class of real-world problems, from language communication to DNA coding. Basically, an original network is supposed to be transmitted, with or without compaction, through a sequence of symbols or time-series obtained by sampling its topology by some network dynamics, such as random walks. We show that the degree of compression is ultimately related to the ability to predict the frequency of symbols based on the topology of the original network and the adopted dynamics. The potential of the proposed approach is illustrated with respect to the efficiency of transmitting several types of topologies by using a variety of random walks. Several interesting results are obtained, including the behavior of the Barabási-Albert model oscillating between high and low performance depending on the considered dynamics, and the distinct performances obtained for two geographical models.

undefined

## I Introduction

A great deal of efforts in science and technology has been focused on the study of information theory Cover and Thomas (2006) and network science Newman (2010); Barabási (2016), two seemingly independent realms. In information theory, basically, probabilities are assigned to symbols and used to derive important results, such as minimum bandwidth and minimal sampling rates. On the other hand, in network science Newman (2010), focus is given to understanding the intricate topology of complex networks, and its relationship with various types of dynamics. Interestingly, these two different perspectives — broadly related to time series compaction and studies of topology/dynamics complexity — can be shown to ultimately be intertwined and complementary one another. For instance, information theory has been used to define causal relationships between nodes Borge-Holthoefer et al. (2016); Sun et al. (2015), characterize networks according to their compressibility Ahnert (2014), define topological similarity De Domenico and Biamonte (2016), map time-series to networks Lacasa et al. (2014), quantify the diversity of ecological networks Ulanowicz (2011), and characterize network dynamics Andjelković et al. (2015).

A systematic integration of information theory and network science so as to provide a unified scientific approach, constitutes the main purpose of the current article. The basic idea is to understand a sequence of symbols or time series as a projection of an original network, e.g. obtained by some sampling dynamics (such as random walks), transmitted, and then reconstructed with some accuracy. This basic framework is illustrated in Fig. 1.

Underlying such an approach is the hypothesis that every time series or sequence of symbols is produced by some discrete system, which can be represented as a complex network. So, these generated series and sequences inherit, to a great extent, the properties of the generating networks. The second underlying hypothesis of the present work is that interaction between such complex network systems has to proceed through communication channels, which necessarily have limited bandwidth and/or are noisy. So, it becomes important to devise and consider methods for more effective/robust transmission of the networks, such as using compression, which is the main original motivation behind information theory. At the same time, these series of symbols are a byproduct of the interaction between the topology and the dynamics unfolding in the original network. Therefore, the proposed approach transfer the design of effective communication methodologies from looking only at the time series to the network level.

The potential of such an integration is broad, as several important problems can be naturally conceptualized and represented according to the proposed framework. Possible applications include, but are not limited to: the transformation from thoughts into language, encompassing the whole of literature in the process, the creation and reception of artistic pieces such as in music, routing in transportation and computing systems, optimal data allocation in distributed computing, teaching and planning of syllabuses, economics and financial indices, the codification of proteins and genes into linear chains, and even WWW surfing and the flow of consciousness. An interesting aspect shared by all such cases is that the linearization of a higher dimensional structure (a network) derives from imposed constraints such as finite bandwidth channel, storage systems, etc. By the way, intermediate representations with dimensions higher than one (time series) are also possible and naturally incorporated in the proposed theory. In addition, the efficiency of coding by projections and respective transmission are related to the complexity of the original network, therefore emphasizing another critical issue shared by information theory and network science. Though we have so far been restricted to point-to-point communication, the proposed framework also extends naturally to larger systems involving many such pairwise interactions, such as illustrated in Fig. 2. Several real-world systems can be represented and studied by using such a framework, such as opinion spreading and evolution of scientific ideas. In such cases, each of the large nodes in Fig. 2 would correspond to an agent transmitting its beliefs, originally represented by the networks inside the nodes, through time series along a network with a particular topology.

Optimization of the transmission can be achieved by compacting the time-series. In information theory, this is typically achieved by using the frequency of symbols as a means to derive optimal code words. For instance, Huffman coding Cormen et al. (2001) provides a means to achieve lossless, optimal symbol-by-symbol coding of time series, considering the probability of each symbol. In the here proposed framework, optimization implies in accurate prediction of symbols by considering the topology of the network and the probing dynamics, instead of only the sequence of symbols. For instance, in the case of symbols taken individually from an undirected graph by using a traditional random walk, it is known that the frequency of each symbol can be perfectly predicted from the respective node degrees Costa et al. (2007). The precision of such a prediction can be expressed in terms of the Pearson correlation coefficient between the frequency of visits and degrees (or other topological property). Because of its ability to characterize how much the dynamics is affected by the topological features of the network, such a measurement is henceforth called steering coefficient of the topology over the dynamics, being henceforth represented as . All in all, the efficiency of coding given a specific network and dynamics will probably depend on the value of the steering coefficient, which can vary largely among different network topologies and dynamics.

The present work illustrates and explores the potential of the proposed framework with respect to synthetic networks, which allows the consideration of several topologies and different sampling dynamics. Through such a procedure, we can investigate interesting questions such as: (i) how does the ability to predict the frequency of symbols impact the transmission?; (ii) how do the topological features of different graph models affect the performance?; (iii) how do the considered types of dynamics compare regarding the time of exploration?. These questions are tackled by considering as objective the recovery of the original graph with a given accuracy.

## Ii Methodology

### Adopted dynamics

In order to investigate the proposed framework, we adopted four random walk dynamics and used them to generate respective sequences of symbols. The considered dynamics are: the random walk (RW) Lovász (1993), a variation in which the transition probabilities are biased toward nodes with higher degree (RWD) Bonaventura et al. (2014), another variation in which the inverse of the node degree is considered (RWID) Bonaventura et al. (2014), and the true self-avoiding random walk (TSAW) Amit et al. (1983); Kim et al. (2016).

In a traditional RW dynamics, the next node to be taken by an agent is selected uniformly among its neighboring nodes. In a degree-biased random walk, the probability that the agent goes from node to node depends on the degree of each neighboring node. Here we consider a dependence of the form

(1) |

where is the set of nodes connected to node . When we have the RWD dynamics, while results in the RWID case. An interesting property of the RW dynamics is that, on undirected networks, as the generated sequence increases, the frequencies of visits to nodes as inferred from the current number of times each node has been visited, become directly proportional to the node degree Costa et al. (2007). For the degree-biased case, the steady state probabilities of a degree-biased random walk can be written as Gómez-Gardeñes and Latora (2008)

(2) |

In case the degrees of the neighbors of node can be approximated as the average degree of the network (which happens, for instance, for narrow degree distributions), the numerator of Equation 2 can be written as , leading to

(3) |

This means that the probabilities become related to the degree taken to .

In the TSAW, the memory of the path already taken by the agent is kept and considered for determining its next step. Here, we opted to consider the frequency of edges in contrast to the frequency of nodes Kim et al. (2016). In this way, edges already visited many times by the agent are avoided. Thus, the probability that the agent moves through a certain edge at its immediate neighborhood is

(4) |

where is a parameter of the dynamics and is the frequency of visits to edge . Note that self-avoiding behaviour is achieved for . The TSAW dynamics can be expected to be usually faster to cover a network compared to the RW since unvisited connections are prioritized Kim et al. (2016). For large number of iterations, the dynamics tends to behave like a diffusion, i.e. similarly to the RW dynamics Amit et al. (1983, 1983). In the analysis we set .

### Complex network models

Six network models were used to investigate the proposed framework, namely the Erdős-Rényi (ER) ErdŐs and Rényi (1960), Barabási-Albert (BA) Barabási and Albert (1999), Watts-Strogatz (WS) Watts and Strogatz (1998), Waxman (WAX) Waxman (1988), random geometric (GEO) Dall and Christensen (2002) and Knitted (KN) Costa (2007) models. Given its nature, the latter model is used only on experiments involving directed networks. The ER model generates small-world networks having a binomial degree distribution Newman (2010), which means that all nodes in the network have similar degree. In contrast, networks generated by the BA model have a power-law degree distribution Barabási and Albert (1999), implying that a few nodes, called hubs Barabási (2016), possess large degree while most of the nodes in the network have low degree. The WS model can be used to generate networks having the small-world property while also possessing large clustering coefficient values Watts and Strogatz (1998). We adopt a variation of this model where instead of a ring one starts with a lattice network and edges are rewired with probability . We consider two rewiring probability values for the WS model: (WS1) and (WS2). In the GEO model, nodes are randomly placed, with uniform probability, in a two-dimensional space and pairs of nodes are connected if their distance is smaller than a given value. The WAX model begins with the same node placement procedure as in the GEO model, but pairs of nodes are connected according to a probability that decays exponentially with the distance between the nodes. Networks generated by the GEO and WAX models tend to have large diameter. KN networks are formed by treading paths. Initially, a set of unconnected nodes is created. Then, a sequence of distinct nodes is randomly selected and visited until a stop criterium is reached. Adjacent nodes in the sequence are connected through respective directed links. The process can be repeated many times until a desired average degree is reached. KN networks are peculiar among the considered models because it can be understood as being generated by a random walk. Such networks can be used for modeling co-occurrence networks in texts and other real-world situations involving sequential, uninterrupted visits to nodes Cohen et al. (2005); Liu and Cong (2013); Amancio (2015); Amancio et al. (2012).

The network models presented above, with the exception of KN networks, correspond to undirected networks. In this study we also considered directed networks. In order to convert the original network into a directed one, we assigned directions to the edges. First, we defined a parameter , called reciprocity, which is the probability of an edge to have both directions (i.e., be reciprocal). For each original edge, a random number in the range was generated with uniform probability. If , the original edge was split into two edges (in and out); otherwise, a single direction was randomly selected with equal probability. In addition, only the largest strongly connected component of the network was considered, so as to avoid the random walker to become trapped. So, in order to keep the original size of the network as much as possible while incorporating a considerable degree of directionality, we set .

### Huffman algorithm

In digital media, a message can be encoded as a set of organized symbols, which are stored by using a fixed number of bits. For example, texts are formed of symbols which can be represented as characters of bits. In order to store or transmit messages in an effective way, several lossless compression algorithms have been proposed in the literature Salomon and Motta (2009).

The Huffman code is a particular data compression algorithm, based on information theory Cover and Thomas (2006). This method generates a dictionary of bit sequences employed to represent each symbol in a message. The compression is achieved by associating shorter bit sequences to more frequent symbols and longer sequences to symbols that appears more rarely in the message. To do so, the Huffman algorithm uses a binary tree, whose leaves represent symbols. Starting from the root, every edge is associated to a bit. Typically, left and right children are associated to the bits 0 and 1, respectively. The symbol code associated to each leaf node is then obtained by concatenating, from root to leaves, all edge values from the root. Such structure is used as a dictionary to encode the original message. In a similar fashion, the dictionary is used to decode the message.

### Network reconstruction

As the generated time series reaches the receiver, it is used to progressively reconstruct the original network being transmitted. This can be easily achieved by starting with a disconnected set of symbols and adding each new received edge, defined by a pair of subsequent symbols in the time series, to the network. We expect the reconstruction to depend on: i) the size of the time series; ii) the considered dynamics; iii) the network topology. In the case of the considered random walks, perfect reconstruction is expected after a sufficient period of time.

## Iii Results and Discussion

The main question to be investigated experimentally regards how effectively each of the network models can be recovered from a respectively generated time series, in presence or not of compression. For every type of network, each of the considered random walks is applied in order to probe the respective topology by visiting, sequentially, the nodes. A respective time series is generated in the process from which, at each time instant, a reconstruction of the original is obtained by considering the edges and nodes already transmitted. Therefore, the efficiency of the transmission can be quantified in terms of the time series length (which is proportional to the transmission time) required for reconstruction of 90% of the original network (measured in terms of number of edges). This critical time is henceforth referred to as . The effect of compressing the time series by using the frequency of visits to nodes predicted by the respective degrees, referred to as , is also considered, giving rise to another series of experiments. Furthermore, we also calculated the long term transmission rate, defined as , where is a sufficiently large time (a total of one million symbols was used in the reported experiments). In principle, a combination of topology and dynamics that allows large compression ratio should lead to faster reconstruction.

Fig. 3 shows the parallel coordinates obtained for the undirected network models. A total of 30 simulations was performed for each model and dynamics, and 30 network realizations were used for each network model. The considered networks had approximately 1000 nodes and average degree near 8. The overall best transmission times were obtained for the TSAW random walk. The BA implied a substantially higher value of for the RWD dynamics (Fig. 3(a)). This is probably a consequence of the fact that, in this dynamics, the moving agent tends to alternate between hubs, overlooking nodes with small degree. As shown in Fig. 3(b), similar results were obtained when considering compressed times, , although in this case the values of for the RWD dynamics in the BA model are not as prominent as those obtained for . The results for the WAX and GEO models differed significantly, with the former being transmitted much more effectively. This result is surprising because both these models share a geographical nature, in the sense that nodes that are spatially close one another tend to be connected.

Also shown in Fig. 3 is the steering coefficient (). Similar values were obtained for the RW, RWD and TSAW dynamics. In the RWID case, the degree is not a good predictor of the frequency of visits, as indicated in Eq. 3, which leads to low . The WS1 and WS2 models usually led to low steering coefficient values. This probably happens because most nodes in these networks have the same degree, and therefore they also possess similar frequency of visits. The BA model always resulted in the largest steering coefficient. Such an effect is possibly a consequence of the power-law nature of the degree distribution, in which nodes with small degree tend to be scarcely visited while the opposite happens for nodes with large degree. A more in-depth discussion about the encoding efficiency is presented in Section S1 of the supplementary material, where we compare the compression achieved when estimating the symbol probabilities using the nodes degrees with those achieved when using individual messages to estimate the probabilities. Remarkably, in most cases we found that predicting the symbol probabilities from the network (instead of from the sequence of symbols) yielded better compaction.

The long term steering coefficient values () are shown in Fig. 3(d). The TSAW dynamics led to the highest values, followed closely by the RWD. A variety of behavior were observed for RWID, all of them yielding values smaller than those obtained for the other dynamics.

Fig. 3(e,f) shows the compression ratio for of network recovery, , and for long term exploration, , obtained for the several network models and dynamics. Similar results were obtained for most cases, except for the BA model in the RW, TSAW and RWD cases, which yielded better compression rates. Interestingly, the GEO and WAX, which had produced substantially different compression and transmission times in the previous experiment, implied similar compression ratios.

The experimental results for the directed cases are shown in Fig. 4. In this experiment we have the inclusion of the KN model (intrinsically directed), which always led to low reconstruction times in all cases. The better transmission obtained for the KN networks is possibly related to the process of network construction, which can be understood as a kind of random walk. This type of network has two key aspects: (i) for every node, the inward degree () is equal to the outward degree () and (ii) the reciprocity is smaller than the other considered networks. In Section S2 of the supplementary material we compare the transmission times of KN networks with those obtained for networks generated by a directed ER model (thus having reciprocity close to zero) and a configuration model Newman (2010) having the constraint . The results indicate that these two aspects are responsible for the better transmission of KN networks. Returning to Fig. 4, the results were again similar for and , with the exception of the BA model. This model has large for the RWD dynamics. Overall, the values of showed similar trends as in the undirected case. The long term steering coefficients, , resulted smaller than for the undirected cases, with exception of the BA model, which was similar to that case.

The compression ratios and obtained for the directed networks are shown in Fig. 4(e,f). These results are generally similar to those obtained for undirected networks.

## Iv Conclusions

The areas of information theory and complex networks have been developed in a mostly independent way. However, as argued in the present work, these two areas present several shared and complementary elements which, when integrated, can be used to model, characterize and analyze a broad range of important real-world problems ranging from spoken/written language to DNA sequences. A formal framework, leading to a potentially new area, has been reported, involving the transmission of an original network, by using a sampling dynamics such as random walks, which produces a sequence of symbols or time series that can be used by a receiver to reconstruct the original network. More effective transmission demands compaction of the time series which, we argue, is directly related to the topology of the original network. We also show that the critical issue of compaction is directly related to one of the central paradigms in network science, namely the relationship between topology and dynamics, more specifically regarding the ability to predict the frequency of symbols from the very topology of the original network. Interestingly, the quality of such a prediction depends on the interplay between the topology of the original network and the adopted dynamics.

Interestingly, the proposed basic framework can be directly extended to model more sophisticate systems involving several pairs of transmitter-receivers, which are themselves organized as complex networks. In addition to proposing the systematic integration between network science and information theory, we also illustrated a typical problem that can be tackled in this area, namely the efficiency of transmission of several types of networks by using different kinds of random walks. A number of interesting results has been reported. First, we confirmed that different network topologies and dynamics can lead, irrespectively of compaction, to rather distinct performances. Interestingly, the BA model exhibited a markedly distinct behavior, oscillating between the best and worst performances, depending on the probing random walk. On the other hand, the KN model, in almost all cases, led to the best performance in the case of directed networks. In addition, the two adopted geographical networks, namely WAX and GEO, despite their seemingly analogous spatial organization, yielded rather different results in the case of undirected networks. It is particularly interesting to observe that the BA model, which is topologically very complex (non-uniform degree distribution), led to the best overall compaction in most cases, except the RWID. This is because the power law degree distribution implies in asymmetric distribution of frequency of symbols, and therefore, more effective Huffman coding.

Given the generality of the proposed framework with respect to theoretical and practical aspects, the prospects for future works are particularly wide and a more complete list of possibilities would be beyond the scope of this work. Some particularly promising application venues include the modeling of opinion spreading, syllabuses planning, and language evolution. Also, it would be interesting to consider noisy transmission, as well as higher order statistical coding of symbols. Regarding network topology, it would be particularly interesting to investigate how modular structure can impact the transmission. Other types of dynamics can be also considered, especially those related to neuronal signal propagation.

## Acknowledgements

The authors acknowledge financial support from Capes-Brazil, São Paulo Research Foundation (FAPESP) (grant no. 2016/19069-9, 2015/08003-4, 2015/18942-8, 2014/20830-0 and 2011/50761-2), CNPq-Brazil (grant no. 307333/2013-2) and NAP-PRP-USP.

## References

- Cover and Thomas (2006) T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) (Wiley-Interscience, 2006).
- Newman (2010) M. Newman, Networks: An Introduction (Oxford University Press, Inc., New York, NY, USA, 2010).
- Barabási (2016) A.-L. Barabási, Network science (Cambridge University Press, 2016).
- Borge-Holthoefer et al. (2016) J. Borge-Holthoefer, N. Perra, B. Gonçalves, S. González-Bailón, A. Arenas, Y. Moreno, and A. Vespignani, Science advances 2, e1501158 (2016).
- Sun et al. (2015) J. Sun, D. Taylor, and E. M. Bollt, SIAM Journal on Applied Dynamical Systems 14, 73 (2015).
- Ahnert (2014) S. E. Ahnert, Scientific reports 4 (2014).
- De Domenico and Biamonte (2016) M. De Domenico and J. Biamonte, Physical Review X 6, 041062 (2016).
- Lacasa et al. (2014) L. Lacasa, V. Nicosia, and V. Latora, arXiv preprint arXiv:1408.0925 (2014).
- Ulanowicz (2011) R. Ulanowicz, Treatise on Estuarine and Coastal Science pp. 35–57 (2011).
- Andjelković et al. (2015) M. Andjelković, N. Gupte, and B. Tadić, Physical Review E 91, 052817 (2015).
- Cormen et al. (2001) T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson, Introduction to Algorithms (McGraw-Hill Higher Education, 2001), 2nd ed.
- Costa et al. (2007) L. F. Costa, O. Sporns, L. Antiqueira, M. G. V. Nunes, and O. N. Oliveira Jr., Applied Physics Letters 91, 054107 (2007).
- Lovász (1993) L. Lovász, Random walks on graphs: a survey, vol. 1 of Combinatorics, Paul Erdos is Eighty (Janos Bolyai Mathematical Society, Hungary, 1993).
- Bonaventura et al. (2014) M. Bonaventura, V. Nicosia, and V. Latora, Physical Review E 89, 012803 (2014).
- Amit et al. (1983) D. J. Amit, G. Parisi, and L. Peliti, Physical Review B 27, 1635 (1983).
- Kim et al. (2016) Y. Kim, S. Park, and S.-H. Yook, Physical Review E 94, 042309 (2016).
- Gómez-Gardeñes and Latora (2008) J. Gómez-Gardeñes and V. Latora, Phys. Rev. E 78, 065102 (2008).
- ErdŐs and Rényi (1960) P. ErdŐs and A. Rényi, Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17 (1960).
- Barabási and Albert (1999) A.-L. Barabási and R. Albert, Science 286, 509 (1999).
- Watts and Strogatz (1998) D. Watts and S. Strogatz, nature 393, 440 (1998).
- Waxman (1988) B. M. Waxman, Selected Areas in Communications, IEEE Journal on 6, 1617 (1988).
- Dall and Christensen (2002) J. Dall and M. Christensen, Physical Review E 66, 016121 (2002).
- Costa (2007) L. F. Costa, arXiv/abs: 0711.2736 (2007).
- Cohen et al. (2005) A. Cohen, W. Hersh, C. Dubay, and K. Spackman, BMC Bioinformatics 6, 103 (2005).
- Liu and Cong (2013) H. Liu and J. Cong, Chinese Science Bulletin 58, 1139 (2013).
- Amancio (2015) D. R. Amancio, Journal of Statistical Mechanics: Theory and Experiment 2015, P03005 (2015).
- Amancio et al. (2012) D. R. Amancio, O. N. Oliveira Jr., and L. F. Costa, EPL (Europhysics Letters) 98, 18002 (2012).
- Salomon and Motta (2009) D. Salomon and G. Motta, Handbook of Data Compression (Springer Publishing Company, Incorporated, 2009), 5th ed.

## Supplementary material

### S1. Efficiency of network encoding

As noted in the main text, the steering coefficient () indicates how well the symbol probabilities of the transmitted message can be predicted from some network topological property. A combination of network and dynamics possessing large should lead to a good prediction of the statistics of the message. In order to verify if this is indeed the case, we compared the compression ratios achieved when exploring 90% of the network () with those obtained when using one of the transmitted messages to estimate the symbol probabilities of a whole set of messages being transmitted (). The results for the undirected networks are shown in Fig. S1. It is clear that estimating the probabilities using the topology of the system always resulted in better compression ratios than the single message approach. Also, observe that the GEO model displays a large variance of values. This is caused by the large diameter of this network, which leads to substantial parts not being visited by the walker when the symbol probabilities are being estimated. This leads to low compression ratios for messages starting around nodes that were not visited during the probability estimation, while larger compression ratios are obtained for messages starting closer to the original message used for constructing the symbol dictionary.

The comparison between and obtained for directed networks is shown in Fig. S2. In this case, similar values were obtained for both quantities. This is likely due to the slower random walk exploration of edges allowed by the directed networks. This, in turn, leads to better estimation of the node probabilities, since the walker is more likely to reach nodes that would not be visited otherwise if the network were undirected.

### S2. Better exploration times of knitted networks

In order to understand why the simulations executed in Knitted networks (KN) resulted in better exploration times, we compare the results of this model with a set of ER networks having distinct reciprocity values and different restrictions regarding the in-degree () and out-degree () of each node. First, networks were created with a range of reciprocity values, which are approximately , and . The methodology used for changing the reciprocity of the networks is described in Section II of the main text. The four random walk dynamics considered in the main text were applied to the networks. The obtained exploration times ( and ), steering coefficients ( and ) and compression ratios ( and ) are shown in Fig. S3. Note that and increase with the reciprocity. Furthermore, Fig. S3 also shows the results for ER networks built so that the reciprocity is close to zero and is equal to for each node (ERE). For all the considered random walks, the exploration times and of the KN networks were mostly similar to those obtained for ERE networks. Therefore, we believe that the main characteristics leading to an efficient exploration in KN networks are the low reciprocity and the fact that, by construction, is equal to for almost all nodes.