Low Algorithmic Complexity Entropy-deceiving Graphs

Low Algorithmic Complexity Entropy-deceiving Graphs

Hector Zenil hector.zenil@algorithmicnaturelab.org Information Dynamics Lab, Unit of Computational Medicine, Department of Medicine Solna, Center for Molecular Medicine, SciLifeLab, Karolinska Institute, Stockholm, Sweden Department of Computer Science, University of Oxford, U.K. Algorithmic Nature Group, LABoRES, Paris, France http://algorithmicnature.org/    Narsis A. Kiani Information Dynamics Lab, Unit of Computational Medicine, Department of Medicine Solna, Center for Molecular Medicine, SciLifeLab, Karolinska Institute, Stockholm, Sweden Algorithmic Nature Group, LABoRES, Paris, France    Jesper Tegnér Biological and Environmental Sciences and Engineering Division, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia Unit of Computational Medicine, Department of Medicine Solna, Center for Molecular Medicine, SciLifeLab, Karolinska Institute, Stockholm, Sweden
July 15, 2019
Abstract

In estimating the complexity of objects, in particular of graphs, it is common practice to rely on graph- and information-theoretic measures. Here, using integer sequences with properties such as Borel normality, we explain how these measures are not independent of the way in which an object, such as a graph, can be described or observed. From observations that can reconstruct the same graph and are therefore essentially translations of the same description, we will see that when applying a computable measure such as Shannon Entropy, not only is it necessary to pre-select a feature of interest where there is one, and to make an arbitrary selection where there is not, but also that more general properties, such as the causal likelihood of a graph as a measure (opposed to randomness), can be largely misrepresented by computable measures such as Entropy and Entropy rate. We introduce recursive and non-recursive (uncomputable) graphs and graph constructions based on these integer sequences, whose different lossless descriptions have disparate Entropy values, thereby enabling the study and exploration of a measure’s range of applications and demonstrating the weaknesses of computable measures of complexity.

graph algorithmic complexity \sepShannon Entropy \sepBorel normality \sepKolmogorov-Chaitin complexity \sepalgorithmic randomness \sepgraph algorithmic probability
thanks: First two authors contributed equally

I The use of Shannon Entropy in network profiling

One of the major challenges in modern physics is to provide proper and suitable representations of network systems for use in fields ranging from physics boccaletti () to chemistry chen2014entropy (). A common problem is the description of order parameters with which to characterize the ‘complexity of a network’. Graph complexity has traditionally been characterized using graph-theoretic measures such as degree distribution, clustering coefficient, edge density, and community or modular structure.

More recently, networks have also been characterized using classical information theory. One problem in this area is the interdependence of many graph-theoretic properties, which makes measures more sophisticated than single-property measurements orsini () difficult to come by. The standard way to address this is to generate graphs that have a certain specific property while being random in all other aspects, in order to check whether or not the property in question is typical among an ensemble of graphs with otherwise seemingly different properties.

Approaches using measures based upon Shannon Entropy’s claim to quantify the information content of a network bianconi2007entropy () as an indication of its ‘typicality’ are based on an assumption of associated ensembles provided by the Entropy evaluation: the more random the more typical. The claim is that one can construct a “null model” that captures some aspects of a network (e.g. graphs that have the same degree distribution) and see how different the network is to the null model as regards particular features, such as clustering coefficient, graph distance, or other features of interest. The procedure aims at producing an intuition of an ensemble of graphs that are assumed to have been sampled uniformly at random from the set of all graphs with the same property to determine if such a property occurs with high or low probability. If the graph is not significantly different, statistically, from the null model, then the graph is claimed to be as “simple” as the null model; otherwise, the measure is claimed to be a lower bound on the “complexity” of the graph as an indication of its random versus causal nature.

Here we highlight some serious limitations of these approaches that are often neglected, and provide pointers to approaches that are complementary to Shannon Entropy, in order to partially circumvent some of the aforesaid limitations by combining it with a measure of local algorithmic complexity that better captures the recursive and thus causal properties of an object–in particular a network–beyond statistical properties.

One of the most popular applications of Entropy is to graph degree distribution, as first suggested and introduced by korner1988random (). Similar approaches have been adopted in areas such as chemical graph theory and computational systems biology dehmer2008entropy () as functions of layered graph degree distribution under certain layered coarse-graining operations (sphere covers), leading to the hierarchical application of Entropy, a version of graph traversal Entropy rate. In chemistry, for example, Shannon Entropy over a function of degree sequence has been used as a profiling tool to characterize–so it is claimed–molecular complexity.

While the application of Entropy to graph degree distributions has been relatively more common, the same Entropy has also been applied to other graph features, such as functions of their adjacency matrices estrada2014walk (), and to distance and Laplacian matrices Dehmer3 ().

Even more recently, Shannon Entropy on adjacency matrices was used to attempt the discovery of CRISPR regions in an interesting transformation of DNA sequences into graphs sengupta2016application (). A survey contrasting adjacency matrix based (walk) entropies and other entropies (e.g. on degree sequence) is offered in estrada2014walk (). It finds that adjacency based ones are more robust vis-a-vis graph size and are correlated to graph algebraic properties, as these are also based on the adjacency matrix (e.g. graph spectrum).

Finally, hybrid measures have been used, such as the graph heterogeneity index Estrada2 () as a function of degree sequence, and the Laplacian matrix, where some of the limitations of quantifying only the diversity of the degree distribution, i.e. its Entropy (or of any graph measure as a function of the Entropy of the degree distribution), have been identified.

It is thus of the greatest interest to researchers in physics, chemistry and biology to understand the reach, limits and interplay of measures of entropy, in particular as applied to networks. Likewise to understand how unserviceable for extracting causal content–as opposed to randomness–the use of entropy as a measure of randomness, complexity or information content can be. The use of entropy has, however, been extended, because its numerical calculation is computationally very cheap as compared to richer, but more difficult to approximate universal measures of complexity which are better qualified to capture more general properties of graphs. Some of these properties to be captured are related to the nature of the graph-generating mechanisms, which were what most of the previously utilized measures were supposed to quantify in the first place, in one way or another, from the introduction of the first random graph model by Erdös and Rényi erdos1960evolution () to the most popular models such as ‘scale-freeness’ barabasi (), and more recent ones such as network randomness typicality bianconi2007entropy ().

Ii Notation and basic definitions

Definition II.1.

A graph is an ordered pair comprising a set of nodes or vertices and a set of edges or links, which are 2-element subsets of .

Definition II.2.

A graph is labelled when the vertices are distinguished by labels , with the cardinality of the set .

Definition II.3.

Graphs and are isomorphic if there is a bijection between the vertex sets of and , such that any two vertices and are adjacent in if and only if and are adjacent in .

Definition II.4.

The degree of a node , denoted by , is the number of (both incoming and outgoing) links to other nodes, and is the unordered list of all .

Definition II.5.

An E-R graph is a graph of size constructed by connecting nodes randomly with probability independent of every other edge.

Usually E-R graphs are assumed to be non-recursive (i.e. truly random), but E-R graphs can be constructed recursively using pseudo-random generating algorithms.

ii.1 Graph Entropy

One of the main objectives behind the application of Shannon Entropy is the characterization of the randomness or ‘information content’ of an object such as a graph. Here we introduce graphs with interesting deceptive properties, particularly disparate Entropy (rate) values for the same object when looked at from different perspectives, revealing the inadequacy of classical information-theoretic approaches to graph complexity.

Central to information theory is the concept of Shannon’s information Entropy, which quantifies the average number of bits needed to store or communicate the statistical description of an object.

For an ensemble , where is the set of possible outcomes (the random variable), and is the probability of an outcome in . The Shannon Entropy of is then given by

Definition II.6.
(1)

Which implies that to calculate one has to know or assume the mass distribution probability of ensemble . One caveat regarding Shannon’s Entropy is that one is forced to make an arbitrary choice regarding granularity. Take for example the bit string 01010101010101. The Shannon Entropy of the string at the level of single bits is maximal, as there are the same number of 1s and 0s, but the string is clearly regular when 2-bit (non-overlapping) blocks are taken as basic units, in which instance the string has minimal complexity because it contains only 1 symbol (01) from among 4 possible ones (00,01,10,11). A generalization consists in taking into consideration all possible “granularities” or the Entropy rate:

Definition II.7.

Let with denote the joint probability over blocks of consecutive symbols. Let the Shannon Entropy rate shannon () (also known as granular Entropy, -gram Entropy) of a block of consecutive symbols–denoted by –be:

(2)

Thus to determine the Entropy rate of the sequence, we estimate the limit when . It is not hard to see, however, that will diverge as tends to infinity if the number of symbols increases, but if applied to a binary string , it will reach a minimum for the granularity in which a statistical regularity is revealed.

The Shannon Entropy shannon () of an object is simply for fixed block size , so we can drop the subscript.

We can define the Shannon Entropy of a graph , with respect to , by:

Definition II.8.
(3)

where is a probability distribution of , is a feature of interest of , e.g. edge density, degree sequence, number of over-represented subgraphs/graphlets (graph motifs), and so on. When is the uniform distribution (every graph of the same size is equally likely), it is usually omitted as a parameter of .

The most common applications of Entropy to graphs are to degree sequence distribution and edge density (adjacency matrix), which are labelled graph invariants. In molecular biology, for example, a common application of Entropy is to count the number of ‘branchings’ mowshowitz () per node by, e.g., randomly traversing a graph starting from a random point. The more extensive the branching, the greater the uncertainty of a graph’s path being traversed in a unique fashion, and the higher the Entropy. Thorough surveys of graph Entropy are available in mowshowitz (); simonyi (); mowshowitz2 (), so we will avoid providing yet another one. In most, if not all of these applications of Entropy, very little attention is paid to the fact that Entropy can lead to completely disparate results depending on the ways in which the same objects of study are described, that is, to the fact that Entropy is not a graph invariant–either for labelled or unlabelled graphs–vis-á-vis object description, a major drawback for a complexity measure zenildata (); zenilbdm () of typicality, randomness, and causality. In the survey mowshowitz (), it is suggested that there is no ‘right’ definition of Entropy. Here we formally confirm this to be the case in a fundamental sense.

Indeed, Entropy requires a pre-selection of a graph invariant, but it is itself not a graph-invariant. This is because ignorance of the probability distribution makes Entropy necessarily dependent on graph invariant description, there being no such thing as an Invariance theorem solomonoff (); kolmogorov (); chaitin () in Shannon Entropy to provide a convergence of values independent of description language as there is in algorithmic information theory for algorithmic (Kolmogorov-Chaitin) complexity.

Definition II.9.

The algorithmic complexity of an object is the length of its shortest computational description (computer program) in a reference language (of which it is independent), such that the shortest generating computer program fully reconstructs  solomonoff (); kolmogorov (); chaitin (); levin ().

Iii Construction of Entropy-deceiving graphs

If we can show that we can artificially fool Entropy we will show how Entropy may fail to characterize natural or socially occurring networks. Especially because, as we will demonstrate, different values of Shannon Entropy can be retrieved for the same graph as functions of different features of interest of said graph, thereby showing that there is no such thing as the ‘Shannon Entropy of a graph’ but rather ‘the Shannon Entropy of an identified property of a graph’, which can easily be replaced by a function that simply quantifies such a property directly.

iii.1 Entropy of pseudo-random graphs

By using integer sequences, in particular Borel-normal irrational numbers,

one can construct pseudo-random graphs, which can in turn be used to construct networks.

Definition III.1.

A real number is said to be normal if all -tuplets of ’s digital expansion are equally likely, thereby of natural maximal -order Entropy rate by definition of Borel normality.

For example, the mathematical constant is believed to be an absolute Borel normal number (Borel normal in every base), and so one can take the digits of in any base and take digits as the entries for a graph adjacency matrix of size by taking consecutive segments of digits . The resulting graph will have nodes and an edge density 0.5 because the occurrence of 1 or 0 in in binary has the probability 0.5 (the same as in decimals after transformation of digits to 0 if digit and 1 otherwise, or and 1 otherwise in general for any base ), thus complying with the definition of an Erdös-Rényi (E-R) graph (albeit of high density).





Figure 1: Histograms of degree distributions of networks using 10 000 (left) and (right) digits of in base 2 (A) and in base 10 (B) undirected and with no self-loops. C: A graph based on the 64 calculated bits of a partially computable Chaitin number calude (). It appears to have some structure but any regularity will eventually vanish as it is a Martin-Löf algorithmic random number martinlof ().

As theoretically predicted and numerically demonstrated in Fig. 1(A and B), the degree distribution will approximate a normal distribution around . This means that the graph adjacency matrix will have maximal Entropy (if is Borel normal) but low degree-sequence Entropy because all values are around and they do not span all the possible node degrees (in particular, low degrees). This means that algorithmically constructing a graph can give rise to an object with a different Entropy when the feature of interest of the said graph is changed.

A graph does not have to be of low algorithmic complexity to yield incompatible observer-dependent Entropy values. One can take the digits of an Chaitin number (the halting probabilities of optimal Turing machines with prefix-free domains), some of the digits of which are uncomputable. But in Fig. 1(C) we show a graph based on the first 64 digits of an Chaitin number calude (), thus a highest-algorithmic-complexity graph in the long run (it is ultimately uncomputable). Since randomness implies normality martinlof (), the adjacency matrix has maximal Entropy, but for the same reasons as obtain in the case of the graphs, it will have low degree-sequence Entropy. For algorithmic complexity, in contrast, as we will see in Theorem III.6, all graphs have the same algorithmic complexity regardless of their (lossless) descriptions (e.g. adjacency matrix or degree sequence), as long as the same and only the same graph (up to an isomorphism) can be reconstructed from their descriptions.

Figure 2: A regular antelope graph (left) and an Erdös-Rényi (E-R) graph (right) with the same number of edges and nodes, therefore the same adjacency matrix dimension and exactly the same edge density can have very different properties. Specifically, one can be recursively (algorithmically) generated while the other is random looking. One would wish to capture this essential difference.

One can also start from completely different graphs. For example, Fig. 2 shows how Shannon Entropy is applied directly to the adjacency matrix as a function of edge density, with the same Entropy values retrieved despite their very different (dis)organization.

The Entropy rate will be low for the regular antelope graph, and higher, but still far removed from randomness for the E-R, because by definition the degree-sequence variation of an E-R graph is small. However, in scale-free graphs degree distribution is artificially scaled, spanning a large number of different degrees as a function of number of connected edges per added node, and resulting in an over-estimation of their degree-sequence Entropy, as can be numerically verified in Fig. 3. Degree-sequence Entropy points in the opposite direction to the entropic estimation of the same graphs arrived at by looking at their adjacency matrices, when in reality, scale-free networks produced by, e.g., Barabasi-Albert’s preferential attachment algorithm barabasi (), are recursive (algorithmic and deterministic, even if probabilities are involved), as opposed to the E-R construction built (pseudo-)randomly. The Entropy of the degree-sequence of scale-free graphs would suggest that they are almost as, or even more random than E-R graphs for exactly the same edge densities. To circumvent this, ad-hoc measures of modularity have been introduced sole (), to precisely capture how removed a graph is from ‘scale-freeness’ by comparing any graph to a scale-free randomized version of itself, and thereby compelling consideration of a pre-selected feature of interest (‘scale-freeness’).

Furthermore, an E-R graph can be recursively (algorithmically) generated or not, and so its Shannon Entropy has no connection to the causal, algorithmic information content of the graph, and can only provide clues for low Entropy graphs that can be characterized by other graph-theoretic properties, without need of an entropic characterization.


Figure 3: Box plot of Entropy values applied to the degree-sequence distribution of 10 scale-free (B-A) and 10 E-R graphs with nodes and the same parameters. Results may mislead as to the generative quality of each group of graphs, suggesting that B-A are as or more random than E-R graphs, despite their recursive (causal/algorithmic and deterministic) nature, whereas in fact this should make B-A networks more random than E-R graphs. Here the E-R graphs have exactly the same edge density as the B-A graphs for 4 and 5 preferential attached edges per node. This plot illustrates how, for all purposes, Entropy can be easily fooled and cannot tell apart higher causal content from apparent randomness. One can always update the ensemble distribution to accommodate special cases but only after gaining knowledge by other methods.

iii.2 A low complexity and high Entropy graph

We introduce a method to build a family of recursive graphs with maximal Entropy but low algorithmic complexity, hence graphs that appear statistically random but are, however, of low algorithmic randomness and thus causally (recursively) generated. Moreover, these graphs may have maximal Entropy for some lossless descriptions but minimal Entropy for other lossless descriptions of exactly the same objects, with both descriptions characterizing the same object and only that object, thereby demonstrating how Entropy fails at unequivocally and unambiguously characterizing a graph independent of a particular feature of interest. We denote by ‘ZK’ the graph (unequivocally) constructed as follows:

  1. Let be a starting graph connecting a node with label 1 to a node with label 2. If a node with label has degree , we call it a core node; otherwise, we call it a supportive node.

  2. Iteratively add a node to such that the number of core nodes in is maximized. The resulting graph is typified by the one in Fig. 4.



Figure 4: . Tree-like (a) and radial representation (b) of the same ZK graph with maximal Entropy degree sequence by construction, starting from iteration 2 and proceeding to 8, adding a node at a time.

iii.3 Properties of the ZK graph

The degree sequence of the labelled nodes is the Champernowne constant champernowne () in base 10, a transcendental real whose decimal expansion is Borel normal borel (), constructed by concatenating representations of successive integers

whose digits are the labelled node degrees of for iterations (sequence A033307 in the OEIS).

The sequence of edges is a recurrence relation built upon previous iteration values between core and supportive nodes, defined by:

where is the golden ratio and the floor function (sequence A183136 in the OEIS) whose values are 1, 2, 4, 7, 10, 14, 18, 23, 29, 35, 42, 50, 58, 67, 76, 86, 97, 108, 120, 132, 145,

Figure 5: Basic node and link growth properties and corresponding fitted (polynomial) lines. The relation between node and link growth determines the edge density, which at the limit is 0.
Figure 6: Graph theoretic and dynamic properties of the recursive ‘ZK’ graph. Despite the trivial construction of the recursive network, it displays all sorts of interesting convergent and divergent non-trivial graph-theoretic, dynamic and complexity properties. For example, the clustering coefficient of the undirected graph asymptotically converges to 0.65 and some properties grow or decrease linearly while others do so polynomially. Entropy of different graph descriptions (even for fully accurate descriptions, and not because of a lack of information from the observer point of view) diverge and become trivially dependent on other simple functions (e.g. edge density or degree sequence normality). In contrast, methods based on algorithmic probability (c.f. V) assign lower complexity to the graph than both Entropy and lossless compression algorithms (e.g. Compress, depicted here) that are based on Entropy rate (word repetition). While useful for quantifying specific features of the graph that may appear interesting, no graph-theoretic or entropic measure can account for the low (algorithmic) randomness and therefore (high) causal content of the network.
Figure 7: ZK randomness and information content according to lossless compression Entropy and a technique, other than compression, that uses the concept of algorithmic probability to approximate algorithmic complexity kolmo2d (); zenilgraph (); zenilmethodsbiology (). This means that randomness characterizations by algorithmic complexity are robust, as they are independent of object description, and are therefore, in an essential way, parameter-free, meaning that there is no need for pre-selection or arbitrary selection of features of interest for proper graph profiling.
Definition III.2.

is a graph with at least one node with degree where .

has been used where we want to emphasize the number of generation- or time-steps in the process of constructing . The symbol denotes the maximum degree of the graph. Nodes in the ZK graph belong to 2 types: core and supportive nodes.

Definition III.3.

Node is a core node iff such that . Otherwise it is a supportive node.

Theorem III.1.

To convert to , we need to add 2 supportive nodes to if is odd or one supportive node if is even.

Proof.

By induction:

The basis: has 3 core nodes denoted by and 2 supportive nodes denoted by . As described in the construction procedure, to convert to , we choose a supportive node with maximum degree. Here, since we have only nodes, their degree is one. So we need to connect to 3 other supportive nodes. As we have only one left, we need to add 2 supportive nodes. Now, has 3 supportive nodes, 2 of them new, , and one old, . The old one is of degree 2, and we need to convert it to 5; we have 2 other supportive nodes left, so we need a new supportive node . Therefore, the assumption is true for and (the basis).

Inductive step: Now, if we assume that it is true for , then it is true for .

We consider 2 cases:

  1. is odd

  2. is even

Case one: If is odd then is even, which means we have added one supportive node with degree one, and to convert to we need to have a core node with degree . The maximum degree of a supportive node is , and we have only one supportive node which is not connected to the core candidate node, which implies that the core candidate node will be , and we would need to add 2 extra supportive nodes to our graph.

Case two: If is even then is odd, and therefore has 2 supportive nodes with degree one (they have only been connected to the last core nodes). So we would need to add only one node to convert the supportive node with maximum degree to a core node with degree . ∎

Corollary III.1.
  1. if is odd then

  2. is even

Theorem III.2.

there is a maximum of 3 nodes with degree in .

Proof.

By induction:

The basis: The assumption is true for .

Inductive step: If we assume have there is a maximum of 3 nodes with degree then , then there is a maximum of 3 nodes with degree .

The proof is direct using theorem III.1. To generate , we add a maximum of 2 supportive nodes. These nodes have degree one and there is no node with degree one except the first core node (core node with degree 1). Thus we have a maximum of 3 nodes with degree one. The degree of all other supportive nodes will be increased by one, which, based on the hypothesis of induction, has not been repeated more than 3 times. ∎

Theorem III.3.

ZK is of maximal degree-sequence Entropy.

Proof.

The degree sequence of the ZK graph can be divided into 2 parts:

  1. A dominating degree subsequence associated with the core nodes (always longer than subsequence 2 of supporting nodes) generated by the infinite series:

    that produces the Champernowne constant , which is Borel normal borel (); champernowne ().

  1. A second degree sequence associated with the supportive nodes, whose digits do not repeat more than 3 times, and therefore, by Theorem III.2, has a maximal -order Entropy rate for and a high Entropy rate for .

Therefore, the degree sequence of ZK is asymptotically of maximal Entropy rate. ∎

Theorem III.4.

The ZK graph is of low algorithmic (Kolmogorov-Chaitin) complexity.

Proof.

By demonstration: The computer generated program of the ZK graph written in the Wolfram Language, is:

AddEdges[graph_] :=
 EdgeAdd[graph,
  Rule@@@Distribute[{Max[VertexDegree[graph]] + 1,
     Table[i, {i, (Max[VertexDegree[graph]] +
         2), (Max[VertexDegree[graph]] +
          1) + (Max[VertexDegree[graph]] + 1) -
        VertexDegree[graph, Max[VertexDegree[graph]] + 1]}]}, List]]

The graph can be constructed recursively for any number of nodes by nesting the AddEdges[] function as follows:

Nest[AddEdges, Graph[{1 -> 2}, n]

starting from the graph defined by as initial condition.

The length of NestList with AddEdges and the initial condition in bytes is the algorithmic complexity of ZK, which grows by only and is therefore of low algorithmic randomness. ∎

We now show that we can fully reconstruct ZK from the degree sequence. As we know that we can also reconstruct ZK from its adjacency matrix (denoted by ), we therefore have it that both are lossless descriptions from which ZK can be fully reconstructed and for which Entropy provides contradictory values depending on the feature of interest.

Theorem III.5.

, all instances of are isomorphic.

Proof.

The only degree of freedom in the graph reconstruction is the selection of a supportive node to convert to a core node when there are several supportive nodes of maximal degree. As has been proven in Theorem III.1, the number of nodes which are added to a graph is independent of the supportive nodes selected for conversion to a core node. In any instance of a graph the number of nodes and edges are equal, and it is clear that by mapping the selected node in each step in any instance of a graph to the selected node in the corresponding step in another instance we get , such that is a bijection (both one-one and superimposed one on the other). ∎

Finally, we prove that all isomorphic graphs have about the same (e.g. low) algorithmic complexity:

Theorem III.6.

Let be an isomorphic graph of . Then for all , where is the automorphism group of .

Proof.

The idea is that if there is a significantly shorter program for generating compared to a program generating , we can use to generate via and a relatively short program that tries, e.g., all permutations, and checks for isomorphism. Let’s assume that there exists a program such that , i.e. the difference is not bounded by any constant, and that . We can replace by to generate such that , where is a constant independent of that represents the size of the shortest program that generates , given any . Then we have it that , which is contrary to the assumption. ∎

The number of Borel-normal numbers that can be used as the degree sequence of a graph is determined by the necessary and sufficient conditions in kim (); kim2 () and is numerable infinite.

iii.4 Degree-sequence targeted Entropy-deceiving graph construction

Taking advantage of the correlation between 2 variables , (starting independently) with the same probability distribution, let be a matrix with rows normalized to 1. Consider the random variables , which satisfy

The correlation between and is just the inner product between the two rows of . This can be used to generate a degree distribution of a graph with any particular Entropy, provided the resulting degree sequence complies with or is completed according to the necessary and sufficient conditions for building a graph kim (); kim2 ().

Iv Graph Entropy versus Graph Algorithmic Complexity

The ensemble of the graphs compatible with the ZK graph for the Entropy of its degree distribution consists thus of the set of networks that have near-maximal degree sequence, as the sequence distribution is uninformative (nearly every degree appears only once) and thus does not reduce statistical uncertainty, despite the algorithmic nature of the ZK graph (and assuming one does not know that the graph is deterministically generated, a reasonable assumption of ignorance characteristic of the general observer in a typical, realistic case). The size of the ensemble is thereby close to , the number of permutations of the elements of the degree distribution of the ZK graph, constrained by the number of sequences that can actually construct a graph kim (); kim2 (). This means that, without loss of generality, any Entropy-based measure (in this case applied to the degree sequence) will be misleading, assigning high randomness after a large ensemble of equally high Entropy values when it is in fact a simple recursive graph, and thereby illustrating the limits of classical information theory for graph profiling.

iv.1 Algorithmic complexity invariance vis-á-vis full object description

While this paper does not focus on alternatives to graph Entropy, alternative and complementary directions for exploring robust (if semi-computable) approaches to graph complexity have been introduced zenilgraph (), together with numerical methods showing that one can not only robustly define the algorithmic complexity (even when semi-computable) of labelled graphs more independently of description language, but also of unlabelled graphs, as set forth in zenilmethodsbiology (), in particular:

Definition IV.1.

Algorithmic Complexity of unlabelled graphs: Let be a lossless description of and its automorphism group. Then,

where is the algorithmic (Kolmogorov-Chaitin) complexity of the graph as introduced in zenilgraph (); zenilmethodsbiology () (the shortest computer program that produces upon halting) and is the set of all descriptions for all graphs in , independent of (per the Invariance theorem). Which, unlike graph Entropy, is robust zenilmethodsbiology (). In zenilgraph (), it was in fact shown that the algorithmic complexity estimation of a labelled graph is a good approximation of the algorithmic complexity of the graph automorphism group (i.e. the unlabelled graph complexity), and is correlated in one direction to the automorphism group count.

iv.2 The fragility of Entropy and computable measures vis-á-vis object description

In contrast to algorithmic complexity, no computable measure of complexity can test for all (Turing) computable regularities in a dataset martinlof (). That is, there is no test that can be implemented as a Turing machine that takes the data as input and indicates whether it has a regularity upon halting (regularities such as “every 5th place is occupied by a consecutive prime number”, to mention one example among an infinite number of possibilities).

Definition IV.2.

A computable regularity is a regularity for which a test can be set as a computer program running on a specific-purpose Turing machine testing for the said regularity.

Common statistical tests, for example, are computable because they are designed to be effective, but no computable universal measure of complexity can test for every computable regularity. In other words, for every computable measure capturing a data feature intended to quantify the random content of the data, one can devise a mechanistic procedure producing that deceptively simulates the said measure for all other features.

Moreover, for every effective feature, one can devise/conceive an effective measure to test for it, but there is no computable measure able to implement a universal statistical test martinlof (). This means that for every effective (computable) property/feature of a computable object , there is a computable measure to test for in (or any object like ), but no computable measure exists to test for every feature in (and all the effectively enumerable computable objects like ).

Let be a lossless description of an object , meaning that can be reconstructed from without any loss of information. Then there is no essential distinction between and from the algorithmic point of view because , where is the length of the translation program (in bits) between and .

Theorem IV.1.

For a computable measure , such as Shannon Entropy, there is no constant or logarithmic term such that , or bounding the difference as a function of the size of .

In other words, as we have proven by exhibiting a counter-example (the ZK graph), the Shannon Entropy of an object may diverge when applied to different lossless descriptions of the same object and cannot therefore be considered a robust measure of complexity. A measure of complexity should thus look not for a single property in (any possible) object but for potentially an unbounded (and potentially unidentified) number of possible properties in any object .

A sound characterization of a complexity measure can thus be established as a function that captures strictly more information about (any) than any (computable) function. All computable functions are thus not good candidates for universal measures of complexity as they can be replaced by a measure as a function of the property (or combination) of properties of interest and nothing else.

iv.3 Dependence on assumed distributions

An argument against the claim that Entropy yields contradictory values when used to profile randomness (even statistical randomness) is that one can change the domain of the Entropy measure in such a way as to make Entropy consistent with any possible description of a graph. For example, because we have proven that the ZK algorithm is deterministic and can only produce a single ZK graph, it follows that there is no uncertainty in the production of the object, there being only one graph for the formula. In this way, building a distribution of all formulae generating the ZK graph will always lead to Shannon Entropy for the ‘right’ description using the ‘right’ ensemble containing only the ZK formula(e).

According to the same argument the digits of the mathematical constant (to mention only the most trivial example) would have Shannon Entropy , because the digits are produced deterministically and the ‘right’ ensemble for should be that containing only formulae deterministically generating the digits .

Directly changing the ensemble on which Entropy operates for a specific object only facilitates conformity to some arbitrary Entropy value dictated by an arbitrary expectation, e.g. that for any initial segment of of length (entailing an Entropy rate of 0 as well) because is deterministic and therefore no digit is surprising at all, or alternatively, if Shannon Entropy is supposed to measure statistical randomness. Moreover, this misbehaviour has to do not with a lack of knowledge but with the lack of an invariance theorem, because is deterministically generated and hence its digits do not fundamentally reduce uncertainty. But if one assumes that the digits of are not stochastic in order to assign it a Shannon Entropy equal to zero, then one is forced to concede that even perfect statistical randomness, produced by a supposedly Borel-normal number, has, in objective terms, a Shannon Entropy (and Entropy rate) equal to zero, but the highest Shannon Entropy (and Entropy rate) from an observer perspective (as it will never be certain that the streaming digits are truly ). In other words, the asymptotic behaviour after taking into consideration the digits of approximates maximum Shannon Entropy, but itself has a Shannon Entropy of zero.

iv.4 An algorithmic Maximum Entropy Model

Following the statistical mechanics approach bianconi2007entropy (), a typical recursively generated graph such as the ZK graph would, based on its degree sequence, be characterized as being typically random from the observer perspective–because Shannon Entropy will find the graph to be statistically random and thus just as random as any member of the set of all graphs with (near) maximal degree sequence Entropy–thus giving no indication of the actual recursive nature of the ZK graph and misleading the observer.

In contrast, the type of approach introduced in zenilgraph (), based upon trying to find clues to the recursive nature of an object such as a graph, would asymptotically find the causal nature of a recursively-generating object such as the ZK graph, independent of probability distributions, even if it is more difficult to estimate.

Rectifying the approaches based on models of maximum entropy involves updating and replacing the assumption of the maximum entropy ensemble. An example illustrating how to achieve this in the context of, e.g., a Bayesian approach, has been provided in algorithmiccognition1 () and consists in replacing the uninformative prior by the uninformative algorithmic probability distribution, the so-called Universal Distribution, as introduced by Levin levin (). The general approach has already delivered some important results algorithmiccognition2 () by, e.g., quantifying the degree of human cognitive randomness that previous statistical approaches and measures such as Entropy made it impossible to quantify. Animated videos have been made available explaining applications to graph complexity (https://youtu.be/E238zKsPCgk) and to cognition in the context of random generation tasks (https://youtu.be/E-YjBE5qm7c). A tool has also been placed online (http://complexitycalculator.com/) for sequences and arrays, and thus the reader can experiment with an actual numerical tool and explore the differences between the statistical and the algorithmic approaches.

V Conclusions

The methods introduced here allow the construction of ‘Borel-normal pseudo-random graphs’, uncomputable number-based graphs and algorithmically produced graphs, while illustrating the shortcomings of computable graph-theoretic and Entropy approaches to graph complexity beyond random feature selection, and their failure when it comes to profiling randomness and hence causal-content (as opposed to randomness).

We have shown that Entropy is highly observer-dependent even in the face of full accuracy and access to lossless object descriptions and thus has to be complemented by measures of algorithmic content. We have produced specific complexity-deceiving graphs for which Entropy retrieves disparate values when an object is described differently (thus with different underlying distributions), even when the descriptions reconstruct exactly the same, and only the same, object. This drawback of Shannon Entropy, ultimately related to its dependence on distribution, is all the more serious because it is easily overlooked in the case of objects other than strings, for instance, graphs. For an object such as a graph, we have shown that changing the descriptions may not only change the values but actually produce divergent, contradictory values.

We constructed a graph ZK about which the following is true when it is described by its adjacency matrix : for growing graph size . Contradictorily, considering the same ZK graph degree sequence, we found that for the same growth rate , even though both and are lossless descriptions of the same graph that construct exactly the same ZK graph, and only a ZK graph.

This means that not only does one need to choose a description of interest in order to apply a definition of Entropy, such as the adjacency matrix of a network (or its incidence or Laplacian) or its degree sequence, but that as soon as the choice is made, Entropy becomes a trivial counting function of the specific feature of interest, and of that feature alone. In the case of, for example, the adjacency matrix of a network (or any related matrix associated with the graph, such as the incidence or Laplacian matrices), Entropy becomes a function of edge density, while for degree sequence, Entropy becomes a function of sequence normality. Entropy can thus trivially be replaced by such functions without any loss, but it cannot be used to profile the object (randomness, or information content) in any way independent of an arbitrary feature of interest.

These results and observations have far-reaching consequences. For example, recent literature appears contradictory, by turns suggesting that cancer cells display an increase in Entropy teschendorff (), and also reporting that cancer cells display a decrease in Entropy west (), in both cases applied to a function of degree distribution over networks of molecular interactions. Cells are also believed to be in a state of criticality between evolvability and robustness aldana (); csermely () that may make them look random though they are not. This means that Entropy may be overestimating randomness in the best case or misleading in the worst case, as we have found in the instance of disparate values for the same objects, thus suggesting that additional safeguards are needed to achieve consistency and soundness.

New developments zenilgraph (); zenilmethodsbiology () promise more robust complementary measures of (graph) complexity less dependent on object description, measures based upon the mathematical theory of randomness and algorithmic probability which are better equipped to profile causality and algorithmic information content and cover statistical randomness and thus can be considered an observer-improved generalization of Shannon Entropy.

References

  • (1) Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
  • (2) Ginestra Bianconi. The entropy of randomized network ensembles. EPL (Europhysics Letters), 81(2):28005, 2007.
  • (3) S. Boccaletti et al. The structure and dynamics of multilayer networks. Physics Reports, 544(1):1–122, 2014.
  • (4) E Borel. Les probabilités dénombrables et leurs applications arithmétiques. Rendiconti del Circolo Matematico di Palermo (1884-1940), 27(1):247–271, 1909.
  • (5) Cristian S Calude, Michael J Dinneen, Chi-Kou Shu, et al. Computing a glimpse of randomness. Experimental Mathematics, 11(3):361–370, 2002.
  • (6) Gregory J Chaitin. On the length of programs for computing finite binary sequences. Journal of the ACM (JACM), 13(4):547–569, 1966.
  • (7) David G Champernowne. The construction of decimals normal in the scale of ten. Journal of the London Mathematical Society, 1(4):254–260, 1933.
  • (8) Zengqiang Chen, Matthias Dehmer, Frank Emmert-Streib, and Yongtang Shi. Entropy bounds for dendrimers. Applied Mathematics and Computation, 242:462–472, 2014.
  • (9) Peter Csermely et al. Cancer stem cells display extremely large evolvability: alternating plastic and rigid networks as a potential mechanism. Seminars in Cancer Biology, 30:42–51, 2015.
  • (10) Matthias Dehmer, Stephan Borgert, and Frank Emmert-Streib. Entropy bounds for hierarchical molecular networks. PLoS One, 3(8):e3079, 2008.
  • (11) Matthias Dehmer and Abbe Mowshowitz. A history of graph entropy measures. Information Sciences, 181(1):57–78, 2011.
  • (12) Paul Erdos and Alfréd Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17–60, 1960.
  • (13) Ernesto Estrada. Quantifying network heterogeneity. Physical Review E, 82(6):066102, 2010.
  • (14) Ernesto Estrada, A José, and Naomichi Hatano. Walk entropies in graphs. Linear Algebra and its Applications, 443:235–244, 2014.
  • (15) Orsini et al. Quantifying randomness in real networks. Nature Communications, 6:8627, 2015.
  • (16) H. Kim, C. I. Del Genio, K. E. Bassler, and Z. Toroczkai. Degree-based graph construction. New Journal of Physics, 14:023012, 2012.
  • (17) Hyunju Kim, Zoltán Toroczkai, Péter L Erdős, István Miklós, and László A Székely. Degree-based graph construction. Journal of Physics A: Mathematical and Theoretical, 42(39):392001, 2009.
  • (18) Andrei Nikolaevich Kolmogorov. Three approaches to the quantitative definition of information*. International Journal of Computer Mathematics, 2(1-4):157–168, 1968.
  • (19) Janos Korner and Katalin Marton. Random access communication and graph entropy. IEEE transactions on information theory, 34(2):312–314, 1988.
  • (20) Leonid A Levin. Laws of information conservation (nongrowth) and aspects of the foundation of probability theory. Problemy Peredachi Informatsii, 10(3):30–35, 1974.
  • (21) Ming Li and Paul Vitányi. An introduction to Kolmogorov complexity and its applications. Springer Science & Business Media, 2009.
  • (22) Guoxiang Lu, Bingqing Li, and Lijia Wang. Some new properties for degree-based graph entropies. Entropy, 17(12):8217–8227, 2015.
  • (23) Per Martin-Löf. The definition of random sequences. Information and control, 9(6):602–619, 1966.
  • (24) Ron Milo, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. Network motifs: simple building blocks of complex networks. Science, 298(5594):824–827, 2002.
  • (25) Abbe Mowshowitz. Entropy and the complexity of graphs: I. an index of the relative complexity of a graph. The bulletin of mathematical biophysics, 30(1):175–204, 1968.
  • (26) Abbe Mowshowitz and Matthias Dehmer. Entropy and the complexity of graphs revisited. Entropy, 14(3):559–570, 2012.
  • (27) Dipendra C Sengupta and Jharna D Sengupta. Application of graph entropy in CRISPR and repeats detection in DNA sequences. Computational Molecular Bioscience, 6(03):41, 2016.
  • (28) CE Shannon. A mathematical theory of communication, Bell System technical journal 27: 379-423 and 623–656. Mathematical Reviews (MathSciNet): MR10, 133e, 1948.
  • (29) Shai S Shen-Orr, Ron Milo, Shmoolik Mangan, and Uri Alon. Network motifs in the transcriptional regulation network of Escherichia coli. Nature genetics, 31(1):64–68, 2002.
  • (30) Gábor Simonyi. Graph entropy: a survey. Combinatorial Optimization, 20:399–441, 1995.
  • (31) Ricard V Solé and Sergi Valverde. Spontaneous emergence of modularity in cellular networks. Journal of The Royal Society Interface, 5(18):129–133, 2008.
  • (32) Fernando Soler-Toscano, Hector Zenil, Jean-Paul Delahaye, and Nicolas Gauvrit. Calculating Kolmogorov complexity from the output frequency distributions of small Turing machines. PloS one, 9(5):e96223, 2014.
  • (33) Ray J Solomonoff. A formal theory of inductive inference. part i. Information and control, 7(1):1–22, 1964.
  • (34) Andrew E Teschendorff and Simone Severini. Increased entropy of signal transduction in the cancer metastasis phenotype. BMC Systems Biology, 4(104), 2010.
  • (35) C. Torres-Sosa, S. Huang, and M. Aldana. Criticality is an emergent property of genetic networks that exhibit evolvability. PLoS Comput Biol, 8(9):e1002669, 2012.
  • (36) Ernesto Trucco. A note on the information content of graphs. Bulletin of Mathematical Biology, 18(2):129–135, 1956.
  • (37) James West, Simone Severini, Ginestra Bianconi, and Andrew E. Teschendorff. Differential network entropy reveals cancer system hallmarks. Scientific Reports, 2(802), 2012.
  • (38) Hector Zenil. Small data matters, correlation versus causation and algorithmic data analytics. Predictability in the world: philosophy and science in the complex world of Big Data, Springer Verlag (forthcoming), 2013.
  • (39) Hector Zenil, Narsis A Kiani, and Jesper Tegnér. Methods of information theory and algorithmic complexity for network biology. Seminars in cell & developmental biology, 51:32–43, 2016.
  • (40) Hector Zenil, Fernando Soler-Toscano, Jean-Paul Delahaye, and Nicolas Gauvrit. Two-dimensional kolmogorov complexity and an empirical validation of the coding theorem method by compressibility. PeerJ Computer Science, 1:e23, 2015.
  • (41) Hector Zenil, Fernando Soler-Toscano, Kamaludin Dingle, and Ard A Louis. Correlation of automorphism group size and topological properties with program-size complexity evaluations of graphs and complex networks. Physica A: Statistical Mechanics and its Applications, 404:341–358, 2014.
  • (42) Hector Zenil, Francisco Soler-Toscano, Narsis A. Kiani, Santiago Hernández-Orozco, and Antonio Rueda-Toicen. A decomposition method for global evaluation of Shannon entropy and local estimations of algorithmic complexity. arXiv:1609.00110 [cs.IT], 2016.
  • (43) Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-rate coding. IEEE transactions on Information Theory, 24(5):530–536, 1978.
  • (44) M. Dehmer and A. Mowshowitz. A history of graph entropy measures. Information Sciences 181, 57–78, 2011.
  • (45) N. Gauvrit, H. Zenil, and J. Tegnér. The Information-theoretic and Algorithmic Approach toHuman, Animal and Artificial Cognition Springer Verlag (in press)
  • (46) N. Gauvrit, H. Zenil, F. Soler-Toscano, J.-P. Delahaye, and P. Bruger. Human behavioral complexity peaks at age 25. PLoS Comput Biol. 13:4, e1005408, 2017.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
244005
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description