Low Algorithmic Complexity Entropydeceiving Graphs
Abstract
In estimating the complexity of objects, in particular of graphs, it is common practice to rely on graph and informationtheoretic measures. Here, using integer sequences with properties such as Borel normality, we explain how these measures are not independent of the way in which an object, such as a graph, can be described or observed. From observations that can reconstruct the same graph and are therefore essentially translations of the same description, we will see that when applying a computable measure such as Shannon Entropy, not only is it necessary to preselect a feature of interest where there is one, and to make an arbitrary selection where there is not, but also that more general properties, such as the causal likelihood of a graph as a measure (opposed to randomness), can be largely misrepresented by computable measures such as Entropy and Entropy rate. We introduce recursive and nonrecursive (uncomputable) graphs and graph constructions based on these integer sequences, whose different lossless descriptions have disparate Entropy values, thereby enabling the study and exploration of a measure’s range of applications and demonstrating the weaknesses of computable measures of complexity.
I The use of Shannon Entropy in network profiling
One of the major challenges in modern physics is to provide proper and suitable representations of network systems for use in fields ranging from physics boccaletti () to chemistry chen2014entropy (). A common problem is the description of order parameters with which to characterize the ‘complexity of a network’. Graph complexity has traditionally been characterized using graphtheoretic measures such as degree distribution, clustering coefficient, edge density, and community or modular structure.
More recently, networks have also been characterized using classical information theory. One problem in this area is the interdependence of many graphtheoretic properties, which makes measures more sophisticated than singleproperty measurements orsini () difficult to come by. The standard way to address this is to generate graphs that have a certain specific property while being random in all other aspects, in order to check whether or not the property in question is typical among an ensemble of graphs with otherwise seemingly different properties.
Approaches using measures based upon Shannon Entropy’s claim to quantify the information content of a network bianconi2007entropy () as an indication of its ‘typicality’ are based on an assumption of associated ensembles provided by the Entropy evaluation: the more random the more typical. The claim is that one can construct a “null model” that captures some aspects of a network (e.g. graphs that have the same degree distribution) and see how different the network is to the null model as regards particular features, such as clustering coefficient, graph distance, or other features of interest. The procedure aims at producing an intuition of an ensemble of graphs that are assumed to have been sampled uniformly at random from the set of all graphs with the same property to determine if such a property occurs with high or low probability. If the graph is not significantly different, statistically, from the null model, then the graph is claimed to be as “simple” as the null model; otherwise, the measure is claimed to be a lower bound on the “complexity” of the graph as an indication of its random versus causal nature.
Here we highlight some serious limitations of these approaches that are often neglected, and provide pointers to approaches that are complementary to Shannon Entropy, in order to partially circumvent some of the aforesaid limitations by combining it with a measure of local algorithmic complexity that better captures the recursive and thus causal properties of an object–in particular a network–beyond statistical properties.
One of the most popular applications of Entropy is to graph degree distribution, as first suggested and introduced by korner1988random (). Similar approaches have been adopted in areas such as chemical graph theory and computational systems biology dehmer2008entropy () as functions of layered graph degree distribution under certain layered coarsegraining operations (sphere covers), leading to the hierarchical application of Entropy, a version of graph traversal Entropy rate. In chemistry, for example, Shannon Entropy over a function of degree sequence has been used as a profiling tool to characterize–so it is claimed–molecular complexity.
While the application of Entropy to graph degree distributions has been relatively more common, the same Entropy has also been applied to other graph features, such as functions of their adjacency matrices estrada2014walk (), and to distance and Laplacian matrices Dehmer3 ().
Even more recently, Shannon Entropy on adjacency matrices was used to attempt the discovery of CRISPR regions in an interesting transformation of DNA sequences into graphs sengupta2016application (). A survey contrasting adjacency matrix based (walk) entropies and other entropies (e.g. on degree sequence) is offered in estrada2014walk (). It finds that adjacency based ones are more robust visavis graph size and are correlated to graph algebraic properties, as these are also based on the adjacency matrix (e.g. graph spectrum).
Finally, hybrid measures have been used, such as the graph heterogeneity index Estrada2 () as a function of degree sequence, and the Laplacian matrix, where some of the limitations of quantifying only the diversity of the degree distribution, i.e. its Entropy (or of any graph measure as a function of the Entropy of the degree distribution), have been identified.
It is thus of the greatest interest to researchers in physics, chemistry and biology to understand the reach, limits and interplay of measures of entropy, in particular as applied to networks. Likewise to understand how unserviceable for extracting causal content–as opposed to randomness–the use of entropy as a measure of randomness, complexity or information content can be. The use of entropy has, however, been extended, because its numerical calculation is computationally very cheap as compared to richer, but more difficult to approximate universal measures of complexity which are better qualified to capture more general properties of graphs. Some of these properties to be captured are related to the nature of the graphgenerating mechanisms, which were what most of the previously utilized measures were supposed to quantify in the first place, in one way or another, from the introduction of the first random graph model by Erdös and Rényi erdos1960evolution () to the most popular models such as ‘scalefreeness’ barabasi (), and more recent ones such as network randomness typicality bianconi2007entropy ().
Ii Notation and basic definitions
Definition II.1.
A graph is an ordered pair comprising a set of nodes or vertices and a set of edges or links, which are 2element subsets of .
Definition II.2.
A graph is labelled when the vertices are distinguished by labels , with the cardinality of the set .
Definition II.3.
Graphs and are isomorphic if there is a bijection between the vertex sets of and , such that any two vertices and are adjacent in if and only if and are adjacent in .
Definition II.4.
The degree of a node , denoted by , is the number of (both incoming and outgoing) links to other nodes, and is the unordered list of all .
Definition II.5.
An ER graph is a graph of size constructed by connecting nodes randomly with probability independent of every other edge.
Usually ER graphs are assumed to be nonrecursive (i.e. truly random), but ER graphs can be constructed recursively using pseudorandom generating algorithms.
ii.1 Graph Entropy
One of the main objectives behind the application of Shannon Entropy is the characterization of the randomness or ‘information content’ of an object such as a graph. Here we introduce graphs with interesting deceptive properties, particularly disparate Entropy (rate) values for the same object when looked at from different perspectives, revealing the inadequacy of classical informationtheoretic approaches to graph complexity.
Central to information theory is the concept of Shannon’s information Entropy, which quantifies the average number of bits needed to store or communicate the statistical description of an object.
For an ensemble , where is the set of possible outcomes (the random variable), and is the probability of an outcome in . The Shannon Entropy of is then given by
Definition II.6.
(1) 
Which implies that to calculate one has to know or assume the mass distribution probability of ensemble . One caveat regarding Shannon’s Entropy is that one is forced to make an arbitrary choice regarding granularity. Take for example the bit string 01010101010101. The Shannon Entropy of the string at the level of single bits is maximal, as there are the same number of 1s and 0s, but the string is clearly regular when 2bit (nonoverlapping) blocks are taken as basic units, in which instance the string has minimal complexity because it contains only 1 symbol (01) from among 4 possible ones (00,01,10,11). A generalization consists in taking into consideration all possible “granularities” or the Entropy rate:
Definition II.7.
Let with denote the joint probability over blocks of consecutive symbols. Let the Shannon Entropy rate shannon () (also known as granular Entropy, gram Entropy) of a block of consecutive symbols–denoted by –be:
(2) 
Thus to determine the Entropy rate of the sequence, we estimate the limit when . It is not hard to see, however, that will diverge as tends to infinity if the number of symbols increases, but if applied to a binary string , it will reach a minimum for the granularity in which a statistical regularity is revealed.
The Shannon Entropy shannon () of an object is simply for fixed block size , so we can drop the subscript.
We can define the Shannon Entropy of a graph , with respect to , by:
Definition II.8.
(3) 
where is a probability distribution of , is a feature of interest of , e.g. edge density, degree sequence, number of overrepresented subgraphs/graphlets (graph motifs), and so on. When is the uniform distribution (every graph of the same size is equally likely), it is usually omitted as a parameter of .
The most common applications of Entropy to graphs are to degree sequence distribution and edge density (adjacency matrix), which are labelled graph invariants. In molecular biology, for example, a common application of Entropy is to count the number of ‘branchings’ mowshowitz () per node by, e.g., randomly traversing a graph starting from a random point. The more extensive the branching, the greater the uncertainty of a graph’s path being traversed in a unique fashion, and the higher the Entropy. Thorough surveys of graph Entropy are available in mowshowitz (); simonyi (); mowshowitz2 (), so we will avoid providing yet another one. In most, if not all of these applications of Entropy, very little attention is paid to the fact that Entropy can lead to completely disparate results depending on the ways in which the same objects of study are described, that is, to the fact that Entropy is not a graph invariant–either for labelled or unlabelled graphs–visávis object description, a major drawback for a complexity measure zenildata (); zenilbdm () of typicality, randomness, and causality. In the survey mowshowitz (), it is suggested that there is no ‘right’ definition of Entropy. Here we formally confirm this to be the case in a fundamental sense.
Indeed, Entropy requires a preselection of a graph invariant, but it is itself not a graphinvariant. This is because ignorance of the probability distribution makes Entropy necessarily dependent on graph invariant description, there being no such thing as an Invariance theorem solomonoff (); kolmogorov (); chaitin () in Shannon Entropy to provide a convergence of values independent of description language as there is in algorithmic information theory for algorithmic (KolmogorovChaitin) complexity.
Definition II.9.
The algorithmic complexity of an object is the length of its shortest computational description (computer program) in a reference language (of which it is independent), such that the shortest generating computer program fully reconstructs solomonoff (); kolmogorov (); chaitin (); levin ().
Iii Construction of Entropydeceiving graphs
If we can show that we can artificially fool Entropy we will show how Entropy may fail to characterize natural or socially occurring networks. Especially because, as we will demonstrate, different values of Shannon Entropy can be retrieved for the same graph as functions of different features of interest of said graph, thereby showing that there is no such thing as the ‘Shannon Entropy of a graph’ but rather ‘the Shannon Entropy of an identified property of a graph’, which can easily be replaced by a function that simply quantifies such a property directly.
iii.1 Entropy of pseudorandom graphs
By using integer sequences, in particular Borelnormal irrational numbers,
one can construct pseudorandom graphs, which can in turn be used to construct networks.
Definition III.1.
A real number is said to be normal if all tuplets of ’s digital expansion are equally likely, thereby of natural maximal order Entropy rate by definition of Borel normality.
For example, the mathematical constant is believed to be an absolute Borel normal number (Borel normal in every base), and so one can take the digits of in any base and take digits as the entries for a graph adjacency matrix of size by taking consecutive segments of digits . The resulting graph will have nodes and an edge density 0.5 because the occurrence of 1 or 0 in in binary has the probability 0.5 (the same as in decimals after transformation of digits to 0 if digit and 1 otherwise, or and 1 otherwise in general for any base ), thus complying with the definition of an ErdösRényi (ER) graph (albeit of high density).
As theoretically predicted and numerically demonstrated in Fig. 1(A and B), the degree distribution will approximate a normal distribution around . This means that the graph adjacency matrix will have maximal Entropy (if is Borel normal) but low degreesequence Entropy because all values are around and they do not span all the possible node degrees (in particular, low degrees). This means that algorithmically constructing a graph can give rise to an object with a different Entropy when the feature of interest of the said graph is changed.
A graph does not have to be of low algorithmic complexity to yield incompatible observerdependent Entropy values. One can take the digits of an Chaitin number (the halting probabilities of optimal Turing machines with prefixfree domains), some of the digits of which are uncomputable. But in Fig. 1(C) we show a graph based on the first 64 digits of an Chaitin number calude (), thus a highestalgorithmiccomplexity graph in the long run (it is ultimately uncomputable). Since randomness implies normality martinlof (), the adjacency matrix has maximal Entropy, but for the same reasons as obtain in the case of the graphs, it will have low degreesequence Entropy. For algorithmic complexity, in contrast, as we will see in Theorem III.6, all graphs have the same algorithmic complexity regardless of their (lossless) descriptions (e.g. adjacency matrix or degree sequence), as long as the same and only the same graph (up to an isomorphism) can be reconstructed from their descriptions.
One can also start from completely different graphs. For example, Fig. 2 shows how Shannon Entropy is applied directly to the adjacency matrix as a function of edge density, with the same Entropy values retrieved despite their very different (dis)organization.
The Entropy rate will be low for the regular antelope graph, and higher, but still far removed from randomness for the ER, because by definition the degreesequence variation of an ER graph is small. However, in scalefree graphs degree distribution is artificially scaled, spanning a large number of different degrees as a function of number of connected edges per added node, and resulting in an overestimation of their degreesequence Entropy, as can be numerically verified in Fig. 3. Degreesequence Entropy points in the opposite direction to the entropic estimation of the same graphs arrived at by looking at their adjacency matrices, when in reality, scalefree networks produced by, e.g., BarabasiAlbert’s preferential attachment algorithm barabasi (), are recursive (algorithmic and deterministic, even if probabilities are involved), as opposed to the ER construction built (pseudo)randomly. The Entropy of the degreesequence of scalefree graphs would suggest that they are almost as, or even more random than ER graphs for exactly the same edge densities. To circumvent this, adhoc measures of modularity have been introduced sole (), to precisely capture how removed a graph is from ‘scalefreeness’ by comparing any graph to a scalefree randomized version of itself, and thereby compelling consideration of a preselected feature of interest (‘scalefreeness’).
Furthermore, an ER graph can be recursively (algorithmically) generated or not, and so its Shannon Entropy has no connection to the causal, algorithmic information content of the graph, and can only provide clues for low Entropy graphs that can be characterized by other graphtheoretic properties, without need of an entropic characterization.
iii.2 A low complexity and high Entropy graph
We introduce a method to build a family of recursive graphs with maximal Entropy but low algorithmic complexity, hence graphs that appear statistically random but are, however, of low algorithmic randomness and thus causally (recursively) generated. Moreover, these graphs may have maximal Entropy for some lossless descriptions but minimal Entropy for other lossless descriptions of exactly the same objects, with both descriptions characterizing the same object and only that object, thereby demonstrating how Entropy fails at unequivocally and unambiguously characterizing a graph independent of a particular feature of interest. We denote by ‘ZK’ the graph (unequivocally) constructed as follows:

Let be a starting graph connecting a node with label 1 to a node with label 2. If a node with label has degree , we call it a core node; otherwise, we call it a supportive node.

Iteratively add a node to such that the number of core nodes in is maximized. The resulting graph is typified by the one in Fig. 4.
iii.3 Properties of the ZK graph
The degree sequence of the labelled nodes is the Champernowne constant champernowne () in base 10, a transcendental real whose decimal expansion is Borel normal borel (), constructed by concatenating representations of successive integers
whose digits are the labelled node degrees of for iterations (sequence A033307 in the OEIS).
The sequence of edges is a recurrence relation built upon previous iteration values between core and supportive nodes, defined by:
where is the golden ratio and the floor function (sequence A183136 in the OEIS) whose values are 1, 2, 4, 7, 10, 14, 18, 23, 29, 35, 42, 50, 58, 67, 76, 86, 97, 108, 120, 132, 145,
Definition III.2.
is a graph with at least one node with degree where .
has been used where we want to emphasize the number of generation or timesteps in the process of constructing . The symbol denotes the maximum degree of the graph. Nodes in the ZK graph belong to 2 types: core and supportive nodes.
Definition III.3.
Node is a core node iff such that . Otherwise it is a supportive node.
Theorem III.1.
To convert to , we need to add 2 supportive nodes to if is odd or one supportive node if is even.
Proof.
By induction:
The basis: has 3 core nodes denoted by and 2 supportive nodes denoted by . As described in the construction procedure, to convert to , we choose a supportive node with maximum degree. Here, since we have only nodes, their degree is one. So we need to connect to 3 other supportive nodes. As we have only one left, we need to add 2 supportive nodes. Now, has 3 supportive nodes, 2 of them new, , and one old, . The old one is of degree 2, and we need to convert it to 5; we have 2 other supportive nodes left, so we need a new supportive node . Therefore, the assumption is true for and (the basis).
Inductive step: Now, if we assume that it is true for , then it is true for .
We consider 2 cases:

is odd

is even
Case one:
If is odd then is even, which means we have added one supportive node with degree one, and to convert to we need to have a core node with degree . The maximum degree of a supportive node is , and we have only one supportive node which is not connected to the core candidate node, which implies that the core candidate node will be , and we would need to add 2 extra supportive nodes to our graph.
Case two: If is even then is odd, and therefore has 2 supportive nodes with degree one (they have only been connected to the last core nodes). So we would need to add only one node to convert the supportive node with maximum degree to a core node with degree . ∎
Corollary III.1.

if is odd then

is even
Theorem III.2.
there is a maximum of 3 nodes with degree in .
Proof.
By induction:
The basis: The assumption is true for .
Inductive step: If we assume have there is a maximum of 3 nodes with degree then , then there is a maximum of 3 nodes with degree .
The proof is direct using theorem III.1. To generate , we add a maximum of 2 supportive nodes. These nodes have degree one and there is no node with degree one except the first core node (core node with degree 1). Thus we have a maximum of 3 nodes with degree one. The degree of all other supportive nodes will be increased by one, which, based on the hypothesis of induction, has not been repeated more than 3 times. ∎
Theorem III.3.
ZK is of maximal degreesequence Entropy.
Proof.
The degree sequence of the ZK graph can be divided into 2 parts:

A dominating degree subsequence associated with the core nodes (always longer than subsequence 2 of supporting nodes) generated by the infinite series:
that produces the Champernowne constant , which is Borel normal borel (); champernowne ().

A second degree sequence associated with the supportive nodes, whose digits do not repeat more than 3 times, and therefore, by Theorem III.2, has a maximal order Entropy rate for and a high Entropy rate for .
Therefore, the degree sequence of ZK is asymptotically of maximal Entropy rate. ∎
Theorem III.4.
The ZK graph is of low algorithmic (KolmogorovChaitin) complexity.
Proof.
By demonstration: The computer generated program of the ZK graph written in the Wolfram Language, is:
AddEdges[graph_] := EdgeAdd[graph, Rule@@@Distribute[{Max[VertexDegree[graph]] + 1, Table[i, {i, (Max[VertexDegree[graph]] + 2), (Max[VertexDegree[graph]] + 1) + (Max[VertexDegree[graph]] + 1)  VertexDegree[graph, Max[VertexDegree[graph]] + 1]}]}, List]]
The graph can be constructed recursively for any number of nodes by nesting the AddEdges[] function as follows:
Nest[AddEdges, Graph[{1 > 2}, n]
starting from the graph defined by as initial condition.
The length of NestList with AddEdges and the initial condition in bytes is the algorithmic complexity of ZK, which grows by only and is therefore of low algorithmic randomness. ∎
We now show that we can fully reconstruct ZK from the degree sequence. As we know that we can also reconstruct ZK from its adjacency matrix (denoted by ), we therefore have it that both are lossless descriptions from which ZK can be fully reconstructed and for which Entropy provides contradictory values depending on the feature of interest.
Theorem III.5.
, all instances of are isomorphic.
Proof.
The only degree of freedom in the graph reconstruction is the selection of a supportive node to convert to a core node when there are several supportive nodes of maximal degree. As has been proven in Theorem III.1, the number of nodes which are added to a graph is independent of the supportive nodes selected for conversion to a core node. In any instance of a graph the number of nodes and edges are equal, and it is clear that by mapping the selected node in each step in any instance of a graph to the selected node in the corresponding step in another instance we get , such that is a bijection (both oneone and superimposed one on the other). ∎
Finally, we prove that all isomorphic graphs have about the same (e.g. low) algorithmic complexity:
Theorem III.6.
Let be an isomorphic graph of . Then for all , where is the automorphism group of .
Proof.
The idea is that if there is a significantly shorter program for generating compared to a program generating , we can use to generate via and a relatively short program that tries, e.g., all permutations, and checks for isomorphism. Let’s assume that there exists a program such that , i.e. the difference is not bounded by any constant, and that . We can replace by to generate such that , where is a constant independent of that represents the size of the shortest program that generates , given any . Then we have it that , which is contrary to the assumption. ∎
iii.4 Degreesequence targeted Entropydeceiving graph construction
Taking advantage of the correlation between 2 variables , (starting independently) with the same probability distribution, let be a matrix with rows normalized to 1. Consider the random variables , which satisfy
The correlation between and is just the inner product between the two rows of . This can be used to generate a degree distribution of a graph with any particular Entropy, provided the resulting degree sequence complies with or is completed according to the necessary and sufficient conditions for building a graph kim (); kim2 ().
Iv Graph Entropy versus Graph Algorithmic Complexity
The ensemble of the graphs compatible with the ZK graph for the Entropy of its degree distribution consists thus of the set of networks that have nearmaximal degree sequence, as the sequence distribution is uninformative (nearly every degree appears only once) and thus does not reduce statistical uncertainty, despite the algorithmic nature of the ZK graph (and assuming one does not know that the graph is deterministically generated, a reasonable assumption of ignorance characteristic of the general observer in a typical, realistic case). The size of the ensemble is thereby close to , the number of permutations of the elements of the degree distribution of the ZK graph, constrained by the number of sequences that can actually construct a graph kim (); kim2 (). This means that, without loss of generality, any Entropybased measure (in this case applied to the degree sequence) will be misleading, assigning high randomness after a large ensemble of equally high Entropy values when it is in fact a simple recursive graph, and thereby illustrating the limits of classical information theory for graph profiling.
iv.1 Algorithmic complexity invariance visávis full object description
While this paper does not focus on alternatives to graph Entropy, alternative and complementary directions for exploring robust (if semicomputable) approaches to graph complexity have been introduced zenilgraph (), together with numerical methods showing that one can not only robustly define the algorithmic complexity (even when semicomputable) of labelled graphs more independently of description language, but also of unlabelled graphs, as set forth in zenilmethodsbiology (), in particular:
Definition IV.1.
Algorithmic Complexity of unlabelled graphs: Let be a lossless description of and its automorphism group. Then,
where is the algorithmic (KolmogorovChaitin) complexity of the graph as introduced in zenilgraph (); zenilmethodsbiology () (the shortest computer program that produces upon halting) and is the set of all descriptions for all graphs in , independent of (per the Invariance theorem). Which, unlike graph Entropy, is robust zenilmethodsbiology (). In zenilgraph (), it was in fact shown that the algorithmic complexity estimation of a labelled graph is a good approximation of the algorithmic complexity of the graph automorphism group (i.e. the unlabelled graph complexity), and is correlated in one direction to the automorphism group count.
iv.2 The fragility of Entropy and computable measures visávis object description
In contrast to algorithmic complexity, no computable measure of complexity can test for all (Turing) computable regularities in a dataset martinlof (). That is, there is no test that can be implemented as a Turing machine that takes the data as input and indicates whether it has a regularity upon halting (regularities such as “every 5th place is occupied by a consecutive prime number”, to mention one example among an infinite number of possibilities).
Definition IV.2.
A computable regularity is a regularity for which a test can be set as a computer program running on a specificpurpose Turing machine testing for the said regularity.
Common statistical tests, for example, are computable because they are designed to be effective, but no computable universal measure of complexity can test for every computable regularity. In other words, for every computable measure capturing a data feature intended to quantify the random content of the data, one can devise a mechanistic procedure producing that deceptively simulates the said measure for all other features.
Moreover, for every effective feature, one can devise/conceive an effective measure to test for it, but there is no computable measure able to implement a universal statistical test martinlof (). This means that for every effective (computable) property/feature of a computable object , there is a computable measure to test for in (or any object like ), but no computable measure exists to test for every feature in (and all the effectively enumerable computable objects like ).
Let be a lossless description of an object , meaning that can be reconstructed from without any loss of information. Then there is no essential distinction between and from the algorithmic point of view because , where is the length of the translation program (in bits) between and .
Theorem IV.1.
For a computable measure , such as Shannon Entropy, there is no constant or logarithmic term such that , or bounding the difference as a function of the size of .
In other words, as we have proven by exhibiting a counterexample (the ZK graph), the Shannon Entropy of an object may diverge when applied to different lossless descriptions of the same object and cannot therefore be considered a robust measure of complexity. A measure of complexity should thus look not for a single property in (any possible) object but for potentially an unbounded (and potentially unidentified) number of possible properties in any object .
A sound characterization of a complexity measure can thus be established as a function that captures strictly more information about (any) than any (computable) function. All computable functions are thus not good candidates for universal measures of complexity as they can be replaced by a measure as a function of the property (or combination) of properties of interest and nothing else.
iv.3 Dependence on assumed distributions
An argument against the claim that Entropy yields contradictory values when used to profile randomness (even statistical randomness) is that one can change the domain of the Entropy measure in such a way as to make Entropy consistent with any possible description of a graph. For example, because we have proven that the ZK algorithm is deterministic and can only produce a single ZK graph, it follows that there is no uncertainty in the production of the object, there being only one graph for the formula. In this way, building a distribution of all formulae generating the ZK graph will always lead to Shannon Entropy for the ‘right’ description using the ‘right’ ensemble containing only the ZK formula(e).
According to the same argument the digits of the mathematical constant (to mention only the most trivial example) would have Shannon Entropy , because the digits are produced deterministically and the ‘right’ ensemble for should be that containing only formulae deterministically generating the digits .
Directly changing the ensemble on which Entropy operates for a specific object only facilitates conformity to some arbitrary Entropy value dictated by an arbitrary expectation, e.g. that for any initial segment of of length (entailing an Entropy rate of 0 as well) because is deterministic and therefore no digit is surprising at all, or alternatively, if Shannon Entropy is supposed to measure statistical randomness. Moreover, this misbehaviour has to do not with a lack of knowledge but with the lack of an invariance theorem, because is deterministically generated and hence its digits do not fundamentally reduce uncertainty. But if one assumes that the digits of are not stochastic in order to assign it a Shannon Entropy equal to zero, then one is forced to concede that even perfect statistical randomness, produced by a supposedly Borelnormal number, has, in objective terms, a Shannon Entropy (and Entropy rate) equal to zero, but the highest Shannon Entropy (and Entropy rate) from an observer perspective (as it will never be certain that the streaming digits are truly ). In other words, the asymptotic behaviour after taking into consideration the digits of approximates maximum Shannon Entropy, but itself has a Shannon Entropy of zero.
iv.4 An algorithmic Maximum Entropy Model
Following the statistical mechanics approach bianconi2007entropy (), a typical recursively generated graph such as the ZK graph would, based on its degree sequence, be characterized as being typically random from the observer perspective–because Shannon Entropy will find the graph to be statistically random and thus just as random as any member of the set of all graphs with (near) maximal degree sequence Entropy–thus giving no indication of the actual recursive nature of the ZK graph and misleading the observer.
In contrast, the type of approach introduced in zenilgraph (), based upon trying to find clues to the recursive nature of an object such as a graph, would asymptotically find the causal nature of a recursivelygenerating object such as the ZK graph, independent of probability distributions, even if it is more difficult to estimate.
Rectifying the approaches based on models of maximum entropy involves updating and replacing the assumption of the maximum entropy ensemble. An example illustrating how to achieve this in the context of, e.g., a Bayesian approach, has been provided in algorithmiccognition1 () and consists in replacing the uninformative prior by the uninformative algorithmic probability distribution, the socalled Universal Distribution, as introduced by Levin levin (). The general approach has already delivered some important results algorithmiccognition2 () by, e.g., quantifying the degree of human cognitive randomness that previous statistical approaches and measures such as Entropy made it impossible to quantify. Animated videos have been made available explaining applications to graph complexity (https://youtu.be/E238zKsPCgk) and to cognition in the context of random generation tasks (https://youtu.be/EYjBE5qm7c). A tool has also been placed online (http://complexitycalculator.com/) for sequences and arrays, and thus the reader can experiment with an actual numerical tool and explore the differences between the statistical and the algorithmic approaches.
V Conclusions
The methods introduced here allow the construction of ‘Borelnormal pseudorandom graphs’, uncomputable numberbased graphs and algorithmically produced graphs, while illustrating the shortcomings of computable graphtheoretic and Entropy approaches to graph complexity beyond random feature selection, and their failure when it comes to profiling randomness and hence causalcontent (as opposed to randomness).
We have shown that Entropy is highly observerdependent even in the face of full accuracy and access to lossless object descriptions and thus has to be complemented by measures of algorithmic content. We have produced specific complexitydeceiving graphs for which Entropy retrieves disparate values when an object is described differently (thus with different underlying distributions), even when the descriptions reconstruct exactly the same, and only the same, object. This drawback of Shannon Entropy, ultimately related to its dependence on distribution, is all the more serious because it is easily overlooked in the case of objects other than strings, for instance, graphs. For an object such as a graph, we have shown that changing the descriptions may not only change the values but actually produce divergent, contradictory values.
We constructed a graph ZK about which the following is true when it is described by its adjacency matrix : for growing graph size . Contradictorily, considering the same ZK graph degree sequence, we found that for the same growth rate , even though both and are lossless descriptions of the same graph that construct exactly the same ZK graph, and only a ZK graph.
This means that not only does one need to choose a description of interest in order to apply a definition of Entropy, such as the adjacency matrix of a network (or its incidence or Laplacian) or its degree sequence, but that as soon as the choice is made, Entropy becomes a trivial counting function of the specific feature of interest, and of that feature alone. In the case of, for example, the adjacency matrix of a network (or any related matrix associated with the graph, such as the incidence or Laplacian matrices), Entropy becomes a function of edge density, while for degree sequence, Entropy becomes a function of sequence normality. Entropy can thus trivially be replaced by such functions without any loss, but it cannot be used to profile the object (randomness, or information content) in any way independent of an arbitrary feature of interest.
These results and observations have farreaching consequences. For example, recent literature appears contradictory, by turns suggesting that cancer cells display an increase in Entropy teschendorff (), and also reporting that cancer cells display a decrease in Entropy west (), in both cases applied to a function of degree distribution over networks of molecular interactions. Cells are also believed to be in a state of criticality between evolvability and robustness aldana (); csermely () that may make them look random though they are not. This means that Entropy may be overestimating randomness in the best case or misleading in the worst case, as we have found in the instance of disparate values for the same objects, thus suggesting that additional safeguards are needed to achieve consistency and soundness.
New developments zenilgraph (); zenilmethodsbiology () promise more robust complementary measures of (graph) complexity less dependent on object description, measures based upon the mathematical theory of randomness and algorithmic probability which are better equipped to profile causality and algorithmic information content and cover statistical randomness and thus can be considered an observerimproved generalization of Shannon Entropy.
References
 (1) AlbertLászló Barabási and Réka Albert. Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
 (2) Ginestra Bianconi. The entropy of randomized network ensembles. EPL (Europhysics Letters), 81(2):28005, 2007.
 (3) S. Boccaletti et al. The structure and dynamics of multilayer networks. Physics Reports, 544(1):1–122, 2014.
 (4) E Borel. Les probabilités dénombrables et leurs applications arithmétiques. Rendiconti del Circolo Matematico di Palermo (18841940), 27(1):247–271, 1909.
 (5) Cristian S Calude, Michael J Dinneen, ChiKou Shu, et al. Computing a glimpse of randomness. Experimental Mathematics, 11(3):361–370, 2002.
 (6) Gregory J Chaitin. On the length of programs for computing finite binary sequences. Journal of the ACM (JACM), 13(4):547–569, 1966.
 (7) David G Champernowne. The construction of decimals normal in the scale of ten. Journal of the London Mathematical Society, 1(4):254–260, 1933.
 (8) Zengqiang Chen, Matthias Dehmer, Frank EmmertStreib, and Yongtang Shi. Entropy bounds for dendrimers. Applied Mathematics and Computation, 242:462–472, 2014.
 (9) Peter Csermely et al. Cancer stem cells display extremely large evolvability: alternating plastic and rigid networks as a potential mechanism. Seminars in Cancer Biology, 30:42–51, 2015.
 (10) Matthias Dehmer, Stephan Borgert, and Frank EmmertStreib. Entropy bounds for hierarchical molecular networks. PLoS One, 3(8):e3079, 2008.
 (11) Matthias Dehmer and Abbe Mowshowitz. A history of graph entropy measures. Information Sciences, 181(1):57–78, 2011.
 (12) Paul Erdos and Alfréd Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17–60, 1960.
 (13) Ernesto Estrada. Quantifying network heterogeneity. Physical Review E, 82(6):066102, 2010.
 (14) Ernesto Estrada, A José, and Naomichi Hatano. Walk entropies in graphs. Linear Algebra and its Applications, 443:235–244, 2014.
 (15) Orsini et al. Quantifying randomness in real networks. Nature Communications, 6:8627, 2015.
 (16) H. Kim, C. I. Del Genio, K. E. Bassler, and Z. Toroczkai. Degreebased graph construction. New Journal of Physics, 14:023012, 2012.
 (17) Hyunju Kim, Zoltán Toroczkai, Péter L Erdős, István Miklós, and László A Székely. Degreebased graph construction. Journal of Physics A: Mathematical and Theoretical, 42(39):392001, 2009.
 (18) Andrei Nikolaevich Kolmogorov. Three approaches to the quantitative definition of information*. International Journal of Computer Mathematics, 2(14):157–168, 1968.
 (19) Janos Korner and Katalin Marton. Random access communication and graph entropy. IEEE transactions on information theory, 34(2):312–314, 1988.
 (20) Leonid A Levin. Laws of information conservation (nongrowth) and aspects of the foundation of probability theory. Problemy Peredachi Informatsii, 10(3):30–35, 1974.
 (21) Ming Li and Paul Vitányi. An introduction to Kolmogorov complexity and its applications. Springer Science & Business Media, 2009.
 (22) Guoxiang Lu, Bingqing Li, and Lijia Wang. Some new properties for degreebased graph entropies. Entropy, 17(12):8217–8227, 2015.
 (23) Per MartinLöf. The definition of random sequences. Information and control, 9(6):602–619, 1966.
 (24) Ron Milo, Shai ShenOrr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. Network motifs: simple building blocks of complex networks. Science, 298(5594):824–827, 2002.
 (25) Abbe Mowshowitz. Entropy and the complexity of graphs: I. an index of the relative complexity of a graph. The bulletin of mathematical biophysics, 30(1):175–204, 1968.
 (26) Abbe Mowshowitz and Matthias Dehmer. Entropy and the complexity of graphs revisited. Entropy, 14(3):559–570, 2012.
 (27) Dipendra C Sengupta and Jharna D Sengupta. Application of graph entropy in CRISPR and repeats detection in DNA sequences. Computational Molecular Bioscience, 6(03):41, 2016.
 (28) CE Shannon. A mathematical theory of communication, Bell System technical journal 27: 379423 and 623–656. Mathematical Reviews (MathSciNet): MR10, 133e, 1948.
 (29) Shai S ShenOrr, Ron Milo, Shmoolik Mangan, and Uri Alon. Network motifs in the transcriptional regulation network of Escherichia coli. Nature genetics, 31(1):64–68, 2002.
 (30) Gábor Simonyi. Graph entropy: a survey. Combinatorial Optimization, 20:399–441, 1995.
 (31) Ricard V Solé and Sergi Valverde. Spontaneous emergence of modularity in cellular networks. Journal of The Royal Society Interface, 5(18):129–133, 2008.
 (32) Fernando SolerToscano, Hector Zenil, JeanPaul Delahaye, and Nicolas Gauvrit. Calculating Kolmogorov complexity from the output frequency distributions of small Turing machines. PloS one, 9(5):e96223, 2014.
 (33) Ray J Solomonoff. A formal theory of inductive inference. part i. Information and control, 7(1):1–22, 1964.
 (34) Andrew E Teschendorff and Simone Severini. Increased entropy of signal transduction in the cancer metastasis phenotype. BMC Systems Biology, 4(104), 2010.
 (35) C. TorresSosa, S. Huang, and M. Aldana. Criticality is an emergent property of genetic networks that exhibit evolvability. PLoS Comput Biol, 8(9):e1002669, 2012.
 (36) Ernesto Trucco. A note on the information content of graphs. Bulletin of Mathematical Biology, 18(2):129–135, 1956.
 (37) James West, Simone Severini, Ginestra Bianconi, and Andrew E. Teschendorff. Differential network entropy reveals cancer system hallmarks. Scientific Reports, 2(802), 2012.
 (38) Hector Zenil. Small data matters, correlation versus causation and algorithmic data analytics. Predictability in the world: philosophy and science in the complex world of Big Data, Springer Verlag (forthcoming), 2013.
 (39) Hector Zenil, Narsis A Kiani, and Jesper Tegnér. Methods of information theory and algorithmic complexity for network biology. Seminars in cell & developmental biology, 51:32–43, 2016.
 (40) Hector Zenil, Fernando SolerToscano, JeanPaul Delahaye, and Nicolas Gauvrit. Twodimensional kolmogorov complexity and an empirical validation of the coding theorem method by compressibility. PeerJ Computer Science, 1:e23, 2015.
 (41) Hector Zenil, Fernando SolerToscano, Kamaludin Dingle, and Ard A Louis. Correlation of automorphism group size and topological properties with programsize complexity evaluations of graphs and complex networks. Physica A: Statistical Mechanics and its Applications, 404:341–358, 2014.
 (42) Hector Zenil, Francisco SolerToscano, Narsis A. Kiani, Santiago HernándezOrozco, and Antonio RuedaToicen. A decomposition method for global evaluation of Shannon entropy and local estimations of algorithmic complexity. arXiv:1609.00110 [cs.IT], 2016.
 (43) Jacob Ziv and Abraham Lempel. Compression of individual sequences via variablerate coding. IEEE transactions on Information Theory, 24(5):530–536, 1978.
 (44) M. Dehmer and A. Mowshowitz. A history of graph entropy measures. Information Sciences 181, 57–78, 2011.
 (45) N. Gauvrit, H. Zenil, and J. Tegnér. The Informationtheoretic and Algorithmic Approach toHuman, Animal and Artificial Cognition Springer Verlag (in press)
 (46) N. Gauvrit, H. Zenil, F. SolerToscano, J.P. Delahaye, and P. Bruger. Human behavioral complexity peaks at age 25. PLoS Comput Biol. 13:4, e1005408, 2017.