Coding-theorem Like Behaviour and Emergence of the Universal Distribution from Resource-bounded Algorithmic Probability

Coding-theorem Like Behaviour and Emergence of the Universal Distribution from Resource-bounded Algorithmic Probability

Hector ZenilCorresponding author: hector.zenil [at] algorithmicnaturelab [dot] org    Liliana Badillo     Santiago Hernández-Orozco     and Francisco Hernández-Quiroz 
Abstract

Introduced by Solomonoff and Levin, the seminal concept of Algorithmic Probability (AP) and the Universal Distribution (UD) predicts the way in which strings distribute as the result of running ‘random’ computer programs. Previously referred to as ‘miraculous’ because of its surprisingly powerful properties and applications as the optimal theoretical solution to the challenge of induction and inference, approximations to AP and the UD are of the greatest importance in computer science and science in general. Here we are interested in the emergence, rates of convergence, and the Coding-theorem like behaviour as a marker of acting AP emerging in subuniversal models of computation. To this end, we investigate empirical distributions of computer programs of weaker computational power according to the Chomsky hierarchy. We introduce measures of algorithmic probability and algorithmic complexity based upon resource-bounded computation compared to previously thoroughly investigated distributions produced from the output distribution of Turing machines. The approach allows for numerical approximations to algorithmic (Kolmogorov-Chaitin) complexity-based estimations at each of the levels of a computational hierarchy. We demonstrate that all these estimations are correlated in rank and they converge both in rank as a function of computational power despite the fundamental differences of each computational model.

Keywords algorithmic coding-theorem like behaviour, Solomonoff’s induction, Levin’s semi-measure, computable algorithmic complexity, Finite-state complexity, transducer complexity, context-free grammar complexity, linear-bounded complexity, time resource-bounded complexity.

Department of Computer Science, University of Oxford, Oxford, U.K.
Algorithmic Dynamics Lab, Unit of Computational Medicine, SciLifeLab, Centre for Molecular Medicine, Department of Medicine Solna, Karolinska Institute, Stockholm, Sweden.
Algorithmic Nature Group, LABORES, Paris, France.
Posgrado en Ciencias e Ingeniería de la Computación at the Universidad Nacional Autonoma de México (UNAM)
Departamento de Matemáticas, Facultad de Ciencias, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, México.

1 Motivation and Significance

An algorithmic ‘law’ regulates the behaviour of the output distribution of computer programs. The Universal Distribution is the probability distribution that establishes how all the output strings from a universal computer running a random computer program distribute. The Algorithmic Probability of a string is formally defined by:

(1)

where the sum is over all halting programs for which , a prefix-free universal Turing machine, outputs the string . A prefix universal Turing machine defines a set of valid programs so that the sum is bounded by Kraft’s inequality [17] and not greater than 1 (is also called a semi-probability measure because some programs will not halt and thus the sum of the probabilities is never really 1).

An invariance theorem establishes that the choice of reference universal Turing machine introduces a vanishing bias as a function of the string size and is thus asymptotically negligible.

(2)

where is a constant that depends on and (think of a compiler size translating in both directions) but is independent of and thus the reference can safely be dropped in the long term, yet this invariance theorem tells nothing about the rate of convergence thus making this numerical experiments more relevant and necessary.

The Algorithmic Probability and the Universal Distribution represent the theoretical optimal to the challenge of induction and inference. According to R. Solomonoff, a co-founder of algorithmic information theory [32, 33, 34].

More recently, at a panel discussion at the World Science Festival in New York City on Dec 14, 2014, Marvin Minsky, one of the founding fathers of AI, said [own transcription]:

It seems to me that the most important discovery since Gödel was the discovery by Chaitin, Solomonoff and Kolmogorov of the concept called Algorithmic Probability which is a fundamental new theory of how to make predictions given a collection of experiences and this is a beautiful theory, everybody should learn it, but it’s got one problem, that is, that you cannot actually calculate what this theory predicts because it is too hard, it requires an infinite amount of work. However, it should be possible to make practical approximations to the Chaitin, Kolmogorov, Solomonoff theory that would make better predictions than anything we have today. Everybody should learn all about that and spend the rest of their lives working on it.

The Universal Distribution has also been referred to as miraculous, because of its properties in inference and prediction [23]. However, the calculation of both is not computable. This has meant that for decades after its discovery little to no attempts to apply Algorithmic Probability to problems in general science have been made. Nonetheless, is not only uncomputable but, to be more precisely, lower semi-computable, which means that it can be approximated from above. It is thus of fundamental interest to science in its application and contribution to the challenges of complexity, inference and causality to find and keep pushing the boundaries towards better approximating methods to algorithmic probability. A recent new framework and pipeline on numerical methods to estimate them have been advanced and proven successful in many areas of application ranging from cognition to graph complexity [9, 12, 7, 39, 40].

There are many properties of AP that make it optimal [32, 33, 34, 23]. For example, the same Universal Distribution will work for any problem within a convergent error, it can deal with missing and multidimensional data, the data does not need to be stationary or ergodic, there is no under-fitting or over-fitting because the method is parameter-free and thus the data need not to be divided into training and test sets, it is the gold standard for a Bayesian approach in the sense that it updates the distribution in the most efficient and accurate way possible with no assumptions.

Several interesting extensions of resource-bounded Universal Search approaches have been introduced to make algorithmic probability more useful in practice [27, 28, 20, 14] some of which provide some theoretical bounds [1]. While some approaches have explored how relaxing some of the conditions (e.g. universality) on which Levin’s Universal Search is fundamentally based upon [36] or have introduced domain-specific versions (and thus versions of conditional AP). Here we explore the behaviour of explicitly weaker models of computation of increasing computational power to investigate the asymptotic behaviour and emergence of the the Universal Distribution and the properties of both the different models to approximate it and the actual empirical distributions that such models produce.

The so-called Universal Search [15] is based on dovetailing over all possible programs and their runtime such that the fraction of time allocated to program is , where is the size of the program (in number of bits). Despite the algorithm’s simplicity and remarkable theoretical properties, a potentially huge constant slowdown factor has kept it from being used much in practice. Some of the approaches to sped it up have included to introduce biased and make the search domain specific, this has at the same time limited the power of Algorithmic Probability.

There are practical applications of AP that make it very relevant. If one could translate some of the power of Algorithmic Probability to decidable models (thus below the Type-0 in the Chomsky hierarchy) without having to deal with the uncomputability algorithmic complexity and algorithmic probability, it would be effectively possible to trade computing power for predictive power. While trade-offs must exist for this to be possible (full predictability and uncomputability are incompatible), the question of finding a threshold for the coding-theorem to apply would be key to transfer some power from relaxing the computational power of an algorithm. If a smooth trade-off is found before the undecidability border of Turing completeness it means that the partial advantages of Algorithmic Information Theory can be found and partially recovered from simpler models of computation in exchange for accuracy. Such simpler models of computation may model physical processes that are computationally rich but are subjects to noise or are bounded by resources. More real-world approaches may then lead to applications such as in the reduction of conformational distributions of protein folding [6, 10] in a framework that may favour or rule out certain paths thereby helping predicting the most likely (algorithmic) final configuration. If the chemical and thermodynamic laws that drive these processes are subject to are considered algorithmic in any way even under random interactions e.g. molecular Brownian motion, the Universal Distribution may help shed some light to quantify the most likely regions if, in any way, those laws constitute forms of computation below or at the Turing level that we explore here. This can be considered more plausible if one considers that bias probabilistic bias affect convergence [19] and that we have demonstrated [11] that finite approximations to the Universal Distribution may explain some phenomenology of natural selection.

1.1 Uncomputability in complexity

Here we explore the middle ground at the boundary and study the interplay between computable and non-computable measures of algorithmic probability connected to algorithmic complexity. Indeed, a deep connection between algorithmic complexity (or Kolmogorov-Chaitin) of an object and of was found and formalized by way of the algorithmic Coding theorem. The theorem establishes that the probability of to be produced by a random algorithm is inversely proportional to its algorithmic complexity (up to a constant) [16]:

(3)

Levin proved that the output distribution established by Algorithmic Probability dominates (up to multiplicative scaling) any other distribution produced by algorithmic means as long as the executor is a universal machine, hence giving the distribution its ‘universal’ character (and name, as the ‘Universal Distribution’).

This so-called Universal Distribution is a signature of Turing-completeness. However, many processes that model or regulate natural phenomena may not necessarily be Turing universal. For example, some models of self-assembly may not powerful enough to reach Turing-completeness, yet they display similar output distributions to that predicted by the Universal Distribution by way of the algorithmic Coding theorem, with simplicity highly favoured by frequency production. Noise is another source of power degradation that may hamper universality and therefore the scope and application of algorithmic probability. However, if some sub-universal systems approach the coding-theorem behaviour this give us great prediction capabilities and less powerful but computable algorithmic complexity measures. Here we ask whether such distributions can be partially or totally explained from importing the relation established by the coding theorem and under what conditions non-universal systems can display algorithmic coding-theorem like behaviour.

Here we produce empirical distributions of systems at each of the computing levels of the Chomsky hierarchy, starting from transducers (Type-3) as defined in [18], Context-free grammars (Type-2) as defined in [35], linear-bounded non-deterministic Turing machines (Type-1) as approximations to bounded Kolmogorov-Chaitin complexity and from a universal procedure from an enumeration of Turing machines (Type-0) as defined in [5, 21]. We report the results of the experiments and comparisons, showing the gradual coding-theorem-like behaviour at the boundary between decidable and undecidable systems.

2 Methods

We will denote by or just the set of all strings produced by all the Turing machines with states and symbols.

2.1 The Chomsky Hierarchy

The Chomsky hierarchy is a strict containment hierarchy of classes of formal grammars equivalent to different computational models of increasing computing power. At each of the 4 levels, grammars and automata compute a larger set of possible languages and strings. From weaker to stronger computational power:

  1. The most restricted grammars. Generates the regular languages consisting of rules with single non-terminal symbols on the left-hand side and terminal or non-terminal symbols on the right-hand side. This level is studied by way of finite-state transducers as a generalization of a finite-state automaton that generates an output at every step generating a set of relations on the output tape yet FSTs do not recognize a larger set than FSAs thus representing this level. We used an enumeration of transducers introduced in [18] where they also proved an invariance theorem thus demonstrating that the enumeration choice is invariant (up to a constant).m[Type-2] Grammars that generate the context-free languages. This kind of grammars are extensively used in linguistics. The languages generated by a CFG grammars are exactly the languages that can be recognized by a non-deterministic pushdown automaton. We denote this level by CFG. We generated production rules for 40 000 grammars according to a sound scheme introduced in [35].

  2. Grammars that generate the context-sensitive languages. The languages described by these grammars are exactly all languages that can be recognized by a linear-bounded automaton or a non-deterministic Turing machine whose tape is bounded by a constant times the length of the input that we will denote by LBA. An AP-based variation is here introduce and we denote it by LBA/AP.

  3. The least restricted grammar. Generates the languages that can be recognized by Turing machines or recursively enumerable languages. This is the level at which Turing-universality is achieved or required. We used the previously generated distributions from [5, 21]

We also explore the consequences of relaxing the halting configutation (state) condition in models of universal computation (Type-0) when it comes to comparing their output distributions.

2.2 Finite-state complexity

Formal language theory and algorithmic complexity had traditionally been disconnected in connection to the number of states, or the number of transitions in a minimal finite automaton accepting a regular language. In [3] it was established a connection by extending the notions of Blum static complexity and of encoded function space. The main reason for this lack of connection was that languages are sets of strings, rather than strings used for measure of algorithmic complexity and a meaningful definition for the complexity of a language was lacking as well as well as a definition of Finite-state algorithmic complexity. However, in [18], they developped a version of algorithmic complexity by replacing Turing machines with finite transducers; the complexity induced is called Finite-state complexity (FSA). Despite of the fact that the Universality Theorem (true for Turing machines) is false for finite transducers, rather surprisingly, the invariance theorem holds true for Finite-state complexity and, in contrast with descriptional complexities (plain and prefix-free), Finite-state complexity is computable.

Defined in [18] and analogous to the core concept (Kolmogorov-Chaitin complexity) of Algorithmic Information Theory (AIT) based on finite transducers instead of Turing machines. Finite-state complexity is computable and there is no a priori upper bound for the number of states used for minimal descriptions of arbitrary strings.

Consider a transducer with the finite set of states . Then the transition function of is encoded by a binary string (see [18] for details). The transducer which is encoded by is called , where is the set of all strings in the form of .

In [18] it was shown that the set of all transducers can be enumerated by a regular language and that there exists a a hierarchy for more general computable encodings. For this experiment we fix .

As in traditional AIT where Turing machines are used to describe binary strings, transducers describe strings in the following way: we say that a pair , , is a description of the string if and only if . The size of the description is defined in the following way

Definition 2.1.

([18]) The Finite-state complexity of (that we will identify as FSA in the results) with respect to encoding is defined by

An important characteristic of traditional AIT is the invariance theorem which states that the complexity is optimal up to an additive constant and relies on the existence of a Universal Turing machine (the additive constant is in fact its size). In contrast with AIT, due to the non existence of a “Universal transducer”, Finite-state complexity includes the size of the transducer as part of the encoding length. Nevertheless, the invariance theorem holds true for Finite-state complexity. An interesting consequence of the invariance theorem for Finite-state complexity is the existence of an upper bound for all , where is the length of the string which encodes the identity transducer. Hence is computable. If and are encodings then for computable function  [18].

An alternative definition of Finite-state complexity based on Algorithmic Probability is as follows:

Definition 2.2.

([18]) The Finite-state complexity (denoted by FSA/AP in the results) of with respect to encoding is defined by

That is the number of times that a string is accepted by a transducer (in this case, reported in the results, for encodings of size 8 to 22).

2.2.1 Building a Finite-state empirical distribution

We now define the construction of an empirical distribution using Finite-state complexity. We introduce our alternative definition of algorithmic probability using transducers.

Definition 2.3.

(Finite-state Algorithmic Probability) Let be the set of encodings of all transducers by a binary string in the form of . We then define the algorithmic probability of a string as follows

For any string , is the algorithmic probability of , computed for the set of encodings . In the construction of the empirical distribution for Finite-state complexity, we consider the set of strings such that , , and . Hence . Following [22] we define the empirical distribution function ( the probability distribution) as

Definition 2.4.

(Finite-state Distribution, FSA)

In other words, considers all strings of length and determines whether such that . Then computes and counts the number of times that we have for every string described by such that 111Since is in fact regular we could indeed use an enumeration of but for this work we analyze all binary strings of length ..

We note that in the encoding , a string occurring as the output of a transition in contributes to the size of a description of a string . The decision of considering strings such that was made based on the fact that, in the encoding of the smallest transducer the one with the transition function,

(4)

where is the empty string, the string (which occurs as the output of transitions in ) has length and so contributes to the size of the description of a string .

2.3 Context-free grammars

In [35] Wharton describes an algorithm for a general purpose grammar enumerator which is adaptable to different classes of grammars (i. e. regular, context-free, etc). We implemented this algorithm in the Wolfram Language with the purpose of enumerating context-free grammars over the terminal vocabulary which are in Chomsky Normal Form. Before describing the implementation of the algorithm we define the following terminology:

  • A grammar is a 4-tuple .

  • is the non terminal vocabulary.

  • is the terminal vocabulary.

  • is the set of productions.

  • is the start symbol.

  • is the vocabulary of .

  • For any grammar , and denote the cardinalities of and respectively.

First, we define the structure of a grammar . Let be any grammar. Suppose we are given the non terminal vocabulary with an arbitrary ordering such that the first non terminal is the start symbol . The grammar has a structure which consists of a list of integers. Each integer from is the number of productions having the non terminal on the left-hand side (according to the ordering of ). Hence the cardinality of the set of productions satisfies .

Now, let be a class of grammars over a terminal vocabulary . By we denote the grammars in with complexity . We then enumerate by increasing the complexity . To every complexity class corresponds a set of structure classes which is determined by and . Therefore a complexity class is enumerated by enumerating each of its structure classes (i. e. every structure that constitutes ). In addition, we need to define an ordered sequence which consists on all possible right-hand sides for the production rules. The sequence is ordered lexicographically (first terminals then non terminals) and is defined according to the class of grammars we want to enumerate. For example, suppose we are interested in enumerating the class of Chomsky Normal Form grammars over the terminal vocabulary and the non terminal vocabulary , we then set .

2.3.1 Implementation

Given a complexity , the algorithm described below (that we implemented in the Wolfram Language running on Mathematica) enumerates all the grammars according to [35] in a structure class .

  1. The complexity measure is provided by the following pairing function in and :

    In other words, given , we apply the inverse of the above function in order to get the values of and . This function is implemented by the function pairingInverse[].

  2. The set of non terminals is generated by the function generateSetN[].

  3. The ordered sequence is generated using the set of non terminals by the function generateSetR[].

  4. The different structure classes that correspond to complexity are generated by the function generateStructureClasses[].

  5. All the possible grammars with the structure classes defined at previous step are then generated. Each grammar has an associated matrix . This is performed by function generateStructureMatricesA[, Length[]].

  6. The sequence is used to generate the rules of the grammars by the function generateGrammars[, ].

2.3.2 The CYK algorithm

A procedure to decide if a bit string is generated by a grammar was implemented according to the Cocke–Younger–Kasami (CYK) algorithm. The CYK is an efficient worst-case parsing algorithm that operates on grammars in Chomsky normal form (CNF) in , where is the length of the parsed string and is the size of the CNF grammar . The algorithm considers every possible substring of the input string and decides where is the language generated by . The implementation was adapted from [25].

2.4 CFG Algorithmic Probability

We can now define the Algorithmic Probability of a string according to CFG as follows:

Definition 2.5.

(Context-free Algorithmic Probability, CFG)

And its respective distribution:

where as defined in 2.3.1, is the language generated by and denotes the cardinality of the sample set of the grammars considered. For the results here reported .

with a grammar of complexity at most according to a structure class  [35].

2.5 Linear-bounded complexity

In [1] it is shown that Time-bounded Kolmogorov distribution is universal (comvergence) and they describe the question of an analogue to the algorithmic Coding theorem as an open problem certainly to be solved with the help of exploiting the universality finding. On the other hand, in [2, 13] it has been shown that the time-bounded algorithmic complexity (being computable) is a Solovay function. These functions are an upper bound of algorithmic complexity (prefix-free version) and they give the same value for almost all strings.

In [5, 21] we described a numerical approach to the problem of approximating the Kolmogorov complexity for short strings. This approach does an exhaustive execution of all deterministic 2-symbol Turing Machines, constructs an output frequency distribution and then the Coding Theorem is applied to approximate the algorithmic complexity of the strings produced.

For this experiment we follow the same approach but we consider the less powerful model of computation of linear-bounded automata (LBA). A LBA is basically a single tape Turing Machine which never leaves those cells on which the input was placed [22]. It is well know that the class of languages accepted by LBA is in fact that of context-sensitive languages [22].

We use the same Turing Machine formalism as [5, 21] which is that of the Busy Beaver introduced by Rado [8], and we use the known values of the Busy Beaver functions.

2.6 Time complexity

is the class of Turing Machines that produce an output in polynomial time with respect of the size of its input. Is easy to see that this class is contained by the class defined by Linear Bounded Automatons (LBA): if the number of the transition is bounded by a linear function, so are the number of cells it can visit, but is important to note that LBA are not time restricted and can use non-deterministic transition. Now, given that Turing Machines can decide context-free grammars in polynomial time (ex: the CYK algorithm), is higher in the hierarchy than Type-2 languages.

Within the context of this article, we will represent this class with the set of Turing Machines with 4 states and 2 symbols with no inputs whose execution time is upper-bounded by a fixed constant. We will cap our execution time by 27, 54, 81 and 107 for a total of 44 079 842 304 Turing machiness, where 107 is the Busy Beaver value of the set.

2.7 The Chomsky hierarchy bounding execution time

The definition of bounded algorithmic complexity is a variation of the unbounded version as follows:

Definition 2.6.

(Linear-bounded Algorithmic Complexity, CFG)

By being bounded by polynomial-sized tapes, the Turing Machines that decide context-sensitive grammars (type-1) can be captured in exponential time by deterministic Turing Machines.

Exactly where each class of Chomsky hierarchy is with respect of the time-based computational complexity classification is related to seminal open problems. For instance, a set equality between the languages recognized by linear-bounded automata and the ones recognized in exponential time would solve the question. Nevertheless, varying the allowed computed time for the CTM algorithm allow us to capture approximations to the descriptive complexity of an object with lower computing resources in a similar way that as does considering each member of the Chomsky hierarchy.

2.8 Non-halting models

We also considered models with no halting configuration, such as cellular automata (nonH-CA) and Turing machines (nonH-TM) with no halting state as defined in [37] in order to assess whether they converge or not to the Universal Distribution defined over machines with halting condition. For cellular automata we exhaustively ran all the 256 Elementary Cellular Automata [37] (i.e. closest neighbour and centre cell are taken into consideration) and all 65 536 so-called General Cellular Automata [37] (that is with 2 neighbours to one side and one to the other, and the centre cell). For Turing machines, we ran all 4096 (2,2) Turing machines with no halting state, and a sample of 65 536 (same number as CA) Turing machines in (3,2) also with no halting state.

2.9 Consolidating the empirical distributions

In order to perform the comparisons among the distributions in each of the Chomsky hierarchy levels, it is needed to consolidate cases of bias imposed by arbitrary choices coming from the chosen model (e.g. starting from a tape with 0 or 1). This is because, for example, the string 0000 should occur exactly the same number of times than 1111 does and so on because 0000 and 1111 should have exactly the same algorithmic complexity. If is the string and the frequency of production, we thus consolidate the algorithmic probability of denoted by as follows:

where is the reversion of , e.g. 0001 becomes 1000 and is the negation of , e.g. 0001 becomes 1110, for all empirical distributions for FSA, CFG, LBA and TM. It is worth notice that and do not increase the algorithmic complexity of but by a very small constant and thus there is no reason to expect neither the model nor the complexity of the strings to have been produced or taken to have different algorithmic complexity. Greater details on the counting method are also given in [5].

3 Results

Table 1 reports the number of strings produced at each model.

Chomsky Computational No.
Type Model Strings
3 FSA(8-22) 294
3 FSA/AP(8-22) 1038
2 CFG(40K) 496
(2,0) LBA(27) 847
(2,0) LBA(54) 1225
(2,0) LBA(81) 1286
0 LBA 107 = TM(4,2) 1302
0 TM(5,2) 8190
Table 1: Strings of at most 12 bits produced at each level from a total possible of 8190 strings (which are all generated at ). LBA was simulated by halting time cutoffs which is strictly less powerful than the TM model (Type-0) but strictly greater than CFG (Type-2). Each LBA is followed by its runtime cutoff. The cutoff value of 107 steps is the Busy Beaver for (4,2) and given that LBA are emulated by TM in (4,2) by letting them run up to 107 means that they have exhausted the full TM space (4,2) and thus are equivalent. The empirical distribution for CFG was obtained by producing 40 000 grammars and checking whether each of the 1302 strings occurred in LBA 107 could be generated by any of those grammars.
Figure 1: Correlation plots (A) between the 2 models of Algorithmic Probability for FSA using the same enumeration (B) between FSA (both plain and FSA/AP-based) and (C) both FSA and FSA/AP separately against Turing machines.
Figure 2: Correlation plots of the four Chomsky hierarchy types. (A) FSA v TM. (B) how FSA approximates TM before converging to a Kendall ranking correlation value below 0.9 (with LBA able to reach 1, see Fig. 5). (C) CFG v TM and (D) LBA v TM.

3.1 Finite-state complexity

The experiment consists on a thorough analysis of all strings that satisfy . If the string satisfies (for some ) then we compute and generate a set of output strings. Then a frequency distribution is constructed from the set of output strings. On the other hand, we compute the Finite-state complexity for strings such that (this is an arbitrary decision).

3.1.1 Produced distributions for

The results given in Table 2 indicate how many strings satisfy (such that encodes the transition table of some transducer and is an input for it) per string length.

Running FSAs is very fast, there is no halting problem and all stop very quickly. However, while FSA and CFG preserve some of the ranking of greater computational models and accelerate the appearance of ‘jumpers’ (long strings with low complexity) these weak models of computation do not generate the highest algorithmic complexity strings that come in the tail of the distributions of more powerful models as shown in Fig. 4.

Size Strings Transducers
8 256 1
9 512 2
10 1024 6
11 2048 12
12 4096 34
13 8192 68
14 16384 156
15 32768 312
16 65536 677
17 131072 1354
18 262144 2814
19 524288 5628
20 1048576 11474
21 2097152 22948
22 4194304 46332
Table 2: Transducers per string length.

For example, there is only one binary string of length 8 that encodes a transducer out of strings in , which is the transducer with the transition function (4) (we refer to it as the smallest transducer).

In the case of , we found that out of strings only two of them encode the smallest transducer with “0” and “1” as input. Again, the only string produced by this distribution is the empty string .

is the first distribution in which one of the strings encodes a transducer with two states. The Finite-state complexity of the strings produced by shows (see Table 16 and Table 17 in the Supplementary Material).

consists of 2814 transducers (see Table 3 and Table 4 in the Supplementary Material). The longest string produced is of length 6. The rest of the tables are in the Supplementary Material.

String Probability
0.82800
00 0.02701
11 0.02701
000 0.01990
111 0.01990
0 0.01848
1 0.01848
0000 0.01279
1111 0.01279
00000 0.00426
11111 0.00426
01 0.00213
10 0.00213
000000 0.00071
0101 0.00071
1010 0.00071
111111 0.00071
Table 3: Example of a probability distribution for .
Complexity Frequency
14 1024
12 640
13 512
11 320
10 212
9 106
Table 4: Frequency of Finite-state complexity for strings produced by .

The rest of the tables are reported in the Suplpementary Material.

3.2 Computing Finite-state complexity

We performed another experiment in order to further analyze the characteristics of Finite-state complexity of all strings of length . We summarize the results we obtained for computing Finite-state complexity for each string of length in Table 5 whereas Table 6 shows the strings that encode transducers such that .

Complexity Frequency
4 1
7 2
8 2
9 4
10 4
11 10
12 22
13 32
14 56
15 126
16 252
Table 5: Frequency of for strings of length .
Transducer Frequency
0000 1
000100 8
0001010110 1
00010110 4
0001011100 1
0001011110 1
000110 8
00011100 4
0001110100 1
0001110110 1
0001111100 1
01000110 480
Table 6: Frequency of the strings that encode transducers such that .

3.3 Context-free grammar distribution

We created production rules for 298 233 grammars with up to 26 non-terminal symbols and used the first 40 000 of them on a set of 155 strings for which frequency we also had on all other levels (FSA, LBA and TM). In Table 7 is shown the top 20 produced strings.

String Frequency
0 5131
00 5206
000 5536
0000 5480
00000 5508
000000 5508
00001 2818
0001 2810
00010 2754
00011 2764
001 2812
0010 2692
00100 2750
00101 2744
0011 2736
00110 2730
00111 2736
01 2688
010 2748
0100 2742
Table 7: Top 20 most frequent strings generated by Context-free grammars (CFG). Assuming that the algorithmic Coding theorem held, we can then transform the frequency into a comutable algorithmic complexity estimation by CFG to compare with.

3.4 Linear-bounded automata distribution

Table 8 shows the different values that we considered for the experiments and the number of strings produced by all LBA’s with states .

States Tape space Steps Initial position Strings produced
2 7 7 3 11
2 7 7 4 15
2 9 7 5 15
2 9 10 5 15
2 11 10 5 15
2 11 10 7 15
3 9 22 5 78
3 11 22 7 78
3 13 22 9 78
Table 8: Experiments of LBA’s with states .

Because of computational power limitations, we generated randomly LBA’s with states . According to the assumptions explained above the tape space should be 16. However, we took the tape space to be 17 since that would allow us to place the initial head position right in the middle of the tape. The table 9 shows the experiments we performed.

As the results demonstrate, by varying the allowed execution time for the space of Turing Machines we can approximate the CTM distribution corresponding for each Chomsky hierarchy level. For instance, regular languages (type-3 grammars) can be decided in linear time, given that each transition and state in a finite automaton can be encoded by a corresponding state and transition in a Turing Machine. Context-free grammars (type-2) can be decided in polynomial time with parsing algorithms such as CYK.

No. strings
States Tape space Steps Initial position Random LBA’s produced
4 17 108 8 100000 93
4 17 108 8 200000 110
4 17 108 8 300000 125
4 17 108 8 350000 119
4 17 108 8 400000 134
4 17 108 8 500000 138
4 17 108 8 600000 143
Table 9: Experiments of LBA’s with states .

3.5 Emergence of the Universal Distribution

Figure 3: Comparison of empirical distributions of different computational power, with TM in log plot (thus the only exponential) followed by LBA and then FSA and CFG when taking their fitting slopes coefficients. Running CFG is very expensive because for every string we need to test if such a string can be generated in all the produced grammars, that means in our case 1302 40 000, and while the CYK algorithm runs in polynomial time, the combinatorial explosion turns the AP-based model by CFG intractable.
Figure 4: Each model misses the most random strings according to the next increasing power model with the greatest discrepancy against TM(5,2). For example, FSA and FSA/AP produce the same strings but they assign different values to each string. Here is shown that FSA/AP outperforms FSA alone by more accurately identifying greatest randomness strings and therefore missing less of them compared to the next level CFG. The plot also shows that CFG is more powerful than LBA with halting runtime 27 and similar to LBA 54. The progression of LBA towards the full power of TM is also noticeable.

3.5.1 Time-bounded Emergence

Fig. 5 shows how LBA asymptotically approximate the Universal Distribution.

A                                                    B

C                                                    D

Figure 5: Trading computational power for accuracy. (A) Number of halting machines per maximum imposed runtime (B) Number of strings produced among all possible showing all strings up to length 9 were generated in (C) Output frequency distribution per running time produced in from highest to lower frequency (D) Smooth density plot maximizing the differences of the output distributions at each runtime approximating the full power of Turing machines in versus strictly lower computational power when bounding the runtime.

3.5.2 Rate of convergence of the distributions

One opportunity offered by this analysis is the assessment of the way in which other methods distribute strings by (statistical) randomness such as Shannon entropy and the performance of other means of approximation algorithmic complexity, such as lossless compression algorithms, in particular one of the most popular based on LZW (Compress). We can then compare these 2 methods in relation to estimations of a Universal Distribution produced by TM(4,2). The results (see Fig. 6) of both entropy and Compression conform with the theoretical expectation. Entropy correlates best at the first level of the Chomsky hierarchy that of FSAs stressing that the algorithmic discriminatory power of entropy to tell apart randomness from pseudo-randomness is limited to statistical regularities of the kind that regular languages would capture. Lossless compression, however, at least assessed by one of the most popular methods behind other popular lossless compression formats, outperformed Shannon entropy but not by much and it was at best most correlated to the output distribution generated by CFG. This does not come as a surprise given that popular implementations of lossless compression are a variation of Shannon entropy generalized to blocks (variable-width window) that capture repetitions often followed by a remapping to use shorter codes for values with higher probabilities (dictionary encoding, Huffman coding) thus effectively a basic grammar based on a simple rewriting rule system. We also found that, while non-halting models approximate the Universal Distribution, they start diverging from TM and remain correlated to LBAs with lower runtime despite increasing the number of states. This may be expected from the over-representation of strings that non-halting machines would otherwise skip from actually producing them (defined as produced upon halting for machines with halting configuration).

Figure 6: Coding theorem-like behaviour and emergence of the Universal Distribution. Correlation of empirical output distributions as compared to the output distribution of as introduced and thoroughly investigated in [21]. A progression towards greater correlation is noticed as a function of increasing computational power. Bold black labels are placed in their Chomsky level and gray labels are placed within the highest correlated level. Shannon Entropy and lossless compression (Compress) distribute values below or at about the first 2 Chomsky types as expected. It is not surprising to see LBA with runtime 107 to further depart in ranking, because LBA after 27 steps produced the highest frequency strings that are expected to converge faster, eventually LBA 107 (which is nothing else but TM(4,2)) will converge to TM(5,2). An empirical bound of non-halting models seems to be low LBA even when increasing the number of states (or symbols for CA).

4 Some open questions

4.1 Tighter bounds and suprema values

We have provided upper and lower bounds for each model of computation but current intervals seem to overlap for some measures of correlation. One question is the exact boundaries, especially how closer to the Universal Distribution each supremum for each model of computation can take us. In other words, find the tighter bounds for the intervals at each level.

4.2 Computable error corrections

There are some very interesting open questions to further explore in the future. For example, if computable corrections can be made to sub-universal distributions e.g. as calculated by context-free grammars, in order to correct for trivial (and also computable) biases such as string length. Indeed, while CFG produced an interesting distribution closer to that of LBAs and better than FSAs, there is, for example, an over-representation of trivial long strings which can easily be corrected. The suspicion is that, while one can apply these corrections and increase the speed of convergence to TM, the interesting cases are non-computable. A theoretical framework and a numerical analysis would be interesting to develop.

4.3 Sensitivity to choice of enumeration

Equivalent to the choice of programming language of reference universal Turing machine. Another interesting question is that of the stability of the chosen enumerations for the different models of computation, both at the same and at different computational power level. A striking outcome of the results here reported is that does not only the increase in computing power better approximates the estimations of the distribution produced by Turing machines, but that completely different models of computation, not only in language description but also in computational power, are rather independent of enumeration. While the enumeration that we have followed for every of the computational model is not arbitrary because they follow a length criteria of increasing program size, it is clear how one can device enumerations to produce completely different behaviour. We have pointed out this stability before in [38] with experiments between Turing machines, cellular automata and Post tag systems, and how these results suggest some sort of ‘natural behaviour’ defined as a behaviour that is not artificially introduced with the purpose to produce a different looking initial distribution (before converging per the invariance theorem for systems with such property). In this sense, in [4] we proposed a measure of ‘algorithmicity’ in the sense of the Universal Distribution, quantifying how close or removed a method producing a distribution is from other approximations, in particular that of one of the most standard Turing machine models, the one used for the Busy Beaver [8], that we have shown is not a special case but that several other variations to this model and completely different models of computation produce similar output distributions [41, 42, 4] including the results reported in this paper. However, one other open question is to enumerate systems in different ways and numerically quantify how many of them are convergent or divergent, how much the divergent diverge and for how long and in what conditions, and if the convergent dominate.

4.4 Missing strings from non-halting models

We have seen that for halting models of computation, decreasing the computational model has as an effect the missing of the most algorithmic random strings of the next computational model in computational power. As we have also seen, non-halting models seem to converge to lower runtime distributions of LBAs which even when they are highly correlated to TMs they do not appear to approach TM but to remain stable producing output distributions similar to LBA. An interesting question to explore is what are the kind of strings missed by non-halting machines. As opposed to the strings missed by halting machines below the Turing-universal limit, Turing-universal models of computation without halting configuration may miss other kind of strings. Do they miss less or more random or simple strings as compared to halting models?

5 Conclusions

Different sub-universal systems may produce different empirical distributions, not only in practice but also in principle i.e. asymptotically they may diverge. However, we have here provided the means to make meaningful comparisons especially against an empirical distribution that has been found to be stable and apparently convergent. It is interesting to explore and seek Coding-theorem like behaviour in sub-universal systems to better understand the landscape of algorithmic probability and complexity for all type of computational systems to which we have here contributed.

The results here reported show that, indeed, the closer a system is to Turing-universality, the higher the likeliness of its output distribution is to the the empirical distribution of universal systems, and that finite approximations of algorithmic complexity from finite sub-universal systems [26] is an interesting approach departing from the use of limited computable measures that can approximate the power of universal measures of complexity.

The results also show improvements over current major tools for approximating algorithmic complexity such as lossless compression algorithms. To our knowledge, it was not possible to quantify or even compare lossless compression algorithms as there was no other standard or numerical alternatives to approximate algorithmic complexity. The construction of empirical distributions based on Algorithmic Probability does provide the means and constitute an approach to evaluate performance in what we have named an algorithmicity test [38].

At least for our implementations–that may or may not be an indication of the best algorithms in terms of time complexity to emulate these computational models and thus produce their output distributions, it is not particularly faster to produce distributions from weaker models if there is interest in producing high algorithmic complexity strings for evaluation, but it does for cases in which only finer-grained values for low complexity string are needed in exchange for a faster calculation. Compared to entropic and lossless compression approximations, producing a partial distributions from finite approximations of Algorithmic Probability even over weak models of computation constitutes a major improvement over strings that are otherwise assigned a greater randomness content by traditional methods such as Shannon entropy and equivalent statistical formulations.

References

  • [1] L. Antunes y L. Fortnow, Time-Bounded Universal Distributions, Electronic Colloquium on Computational Complexity, Report No. 144, 2005.
  • [2] L. Bienvenu and R. Downey. Kolmogorov complexity and Solovay functions. In STACS, volume 3 of LIPIcs, pages 147–158. Schloss Dagstuhl- Leibniz-Zentrum fuer Informatik, 2009.
  • [3] C. Campeanu, K. Culik II, K. Salomaa, S. Yu, S., State complexity of basic operations on finite languages. In: O. Boldt, H. Jürgensen, (eds.) WIA 1999, LNCS, vol. 2214, pp. 60–70. Springer, Heidelberg, 2001.
  • [4] J.-P. Delahaye, H. Zenil, Towards a stable definition of Kolmogorov-Chaitin complexity, arXiv:0804.3459 [cs.IT], 2008.
  • [5] J.-P. Delahaye and H. Zenil, Numerical Evaluation of the Complexity of Short Strings: A Glance Into the Innermost Structure of Algorithmic Randomness, Applied Mathematics and Computation 219, pp. 63–77, 2012.
  • [6] K. Dingle, S. Schaper, and A.A. Louis, The structure of the genotype-phenotype map strongly constrains the evolution of non-coding RNA, Interface Focus 5: 20150053, 2015.
  • [7] N. Gauvrit, F. Soler-Toscano, H. Zenil, Natural Scene Statistics Mediate the Perception of Image Complexity, Visual Cognition, vol. 22:8, pp. 1084–1091, 2014.
  • [8] T. Rado, “On non-computable functions” Bell System Technical Journal 41:3, 877–884, 1962.
  • [9] N. Gauvrit, H. Zenil, F. Soler-Toscano, J.-P. Delahaye, P. Brugger, Human Behavioral Complexity Peaks at Age 25, PLoS Comput Biol 13(4): e1005408, 2017.
  • [10] S.F. Greenbury, I.G. Johnston, A.A. Louis, S.E. Ahnert, J. R., A tractable genotype-phenotype map for the self-assembly of protein quaternary structure, Soc. Interface 11, 20140249, 2014.
  • [11] S. Hernández-Orozco, H. Zenil, N.A. Kiani, Algorithmically probable mutations reproduce aspects of evolution such as convergence rate, genetic memory, modularity, diversity explosions, and mass extinction, arXiv:1709.00268 [cs.NE]
  • [12] V. Kempe, N. Gauvrit, D. Forsyth, Structure emerges faster during cultural transmission in children than in adults, Cognition, 136, 247–254, 2014.
  • [13] R. Hölzl, T. Kräling, W. Merkle, Time-bounded Kolmogorov complexity and Solovay functions, Theory Comput. Syst. 52:1, 80–94, 2013.
  • [14] M. Hutter, A Theory of Universal Artificial Intelligence based on Algorithmic Complexity, Springer, 2000.
  • [15] L.A Levin, Universal sequential search problems. Problems of Information Transmission, 9:265–266, 1973.
  • [16] L.A. Levin. Laws of information conservation (non-growth) and aspects of the foundation of probability theory, Problems Information Transmission, 10(3):206–210, 1974.
  • [17] L.G. Kraft, A device for quantizing, grouping, and coding amplitude modulated pulses, Cambridge, MA: MS Thesis, Electrical Engineering Department, Massachusetts Institute of Technology, 1949.
  • [18] Calude, C. S., Salomaa, K., & Roblot, T. K. (2011). Finite-state complexity. Theoretical Computer Science, 412(41), 5668-5677.
  • [19] S. Schaper and A.A. Louis, The arrival of the frequent: how bias in genotype-phenotype maps can steer populations to local optima PLoS ONE 9(2): e86635, 2014.
  • [20] B.R. Steunebrink, J. Schmidhuber, Towards an Actual Gödel Machine Implementation. In P. Wang, B. Goertzel, eds., Theoretical Foundations of Artificial General Intelligence, Springer, 2012.
  • [21] Soler-Toscano, F., Zenil, H., Delahaye, J. P. & Gauvrit, N. (2014). Calculating Kolmogorov complexity from the output frequency distributions of small Turing machines. PloS one, 9(5), e96223.
  • [22] J.E. Hopcroft, & J.D. Ullman, Formal languages and their relation to automata, 1969.
  • [23] W. Kirchherr and M. Li and P. Vitányi, The Miraculous Universal Distribution, Mathematical Intelligencer, 19, 7–15, 1997.
  • [24] B.Y. Peled, V.K. Mishra, A.Y. Carmi, Computing by nowhere increasing complexity, arXiv:1710.01654 [cs.IT]
  • [25] J. Rangel-Mondragon, Recognition and Parsing of Context-Free, http://library.wolfram.com/infocenter/MathSource/3128/ Accessed on Aug 15, 2017.
  • [26] F. Soler-Toscano, H. Zenil, A Computable Measure of Algorithmic Probability by Finite Approximations with an Application to Integer Sequences, Complexity (accepted).
  • [27] J. Schmidhuber, Optimal Ordered Problem Solver, Machine Learning, 54, 211-254, 2004..
  • [28] J. Schmidhuber, V. Zhumatiy, M. Gagliolo, Bias-Optimal Incremental Learning of Control Sequences for Virtual Robots. In Groen, et al. (eds) Proceedings of the 8th conference on Intelligent Autonomous Systems, IAS-8, Amsterdam, The Netherlands, pp. 658–665, 2004.
  • [29] F. Soler-Toscano, H. Zenil, J.-P. Delahaye, and N. Gauvrit, Small Turing Machines with Halting State: Enumeration and Running on a Blank Tape. http://demonstrations.wolfram.com/SmallTuringMachinesWithHaltingStateEnumerationAndRunningOnAB/. Wolfram Demonstrations Project. Published: January 3, 2013.
  • [30] R.J. Solomonoff. A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:1–22 and 224–254, 1964.
  • [31] M. Li, and P. Vitányi, An Introduction to Kolmogorov Complexity and Its Applications, 3rd ed, Springer, N.Y., 2008.
  • [32] R.J. Solomonoff, Complexity–Based Induction Systems: Comparisons and Convergence Theorems, IEEE Trans. on Information Theory, vol IT–24, No. 4, pp. 422–432, 1978.
  • [33] R.J. Solomonoff, The Application of Algorithmic Probability to Problems in Artificial Intelligence, in L.N. Kanal and J.F. Lemmer (eds.), Uncertainty in Artificial Intelligence, pp. 473–491, Elsevier, 1986.
  • [34] Solomonoff, R.J. A System for Incremental Learning Based on Algorithmic Probability, Proceedings of the Sixth Israeli Conference on Artificial Intelligence, Computer Vision and Pattern Recognition, Dec. 1989, pp. 515–527.
  • [35] R.M. Wharton, Grammar enumeration and inference, Information and Control, 33(3), 253–272, 1977.
  • [36] M. Wiering and J. Schmidhuber. Solving, POMDPs using Levin search and EIRA, In Proceedings of the International Conference on Machine Learning (ICML), pages 534–542, 1996.
  • [37] S. Wolfram, A New Kind of Science, Wolfram Media, Champaign, IL., 2002.
  • [38] H. Zenil and J-P. Delahaye, On the Algorithmic Nature of the World, In G. Dodig-Crnkovic and M. Burgin (eds), Information and Computation, World Scientific Publishing Company, 2010.
  • [39] H. Zenil, F. Soler-Toscano, K. Dingle and A. Louis, Correlation of Automorphism Group Size and Topological Properties with Program-size Complexity Evaluations of Graphs and Complex Networks, Physica A: Statistical Mechanics and its Applications, vol. 404, pp. 341–358, 2014.
  • [40] H. Zenil, N.A. Kiani and J. Tegnér, Methods of Information Theory and Algorithmic Complexity for Network Biology, Seminars in Cell and Developmental Biology, vol. 51, pp. 32-43, 2016.
  • [41] H. Zenil, F. Soler-Toscano, J.-P. Delahaye and N. Gauvrit, Two-Dimensional Kolmogorov Complexity and Validation of the Coding Theorem Method by Compressibility, PeerJ Computer Science, 1:e23, 2015.
  • [42] F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit , Correspondence and Independence of Numerical Evaluations of Algorithmic Information Measures, Computability, vol. 2, no. 2, pp 125-140, 2013.

Supplementary Material

.0.1 Empirical distributions for FSA

For we have that there are six strings that encode a transducer. In fact two of the them are different from the smallest transducer. However, as the previous cases, the only string produced is the empty string .

consists of 12 strings with inputs of length one and three but the output of these transducers is .

contains 34 transducers with inputs whose length ranges from two to four. The output distribution still contains only the string .

is a more interesting distribution. It consists of 68 strings whose input ranges from lengths 1, 3 and 5. Table 10 shows the probability distribution of the strings produced by this distribution. The Finite-state complexity of the strings that comprise is summarized in Table 11.

String Probability
0.94118
0 0.02941
1 0.02941
Table 10: Probability distribution of .
Complexity Frequency
9 32
7 20
8 16
Table 11: Frequency of Finite-state complexity for strings produced by .

is a more richer distribution than the previous since it contains 156 strings that encode different transducers. Table 12 shows the different strings produced by this distribution.

String Probability
0.92308
0 0.02564
1 0.02564
00 0.01282
11 0.01282
Table 12: Probability distribution of .

We note the following facts:

  • The length of the longest string produced is two.

  • The string remains as the one with the highest probability.

  • The Finite-state complexity of the strings produced ranges from 7 to 10 (see Table 13).

  • produces two strings of length 2 out of , that is, “00” and “11”.

Complexity Frequency
10 64
8 40
9 32
7 20
Table 13: Frequency of Finite-state complexity for strings produced by .

is quite similar to in terms of the strings it produces (see Table 12 and Table 15).

String Probability
0.88462
0 0.03205
1 0.03205
00 0.01923
11 0.01923
000 0.00641
111 0.00641
Table 14: Probability distribution of .
Complexity Frequency
11 128
9 80
10 64
8 40
Table 15: Frequency of Finite-state complexity for strings produced by .
String Probability
0.87592
0 0.02363
00 0.02363
1 0.02363
11 0.02363
000 0.01182
111 0.01182
0000 0.00295
1111 0.00295
Table 16: Probability distribution of .
Complexity Frequency
12 256
10 160
11 128
9 80
8 53
Table 17: Frequency of Finite-state complexity for strings produced by .

shows an even more diverse set of strings produced (see Table 18). We have the following interesting facts,

  • The longest string produced is of length 5.

  • For the first time, a distribution produces all strings of length 2.

String Probability
0.83752
0 0.02806
1 0.02806
00 0.02511
11 0.02511
000 0.01773
111 0.01773
0000 0.00739
1111 0.00739
00000 0.00148
01 0.00148
10 0.00148
11111 0.00148
Table 18: Probability distribution of .
Complexity Frequency
13 512
11 320
12 256
10 160
9 106
Table 19: Frequency of Finite-state complexity for strings produced by .

.0.2 ,, and

Here are the strings that comprise each one of these distributions.

String Probability
0.80597
000 0.02345
111 0.02345
00 0.02274
11 0.02274
0 0.01812
1 0.01812
0000 0.01706
1111 0.01706
00000 0.00817
11111 0.00817
000000 0.00284
111111 0.00284
01 0.00178
10 0.00178
0101 0.00107
1010 0.00107
0000000 0.00036
001 0.00036
010 0.00036
010101 0.00036
011 0.00036
100 0.00036
101 0.00036
101010 0.00036
110 0.00036
1111111 0.00036
Table 20: Probability distribution of . ‘Jumpers’ as defined in [5] and [21] are apparent, those simple strings of relatively much longer length that climb the complexity ladder.
Complexity Frequency
15 2048
13 1280
14 1024
12 640
11 424
10 212
Table 21: Frequency of Finite-state complexity for strings produced by .
String Probability
0.80024
0000 0.02144
1111 0.02144
00 0.02092
000 0.02092
11 0.02092
111 0.02092
0 0.01185
00000 0.01185
1 0.01185
11111 0.01185
000000 0.00593
111111 0.00593
01 0.00174
10 0.00174
0101 0.00157
1010 0.00157
0000000 0.00139
1111111 0.00139
010101 0.00070
101010 0.00070
00000000 0.00035
11111111 0.00035
0001 0.00017
0010 0.00017
0011 0.00017
0100 0.00017
01010101 0.00017
0110 0.00017
0111 0.00017
1000 0.00017
1001 0.00017
10101010 0.00017
1011 0.00017
1100 0.00017
1101 0.00017
1110 0.00017
Table 22: Probability distribution of .
Complexity Frequency
16 4096
14 2560
15 2048
13 1280
12 848
11 424
10 218
Table 23: Frequency of Finite-state complexity for strings produced by .
String Probability
0.78630
0000 0.02109
1111 0.02109
000 0.02066
111 0.02066
00 0.01682
11 0.01682
00000 0.01665
11111 0.01665
0 0.01063
1 0.01063
000000 0.00959
111111 0.00959
0000000 0.00331
1111111 0.00331
01 0.00166
10 0.00166
0101 0.00139
1010 0.00139
00000000 0.00122
11111111 0.00122
010101 0.00105
101010 0.00105
01010101 0.00044
10101010 0.00044
001 0.00026
010 0.00026
011 0.00026
100 0.00026
101 0.00026
110 0.00026
000000000 0.00009
0000000000 0.00009
00001 0.00009
00010 0.00009
00011 0.00009
00100 0.00009
00101 0.00009
00110 0.00009
00111 0.00009
01000 0.00009
01001 0.00009
01010 0.00009
0101010101 0.00009
01011 0.00009
01100 0.00009
01101 0.00009
01110 0.00009
01111 0.00009
10000 0.00009
10001 0.00009
10010 0.00009
10011 0.00009
10100 0.00009
10101 0.00009
1010101010 0.00009
10110 0.00009
10111 0.00009
11000 0.00009
11001 0.00009
11010 0.00009
11011 0.00009
11100 0.00009
11101 0.00009
11110 0.00009
111111111 0.00009
1111111111 0.00009
Table 24: Probability distribution of .
Complexity Frequency
17 8192
15 5120
16 4096
14 2560
13 1696
12 848
11 436
Table 25: Frequency of Finite-state complexity for strings produced by .

String Probability
0.78313
0000 0.02180
1111 0.02180
000 0.01744
111 0.01744
00000 0.01727
11111 0.01727
000000 0.01442
111111 0.01442
00 0.01416
11 0.01416
0 0.00622
1 0.00622
0000000 0.00587
1111111 0.00587
00000000 0.00276
11111111 0.00276
0101 0.00160
1010 0.00160
01 0.00138
10 0.00138
010101 0.00125
101010 0.00125
01010101 0.00073
10101010 0.00073
000000000 0.00043
111111111 0.00043
0000000000 0.00030
1111111111 0.00030
0101010101 0.00026
1010101010 0.00026
001 0.00017
010 0.00017
011 0.00017
100 0.00017
101 0.00017
110 0.00017
0001 0.00009
0010 0.00009
001001 0.00009
0011 0.00009
0100 0.00009
010010 0.00009
0110 0.00009
011011 0.00009
0111 0.00009
1000 0.00009
1001 0.00009
100100 0.00009
1011 0.00009
101101 0.00009
1100 0.00009
1101 0.00009
110110 0.00009
1110 0.00009
000000000000 0.00004
000001 0.00004
000010 0.00004
000011 0.00004
000100 0.00004
000101 0.00004
000110 0.00004
000111 0.00004
001000 0.00004
001010 0.00004
001011 0.00004
001100 0.00004
001101 0.00004
001110 0.00004
001111 0.00004
010000 0.00004
010001 0.00004
010011 0.00004
010100 0.00004
010101010101 0.00004
010110 0.00004
010111 0.00004
011000 0.00004
011001 0.00004
011010 0.00004
011100 0.00004
011101 0.00004
011110 0.00004
011111 0.00004
100000 0.00004
100001 0.00004
100010 0.00004
100011 0.00004
100101 0.00004
100110 0.00004
100111 0.00004
101000 0.00004
101001 0.00004
101010101010 0.00004
101011 0.00004
101100 0.00004
101110 0.00004
101111 0.00004
110000 0.00004
110001 0.00004
110010 0.00004
110011 0.00004
110100 0.00004
110101 0.00004
110111 0.00004
111000 0.00004
111001 0.00004
111010 0.00004
111011 0.00004
111100 0.00004
111101 0.00004
111110 0.00004
111111111111 0.00004
Table 26: Probability distribution of .

Complexity Frequency
18 16384
16 10240
17 8192
15 5120
14 3392
13 1696
12 872
11 436
Table 27: Frequency of Finite-state complexity for strings produced by .

.0.3 Code in Perl for finite-state complexity

The program distributionTransducers.py is used to analyze all strings of some length to determine whether satisfies (for some ) and if so then the program computes . This program generates a set of output strings (result-experiment-distribution.csv) from which we can construct an output frequency distribution.

Example of execution:

  • python distributionTransducers.py 8 10 analyzes all strings of length 8 up to length 10.

  • python distributionTransducers.py 8 8 analyzes all strings of length 8.

The file result-experiment-distribution.csv contains the following columns:

  • string, this corresponds to the strings discussed above.

  • valid-encoding, takes value 1 in case and 0 otherwise.

  • sigma, corresponds to string such that .

  • string-p, corresponds to string such that .

  • num-states, number of states of transducer .

  • output, corresponds to string such that .

  • output-complexity, finite-state complexity of output string .

The program computeComplexityStrings.py computes the finite-state complexity for all strings of length up to length (this is the implementation of the algorithm described in [18]). This program generates the file result-complexity.csv which contains the following columns:

  • x, the string that the program is calculating the finite-state complexity for.

  • complexity, finite-state complexity of string .

  • sigma, string such that .

  • string-p, string such that .

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
305
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description