Codingtheorem Like Behaviour and Emergence of the Universal Distribution from Resourcebounded Algorithmic Probability
Abstract
Introduced by Solomonoff and Levin, the seminal concept of Algorithmic Probability (AP) and the Universal Distribution (UD) predicts the way in which strings distribute as the result of running ‘random’ computer programs. Previously referred to as ‘miraculous’ because of its surprisingly powerful properties and applications as the optimal theoretical solution to the challenge of induction and inference, approximations to AP and the UD are of the greatest importance in computer science and science in general. Here we are interested in the emergence, rates of convergence, and the Codingtheorem like behaviour as a marker of acting AP emerging in subuniversal models of computation. To this end, we investigate empirical distributions of computer programs of weaker computational power according to the Chomsky hierarchy. We introduce measures of algorithmic probability and algorithmic complexity based upon resourcebounded computation compared to previously thoroughly investigated distributions produced from the output distribution of Turing machines. The approach allows for numerical approximations to algorithmic (KolmogorovChaitin) complexitybased estimations at each of the levels of a computational hierarchy. We demonstrate that all these estimations are correlated in rank and they converge both in rank as a function of computational power despite the fundamental differences of each computational model.
Keywords algorithmic codingtheorem like behaviour, Solomonoff’s induction, Levin’s semimeasure, computable algorithmic complexity, Finitestate complexity, transducer complexity, contextfree grammar complexity, linearbounded complexity, time resourcebounded complexity.
Department of Computer Science, University of Oxford, Oxford, U.K.
Algorithmic Dynamics Lab, Unit of Computational Medicine, SciLifeLab, Centre for Molecular Medicine, Department of Medicine Solna, Karolinska Institute, Stockholm, Sweden.
Algorithmic Nature Group, LABORES, Paris, France.
Posgrado en Ciencias
e Ingeniería de la Computación at the Universidad Nacional Autonoma de México (UNAM)
Departamento de Matemáticas, Facultad de Ciencias, Universidad Nacional Autónoma de México (UNAM), Ciudad de México, México.
1 Motivation and Significance
An algorithmic ‘law’ regulates the behaviour of the output distribution of computer programs. The Universal Distribution is the probability distribution that establishes how all the output strings from a universal computer running a random computer program distribute. The Algorithmic Probability of a string is formally defined by:
(1) 
where the sum is over all halting programs for which , a prefixfree universal Turing machine, outputs the string . A prefix universal Turing machine defines a set of valid programs so that the sum is bounded by Kraft’s inequality [17] and not greater than 1 (is also called a semiprobability measure because some programs will not halt and thus the sum of the probabilities is never really 1).
An invariance theorem establishes that the choice of reference universal Turing machine introduces a vanishing bias as a function of the string size and is thus asymptotically negligible.
(2) 
where is a constant that depends on and (think of a compiler size translating in both directions) but is independent of and thus the reference can safely be dropped in the long term, yet this invariance theorem tells nothing about the rate of convergence thus making this numerical experiments more relevant and necessary.
The Algorithmic Probability and the Universal Distribution represent the theoretical optimal to the challenge of induction and inference. According to R. Solomonoff, a cofounder of algorithmic information theory [32, 33, 34].
More recently, at a panel discussion at the World Science Festival in New York City on Dec 14, 2014, Marvin Minsky, one of the founding fathers of AI, said [own transcription]:
It seems to me that the most important discovery since Gödel was the discovery by Chaitin, Solomonoff and Kolmogorov of the concept called Algorithmic Probability which is a fundamental new theory of how to make predictions given a collection of experiences and this is a beautiful theory, everybody should learn it, but it’s got one problem, that is, that you cannot actually calculate what this theory predicts because it is too hard, it requires an infinite amount of work. However, it should be possible to make practical approximations to the Chaitin, Kolmogorov, Solomonoff theory that would make better predictions than anything we have today. Everybody should learn all about that and spend the rest of their lives working on it.
The Universal Distribution has also been referred to as miraculous, because of its properties in inference and prediction [23]. However, the calculation of both is not computable. This has meant that for decades after its discovery little to no attempts to apply Algorithmic Probability to problems in general science have been made. Nonetheless, is not only uncomputable but, to be more precisely, lower semicomputable, which means that it can be approximated from above. It is thus of fundamental interest to science in its application and contribution to the challenges of complexity, inference and causality to find and keep pushing the boundaries towards better approximating methods to algorithmic probability. A recent new framework and pipeline on numerical methods to estimate them have been advanced and proven successful in many areas of application ranging from cognition to graph complexity [9, 12, 7, 39, 40].
There are many properties of AP that make it optimal [32, 33, 34, 23]. For example, the same Universal Distribution will work for any problem within a convergent error, it can deal with missing and multidimensional data, the data does not need to be stationary or ergodic, there is no underfitting or overfitting because the method is parameterfree and thus the data need not to be divided into training and test sets, it is the gold standard for a Bayesian approach in the sense that it updates the distribution in the most efficient and accurate way possible with no assumptions.
Several interesting extensions of resourcebounded Universal Search approaches have been introduced to make algorithmic probability more useful in practice [27, 28, 20, 14] some of which provide some theoretical bounds [1]. While some approaches have explored how relaxing some of the conditions (e.g. universality) on which Levin’s Universal Search is fundamentally based upon [36] or have introduced domainspecific versions (and thus versions of conditional AP). Here we explore the behaviour of explicitly weaker models of computation of increasing computational power to investigate the asymptotic behaviour and emergence of the the Universal Distribution and the properties of both the different models to approximate it and the actual empirical distributions that such models produce.
The socalled Universal Search [15] is based on dovetailing over all possible programs and their runtime such that the fraction of time allocated to program is , where is the size of the program (in number of bits). Despite the algorithm’s simplicity and remarkable theoretical properties, a potentially huge constant slowdown factor has kept it from being used much in practice. Some of the approaches to sped it up have included to introduce biased and make the search domain specific, this has at the same time limited the power of Algorithmic Probability.
There are practical applications of AP that make it very relevant. If one could translate some of the power of Algorithmic Probability to decidable models (thus below the Type0 in the Chomsky hierarchy) without having to deal with the uncomputability algorithmic complexity and algorithmic probability, it would be effectively possible to trade computing power for predictive power. While tradeoffs must exist for this to be possible (full predictability and uncomputability are incompatible), the question of finding a threshold for the codingtheorem to apply would be key to transfer some power from relaxing the computational power of an algorithm. If a smooth tradeoff is found before the undecidability border of Turing completeness it means that the partial advantages of Algorithmic Information Theory can be found and partially recovered from simpler models of computation in exchange for accuracy. Such simpler models of computation may model physical processes that are computationally rich but are subjects to noise or are bounded by resources. More realworld approaches may then lead to applications such as in the reduction of conformational distributions of protein folding [6, 10] in a framework that may favour or rule out certain paths thereby helping predicting the most likely (algorithmic) final configuration. If the chemical and thermodynamic laws that drive these processes are subject to are considered algorithmic in any way even under random interactions e.g. molecular Brownian motion, the Universal Distribution may help shed some light to quantify the most likely regions if, in any way, those laws constitute forms of computation below or at the Turing level that we explore here. This can be considered more plausible if one considers that bias probabilistic bias affect convergence [19] and that we have demonstrated [11] that finite approximations to the Universal Distribution may explain some phenomenology of natural selection.
1.1 Uncomputability in complexity
Here we explore the middle ground at the boundary and study the interplay between computable and noncomputable measures of algorithmic probability connected to algorithmic complexity. Indeed, a deep connection between algorithmic complexity (or KolmogorovChaitin) of an object and of was found and formalized by way of the algorithmic Coding theorem. The theorem establishes that the probability of to be produced by a random algorithm is inversely proportional to its algorithmic complexity (up to a constant) [16]:
(3) 
Levin proved that the output distribution established by Algorithmic Probability dominates (up to multiplicative scaling) any other distribution produced by algorithmic means as long as the executor is a universal machine, hence giving the distribution its ‘universal’ character (and name, as the ‘Universal Distribution’).
This socalled Universal Distribution is a signature of Turingcompleteness. However, many processes that model or regulate natural phenomena may not necessarily be Turing universal. For example, some models of selfassembly may not powerful enough to reach Turingcompleteness, yet they display similar output distributions to that predicted by the Universal Distribution by way of the algorithmic Coding theorem, with simplicity highly favoured by frequency production. Noise is another source of power degradation that may hamper universality and therefore the scope and application of algorithmic probability. However, if some subuniversal systems approach the codingtheorem behaviour this give us great prediction capabilities and less powerful but computable algorithmic complexity measures. Here we ask whether such distributions can be partially or totally explained from importing the relation established by the coding theorem and under what conditions nonuniversal systems can display algorithmic codingtheorem like behaviour.
Here we produce empirical distributions of systems at each of the computing levels of the Chomsky hierarchy, starting from transducers (Type3) as defined in [18], Contextfree grammars (Type2) as defined in [35], linearbounded nondeterministic Turing machines (Type1) as approximations to bounded KolmogorovChaitin complexity and from a universal procedure from an enumeration of Turing machines (Type0) as defined in [5, 21]. We report the results of the experiments and comparisons, showing the gradual codingtheoremlike behaviour at the boundary between decidable and undecidable systems.
2 Methods
We will denote by or just the set of all strings produced by all the Turing machines with states and symbols.
2.1 The Chomsky Hierarchy
The Chomsky hierarchy is a strict containment hierarchy of classes of formal grammars equivalent to different computational models of increasing computing power. At each of the 4 levels, grammars and automata compute a larger set of possible languages and strings. From weaker to stronger computational power:

The most restricted grammars. Generates the regular languages consisting of rules with single nonterminal symbols on the lefthand side and terminal or nonterminal symbols on the righthand side. This level is studied by way of finitestate transducers as a generalization of a finitestate automaton that generates an output at every step generating a set of relations on the output tape yet FSTs do not recognize a larger set than FSAs thus representing this level. We used an enumeration of transducers introduced in [18] where they also proved an invariance theorem thus demonstrating that the enumeration choice is invariant (up to a constant).m[Type2] Grammars that generate the contextfree languages. This kind of grammars are extensively used in linguistics. The languages generated by a CFG grammars are exactly the languages that can be recognized by a nondeterministic pushdown automaton. We denote this level by CFG. We generated production rules for 40 000 grammars according to a sound scheme introduced in [35].

Grammars that generate the contextsensitive languages. The languages described by these grammars are exactly all languages that can be recognized by a linearbounded automaton or a nondeterministic Turing machine whose tape is bounded by a constant times the length of the input that we will denote by LBA. An APbased variation is here introduce and we denote it by LBA/AP.
We also explore the consequences of relaxing the halting configutation (state) condition in models of universal computation (Type0) when it comes to comparing their output distributions.
2.2 Finitestate complexity
Formal language theory and algorithmic complexity had traditionally been disconnected in connection to the number of states, or the number of transitions in a minimal finite automaton accepting a regular language. In [3] it was established a connection by extending the notions of Blum static complexity and of encoded function space. The main reason for this lack of connection was that languages are sets of strings, rather than strings used for measure of algorithmic complexity and a meaningful definition for the complexity of a language was lacking as well as well as a definition of Finitestate algorithmic complexity. However, in [18], they developped a version of algorithmic complexity by replacing Turing machines with finite transducers; the complexity induced is called Finitestate complexity (FSA). Despite of the fact that the Universality Theorem (true for Turing machines) is false for finite transducers, rather surprisingly, the invariance theorem holds true for Finitestate complexity and, in contrast with descriptional complexities (plain and prefixfree), Finitestate complexity is computable.
Defined in [18] and analogous to the core concept (KolmogorovChaitin complexity) of Algorithmic Information Theory (AIT) based on finite transducers instead of Turing machines. Finitestate complexity is computable and there is no a priori upper bound for the number of states used for minimal descriptions of arbitrary strings.
Consider a transducer with the finite set of states . Then the transition function of is encoded by a binary string (see [18] for details). The transducer which is encoded by is called , where is the set of all strings in the form of .
In [18] it was shown that the set of all transducers can be enumerated by a regular language and that there exists a a hierarchy for more general computable encodings. For this experiment we fix .
As in traditional AIT where Turing machines are used to describe binary strings, transducers describe strings in the following way: we say that a pair , , is a description of the string if and only if . The size of the description is defined in the following way
Definition 2.1.
([18]) The Finitestate complexity of (that we will identify as FSA in the results) with respect to encoding is defined by
An important characteristic of traditional AIT is the invariance theorem which states that the complexity is optimal up to an additive constant and relies on the existence of a Universal Turing machine (the additive constant is in fact its size). In contrast with AIT, due to the non existence of a “Universal transducer”, Finitestate complexity includes the size of the transducer as part of the encoding length. Nevertheless, the invariance theorem holds true for Finitestate complexity. An interesting consequence of the invariance theorem for Finitestate complexity is the existence of an upper bound for all , where is the length of the string which encodes the identity transducer. Hence is computable. If and are encodings then for computable function [18].
An alternative definition of Finitestate complexity based on Algorithmic Probability is as follows:
Definition 2.2.
([18]) The Finitestate complexity (denoted by FSA/AP in the results) of with respect to encoding is defined by
That is the number of times that a string is accepted by a transducer (in this case, reported in the results, for encodings of size 8 to 22).
2.2.1 Building a Finitestate empirical distribution
We now define the construction of an empirical distribution using Finitestate complexity. We introduce our alternative definition of algorithmic probability using transducers.
Definition 2.3.
(Finitestate Algorithmic Probability) Let be the set of encodings of all transducers by a binary string in the form of . We then define the algorithmic probability of a string as follows
For any string , is the algorithmic probability of , computed for the set of encodings . In the construction of the empirical distribution for Finitestate complexity, we consider the set of strings such that , , and . Hence . Following [22] we define the empirical distribution function ( the probability distribution) as
Definition 2.4.
(Finitestate Distribution, FSA)
In other words, considers all strings of length and determines whether such that . Then computes and counts the number of times that we have for every string described by such that ^{1}^{1}1Since is in fact regular we could indeed use an enumeration of but for this work we analyze all binary strings of length ..
We note that in the encoding , a string occurring as the output of a transition in contributes to the size of a description of a string . The decision of considering strings such that was made based on the fact that, in the encoding of the smallest transducer the one with the transition function,
(4) 
where is the empty string, the string (which occurs as the output of transitions in ) has length and so contributes to the size of the description of a string .
2.3 Contextfree grammars
In [35] Wharton describes an algorithm for a general purpose grammar enumerator which is adaptable to different classes of grammars (i. e. regular, contextfree, etc). We implemented this algorithm in the Wolfram Language with the purpose of enumerating contextfree grammars over the terminal vocabulary which are in Chomsky Normal Form. Before describing the implementation of the algorithm we define the following terminology:

A grammar is a 4tuple .

is the non terminal vocabulary.

is the terminal vocabulary.

is the set of productions.

is the start symbol.

is the vocabulary of .

For any grammar , and denote the cardinalities of and respectively.
First, we define the structure of a grammar . Let be any grammar. Suppose we are given the non terminal vocabulary with an arbitrary ordering such that the first non terminal is the start symbol . The grammar has a structure which consists of a list of integers. Each integer from is the number of productions having the non terminal on the lefthand side (according to the ordering of ). Hence the cardinality of the set of productions satisfies .
Now, let be a class of grammars over a terminal vocabulary . By we denote the grammars in with complexity . We then enumerate by increasing the complexity . To every complexity class corresponds a set of structure classes which is determined by and . Therefore a complexity class is enumerated by enumerating each of its structure classes (i. e. every structure that constitutes ). In addition, we need to define an ordered sequence which consists on all possible righthand sides for the production rules. The sequence is ordered lexicographically (first terminals then non terminals) and is defined according to the class of grammars we want to enumerate. For example, suppose we are interested in enumerating the class of Chomsky Normal Form grammars over the terminal vocabulary and the non terminal vocabulary , we then set .
2.3.1 Implementation
Given a complexity , the algorithm described below (that we implemented in the Wolfram Language running on Mathematica) enumerates all the grammars according to [35] in a structure class .

The complexity measure is provided by the following pairing function in and :
In other words, given , we apply the inverse of the above function in order to get the values of and . This function is implemented by the function pairingInverse[].

The set of non terminals is generated by the function generateSetN[].

The ordered sequence is generated using the set of non terminals by the function generateSetR[].

The different structure classes that correspond to complexity are generated by the function generateStructureClasses[].

All the possible grammars with the structure classes defined at previous step are then generated. Each grammar has an associated matrix . This is performed by function generateStructureMatricesA[, Length[]].

The sequence is used to generate the rules of the grammars by the function generateGrammars[, ].
2.3.2 The CYK algorithm
A procedure to decide if a bit string is generated by a grammar was implemented according to the Cocke–Younger–Kasami (CYK) algorithm. The CYK is an efficient worstcase parsing algorithm that operates on grammars in Chomsky normal form (CNF) in , where is the length of the parsed string and is the size of the CNF grammar . The algorithm considers every possible substring of the input string and decides where is the language generated by . The implementation was adapted from [25].
2.4 CFG Algorithmic Probability
We can now define the Algorithmic Probability of a string according to CFG as follows:
Definition 2.5.
(Contextfree Algorithmic Probability, CFG)
And its respective distribution:
where as defined in 2.3.1, is the language generated by and denotes the cardinality of the sample set of the grammars considered. For the results here reported .
with a grammar of complexity at most according to a structure class [35].
2.5 Linearbounded complexity
In [1] it is shown that Timebounded Kolmogorov distribution is universal (comvergence) and they describe the question of an analogue to the algorithmic Coding theorem as an open problem certainly to be solved with the help of exploiting the universality finding. On the other hand, in [2, 13] it has been shown that the timebounded algorithmic complexity (being computable) is a Solovay function. These functions are an upper bound of algorithmic complexity (prefixfree version) and they give the same value for almost all strings.
In [5, 21] we described a numerical approach to the problem of approximating the Kolmogorov complexity for short strings. This approach does an exhaustive execution of all deterministic 2symbol Turing Machines, constructs an output frequency distribution and then the Coding Theorem is applied to approximate the algorithmic complexity of the strings produced.
For this experiment we follow the same approach but we consider the less powerful model of computation of linearbounded automata (LBA). A LBA is basically a single tape Turing Machine which never leaves those cells on which the input was placed [22]. It is well know that the class of languages accepted by LBA is in fact that of contextsensitive languages [22].
2.6 Time complexity
is the class of Turing Machines that produce an output in polynomial time with respect of the size of its input. Is easy to see that this class is contained by the class defined by Linear Bounded Automatons (LBA): if the number of the transition is bounded by a linear function, so are the number of cells it can visit, but is important to note that LBA are not time restricted and can use nondeterministic transition. Now, given that Turing Machines can decide contextfree grammars in polynomial time (ex: the CYK algorithm), is higher in the hierarchy than Type2 languages.
Within the context of this article, we will represent this class with the set of Turing Machines with 4 states and 2 symbols with no inputs whose execution time is upperbounded by a fixed constant. We will cap our execution time by 27, 54, 81 and 107 for a total of 44 079 842 304 Turing machiness, where 107 is the Busy Beaver value of the set.
2.7 The Chomsky hierarchy bounding execution time
The definition of bounded algorithmic complexity is a variation of the unbounded version as follows:
Definition 2.6.
(Linearbounded Algorithmic Complexity, CFG)
By being bounded by polynomialsized tapes, the Turing Machines that decide contextsensitive grammars (type1) can be captured in exponential time by deterministic Turing Machines.
Exactly where each class of Chomsky hierarchy is with respect of the timebased computational complexity classification is related to seminal open problems. For instance, a set equality between the languages recognized by linearbounded automata and the ones recognized in exponential time would solve the question. Nevertheless, varying the allowed computed time for the CTM algorithm allow us to capture approximations to the descriptive complexity of an object with lower computing resources in a similar way that as does considering each member of the Chomsky hierarchy.
2.8 Nonhalting models
We also considered models with no halting configuration, such as cellular automata (nonHCA) and Turing machines (nonHTM) with no halting state as defined in [37] in order to assess whether they converge or not to the Universal Distribution defined over machines with halting condition. For cellular automata we exhaustively ran all the 256 Elementary Cellular Automata [37] (i.e. closest neighbour and centre cell are taken into consideration) and all 65 536 socalled General Cellular Automata [37] (that is with 2 neighbours to one side and one to the other, and the centre cell). For Turing machines, we ran all 4096 (2,2) Turing machines with no halting state, and a sample of 65 536 (same number as CA) Turing machines in (3,2) also with no halting state.
2.9 Consolidating the empirical distributions
In order to perform the comparisons among the distributions in each of the Chomsky hierarchy levels, it is needed to consolidate cases of bias imposed by arbitrary choices coming from the chosen model (e.g. starting from a tape with 0 or 1). This is because, for example, the string 0000 should occur exactly the same number of times than 1111 does and so on because 0000 and 1111 should have exactly the same algorithmic complexity. If is the string and the frequency of production, we thus consolidate the algorithmic probability of denoted by as follows:
where is the reversion of , e.g. 0001 becomes 1000 and is the negation of , e.g. 0001 becomes 1110, for all empirical distributions for FSA, CFG, LBA and TM. It is worth notice that and do not increase the algorithmic complexity of but by a very small constant and thus there is no reason to expect neither the model nor the complexity of the strings to have been produced or taken to have different algorithmic complexity. Greater details on the counting method are also given in [5].
3 Results
Table 1 reports the number of strings produced at each model.
Chomsky  Computational  No. 
Type  Model  Strings 
3  FSA(822)  294 
3  FSA/AP(822)  1038 
2  CFG(40K)  496 
(2,0)  LBA(27)  847 
(2,0)  LBA(54)  1225 
(2,0)  LBA(81)  1286 
0  LBA 107 = TM(4,2)  1302 
0  TM(5,2)  8190 
3.1 Finitestate complexity
The experiment consists on a thorough analysis of all strings that satisfy . If the string satisfies (for some ) then we compute and generate a set of output strings. Then a frequency distribution is constructed from the set of output strings. On the other hand, we compute the Finitestate complexity for strings such that (this is an arbitrary decision).
3.1.1 Produced distributions for
The results given in Table 2 indicate how many strings satisfy (such that encodes the transition table of some transducer and is an input for it) per string length.
Running FSAs is very fast, there is no halting problem and all stop very quickly. However, while FSA and CFG preserve some of the ranking of greater computational models and accelerate the appearance of ‘jumpers’ (long strings with low complexity) these weak models of computation do not generate the highest algorithmic complexity strings that come in the tail of the distributions of more powerful models as shown in Fig. 4.
Size  Strings  Transducers 

8  256  1 
9  512  2 
10  1024  6 
11  2048  12 
12  4096  34 
13  8192  68 
14  16384  156 
15  32768  312 
16  65536  677 
17  131072  1354 
18  262144  2814 
19  524288  5628 
20  1048576  11474 
21  2097152  22948 
22  4194304  46332 
For example, there is only one binary string of length 8 that encodes a transducer out of strings in , which is the transducer with the transition function (4) (we refer to it as the smallest transducer).
In the case of , we found that out of strings only two of them encode the smallest transducer with “0” and “1” as input. Again, the only string produced by this distribution is the empty string .
is the first distribution in which one of the strings encodes a transducer with two states. The Finitestate complexity of the strings produced by shows (see Table 16 and Table 17 in the Supplementary Material).
consists of 2814 transducers (see Table 3 and Table 4 in the Supplementary Material). The longest string produced is of length 6. The rest of the tables are in the Supplementary Material.
String  Probability 

0.82800  
00  0.02701 
11  0.02701 
000  0.01990 
111  0.01990 
0  0.01848 
1  0.01848 
0000  0.01279 
1111  0.01279 
00000  0.00426 
11111  0.00426 
01  0.00213 
10  0.00213 
000000  0.00071 
0101  0.00071 
1010  0.00071 
111111  0.00071 
Complexity  Frequency 

14  1024 
12  640 
13  512 
11  320 
10  212 
9  106 
The rest of the tables are reported in the Suplpementary Material.
3.2 Computing Finitestate complexity
We performed another experiment in order to further analyze the characteristics of Finitestate complexity of all strings of length . We summarize the results we obtained for computing Finitestate complexity for each string of length in Table 5 whereas Table 6 shows the strings that encode transducers such that .
Complexity  Frequency 

4  1 
7  2 
8  2 
9  4 
10  4 
11  10 
12  22 
13  32 
14  56 
15  126 
16  252 
Transducer  Frequency 

0000  1 
000100  8 
0001010110  1 
00010110  4 
0001011100  1 
0001011110  1 
000110  8 
00011100  4 
0001110100  1 
0001110110  1 
0001111100  1 
01000110  480 
3.3 Contextfree grammar distribution
We created production rules for 298 233 grammars with up to 26 nonterminal symbols and used the first 40 000 of them on a set of 155 strings for which frequency we also had on all other levels (FSA, LBA and TM). In Table 7 is shown the top 20 produced strings.
String  Frequency 

0  5131 
00  5206 
000  5536 
0000  5480 
00000  5508 
000000  5508 
00001  2818 
0001  2810 
00010  2754 
00011  2764 
001  2812 
0010  2692 
00100  2750 
00101  2744 
0011  2736 
00110  2730 
00111  2736 
01  2688 
010  2748 
0100  2742 
3.4 Linearbounded automata distribution
Table 8 shows the different values that we considered for the experiments and the number of strings produced by all LBA’s with states .
States  Tape space  Steps  Initial position  Strings produced 
2  7  7  3  11 
2  7  7  4  15 
2  9  7  5  15 
2  9  10  5  15 
2  11  10  5  15 
2  11  10  7  15 
3  9  22  5  78 
3  11  22  7  78 
3  13  22  9  78 
Because of computational power limitations, we generated randomly LBA’s with states . According to the assumptions explained above the tape space should be 16. However, we took the tape space to be 17 since that would allow us to place the initial head position right in the middle of the tape. The table 9 shows the experiments we performed.
As the results demonstrate, by varying the allowed execution time for the space of Turing Machines we can approximate the CTM distribution corresponding for each Chomsky hierarchy level. For instance, regular languages (type3 grammars) can be decided in linear time, given that each transition and state in a finite automaton can be encoded by a corresponding state and transition in a Turing Machine. Contextfree grammars (type2) can be decided in polynomial time with parsing algorithms such as CYK.
No. strings  

States  Tape space  Steps  Initial position  Random LBA’s  produced 
4  17  108  8  100000  93 
4  17  108  8  200000  110 
4  17  108  8  300000  125 
4  17  108  8  350000  119 
4  17  108  8  400000  134 
4  17  108  8  500000  138 
4  17  108  8  600000  143 
3.5 Emergence of the Universal Distribution
3.5.1 Timebounded Emergence
Fig. 5 shows how LBA asymptotically approximate the Universal Distribution.
3.5.2 Rate of convergence of the distributions
One opportunity offered by this analysis is the assessment of the way in which other methods distribute strings by (statistical) randomness such as Shannon entropy and the performance of other means of approximation algorithmic complexity, such as lossless compression algorithms, in particular one of the most popular based on LZW (Compress). We can then compare these 2 methods in relation to estimations of a Universal Distribution produced by TM(4,2). The results (see Fig. 6) of both entropy and Compression conform with the theoretical expectation. Entropy correlates best at the first level of the Chomsky hierarchy that of FSAs stressing that the algorithmic discriminatory power of entropy to tell apart randomness from pseudorandomness is limited to statistical regularities of the kind that regular languages would capture. Lossless compression, however, at least assessed by one of the most popular methods behind other popular lossless compression formats, outperformed Shannon entropy but not by much and it was at best most correlated to the output distribution generated by CFG. This does not come as a surprise given that popular implementations of lossless compression are a variation of Shannon entropy generalized to blocks (variablewidth window) that capture repetitions often followed by a remapping to use shorter codes for values with higher probabilities (dictionary encoding, Huffman coding) thus effectively a basic grammar based on a simple rewriting rule system. We also found that, while nonhalting models approximate the Universal Distribution, they start diverging from TM and remain correlated to LBAs with lower runtime despite increasing the number of states. This may be expected from the overrepresentation of strings that nonhalting machines would otherwise skip from actually producing them (defined as produced upon halting for machines with halting configuration).
4 Some open questions
4.1 Tighter bounds and suprema values
We have provided upper and lower bounds for each model of computation but current intervals seem to overlap for some measures of correlation. One question is the exact boundaries, especially how closer to the Universal Distribution each supremum for each model of computation can take us. In other words, find the tighter bounds for the intervals at each level.
4.2 Computable error corrections
There are some very interesting open questions to further explore in the future. For example, if computable corrections can be made to subuniversal distributions e.g. as calculated by contextfree grammars, in order to correct for trivial (and also computable) biases such as string length. Indeed, while CFG produced an interesting distribution closer to that of LBAs and better than FSAs, there is, for example, an overrepresentation of trivial long strings which can easily be corrected. The suspicion is that, while one can apply these corrections and increase the speed of convergence to TM, the interesting cases are noncomputable. A theoretical framework and a numerical analysis would be interesting to develop.
4.3 Sensitivity to choice of enumeration
Equivalent to the choice of programming language of reference universal Turing machine. Another interesting question is that of the stability of the chosen enumerations for the different models of computation, both at the same and at different computational power level. A striking outcome of the results here reported is that does not only the increase in computing power better approximates the estimations of the distribution produced by Turing machines, but that completely different models of computation, not only in language description but also in computational power, are rather independent of enumeration. While the enumeration that we have followed for every of the computational model is not arbitrary because they follow a length criteria of increasing program size, it is clear how one can device enumerations to produce completely different behaviour. We have pointed out this stability before in [38] with experiments between Turing machines, cellular automata and Post tag systems, and how these results suggest some sort of ‘natural behaviour’ defined as a behaviour that is not artificially introduced with the purpose to produce a different looking initial distribution (before converging per the invariance theorem for systems with such property). In this sense, in [4] we proposed a measure of ‘algorithmicity’ in the sense of the Universal Distribution, quantifying how close or removed a method producing a distribution is from other approximations, in particular that of one of the most standard Turing machine models, the one used for the Busy Beaver [8], that we have shown is not a special case but that several other variations to this model and completely different models of computation produce similar output distributions [41, 42, 4] including the results reported in this paper. However, one other open question is to enumerate systems in different ways and numerically quantify how many of them are convergent or divergent, how much the divergent diverge and for how long and in what conditions, and if the convergent dominate.
4.4 Missing strings from nonhalting models
We have seen that for halting models of computation, decreasing the computational model has as an effect the missing of the most algorithmic random strings of the next computational model in computational power. As we have also seen, nonhalting models seem to converge to lower runtime distributions of LBAs which even when they are highly correlated to TMs they do not appear to approach TM but to remain stable producing output distributions similar to LBA. An interesting question to explore is what are the kind of strings missed by nonhalting machines. As opposed to the strings missed by halting machines below the Turinguniversal limit, Turinguniversal models of computation without halting configuration may miss other kind of strings. Do they miss less or more random or simple strings as compared to halting models?
5 Conclusions
Different subuniversal systems may produce different empirical distributions, not only in practice but also in principle i.e. asymptotically they may diverge. However, we have here provided the means to make meaningful comparisons especially against an empirical distribution that has been found to be stable and apparently convergent. It is interesting to explore and seek Codingtheorem like behaviour in subuniversal systems to better understand the landscape of algorithmic probability and complexity for all type of computational systems to which we have here contributed.
The results here reported show that, indeed, the closer a system is to Turinguniversality, the higher the likeliness of its output distribution is to the the empirical distribution of universal systems, and that finite approximations of algorithmic complexity from finite subuniversal systems [26] is an interesting approach departing from the use of limited computable measures that can approximate the power of universal measures of complexity.
The results also show improvements over current major tools for approximating algorithmic complexity such as lossless compression algorithms. To our knowledge, it was not possible to quantify or even compare lossless compression algorithms as there was no other standard or numerical alternatives to approximate algorithmic complexity. The construction of empirical distributions based on Algorithmic Probability does provide the means and constitute an approach to evaluate performance in what we have named an algorithmicity test [38].
At least for our implementations–that may or may not be an indication of the best algorithms in terms of time complexity to emulate these computational models and thus produce their output distributions, it is not particularly faster to produce distributions from weaker models if there is interest in producing high algorithmic complexity strings for evaluation, but it does for cases in which only finergrained values for low complexity string are needed in exchange for a faster calculation. Compared to entropic and lossless compression approximations, producing a partial distributions from finite approximations of Algorithmic Probability even over weak models of computation constitutes a major improvement over strings that are otherwise assigned a greater randomness content by traditional methods such as Shannon entropy and equivalent statistical formulations.
References
 [1] L. Antunes y L. Fortnow, TimeBounded Universal Distributions, Electronic Colloquium on Computational Complexity, Report No. 144, 2005.
 [2] L. Bienvenu and R. Downey. Kolmogorov complexity and Solovay functions. In STACS, volume 3 of LIPIcs, pages 147–158. Schloss Dagstuhl LeibnizZentrum fuer Informatik, 2009.
 [3] C. Campeanu, K. Culik II, K. Salomaa, S. Yu, S., State complexity of basic operations on finite languages. In: O. Boldt, H. Jürgensen, (eds.) WIA 1999, LNCS, vol. 2214, pp. 60–70. Springer, Heidelberg, 2001.
 [4] J.P. Delahaye, H. Zenil, Towards a stable definition of KolmogorovChaitin complexity, arXiv:0804.3459 [cs.IT], 2008.
 [5] J.P. Delahaye and H. Zenil, Numerical Evaluation of the Complexity of Short Strings: A Glance Into the Innermost Structure of Algorithmic Randomness, Applied Mathematics and Computation 219, pp. 63–77, 2012.
 [6] K. Dingle, S. Schaper, and A.A. Louis, The structure of the genotypephenotype map strongly constrains the evolution of noncoding RNA, Interface Focus 5: 20150053, 2015.
 [7] N. Gauvrit, F. SolerToscano, H. Zenil, Natural Scene Statistics Mediate the Perception of Image Complexity, Visual Cognition, vol. 22:8, pp. 1084–1091, 2014.
 [8] T. Rado, “On noncomputable functions” Bell System Technical Journal 41:3, 877–884, 1962.
 [9] N. Gauvrit, H. Zenil, F. SolerToscano, J.P. Delahaye, P. Brugger, Human Behavioral Complexity Peaks at Age 25, PLoS Comput Biol 13(4): e1005408, 2017.
 [10] S.F. Greenbury, I.G. Johnston, A.A. Louis, S.E. Ahnert, J. R., A tractable genotypephenotype map for the selfassembly of protein quaternary structure, Soc. Interface 11, 20140249, 2014.
 [11] S. HernándezOrozco, H. Zenil, N.A. Kiani, Algorithmically probable mutations reproduce aspects of evolution such as convergence rate, genetic memory, modularity, diversity explosions, and mass extinction, arXiv:1709.00268 [cs.NE]
 [12] V. Kempe, N. Gauvrit, D. Forsyth, Structure emerges faster during cultural transmission in children than in adults, Cognition, 136, 247–254, 2014.
 [13] R. Hölzl, T. Kräling, W. Merkle, Timebounded Kolmogorov complexity and Solovay functions, Theory Comput. Syst. 52:1, 80–94, 2013.
 [14] M. Hutter, A Theory of Universal Artificial Intelligence based on Algorithmic Complexity, Springer, 2000.
 [15] L.A Levin, Universal sequential search problems. Problems of Information Transmission, 9:265–266, 1973.
 [16] L.A. Levin. Laws of information conservation (nongrowth) and aspects of the foundation of probability theory, Problems Information Transmission, 10(3):206–210, 1974.
 [17] L.G. Kraft, A device for quantizing, grouping, and coding amplitude modulated pulses, Cambridge, MA: MS Thesis, Electrical Engineering Department, Massachusetts Institute of Technology, 1949.
 [18] Calude, C. S., Salomaa, K., & Roblot, T. K. (2011). Finitestate complexity. Theoretical Computer Science, 412(41), 56685677.
 [19] S. Schaper and A.A. Louis, The arrival of the frequent: how bias in genotypephenotype maps can steer populations to local optima PLoS ONE 9(2): e86635, 2014.
 [20] B.R. Steunebrink, J. Schmidhuber, Towards an Actual Gödel Machine Implementation. In P. Wang, B. Goertzel, eds., Theoretical Foundations of Artificial General Intelligence, Springer, 2012.
 [21] SolerToscano, F., Zenil, H., Delahaye, J. P. & Gauvrit, N. (2014). Calculating Kolmogorov complexity from the output frequency distributions of small Turing machines. PloS one, 9(5), e96223.
 [22] J.E. Hopcroft, & J.D. Ullman, Formal languages and their relation to automata, 1969.
 [23] W. Kirchherr and M. Li and P. Vitányi, The Miraculous Universal Distribution, Mathematical Intelligencer, 19, 7–15, 1997.
 [24] B.Y. Peled, V.K. Mishra, A.Y. Carmi, Computing by nowhere increasing complexity, arXiv:1710.01654 [cs.IT]
 [25] J. RangelMondragon, Recognition and Parsing of ContextFree, http://library.wolfram.com/infocenter/MathSource/3128/ Accessed on Aug 15, 2017.
 [26] F. SolerToscano, H. Zenil, A Computable Measure of Algorithmic Probability by Finite Approximations with an Application to Integer Sequences, Complexity (accepted).
 [27] J. Schmidhuber, Optimal Ordered Problem Solver, Machine Learning, 54, 211254, 2004..
 [28] J. Schmidhuber, V. Zhumatiy, M. Gagliolo, BiasOptimal Incremental Learning of Control Sequences for Virtual Robots. In Groen, et al. (eds) Proceedings of the 8th conference on Intelligent Autonomous Systems, IAS8, Amsterdam, The Netherlands, pp. 658–665, 2004.
 [29] F. SolerToscano, H. Zenil, J.P. Delahaye, and N. Gauvrit, Small Turing Machines with Halting State: Enumeration and Running on a Blank Tape. http://demonstrations.wolfram.com/SmallTuringMachinesWithHaltingStateEnumerationAndRunningOnAB/. Wolfram Demonstrations Project. Published: January 3, 2013.
 [30] R.J. Solomonoff. A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:1–22 and 224–254, 1964.
 [31] M. Li, and P. Vitányi, An Introduction to Kolmogorov Complexity and Its Applications, 3rd ed, Springer, N.Y., 2008.
 [32] R.J. Solomonoff, Complexity–Based Induction Systems: Comparisons and Convergence Theorems, IEEE Trans. on Information Theory, vol IT–24, No. 4, pp. 422–432, 1978.
 [33] R.J. Solomonoff, The Application of Algorithmic Probability to Problems in Artificial Intelligence, in L.N. Kanal and J.F. Lemmer (eds.), Uncertainty in Artificial Intelligence, pp. 473–491, Elsevier, 1986.
 [34] Solomonoff, R.J. A System for Incremental Learning Based on Algorithmic Probability, Proceedings of the Sixth Israeli Conference on Artificial Intelligence, Computer Vision and Pattern Recognition, Dec. 1989, pp. 515–527.
 [35] R.M. Wharton, Grammar enumeration and inference, Information and Control, 33(3), 253–272, 1977.
 [36] M. Wiering and J. Schmidhuber. Solving, POMDPs using Levin search and EIRA, In Proceedings of the International Conference on Machine Learning (ICML), pages 534–542, 1996.
 [37] S. Wolfram, A New Kind of Science, Wolfram Media, Champaign, IL., 2002.
 [38] H. Zenil and JP. Delahaye, On the Algorithmic Nature of the World, In G. DodigCrnkovic and M. Burgin (eds), Information and Computation, World Scientific Publishing Company, 2010.
 [39] H. Zenil, F. SolerToscano, K. Dingle and A. Louis, Correlation of Automorphism Group Size and Topological Properties with Programsize Complexity Evaluations of Graphs and Complex Networks, Physica A: Statistical Mechanics and its Applications, vol. 404, pp. 341–358, 2014.
 [40] H. Zenil, N.A. Kiani and J. Tegnér, Methods of Information Theory and Algorithmic Complexity for Network Biology, Seminars in Cell and Developmental Biology, vol. 51, pp. 3243, 2016.
 [41] H. Zenil, F. SolerToscano, J.P. Delahaye and N. Gauvrit, TwoDimensional Kolmogorov Complexity and Validation of the Coding Theorem Method by Compressibility, PeerJ Computer Science, 1:e23, 2015.
 [42] F. SolerToscano, H. Zenil, J.P. Delahaye and N. Gauvrit , Correspondence and Independence of Numerical Evaluations of Algorithmic Information Measures, Computability, vol. 2, no. 2, pp 125140, 2013.
Supplementary Material
.0.1 Empirical distributions for FSA
For we have that there are six strings that encode a transducer. In fact two of the them are different from the smallest transducer. However, as the previous cases, the only string produced is the empty string .
consists of 12 strings with inputs of length one and three but the output of these transducers is .
contains 34 transducers with inputs whose length ranges from two to four. The output distribution still contains only the string .
is a more interesting distribution. It consists of 68 strings whose input ranges from lengths 1, 3 and 5. Table 10 shows the probability distribution of the strings produced by this distribution. The Finitestate complexity of the strings that comprise is summarized in Table 11.
String  Probability 

0.94118  
0  0.02941 
1  0.02941 
Complexity  Frequency 
9  32 
7  20 
8  16 
is a more richer distribution than the previous since it contains 156 strings that encode different transducers. Table 12 shows the different strings produced by this distribution.
String  Probability 

0.92308  
0  0.02564 
1  0.02564 
00  0.01282 
11  0.01282 
We note the following facts:

The length of the longest string produced is two.

The string remains as the one with the highest probability.

The Finitestate complexity of the strings produced ranges from 7 to 10 (see Table 13).

produces two strings of length 2 out of , that is, “00” and “11”.
Complexity  Frequency 
10  64 
8  40 
9  32 
7  20 
String  Probability 

0.88462  
0  0.03205 
1  0.03205 
00  0.01923 
11  0.01923 
000  0.00641 
111  0.00641 
Complexity  Frequency 

11  128 
9  80 
10  64 
8  40 
String  Probability 

0.87592  
0  0.02363 
00  0.02363 
1  0.02363 
11  0.02363 
000  0.01182 
111  0.01182 
0000  0.00295 
1111  0.00295 
Complexity  Frequency 

12  256 
10  160 
11  128 
9  80 
8  53 
shows an even more diverse set of strings produced (see Table 18). We have the following interesting facts,

The longest string produced is of length 5.

For the first time, a distribution produces all strings of length 2.
String  Probability 

0.83752  
0  0.02806 
1  0.02806 
00  0.02511 
11  0.02511 
000  0.01773 
111  0.01773 
0000  0.00739 
1111  0.00739 
00000  0.00148 
01  0.00148 
10  0.00148 
11111  0.00148 
Complexity  Frequency 

13  512 
11  320 
12  256 
10  160 
9  106 
.0.2 ,, and
Here are the strings that comprise each one of these distributions.
String  Probability 

0.80597  
000  0.02345 
111  0.02345 
00  0.02274 
11  0.02274 
0  0.01812 
1  0.01812 
0000  0.01706 
1111  0.01706 
00000  0.00817 
11111  0.00817 
000000  0.00284 
111111  0.00284 
01  0.00178 
10  0.00178 
0101  0.00107 
1010  0.00107 
0000000  0.00036 
001  0.00036 
010  0.00036 
010101  0.00036 
011  0.00036 
100  0.00036 
101  0.00036 
101010  0.00036 
110  0.00036 
1111111  0.00036 
Complexity  Frequency 

15  2048 
13  1280 
14  1024 
12  640 
11  424 
10  212 
String  Probability 

0.80024  
0000  0.02144 
1111  0.02144 
00  0.02092 
000  0.02092 
11  0.02092 
111  0.02092 
0  0.01185 
00000  0.01185 
1  0.01185 
11111  0.01185 
000000  0.00593 
111111  0.00593 
01  0.00174 
10  0.00174 
0101  0.00157 
1010  0.00157 
0000000  0.00139 
1111111  0.00139 
010101  0.00070 
101010  0.00070 
00000000  0.00035 
11111111  0.00035 
0001  0.00017 
0010  0.00017 
0011  0.00017 
0100  0.00017 
01010101  0.00017 
0110  0.00017 
0111  0.00017 
1000  0.00017 
1001  0.00017 
10101010  0.00017 
1011  0.00017 
1100  0.00017 
1101  0.00017 
1110  0.00017 
Complexity  Frequency 

16  4096 
14  2560 
15  2048 
13  1280 
12  848 
11  424 
10  218 
String  Probability 
0.78630  
0000  0.02109 
1111  0.02109 
000  0.02066 
111  0.02066 
00  0.01682 
11  0.01682 
00000  0.01665 
11111  0.01665 
0  0.01063 
1  0.01063 
000000  0.00959 
111111  0.00959 
0000000  0.00331 
1111111  0.00331 
01  0.00166 
10  0.00166 
0101  0.00139 
1010  0.00139 
00000000  0.00122 
11111111  0.00122 
010101  0.00105 
101010  0.00105 
01010101  0.00044 
10101010  0.00044 
001  0.00026 
010  0.00026 
011  0.00026 
100  0.00026 
101  0.00026 
110  0.00026 
000000000  0.00009 
0000000000  0.00009 
00001  0.00009 
00010  0.00009 
00011  0.00009 
00100  0.00009 
00101  0.00009 
00110  0.00009 
00111  0.00009 
01000  0.00009 
01001  0.00009 
01010  0.00009 
0101010101  0.00009 
01011  0.00009 
01100  0.00009 
01101  0.00009 
01110  0.00009 
01111  0.00009 
10000  0.00009 
10001  0.00009 
10010  0.00009 
10011  0.00009 
10100  0.00009 
10101  0.00009 
1010101010  0.00009 
10110  0.00009 
10111  0.00009 
11000  0.00009 
11001  0.00009 
11010  0.00009 
11011  0.00009 
11100  0.00009 
11101  0.00009 
11110  0.00009 
111111111  0.00009 
1111111111  0.00009 
Complexity  Frequency 

17  8192 
15  5120 
16  4096 
14  2560 
13  1696 
12  848 
11  436 
String  Probability 
0.78313  
0000  0.02180 
1111  0.02180 
000  0.01744 
111  0.01744 
00000  0.01727 
11111  0.01727 
000000  0.01442 
111111  0.01442 
00  0.01416 
11  0.01416 
0  0.00622 
1  0.00622 
0000000  0.00587 
1111111  0.00587 
00000000  0.00276 
11111111  0.00276 
0101  0.00160 
1010  0.00160 
01  0.00138 
10  0.00138 
010101  0.00125 
101010  0.00125 
01010101  0.00073 
10101010  0.00073 
000000000  0.00043 
111111111  0.00043 
0000000000  0.00030 
1111111111  0.00030 
0101010101  0.00026 
1010101010  0.00026 
001  0.00017 
010  0.00017 
011  0.00017 
100  0.00017 
101  0.00017 
110  0.00017 
0001  0.00009 
0010  0.00009 
001001  0.00009 
0011  0.00009 
0100  0.00009 
010010  0.00009 
0110  0.00009 
011011  0.00009 
0111  0.00009 
1000  0.00009 
1001  0.00009 
100100  0.00009 
1011  0.00009 
101101  0.00009 
1100  0.00009 
1101  0.00009 
110110  0.00009 
1110  0.00009 
000000000000  0.00004 
000001  0.00004 
000010  0.00004 
000011  0.00004 
000100  0.00004 
000101  0.00004 
000110  0.00004 
000111  0.00004 
001000  0.00004 
001010  0.00004 
001011  0.00004 
001100  0.00004 
001101  0.00004 
001110  0.00004 
001111  0.00004 
010000  0.00004 
010001  0.00004 
010011  0.00004 
010100  0.00004 
010101010101  0.00004 
010110  0.00004 
010111  0.00004 
011000  0.00004 
011001  0.00004 
011010  0.00004 
011100  0.00004 
011101  0.00004 
011110  0.00004 
011111  0.00004 
100000  0.00004 
100001  0.00004 
100010  0.00004 
100011  0.00004 
100101  0.00004 
100110  0.00004 
100111  0.00004 
101000  0.00004 
101001  0.00004 
101010101010  0.00004 
101011  0.00004 
101100  0.00004 
101110  0.00004 
101111  0.00004 
110000  0.00004 
110001  0.00004 
110010  0.00004 
110011  0.00004 
110100  0.00004 
110101  0.00004 
110111  0.00004 
111000  0.00004 
111001  0.00004 
111010  0.00004 
111011  0.00004 
111100  0.00004 
111101  0.00004 
111110  0.00004 
111111111111  0.00004 
Complexity  Frequency 

18  16384 
16  10240 
17  8192 
15  5120 
14  3392 
13  1696 
12  872 
11  436 
.0.3 Code in Perl for finitestate complexity
The program distributionTransducers.py is used to analyze all strings of some length to determine whether satisfies (for some ) and if so then the program computes . This program generates a set of output strings (resultexperimentdistribution.csv) from which we can construct an output frequency distribution.
Example of execution:

python distributionTransducers.py 8 10 analyzes all strings of length 8 up to length 10.

python distributionTransducers.py 8 8 analyzes all strings of length 8.
The file resultexperimentdistribution.csv contains the following columns:

string, this corresponds to the strings discussed above.

validencoding, takes value 1 in case and 0 otherwise.

sigma, corresponds to string such that .

stringp, corresponds to string such that .

numstates, number of states of transducer .

output, corresponds to string such that .

outputcomplexity, finitestate complexity of output string .
The program computeComplexityStrings.py computes the finitestate complexity for all strings of length up to length (this is the implementation of the algorithm described in [18]). This program generates the file resultcomplexity.csv which contains the following columns:

x, the string that the program is calculating the finitestate complexity for.

complexity, finitestate complexity of string .

sigma, string such that .

stringp, string such that .