Theoretical Robopsychology: Samu Has Learned Turing Machines

Theoretical Robopsychology: Samu Has Learned Turing Machines

Norbert Bátfai
Department of Information Technology
University of Debrecen

From the point of view of a programmer, the robopsychology is a synonym for the activity is done by developers to implement their machine learning applications. This robopsychological approach raises some fundamental theoretical questions of machine learning. Our discussion of these questions is constrained to Turing machines. Alan Turing had given an algorithm (aka the Turing Machine) to describe algorithms. If it has been applied to describe itself then this brings us to Turing’s notion of the universal machine. In the present paper, we investigate algorithms to write algorithms. From a pedagogy point of view, this way of writing programs can be considered as a combination of learning by listening and learning by doing due to it is based on applying agent technology and machine learning. As the main result we introduce the problem of learning and then we show that it cannot easily be handled in reality therefore it is reasonable to use machine learning algorithm for learning Turing machines.

1 Introduction

Samu is a disembodied developmental robotic experiment to develop a family chatterbot agent who will be able to talk in a natural language like humans do [Bát15a]. At this moment it is only an utopian idea of the project Samu. The practical purpose of Samu projects is to develop computational mental organs that can support software agents to acquire higher-order knowledge from their input [BB16]. The activities have been conducted during the development of such mental organs may be considered as first efforts to create on demand the Asimovian profession called robopsychology111 [Bát16a].

The roots of this paper lie in the two new software experiments Samu Turing [Bát16c] and Samu C. Turing [Bát16b]. These are very simplified versions of the former habituation-sensitization [CS14] based (like for example SamuBrain [Bát16d] or SamuKnows [Bát16e]) learning projects of Samu. Their common feature is that they use the same COP-based Q-learning engine that the chatbot Samu does. To be more precise the mental organs use the same code (to see this compare with as the chatbot does. The term “COP-based” (Consciousness Oriented Programming [Bát11]) means that the engine predicts its future input. The engine itself is based on the Q-learning that receives positive reinforcement if the chatbot (or a mental organ) can successfully forecast the next input of Q-learning in the actual step. In this case the previous output (the previous prediction) is the same as the actual input, for precise details see [Bát15a] and [BB16]). In the two new experiments in question the transition rules of Turing machines (TMs) have been learned as it is illustrated in Fig 1. It should be noticed that neither these experiments nor this paper focus on the habituation-based learning because the learning agent knows the model (TM) that generates the reality.

Figure 1: This is a screenshot from the project Samu Turing. The reality shown in the left side is generated by the operation of a given Turing machine. The right side shows the predicted configurations of the investigated Turing machine.

Our motivation to write this paper stems from the last paragraph of the work of Neumann on the general theory of automata [vN51] where Neumann had suggested that there is a complexity level above which the machines can reproduce themselves and even more complicated ones. Neumann investigated the self-reproducing automata [vN51] roughly a decade after Alan Turing had published his work on universal simulation theorem [Tur36]. The Turing machine is a precise form of the informal notion of the algorithm to describe algorithms. If this description algorithm has been applied to describe itself then this brings us to Turing’s notion of the universal machine. In an intuitive sense we can say that Neumann replaced Turing’s notion of simulation with the notion of reproduction. In this work we would like to replace the reproduction with the learning. To be more precise we investigate algorithms to write algorithms. For simplicity of our discussion the scope of this paper is constrained to Turing machines. It should be noticed that we could have used other universal computing models such as the Cellular Automata. For example, the first mental organs had learned the Conway’s Game of Life [BB16] (or see the YouTube video at But in spite of this, we chose Turing machines because they are closer to the programmers’ intuition.

The structure of this paper is as follows: the next section introduces the basic notations. Then, in Sect. 3 we present the results of two Samu-based developmental robotic software experiments to learn how Turing machines operate. Here we investigate some specific TMs. It should be noticed that some of them, such as the machines of Schult and Uhing or the Marxen and Buntrock’s BB5 champion machine are famous in the field of the Radó Tibor’s Busy Beaver problem [LV08]. It is worth noting that despite that this problem is a very interesting theoretical computer science problem we do not address it in this paper. We introduce of the learning problem and give the basic notions of this subject. Finally we present a new complexity measure called self-reproduction complexity and we show in Subsect. 3.2.3 that it is reasonable to use machine learning algorithm for learning Turing machines. The paper is closed by a short conclusion in which some possible directions for further work are pointed out.

2 Notations and Technical Background

Throughout both this article and our software experiments we use the definition of the Turing machine (TM) that was introduced in [Bát09] and also used in [Bát15b] where the Turing machine was defined by a quadruple where is a partial transition function and is the starting state. As usual a configuration determines the actual state of the head, the position of the head and the contents of the tape. With the notation of [Bát09] a configuration can be written in the form , where and .

In some proofs for simplicity’s sake we use multitape Turing machines or the blank symbol ␣ on the tape (that is the tape alphabet is extended by the symbol ␣). In addition, without limiting the generality, we may assume that halting Turing machines (with a given input) do not contain unused transition rules. The notation denotes that the machine with the input halts.

Definition 2.0.1 (configN).

The word over the alphabet where is referred to as a configN configuration if there is a configuration such that .

Remark 2.0.1 (config).

In some cases, see for example Remark 3.2.1, we extend the definition of the configuration as follows . In this sense a usual configuration corresponds to a config configuration where the empty word.

We may note that the release of the project Samu C. Turing used in Fig. 2 uses config4 configurations.

3 Learning by Listening and Doing

In the aforementioned projects Samu Turing and Samu C. Turing we programmed the Samu agent to work in a similar way as, for example, Professor James Harland did in his work [Har16] where he observed and studied the configurations of Marxen and Buntrock’s Busy Beaver champion machines [MB90]. In our experiments the agent Samu observes (listening) the consecutive subconfigurations of a given investigated Turing machine and try to predict (doing) the next rule of the machine that will be applied. From this viewpoint this whole learning process can be seen as a way of learning by listening and doing where the listening part is the sensation of the agent and doing is the prediction of the agent. But the question may naturally be raised why should we use agent technology and machine learning algorithms to learn Turing machines? Our explicit answer is based on the following intuitive results and it will be found in Sect. 3.2.3.

3.1 Some Intuitive Results

Figure 2: This figure shows the usual running time (time complexity) of some given machines and the learning time of these investigated machines. The blue curve is the usual time complexities and the red one is the running times of the learning. The x-axis labeled with the number of ones printed by the Turing machines “26”, “14”, “21”, “32”, “160”, Schult (“501”), Uhing (“1915”), Uhing (“1471”) and Marxen-Buntrock (“4097”). For more precise details see and Table 1.

Fig. 2 summarizes and compares some running results produced by the project Samu C. Turing. The numbers of two kinds of running times (usual time complexity and “learning complexity”, see the caption of the figure for details) are not directly comparable because they use different scales to compute the y-axis values. One of the two curves is computed by the number of steps of a Turing machine and the other by the number of sensory-action pairs of the reinforcement learning agent Samu C. Turing. The exact values can be found in Table 1. One of the notions of cognitive complexity defined in Subsect. 3.2.3 will be based on this intuitive “learning complexity”. In Fig. 2, it seems that the growth rate of the learning time is related to the running time. It is worth to compare this with Fig. 6 where the growth rate of an another (the “self-reproducing”) complexity has already been separated from the running time.

3.2 The Basic Notions of the Subject

From the observations of the two experiments above, we can build the abstract model of learning that is referred to as the learning problem. The learning problem of learning TMs is divided into two parts. The first is a simulation of the TM to be learned. The second is the actual learning problem itself. Fig. 3 shows the schematic of the learning problem where the UTM takes the description of the machine and an input of . Then has collected the configurations of whilst it is simulating with . After the simulation takes the collected configurations and it must try to figure out what TM was actually simulating.






Figure 3: This figure shows the schematic of the learning problem. The universal machine takes two input parameters the description of a TM and the input of the machine . The machine computes the sequence of configurations occurred during the execution of the machine with its input . Then the learning machine takes this sequence and finally has to figure out from this input sequence what was actually simulated by the machine .

3.2.1 The Running Problem

It is obvious that the running problem trivially contains the halting problem. Therefore we may notice that similar undecidable statements can be made for this case as well but in this paper we only focus on halting machines.

Lemma 3.2.1.

Apart from the trivial case of the empty tapes, the transition rule between two consecutive configurations and is uniquely determined by the configurations and .


Suppose that there are two transition rules and where , , and then we show that , and .

Let where , ,

Then the following cases are possible

Remark 3.2.1.

It is noted that we may give an even more simpler lemma and proof using the usual and instead of and . We use the latter because they are closer to the programmers’ intuition.

Theorem 3.2.2 (Universal Learning).

There exist an universal running machine and a learning machine such that, for all halting Turing machines , it holds that .


The proof is divided into two parts: in the first one, we modify the usual proof of Turing’s universal simulation theorem (see for example the textbook [ISR00]) to produce the sequence of configurations of by the universal machine . In the other part we focus the learning of by using the previous lemma.

We provide only an outline of the first part. We use a multitape TM for the implementation of . Fig. 4 shows the preparation of the tapes before starting the simulation of . The tapes are shown in Fig. 5 after the simulation of the i-th step of .

encoded T and x

Figure 4: This figure shows the preparation of the tapes of . On the second last tape denotes the used cells with the symbols and . From the point of view of these symbols are interpreted as the blank symbol ␣ on the tape. But from the point of view of they may be “interpreted” as from left and from right.

encoded T and x

Figure 5: This figure shows that the denoted configuration is copied (and collected) to the output tape after the simulation of the i-th step of .

Then the theorem follows from Lemma 3.2.1. ∎

3.2.2 The Learning Problem

The previous theorem shows that there is no problem with learning if we use config (or the usual) configurations. But otherwise, as shown in the following two simple examples of config2 configurations (Example 3.2.1 and 3.2.2) the applied transition rule between two consecutive configN configurations may be not uniquely determined by the configN configurations. If we use configN configurations instead of the usual or config configurations then the Lemma 3.2.1 does not hold. In the next subsection a notion of complexity will be exactly based on this property.

Example 3.2.1.

Let be a config configuration and be a corresponded config2 configuration. Then the rules , , and yield the same config2 configuration.

Example 3.2.2.

Let be a config configuration and be a corresponded config2 configuration. Then the rules and yield the same config2 configuration.

3.2.3 Cognitive Complexities

As has already been mentioned in Sect. 3.1 we intuitively use the running time of the learning machines as a complexity measure that may be formulated as follows

but it does not seem very helpful because it is probably correlated with the usual time complexity of as it is suggested by Fig. 2. The next type of complexity tells what is the first finite for which Lemma 3.2.1 holds with using the configurations . To be more precise, it is defined as

that has shown different behavior than the previous one as it can be seen in Fig. 6 The growth rate of the investigated values not related to the number of ones rather than to the running time (see “14”, “21” and “1471”).

Figure 6: This figure shows the values of machines of Fig. 2. The values are computed by the version of the project Samu C. Turing that tagged by self-reproducing_complexity, see where a manual binary search was also used to determine the last three values. The x-axis is exactly the same as in Fig. 2.

The results shown in Fig. 6 also suggest that it is hopeless to handle the learning problem with the universal learning machine of Lemma 3.2.1. This justifies the using of agent technology (an agent observes the operation of the investigated TMs) and machine learning algorithms (such as Q-learning) to learn Turing machines instead of searching for suitable configNs for any universal learning machine .

1s of
9, 0, 9, 1, 11, 2, 17, 3, 21, 4, 19, 5, 29, 6, 5, 7, 6, 8, 8
264 26 95048 24
9, 0, 9, 1, 11, 2, 5, 3, 20, 4, 17, 5, 24, 7, 29, 8, 15, 9, 1
314 14 60872 7
9, 0, 9, 1, 11, 2, 15, 3, 20, 4, 21, 5, 27, 6, 4, 7, 2, 8, 12
515 21 463558 12
9, 0, 21, 1, 9, 2, 24, 3, 6, 4, 3, 5, 20, 6, 17, 7, 0, 9, 15
583 32 535050 41
9, 0, 9, 1, 12, 2, 15, 3, 21, 4, 29, 5, 1, 7, 24, 8, 2, 9, 27
20928 160 512623 160

9, 0, 11, 1, 12, 2, 17, 3, 23, 4, 3, 5, 8, 6, 26, 8, 15, 9, 5 (Schult’s machine)

134467 501 1685939 664

9, 0, 11, 1, 15, 2, 0, 3, 18, 4, 3, 6, 9, 7, 29, 8, 20, 9, 8 (Uhing’s machine)

2133492 1915 4365184 3816

9, 0, 11, 2, 15, 3, 17, 4, 26, 5, 18, 6, 15, 7, 6, 8, 23, 9, 5 (Uhing’s machine)

2358064 1471 8368208 1961

9, 0, 11, 1, 15, 2, 17, 3, 11, 4, 23, 5, 24, 6, 3, 7, 21, 9, 0

(Marxen and Buntrock’s BB5 champion machine)

47176870 4097 9833455 12287
Table 1: This table numerically shows the values of the investigated machines. The combine columns show the given TM in the form of rule-index notation [Bát15b].

4 Conclusion

In this paper, we started with two developmental robotic software experiments Samu Turing [Bát16c] and Samu C. Turing [Bát16b] to learn how Turing machines operate. This subject of the experiments itself enabled us to investigate the theoretical properties of learning. First, we have eliminated from our software experiments the developmental robotic processes (for example the habituation-sensitization parts) and then we introduced the problem of learning and some complexity measures based on it. For some cases of given TMs we also determine these complexities. The of machines of greater sophistication cannot easily be computed by the universal learning machine of Theorem 3.2.2. This justifies the usage of agent technology and machine learning for learning Turing machines. We have provided only an outline of the proof of Theorem 3.2.2. To complete it may be a further theoretical computer science work. Further work of a practical robopsychological nature is also needed. For example, we are going to investigate using Samu’s neural architecture [Bát15a], Samu mental organs (like MPUs) [BB16] and deep learning to learn how TMs operate.

To return to Neumann’s train of thought mentioned in the introduction it seems to be interesting to study when the learning algorithm has been applied to write itself. Let’s start from a machine that halts with . It follows from Theorem 3.2.2 that and . But then we can also learn this learning of , that is and . And then we can learn again the learning of learning of , that is, to be more precise and so on. If we introduce the notation

then we can easily write that because but the similar relation between and is an open question at this moment.

It is clear, of course, that further work of a theoretical robopsychological nature is required as well. For example, we are going to find possible relations among the time, space, Kolmogorov and cognitive complexities. We believe that this is a necessary step towards achieving the situation that has been defined as “Programs hacking programs” by Neo in the movie “The Matrix Reloaded”. In the framework of Turing machines and Busy Beaver problem this quotation has a special meaning namely that can we program a computer program not only to discover a BB machine but to build it from scratch?

5 Acknowledgment

The author would like to thank his students in “High Level Programming Languages” course in the spring semester of 2015/2016 at the University of Debrecen for testing the Samu projects. He would also like to thank the members of some AI-specific communities on Facebook, Google+ and Linkedin and especially his group called DevRob2Psy at for their interest.


  • [Bát09] Norbert Bátfai. On the Running Time of the Shortest Programs. CoRR, abs/0908.1159, 2009.
  • [Bát11] Norbert Bátfai. Conscious Machines and Consciousness Oriented Programming. CoRR, abs/1108.2865, 2011.
  • [Bát15a] Norbert Bátfai. A disembodied developmental robotic agent called Samu Bátfai. CoRR, abs/1511.02889, 2015.
  • [Bát15b] Norbert Bátfai. Are there intelligent Turing machines? CoRR, abs/1503.03787, 2015.
  • [Bát16a] Norbert Bátfai. How to Become a Robopsychologist. GitHub Project,, 2016.
  • [Bát16b] Norbert Bátfai. Samu C. Turing. GitHub Project,, (visited: 2016-06-04), 2016.
  • [Bát16c] Norbert Bátfai. Samu Turing. GitHub Project,, (visited: 2016-06-04), 2016.
  • [Bát16d] Norbert Bátfai. SamuBrain. GitHub Project,, (visited: 2016-06-04), 2016.
  • [Bát16e] Norbert Bátfai. SamuKnows. GitHub Project,, (visited: 2016-06-04), 2016.
  • [BB16] Norbert Bátfai and Renátó Besenczi. Robopsychology Manifesto: Samu in His Prenatal Development. submitted manuscript, 2016.
  • [CS14] Angelo Cangelosi and Matthew Schlesinger. Developmental Robotics: From Babies to Robots. The MIT Press, 2014.
  • [Har16] James Harland. Busy Beaver Machines and the Observant Otter Heuristic (or How to Tame Dreadful Dragons). CoRR, abs/1602.03228, 2016.
  • [ISR00] Gábor Ivanyos, Réka Szabó, and Lajos Rónyai. Algoritmusok. Typotex, 2000.
  • [LV08] Ming Li and Paul M.B. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications. Springer Publishing Company, Incorporated, 3 edition, 2008.
  • [MB90] Heiner Marxen and Jürgen Buntrock. Attacking the busy beaver 5. Bull EATCS, 40:247–251, 1990.
  • [Tur36] Alan Turing. On computable numbers with an application to the ”Entscheidungsproblem”. Proceeding of the London Mathematical Society, 1936.
  • [vN51] John von Neumann. The general and logical theory of automata. In L. A. Jeffress, editor, Cerebral Mechanisms in Behaviour – The Hixon Symposium, pages 1–31. Wiley, 1951.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description