A Parsimonious Dynamical Model for Structural Learning in the Human Brain
Abstract
The human brain is capable of diverse feats of intelligence. A particularly salient example is the ability to deduce structure from timevarying auditory and visual stimuli, enabling humans to master the rules of language and to build rich expectations of their physical environment. The broad relevance of this ability for human cognition motivates the need for a firstprinciples model explicating putative mechanisms. Here we propose a general framework for structural learning in the brain, composed of an evolving, highdimensional dynamical system driven by external stimuli or internal processes. We operationalize the scenario in which humans learn the rules that generate a sequence of stimuli, rather than the exemplar stimuli themselves. We model external stimuli as seemingly disordered chaotic time series generated by complex dynamical systems; the underlying structure being deduced is then that of the corresponding chaotic attractor. This approach allows us to demonstrate and theoretically explain the emergence of five distinct phenomena reminiscent of cognitive functions: (i) learning the structure of a chaotic system purely from time series, (ii) generating new streams of stimuli from a chaotic system, (iii) switching stream generation among multiple learned chaotic systems, either spontaneously or in response to external perturbations, (iv) inferring missing data from sparse observations of the chaotic system, and (v) deciphering superimposed input from different chaotic systems. Numerically, we show that these phenomena emerge naturally from a recurrent neural network of ErdősRényi topology in which the synaptic strengths adapt in a Hebbianlike manner. Broadly, our work blends chaotic theory and artificial neural networks to answer the long standing question of how neural systems can learn the structure underlying temporal sequences of stimuli.

Department of Bioengineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, PA, 19104

Department of Physics & Astronomy, College of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, 19104

Department of Electrical and Systems Engineering, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, PA, 19104

Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104

To whom correspondence should be addressed: dsb@seas.upenn.edu
The brain perpetually processes streams of timevarying sensory stimuli that bear different sorts of information. One type of information concerns the statistics of how objects, concepts, or other features are ordered or arranged in the stream, while another type of information concerns the structure underlying the stream, and the rules by which that structure is sampled to obtain the stream. In poetry, for example, the former type of information lies in the choice and arrangement of words, while the latter type of information lies in the rules of grammar. We refer to the learning of the latter type of information as structural learning, and note that it underlies diverse cognitive functions. As humans are exposed to complex rulegoverned stimuli — such as music with harmonically related chords or language constrained by the principles of grammar — they acquire knowledge about the material structure without being able to formulate any explicit rules.[1, 2, 3, 4, 5] When multiple structures are learned, humans appear to seamlessly switch between performing tasks based on one structure and performing tasks based on another structure, either spontaneously[6] or when driven by an external stimulus.[7] The human ability to learn structure in timevarying stimuli also extends to a marked capacity to fill in missing signals in acoustic or visual inputs,[8, 9] and to resolve distinct structures underlying mixed inputs.[10, 11, 12, 13] While thus of great relevance for human cognition, the exact mechanisms for structural learning remain unclear.
Over the past two decades, several computational models of neuronal systems have been proposed based on recurrent neural networks (RNNs).[14, 15, 16, 17, 18, 19, 20, 21, 22, 23] Notably, these models can successfully learn to generate desired signals with complex dynamical patterns, such as orbits of a damped oscillator,[17] movements of a twolink arm,[16] trajectories from chaotic dynamics,[19, 20, 21, 22, 23] and even recurrent spiking dynamics consistent with empirical observations in neural systems.[14, 15] Collectively, these prior studies suggest that such models reflect promising candidate mechanisms for learning in the human brain. Here we seek to identify the requisite mechanisms for structural learning, specifically, by building and testing a dynamical systems theory in which the brain is exposed to rulegoverned signals generated by chaotic dynamical systems. By developing new theory and performing computational simulations, we demonstrate how and why structural learning and its associated cognitive functions naturally emerge from a unified dynamical systems framework.
Dynamical Systems and a General Structural Learning Framework
A dynamical system consists of a state space and a dynamical rule that determines how the state variable – an instantaneous description of the system – evolves in the state space as a function of time. A trajectory of the dynamical system is a path along which the state variable travels in the state space as time evolves. An attractor is a subset of the state space towards which trajectories tend to evolve. Unlike trajectories on fixed points or periodic orbits, trajectories on a chaotic attractor are sensitive to arbitrarily small differences or changes in initial conditions, yet remain on the attractor throughout all future time points. As a result, a chaotic attractor contains infinite distinct trajectories, all obeying the same dynamical rule. Intuitively, different trajectories on a chaotic attractor can be informally likened to different pieces of music that all follow the same rules of composition, or to different spoken narrations or written passages that all follow the same rules of grammar. Importantly, this relation to dynamical systems is not simply analogical; extensive prior work provides evidence that chaotic attractors are useful generative models for both music and language.[24, 25, 26, 27, 28, 29]
Here we use this analytical framework to define the sensory input to the brain as a set of trajectories, in a chaotic attractor . In other words, the structure of the sensory input is the dynamical rule on the attractor . For concreteness, we choose an dimensional trajectory from a Lorenz attractor [30] (Fig. 1(a,b)). This sensory input is received by the brain, which in turn is modeled as a sophisticated high dimensional dynamical system governed by biophysical laws. We seek to understand how the brain can deduce the underlying dynamical rules of attractors from exemplary trajectories. Furthermore, we seek to understand how the brain can then utilize the acquired dynamical rule to independently generate new trajectories (Fig. 1(c)).
Informing the Structural Learning Framework with Underlying Biology
We seek to build a model that encapsulates three distinct phenomenological features of human structural learning. First, we consider the reinstatement hypothesis, which posits that the contentspecific cortical activity at encoding is reinstated as the encoded information is being retrieved.[31, 32] Recent evidence supports this idea and suggests that different types of sensory information are encoded in different cortical regions that are reactivated during information retrieval.[33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44] In our model, we therefore recruit a central dynamical system to play the role of the brain regions where sensory information is encoded and stored, and we define its state as (Fig. 1(b)). We operationalize the encoding period as a learning phase, and we operationalize the retrieval period as a testing phase. The central system encodes information during the learning phase as it receives sensory input and evolves following
(1) 
where the future state is a function of both the current state and the sensory input .
Second, we consider the presence of an internally generated output, a process thought to enhance and consolidate learned information by interacting with working memory [45, 46] (colloquially, this is sometimes called an inner voice [45, 47, 46, 48, 49, 50]). We model this internally generated output
(2) 
as being determined by the concurrent state of the central system where . We let the output function adapt during the learning phase such that the output can imitate the concurrent input (as shown in Fig. 1(a)). This formulation also becomes useful later as it allows the output to support the dynamics of the central system during the testing phase by omitting the external input .
Third, we consider the recruitment of other cortical regions that support the reinstatement of information during the retrieval process when the external sensory inputs are absent. Beyond the cortical regions that are reactivated as a human mentally rehearses and retrieves sensory information in the absence of exogeneous input, other cortical regions also become activated when retrieval begins.[51, 52, 53, 54, 33] Thus, distinct from the dynamics stipulated by Eq. 1, we model the testing phase dynamics of the central system by the autonomous equation
(3) 
where the central system is now driven by the output . This formulation can be thought of intuitively as using one set of connections during the learning phase, and a different set of connections during the testing phase (Fig. 1(c)).
Instantiating the Informed Structural Learning Framework in Silico
To exercise this general structural learning framework, we let the central system be an node recurrent neural network akin to a reservoir computer:[55, 56, 22] that is,
(4) 
with an adjacency matrix, an input coefficient matrix, and a constant vector. We also adopt a linear output scheme where the output coefficient matrix is adaptively updated. Specifically, in the learning phase, we evolve the central system following Eq. 1 where is a trajectory on the Lorenz attractor (Fig. 1(b)), and we adapt the in a Hebbian manner according to
(5) 
such that the discrepancy between the input and the output, , is reduced.
Intuitively, the output synaptic strength from neuron to the th output , is modified proportional to the neuronal activity as well as the discrepancy , where is the learning rate. Once converges to where , we remove the external input and evolve the central system following the testing phase dynamics stipulated by Eq. 3. The structural learning is successful if the output that is autonomously generated by the central system follows the same dynamical rule as that of the learned attractor . After instantiating this system in silico, we observe that the testing phase output trajectory (t), although different from the exemplary trajectory in the learning phase, evolves and remains on an attractor similar to the Lorenz attractor (compare Fig. 1(b) and Fig. 1(d)).
The Structural Learning Function
Here we provide a simple and parsimonious explanation for how structural learning is achieved in our model, akin to a recent study in modelfree attractor reconstruction.[57] During the learning phase (Eq. 1), the input trajectory on the chaotic attractor is externally generated by an autonomously evolving input dynamical system, which, together with the central dynamical system, forms a oneway coupled driveresponse system. Successful information encoding is guaranteed by the occurrence of an invertible generalized synchronization between the input system and the central system:[58, 59, 60] the response system , after a transient time from an arbitrary initial condition, becomes uniquely determined by the state of the drive system
(6) 
where is a mapping from the drive system to the response system. The image of the attractor under this mapping, , reflects the central system’s internal representation of , denoted by (Fig. 1(f)).
As evolves on the attractor , the trajectory of the central system in the learning phase evolves toward the manifold (Fig. 1(f)). When in Eq. 6 is an invertible function on , the encoding of in the internal representation is lossless. The discrepancy can be eliminated by adapting the output function in Eq. 2 towards
(7) 
From the form of this output function, it is clear that the dynamics of the learning phase (Eq. 1) is identical to that of the testing phase (Eq. 3) as long as . Thus, the output during the testing phase can evolve following the same dynamical rules as the input . To ensure that remains on the attractor , we require that the manifold is an attractor of the testing phase dynamics (Eq. 3): that is, the dynamics of the central system is stable to perturbations transverse to the manifold , which can be guaranteed by having all transversal Lyapunov exponents be negative (see Supplement).
Multistability in Task Switching between Multiple Learned Structures
The occurrence of invertible generalized synchronization permits the structural learning of multiple chaotic attractors, for . As the central system is driven by the input trajectory during the learning phase, the generalized synchronization guarantees that the central system state evolves correspondingly onto an internal representation manifold
(8) 
where . If and for , we can neglect the subscript of in Eq. 8. The ideal output function in Eq. 2 for this multitask learning is the one that maps each back onto , i.e., for . In addition to this ideal output function, if each representation manifold is an attractor for the testing phase dynamics (Eq. 3), then the central system learns these tasks such that output trajectories on can be stably generated. After instantiating this system in silico, we observe that the recurrent neural network from Fig. 1(c) can successfully generate output trajectories that are on the Lorenz attractor and on the Rössler attractor (Fig. 2(a)).
Once multiple attractors are learned, it is of interest to consider the phenomenon of task switching by generating from different attractors. When each attractor is stably learned, each representation manifold becomes an attractor in the dynamics of the testing phase (Eq. 3). Thus, the output trajectory cannot spontaneously depart from one attractor to another. In this case, we can consider explicitly triggering a switch from one task to another (i.e., from one attractor to another attractor). Notice that the generalized synchronization guarantees that, in the testing phase, the central system state converges to after a short period of time. Thus, we can use a short external input cue, , and lead the central system from an arbitrary state to the desired target manifold .
Instantiating these ideas in silico, we can begin the testing phase dynamics from a random initial state . In Fig. 2(a), the output evolves following the black dotted line and converges onto the Lorenz attractor . Then, for a very short period of time, we provide an external input from a Rössler system. We observed that this external input successfully induced a switch from to , as shown by the green dotted line. Thereafter the central system evolves autonomously without any external input for a long period of time, generating the of the Rössler attractor. We then use a short Lorenz input to drive the central system away from the Rössler attractor and towards the Lorenz attractor, as shown by the yellow dotted line. Thereafter, the central system evolves automously with the testing phase dynamics and generates a very long output trajectory on the Lorenz attractor (red).
Next, it is interesting to consider the case in which a few attractors are stably learned while others are not. In this scenario, we would expect spontaneous task switching, independent of any external input. In Fig. 2(b), we consider a case in which the Lorenz system is stably learned but the Rössler system is not. Here we show that starting from an initial condition , the output of the testing phase remains on the Rössler attractor (blue) for an extended period of time and then spontaneously departs from the Rössler attractor and converges onto the Lorenz attractor (red). Relatedly, it is interesting to consider the scenario in which a Rössler attractor is only partially learned. In Fig. 2(c), we observe that the output of the testing phase exhibits a transient excursion as it evolves on the unstably learned Rössler attractor, in contrast to the dynamics observed in spontaneous task switching.
Network Mechanics Underlying Multiple Learned Structures
Critically, because the learning of multiple structures is governed by a single equation (Eq. 3), different tasks are executed simply by letting the central system state evolve onto different attractors . An important corollary of this fact is that no synapses between units in the RNN are altered, either in their location or in their weight. We will refer to these synapses as structural connections, and note that they are encoded in . While we cannot explain the learning of multiple structures with the pattern of structural connections, it is possible that there is explanatory content in the emergent pattern of functional connections, which are defined as statistical similarities in neuronal time series.
To investigate this possibility, we consider the structural connectivity encoded in the random adjacency matrix , and the functional connectivity encoded in the Pearson correlation matrix of the recorded activity for during the Lorenz (Rössler) task, (). To summarize the emergent patterns of functional connectivity, we apply a commonly used community detection technique known as modularity maximization to identify groups of neurons that show similar time series. We found strong but distinct community structure in and . In Fig. 3(e), we show (red) and (blue) with all nodes sorted by the community structure identified in , and in Fig. 3(f), we show (red) and (blue) with nodes sorted by the community structure identified in . These observations indicate that while neurons remain identically structurally connected in both tasks, their emergent collective dynamics differ.
To quantify these observations more fully, we constructed randomly organized and trained neural networks, and for each we calculated the average functional connectivity among pairs of neurons that are within versus between communities identified from either or (labeled “WL”, “BL”, “WR”, and “BR”, respectively, in Fig. 3(e,f)). In Figs. 3(g,h)), we observe that the functional connectivity estimated from the Lorentz task and averaged within the communities identified from the Lorentz task data, , is significantly larger than the functional connectivity estimated from the Lorentz task and averaged between the communities identifed from the Lorentz task data, . Similarly, for the Rössler task, is significantly larger than . In contrast, the fact that is not significantly larger than and similarly is not significantly larger than , supports our qualitative observation that the community structure in the functional connectivity matrix of the Lorenz task is distinct from the community structure in the functional connectivity matrix of the Rössler task.
Lastly, we asked whether there existed any structural basis for the observed emergent functional communities. Critically, random networks are far from homogeneous, and can display locally dense areas as well as locally sparse areas that occur simply by chance. It is therefore intuitively possible that the random network contains degenerate weak community structure that supports the distinct patterns of emergent dynamics. To investigate this possibility, we calculated the average structural connectivity within the Lorentz communities, , within the Rossler communities, , between the Lorentz communities, , and between the Rossler communities, . We observed greater average structural connectivity within communities than between communities: that is, and were sigificantly larger than and (Fig. 3(g,h)). This observation motivates two open questions: (i) whether some structural networks more easily (or less easily) support diverse functional community structures, and thus the learning of multiple systems, and (ii) for a given structural network, can one predict the number of possible emergent community structures, and therefore the number of systems that can be learned.
Inferring Missing Variables with the Learned Structure
Next we consider the problem of inferring missing variables using the learned structure. Given a central system that successfully learns structures of different chaotic attractors from input trajectories, we consider the case in which a new trajectory on attractor is given but with some of the variables missing. The goal is to use the learned structure of , together with the remaining variables , to infer values of the missing variables , where . To perform this inference, we evolve the central system following the testing phase dynamics shown in Eq. 1, and we replace the missing variables in by the corresponding output variables obtained from Eq. 2 (Fig. 4(a–c)). If the central system driven by available variables maintains the generalized synchronization with the dynamical system that generated the input trajectory, then the inference is expected to be successful.
To instantiate this problem in silico, we train an RNN with neurons to learn the structures of both the Lorenz and the Rössler systems. Then, we test the inference of this RNN in three scenarios, where zero, one, and two variables from are missing, as schematically depicted in Fig. 4(a–c). The inferred trajectories are the central system output during for the Lorenz and the Rössler tasks, respectively. In agreement with intuition, more missing variables leads to poorer inference quality, where the quality is quantified by the normalized mean squared error . To show that the inference quality is related to the quality of the generalized synchronization, we train an ensemble of RNNs and we calculate both their inference error as well as the largest Lyapunov exponent . A more negative suggests a stronger generalized synchronization. In Fig. 4(d), we observe that, for the three scenarios, the inference error is indeed higher whenever the generalized synchronization is weaker; that is, the largest conditional Lyapunov exponent of the central system is less negative.
Deciphering Superimposed Input from Different Dynamical System Sources
Thus far, we have considered cases in which the external input is a trajectory generated by a single chaotic system. However, the human brain often processes mixed sensory input: a superposition of multiple input trajectories with different structures from distinct sources. Here we show that the mechanism of information encoding in this structural learning framework allows the system to decipher and separate trajectories on different chaotic attractors when the input is their superposition (Fig. 5(a)). Specifically, we consider the dimensional sensory input where and are trajectories on the Lorenz attractor and on the Rössler attractor , respectively. Again the central system is modeled as an RNN (Eq. (4)) and is evolved following the learning phase dynamics (Eq. 1) with the external input . In this case, two dimensional outputs and are learned to match with the actual Lorenz trajectory and the actual Rössler trajectory . This is done by adapting the weight matrices and following Eq. (5) where the errors are correspondingly and . After the learning phase, we start the system from a random initial state and evolve the central system following Eq. (1) with a mixed input that is distinct from the exemplary trajectory used in the learning phase (Fig. 5(b)). In Fig. 5(c,d), we observe that the central system successfully separates the mixed trajectory into outputs and which are good estimations of the actual Lorenz trajectory and the actual Rössler trajectory .
As observed empirically,[10] although the input is mixed, the streams of stimuli from distinct underlying structures are encoded separately, thereby supporting selective listening. To demonstrate this same phenomenon in our theoretical framework, we consider the direct product of the Lorenz system and the Rössler system as a combined dynamical system. The state variable of this combined system is . As and evolve onto the Lorenz attractor and the Rössler attractor , respectively, the combined system state evolves onto an attractor that is the Cartesian product of the two attractors, . Since one can do a simple coordinate transformation from to , we note that the mixed input is essentially a dimensional projection of the dimensional . Although the central system is oneway coupled to the combined dynamics through the mixed input rather than the full state variable , the generalized synchronization [58, 59, 60] can guarantee that the central system state converges to for . If the generalized synchronizaition function is invertible, the internal representation separately encodes the information in and , rather than the superimposed . Thus, either the information of the Lorenz or the Rössler attractors can be retrieved when the system adapts the output weight matrices towards and that approximate the inverse function , i.e., .
Conclusion
Humans appear to effortlessly learn abstract structures from raw timeevolving material and then use those structures to compose new data, to infer missing data, or even to decipher mixed data. Many computational models have been proposed under a Bayesianbrain hypothesis that shed light on how humans might learn hidden properties from a few relevant experiences.[61, 62, 63, 64, 65, 66, 67, 68] Complementing these probabilistic models, we propose a general dynamical system based model derived from first principles. Surprisingly, this simple dynamical model produces successful learning of underlying structures (i.e., dynamical properties of chaotic attractors) from raw materials (i.e., exemplary trajectories on these attractors). We show that this dynamical system can create new streams of data with the learned structure, infer missing data, and both learn and operate different tasks by visiting different attractors in the system’s representational space. With its generality, we believe that this framework serves as a promising preliminary model to which more biological details could be added in future to prompt better performance, such as rules of adaption that may lead to better encoding as well as larger learning capacity.
 1. Bly, B. M., Carrión, R. E. & Rasch, B. Domainspecific learning of grammatical structure in musical and phonological sequences. Memory & cognition 37, 10–20 (2009).
 2. Tillmann, B., Bharucha, J. J. & Bigand, E. Implicit learning of tonality: a selforganizing approach. Psychological review 107, 885 (2000).
 3. Aldwell, E. & Schachter, C. Harmony & voice leading, schirmer, 2003. ISBN: 0155062425 .
 4. Kostka, S. M., Payne, D. & Almén, B. Tonal harmony, with an introduction to twentiethcentury music. 4th (2000).
 5. Besson, M. & Schön, D. Comparison between language and music. Annals of the New York Academy of Sciences 930, 232–258 (2001).
 6. Kessler, Y., Shencar, Y. & Meiran, N. Choosing to switch: Spontaneous task switching despite associated behavioral costs. Acta Psychologica 131, 120–128 (2009).
 7. Koch, I. & Allport, A. Cuebased preparation and stimulusbased priming of tasks in task switching. Memory & cognition 34, 433–444 (2006).
 8. Constantino, F. C. & Simon, J. Z. Dynamic cortical representations of perceptual fillingin for missing acoustic rhythm. Scientific reports 7, 17536 (2017).
 9. Komatsu, H. The neural mechanisms of perceptual fillingin. Nature reviews neuroscience 7, 220 (2006).
 10. Ding, N. & Simon, J. Z. Emergence of neural encoding of auditory objects while listening to competing speakers. Proceedings of the National Academy of Sciences 109, 11854–11859 (2012).
 11. Xiang, J., Simon, J. & Elhilali, M. Competing streams at the cocktail party: exploring the mechanisms of attention and temporal integration. Journal of Neuroscience 30, 12084–12093 (2010).
 12. Luo, H. & Poeppel, D. Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron 54, 1001–1010 (2007).
 13. Mesgarani, N. & Chang, E. F. Selective cortical representation of attended speaker in multitalker speech perception. Nature 485, 233 (2012).
 14. Rajan, K., Harvey, C. D. & Tank, D. W. Recurrent network models of sequence generation and memory. Neuron 90, 128–142 (2016).
 15. Alemi, A., Machens, C., Denève, S. & Slotine, J.J. Learning arbitrary dynamics in efficient, balanced spiking networks using local plasticity rules. arXiv preprint arXiv:1705.08026 (2017).
 16. Gilra, A. & Gerstner, W. Predicting nonlinear dynamics: a stable local learning scheme for recurrent spiking neural networks. arXiv preprint arXiv:1702.06463 (2017).
 17. Denève, S., Alemi, A. & Bourdoukan, R. The brain as an efficient and robust adaptive learner. Neuron 94, 969–977 (2017).
 18. Abbott, L., DePasquale, B. & Memmesheimer, R.M. Building functional networks of spiking model neurons. Nature neuroscience 19, 350 (2016).
 19. Maass, W., Natschläger, T. & Markram, H. Realtime computing without stable states: A new framework for neural computation based on perturbations. Neural computation 14, 2531–2560 (2002).
 20. Jaeger, H. & Haas, H. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. science 304, 78–80 (2004).
 21. Sussillo, D. & Abbott, L. F. Generating coherent patterns of activity from chaotic neural networks. Neuron 63, 544–557 (2009).
 22. Pathak, J., Lu, Z., Hunt, B. R., Girvan, M. & Ott, E. Using machine learning to replicate chaotic attractors and calculate lyapunov exponents from data. Chaos: An Interdisciplinary Journal of Nonlinear Science 27, 121102 (2017).
 23. Pathak, J., Hunt, B., Girvan, M., Lu, Z. & Ott, E. Modelfree prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach. Physical Review Letters 120, 024102 (2018).
 24. MayerKress, G., Bargar, R. & Choi, I. Musical structures in data from chaotic attractors (University of Illinois at UrbanaChampaign, 1992).
 25. Elman, J. L. Language as a dynamical system. Mind as motion: Explorations in the dynamics of cognition 195–225 (1995).
 26. Winters, R. M. Musical mapping of chaotic attractors. Wooster, Ohio, EE. UU.: Physics Department, The College of Wooster (2009).
 27. Bidlack, R. Chaotic systems as simple (but complex) compositional algorithms. Computer Music Journal 16, 33–47 (1992). URL http://www.jstor.org/stable/3680849.
 28. Castilho, P. L. Chaotic systems as compositional algorithms mus15 (2015).
 29. Mackenzie, J. P. Chaotic predictive modelling of sound. In ICMC (Citeseer, 1995).
 30. Lorenz, E. N. Deterministic nonperiodic flow. Journal of the atmospheric sciences 20, 130–141 (1963).
 31. James, W. The principles of psychology (new york: Henry holt and company) (1890).
 32. Tulving, E. & Thomson, D. M. Encoding specificity and retrieval processes in episodic memory. Psychological review 80, 352 (1973).
 33. Wheeler, M. E., Petersen, S. E. & Buckner, R. L. Memory’s echo: vivid remembering reactivates sensoryspecific cortex. Proceedings of the National Academy of Sciences 97, 11125–11129 (2000).
 34. Kragel, J. E. et al. Similar patterns of neural activity predict memory function during encoding and retrieval. NeuroImage 155, 60–71 (2017).
 35. Nyberg, L., Habib, R., McIntosh, A. R. & Tulving, E. Reactivation of encodingrelated brain activity during memory retrieval. Proceedings of the National Academy of Sciences 97, 11120–11124 (2000).
 36. Düzel, E. et al. Human hippocampal and parahippocampal activity during visual associative recognition memory for spatial and nonspatial stimulus configurations. Journal of Neuroscience 23, 9439–9444 (2003).
 37. Khader, P., Burke, M., Bien, S., Ranganath, C. & Rösler, F. Contentspecific activation during associative longterm memory retrieval. NeuroImage 27, 805–816 (2005).
 38. Ranganath, C., Heller, A., Cohen, M. X., Brozinsky, C. J. & Rissman, J. Functional connectivity with the hippocampus during successful memory formation. Hippocampus 15, 997–1005 (2005).
 39. Woodruff, C. C., Johnson, J. D., Uncapher, M. R. & Rugg, M. D. Contentspecificity of the neural correlates of recollection. Neuropsychologia 43, 1022–1032 (2005).
 40. Slotnick, S. D. & Schacter, D. L. The nature of memory related activity in early visual areas. Neuropsychologia 44, 2874–2886 (2006).
 41. Johnson, J. D. & Rugg, M. D. Recollection and the reinstatement of encodingrelated cortical activity. Cerebral Cortex 17, 2507–2515 (2007).
 42. Diana, R. A., Yonelinas, A. P. & Ranganath, C. Parahippocampal cortex activation during context reinstatement predicts item recollection. Journal of Experimental Psychology: General 142, 1287 (2013).
 43. Rugg, M. D. & Vilberg, K. L. Brain networks underlying episodic memory retrieval. Current opinion in neurobiology 23, 255–260 (2013).
 44. Bosch, S. E., Jehee, J. F., Fernández, G. & Doeller, C. F. Reinstatement of associative memories in early visual cortex is signaled by the hippocampus. Journal of Neuroscience 34, 7493–7500 (2014).
 45. Marvel, C. L. & Desmond, J. E. From storage to manipulation: how the neural correlates of verbal working memory reflect varying demands on inner speech. Brain and language 120, 42–51 (2012).
 46. PerroneBertolotti, M., Rapin, L., Lachaux, J.P., Baciu, M. & Loevenbruck, H. What is that little voice inside my head? inner speech phenomenology, its role in cognitive performance, and its relation to selfmonitoring. Behavioural brain research 261, 220–239 (2014).
 47. Baddeley, A., Hitch, G. & Bower, G. Recent advances in learning and motivation. Working memory 8, 647–667 (1974).
 48. Morin, A., Uttl, B. & Hamper, B. Selfreported frequency, content, and functions of inner speech. ProcediaSocial and Behavioral Sciences 30, 1714–1718 (2011).
 49. Williams, D. M., Bowler, D. M. & Jarrold, C. Inner speech is used to mediate shortterm memory, but not planning, among intellectually highfunctioning adults with autism spectrum disorder. Development and psychopathology 24, 225–239 (2012).
 50. AldersonDay, B. & Fernyhough, C. Inner speech: development, cognitive functions, phenomenology, and neurobiology. Psychological bulletin 141, 931 (2015).
 51. Addis, D. R., Wong, A. T. & Schacter, D. L. Remembering the past and imagining the future: common and distinct neural substrates during event construction and elaboration. Neuropsychologia 45, 1363–1377 (2007).
 52. Kosslyn, S. M., Ganis, G. & Thompson, W. L. Neural foundations of imagery. Nature Reviews Neuroscience 2, 635 (2001).
 53. Halpern, A. R. & Zatorre, R. J. When that tune runs through your head: a pet investigation of auditory imagery for familiar melodies. Cerebral cortex 9, 697–704 (1999).
 54. Buckner, R. L. & Carroll, D. C. Selfprojection and the brain. Trends in cognitive sciences 11, 49–57 (2007).
 55. Lu, Z. et al. Reservoir observers: Modelfree inference of unmeasured variables in chaotic systems. Chaos: An Interdisciplinary Journal of Nonlinear Science 27, 041102 (2017).
 56. Pathak, J., Hunt, B., Girvan, M., Lu, Z. & Ott, E. Modelfree prediction of large spatiotemporally chaotic systems from data: A reservoir computing approach. Phys. Rev. Lett. 120, 024102 (2018). URL https://link.aps.org/doi/10.1103/PhysRevLett.120.024102.
 57. Lu, Z., Hunt, B. R. & Ott, E. Attractor reconstruction by machine learning. Chaos: An Interdisciplinary Journal of Nonlinear Science 28, 061104 (2018). URL https://doi.org/10.1063/1.5039508. https://doi.org/10.1063/1.5039508.
 58. Afraimovich, V., Verichev, N. & Rabinovich, M. I. Stochastic synchronization of oscillation in dissipative systems. Radiophysics and Quantum Electronics 29, 795–803 (1986).
 59. Pecora, L. M. & Carroll, T. L. Synchronization in chaotic systems. Phys. Rev. Lett. 64, 821–824 (1990). URL https://link.aps.org/doi/10.1103/PhysRevLett.64.821.
 60. Rulkov, N. F., Sushchik, M. M., Tsimring, L. S. & Abarbanel, H. D. Generalized synchronization of chaos in directionally coupled chaotic systems. Physical Review E 51, 980 (1995).
 61. Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. Humanlevel concept learning through probabilistic program induction. Science 350, 1332–1338 (2015).
 62. Battaglia, P. W., Hamrick, J. B. & Tenenbaum, J. B. Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences 110, 18327–18332 (2013).
 63. Tenenbaum, J. B., Kemp, C., Griffiths, T. L. & Goodman, N. D. How to grow a mind: Statistics, structure, and abstraction. science 331, 1279–1285 (2011).
 64. Chater, N., Oaksford, M., Hahn, U. & Heit, E. Bayesian models of cognition. Wiley Interdisciplinary Reviews: Cognitive Science 1, 811–823 (2010).
 65. Griffiths, T. L., Kemp, C. & Tenenbaum, J. B. Bayesian models of cognition (2008).
 66. Kemp, C. & Tenenbaum, J. B. The discovery of structural form. Proceedings of the National Academy of Sciences 105, 10687–10692 (2008).
 67. Chater, N., Tenenbaum, J. B. & Yuille, A. Probabilistic models of cognition: Conceptual foundations (2006).
 68. Tenenbaum, J. B., Griffiths, T. L. & Kemp, C. Theorybased bayesian models of inductive learning and reasoning. Trends in cognitive sciences 10, 309–318 (2006).

We thank Elisabeth A. Karuza, Christopher W. Lynn, Jason Z. Kim, and Arian Ashourvan for helpful comments on earlier versions of this manuscript. ZL and DSB acknowledge support from the John D. and Catherine T. MacArthur Foundation, the Alfred P. Sloan Foundation, the ISI Foundation, the Paul Allen Foundation, the Army Research Laboratory (W911NF1020022), the Army Research Office (BassettW911NF1410679, GraftonW911NF1610474, DCISTW911NF1720181), the Office of Naval Research, the National Institute of Mental Health (2R01DC00920911, R01MH112847, R01MH107235, R21M MH106799), the National Institute of Child Health and Human Development (1R01HD08688801), National Institute of Neurological Disorders and Stroke (R01 NS099348), and the National Science Foundation (BCS1441502, BCS1430087, NSF PHY1554488 and BCS1631550). The content is solely the responsibility of the authors and does not necessarily represent the official views of any of the funding agencies.

The authors declare that they have no competing financial interests.

Correspondence and requests for materials should be addressed to dsb@seas.upenn.edu.
A Parsimonious Dynamical Model for Structural Learning in the Human Brain
Supplementary Information
In this document, we further discuss the general structural learning framework from the perspective of dynamical systems, and we provide technical details regarding the in silico instantiations. Specifically, in Sec. A we discuss information encoding during the learning phase with an emphasis on invertible generalized synchronization. In Sec. B, we discuss how transverse stability affects the system’s performance during the testing phase. In Secs. C–E, we provide the technical details of how we implement recurrent neural networks to learn structures of the Lorenz and the Rössler attractors. In Sec. F we compare our study to prior work.
Appendix A Information Encoding in the Learning Phase
Successful information encoding is guaranteed by the invertible generalized synchronization between the input system and the central system. During the learning phase, the central system is oneway coupled to the input dynamical system,
(9) 
where and the time is discrete. For inputs that are generated by continuoustime dynamical systems (e.g., the Lorenz system or the Rössler system), Eq. (9) is still a valid description as can be interpreted as an evolution function that maps forward along its trajectory by a time step . The central system in the learning phase evolves nonautonomously following,
(10) 
as it is driven by the input generated by Eq. (9). In the main manuscript, we consider cases where evolves onto an invariant manifold that is a strange attractor. However, we note that the theory also works for simpler invariant manifolds such as limit cycles and fixed points.
a.1 Generalized synchronization.
Encoding the input into the central system requires that the central system state becomes uniquely determined by the concurrent input system state, i.e., . Thus, we say that generalized synchronization occurrs between the input system and the central system.[58, 59, 60] Although we only discuss cases where evolves on invariant manifolds that are strange attractors, our approach also applies in cases where is a limit cycle or a stable fixed point.
a.2 The largest conditional Lyapunov exponent.
One common criterion of generalized synchronization is the sign of the largest conditional Lyapunov exponent of the nonautonomous response system.[59] Thus, in our study, we calculate the largest conditional Lyapunov exponent of the nonautonomous central system (Eq. (10)) as a criterion for the information encoding during the learning phase. To calculate , we evolve the learning phase central system (Eq. (10)) with a particular input trajectory , from two closeby random initial states, and , where . The largest conditional Lyapunov exponent is then the exponential convergence or divergence rate of the distance between these two trajectories,
(11) 
Notice that the distance evolves following,
(12) 
where is the Jacobian matrix of ,
(13) 
With Eqs. (1113), we estimate the largest Lyapunov exponent , given a fairly long trajectory and the accompanying input trajectory . If , we say that the central system is generally synchronized to the input system, and the input information is encoded into the central system in the form of .
a.3 Lossless encoding.
To guarantee that the information encoded in the central system is lossless, the generalized synchronization mapping function should be onetoone from to , where and . In the main text, we enforce this to be true by wisely designing the central system. In practice, this onetoone encoding is likely to be achieved when one employs a high dimensional central system with . The tuition behind this choice of large is that, based on the weak Whitney embedding theorem, the function is likely to be onetoone if the dimension of is greater than twice the dimension of the manifold .
Appendix B Transverse Stability and the Testing Phase Performance
When the central system changes from the learning phase architecture to the testing phase architecture, it becomes an autonomous dynamical system,
(14) 
as it replaces the external input in Eq. (10) with a selfgenerated output . Ideally, the learning adapts the output function towards such that for any trajectory as long as is a trajectory on generated by the input system. Thus, the attractor is said to be embedded into the autonomous central system as an invariant manifold where because, for any input trajectory on , the trajectory is a solution of the autonomous central system Eq. (14). In practice however, there always exists noise in the central system as well as a mismatch between and the ideal . Thus, to make sure the central system evolves stably on , we require that the is not only an invariant manifold but an attractor of Eq. (14). If that is the case, then all of the Lyapunov exponents of the autonomous central system should be negative except for those nonnegative exponents that are inherited directly from the chaotic attractor .[57]
Appendix C Preparing Input Trajectories from Chaotic Systems
In this paper, we consider two widely studied dimensional chaotic dynamical systems: the Lorenz system and the Rössler system.
c.1 Input trajectory from the Lorenz sytem.
The input trajectory from the Lorenz system is generated as follows. We integrate the differential equations of the Lorenz system
(15)  
using a th order RungeKutta integrator with time step from a random initial state. Each of the three variables , , and of the trajectory is then normalized to have a mean of zero and a variance of one. This renormalized trajectory is then saved as the Lorenztask input trajectory with time resolution .
c.2 Input trajectory from the Rössler system.
Similar to the Lorenz system, the input trajectory from the Rössler system is generated as follows. We integrate the differential equation of the Rössler system
(16)  
using a th order RungeKutta integrator with time step from a random initial state. The coefficients of the Rössler system in Eq. (16) are chosen such that the system has a similar time scale to that of the Lorenz system. We then renormalize each of the three variables, , , and , such that they all have a mean of zero and a variance of one. We then save the renormalized trajectory as the the Rösslertask input trajectory with time resolution .
c.3 Avoiding overlapping attractors in multitask learning.
For the multitask learning cases depicted in both Fig. 3 and Fig. 4 in the main text, we further modify the input trajectories to prevent any overlap between the Lorenz attractor and the Rössler attractor by shifting the Lorenztask input such that it has mean , and by shifting the Rösslertask input such that it has mean .
Appendix D The Recurrent Neural Network Model
In all of the systems that we instantiated in silico, we model the central system as a random recurrent neural network with neurons,
(17) 
where operating on a vector returns a vector with the same shape that satisfies . The adjacency matrix is the weighted adjacency matrix of the recurrent neural network. The input weight matrix propagates the input variables to the neurons. The vector is a random vector with its elements drawn uniformly from . We choose a simple topology of the recurrent neural network where the adjacency matrix is a sparse ErdősRényi random matrix. The sparseness of (the fraction of nonzero elements) is set to be . All nonzero elements in are drawn uniformly from . The value of is determined by having the spectral radius of (the magnitude of the largest eigenvalue of ) to be . We construct the input weight matrix in such a way as to ensure that each neuron receives one and only one input variable from the dimensional input. The input connection strength (each nonzero element in ) is drawn uniformly at random from the interval .
Appendix E Learning by Adapting the Output Weight Matrices
Throughout the examples given in the main text, exemplary Lorenz and Rössler input trajectories all have length time units ( data points based on the sampling rate ). During each learning phase, the output matrix adapts accordingly following
(18) 
where is the learning rate. Notice that there is a transient period from the initial central system state before the generalized synchronization occurrs. We thus freeze for time points during each learning phase. We let the system learn each task times. For cases where the RNN is asked to learn both the the Lorenz task and the Rössler task (Figs. 3 and 4 in the main text), we let it learn both tasks in an alternate manner. To ensure that converges, we empirically choose the learning rate for the first repetitions and then decrease the learning rate afterward so that the later adaptions can finetune .
Appendix F Comparison with Previous Work
Previous studies have demonstrated that recurrent neural networks can be trained to accomplish many tasks. For example, reservoir computing networks are trained to generate periodic patterns[21] and even to replicate longterm dynamics on a chaotic attractor.[21, 22, 56] Our structural learning framework incorporates the idea that a recurrent neural network learns the chaotic dynamics by reconstructing the chaotic attractor.[57] Using this framework, we show that the system has the ability to adaptively learn dynamics on multiple chaotic attractors. A complementary adaptive learning scheme called “FOLLOW” shows that a recurrent spiking neural network can learn nonlinear dynamics by adjusting the synapses between neurons.[16] A unique feature of our model is that the recurrent network and the synapses that comprise it are kept fixed such that different external inputs drive the dynamics of the RNN. We made this choice because it allows for an easy implementation and interpretation of the learning of more than one chaotic attractor. In fact, the adaptation of the internal output connections also allows the learning of other downstream functions (such as separating mixed input) without adversely affecting the perception manifold.