# Repeated sequential learning increases memory capacity via effective decorrelation in a recurrent neural network

###### Abstract

Memories in neural system are shaped through the interplay of neural and learning dynamics under external inputs. By introducing a simple local learning rule to a neural network, we found that the memory capacity is drastically increased by sequentially repeating the learning steps of input-output mappings. The origin of this enhancement is attributed to the generation of a Psuedo-inverse correlation in the connectivity. This is associated with the emergence of spontaneous activity that intermittently exhibits neural patterns corresponding to embedded memories. Stablization of memories is achieved by a distinct bifurcation from the spontaneous activity under the application of each input.

Through sequential learning, the brain learns to appropriately respond to various inputs. In neural system, synaptic connections are modified to shape neural dynamics such that the applied stimulus and desired response are adequately represented therein. After learning, the stimulus is represented according to the shaped neural dynamicsBlumenfeld et al. (2006); Bernacchia and Amit (2007); McKenzie et al. (2013); Dunsmoor et al. (2015); Driscoll et al. (2017). How memories are successively embedded into neural dynamics through the interplay between the neural dynamics and learning process is a crucial question in neuroscience.

To understand the representation of memories in neural system, associative memory models are often studied. In conventional modelsAmari (1977); Hopfield (1984); Amit et al. (1987), multiple memories are designed to be embedded into corresponding attractors and are generated by a simple learning rule. In spite of their success, however, the interplay between neural dynamics and learning has not been taken into account: when learning a new memory, the change in a connection was determined only by the memory pattern, independently of the already-shaped neural dynamics.

In contrast, we previously proposed a novel associative memory modelKurikawa and Kaneko (2012, 2013) that incorporates interactions between neural dynamics and the learning process. In these studies, however, each pattern was only presented once during learning, and existing memories are gradually eroded as new patterns are learned.

In the present Letter, we first introduce a theoretical formulation for a sequential and repeated learning process which interacts with neural dynamics. By studying this learning process, we investigated if all memories are able to be successfully stored by repeated learning. If so, we then address what kind of neural-network structure enables such enforcement and how memories are represented in neural dynamics upon input. We also study spontaneous dynamics without input which is suggested to involve computation in neural systemOrbán et al. (2016); Berkes et al. (2011); Buesing et al. (2011); Ma et al. (2006); Litwin-Kumar and Doiron (2012); Hennequin et al. (2018).

We consider a model that consists of continuous rate-coding neurons to memorize Input-Output (I/O) mappings. The activity is set between -1 and 1 and evolves according to

(1) |

where denotes a connection from the -th to -th neuron, an dimensional vector is an input pattern, being its strength, and is the index of I/O mappings to be learned. In the following section, and are set at 1.0 and 4.0, respectively, unless otherwise stated.

For each input , we set an dimensional vector as a target. These input and target patterns are generated as random -bit binary patterns, with probabilities . In the presence of each input , the corresponding target is required to be recalled, i.e., an attractor matching is generated. The learning process is required to modify the connectivity to enable the network recall the targets.

PreviouslyKurikawa and Kaneko (2013), we showed such a memory structure is formed through a simple learning rule. To make repeated sequential learning, we added a decay term

(2) |

where . We use a learning rate unless otherwise stated. According to this learning rule, . Thus, this rule preserves , if initially : takes a binary value with probabilities before learning. Diagonals of are set at zero during the whole process. The learning process stops automatically when the neural activity matches the target, because ; otherwise, the learning process continues. Here we imposed I/O maps successively: an input is applied to learn a target and after learning the map was completed, another input is applied to learn another target. The learning process for each single I/O map is called a learning step and denoted . During the learning process, maps are applied in the order for steps. Then, they are randomly applied for steps.

Fig. 1A shows the recall processes in response to two input patterns after learning. is applied from to . Under the input, neural dynamics are modified and the neural state converges to . When is applied instead of the input 1, the neural dynamics are modified differently and, then, the neural state converges to . A neural state that provides a desired target pattern is an attractor. Here, these two I/O maps are successfully recalled.

We first analyzed how repeated learning enhances the memory capacity. For this purpose, we computed the temporal average of an overlap of neural activity with a target, in the presence of input . and represent the temporal average and average over network and trials in Fig. 1B. At this stage, networks can recall only one or two targets perfectly and overlaps with other targets decrease rapidly, independently of . After learning these targets more and more times (), however, recall performance increases and targets of about 60 are recalled perfectly. Networks fail to memorize target patterns beyond . Thus, indicates the limit of memory.

To evaluate this memory capacity in detail, we calculated the averaged overlap over maps, networks and trials, and plotted it for different in Fig 1C, where represents average over index of I/O maps. After learning steps, the average overlap decreases rapidly, while, after learning steps, the overlap is maintained at around unity up to . Therefore, the capacity of the present model is estimated to be . To explore dependence of the memory on , we examined for different . We found that the memory capacity increases monotonically as increases and saturates around (See Fig. S1). Thus, we studied behavior for and as typical samples in the earlier and later stage of learning.

Enhancement in the memory capacity after iterative learning is not trivial, but depends on the learning speed and . As shown in Fig. S1, the memory capacity is decreased as increases and decreases. Especially, the memory capacity for and is almost one, which cannot be increased by repeated learning. This result indicates the need for an adequate relation between the timescales of neural and learning dynamics, as well as the nature of neural dynamics, is important to enhance memory capacity through repeated learning.

Next, we examined the nature of shaped throughout the learning process and its relevance to the enhancement in the memory capacity.
To this end, we calculated singular values of the connectivity for different learning steps.
A learned connectivity is decomposed as .
is a transpose matrix of with as diagonal matrix whose elements are singular values.
The values are plotted in the order of their magnitude for in (Fig 2A).
They decrease continuously for earlier learning steps, while, after long learning, there appears a large discontinuity at 60.
For different , the singular value always show a discontinuous drop at at the later learning stage.
This means left and right singular vectors become dominant in the connection throughout learning^{1}^{1}1The remaining singular vectors still enhance the memory performance. Actually, a matrix consisting of only dominant vectors shows a small decrease in the performance as shown in Fig S2. .

Recalling the connections in the Hopfield modelHopfield (1984) () and
our previous models Kurikawa and Kaneko (2012, 2013) (),
we hypothesize that these vectors consist mainly of linear combinations of and and the other left and right singular vectors are in the normal space to these combinations.
To examine this hypothesis, we used and .
Here, and are -th elements of -th left and right singular vectors, respectively.
Contributions of and to () are roughly estimated by and (, and ), respectively
^{2}^{2}2“roughly” means and we discarded terms of , since targets and inputs are not exact normal orthogonal basis () .
We measured , the average contribution of targets to one of dominant left singular vectors and
also the corresponding quantities for and in Fig.2B.
All of the values are much higher than chance level meaning that the dominant vectors mainly consist of targets and inputs.
Particularly and increases with learning.

We also found that is highly correlated with , while is correlated to (Fig. S2). Thus, dominant left and right singular vectors are decomposed as

(3) |

where () is the correlation coefficient between and , ( and ). We found also that is highly correlated with across for a given , but not with (Fig. S2). By this analysis, in total, is decomposed as

(4) |

where and is an -th singular value. Note that, to enhance recall performance, non-diagonal terms () should be small. In our model, actually, they are much smaller than the diagonal ones of , since there is no correlation between and . Additionally, these non-diagonal terms are further reduced as learning progresses, as shown in Fig. 2C.

To achieve optimal memory capacity, it is generally believed that inverse matrix of correlation between targets has to be introduced into the connectivity Personnaz et al. (1986); Kanter and Sompolinsky (1987); Diederich and Opper (1987) to reduce the interference due to correlation between targets. In the case of our model, , where the interference term is defined as ( and ).

Instead of obtaining the exact form of the connectivity^{3}^{3}3For it, detailed analysis to the order of is necessary, which is not easy due to the same order of noise generated by correlation between any two random patterns., we focus on whether the learned connectivity effectively decorrelates the patterns.
Recalling that, in the standard Hopï¬eld network corresponding to the case that is a diagonal matrix, the standard deviation of () follows , whereas it follows for the Pseudo-inverse correlation matrix Personnaz et al. (1986); Kanter and Sompolinsky (1987); Diederich and Opper (1987).
We, hence, measured () and estimated its dependence on and in Fig. 2D and in Fig. S2 for the present connection matrix shaped by learning.
We found that the standard deviation at the earlier stage of learning (at ), it follows ,
but at the later stage of learning it turns to follow (for ).
This result implies that our learning rule effectively shapes the inverse correlation matrix into the connectvity throughout the learning process to optimally reduce interference.

Next, we analyzed how memories are represented after the inverse correlation matrix is shaped. We first focus on modification of neural dynamics against input strength . In Fig. 3A, we plot a bifurcation diagram against for . Neural activity for , i.e., spontaneous neural activity, oscillates around the origin. As increases, it moves towards a target while maintaining the oscillation amplitude. At a certain strength, the attractor of neural dynamics bifurcates from oscillation to a fixed point corresponding to the target. Neural dynamics projected onto 2-D plane is plotted around the bifurcation point in Fig. 3. Neural activity with a large-amplitude oscillation reduces into a fixed point corresponding to the target between and . Beyond the bifurcation point, the fixed point stays around the target as is increased. Thus, neural activity corresponding to target recall is clearly distinguished from other activities through a bifurcation and is stable against change in beyond the bifurcation point.

We then explored the behavior of neural activity against the mixture of two learned inputs. As an example, the phase diagram of against strength of two learned inputs () is shown in Fig. 3B. The fixed point corresponding to shapes a distinctive phase, at boundary of which bifurcation from the target fixed point to oscillating dynamics occurs (Fig. S3). The fixed point of the other target () provides similar phase diagram (Fig. S3). As an input pattern is changed to increase in the form of (), the attractor bifurcates from the fixed point of to oscillating neural activity close to and to that close to , then, to the fixed point of . These results show that a target is represented as a distinctive phase of the fixed point which is separated by the bifurcations from the attractor with the oscillation.

We asked how robust these memories are against perturbation in inputs. To examine the robustness, we applied quenched random noise with strength to the original input patterns, as ( is an -dimensional vector whose elements are random number from uniform probability distribution ) and analyzed stability of the neural activity that recalls the target (Fig. 3C). For small , the fixed point of the target is insensitive to the noise and remains around unity. Beyond the bifurcation point, the fixed point is collapsed into neural activity showing oscillation.

To close the analysis of neural dynamics, we explored how spontaneous activity is related to the recall performance through learning. For earlier learning step, spontaneous activity shows chaotic dynamics that intermittently approach and depart from targets in Fig. S4A. Here, only a few targets each of which is successfully recalled upon input are approached (as well as their opposite patterns due to parity symmetry in our model) in Fig. 4A and Fig. S4B. For later step, the spontaneous activity also approaches targets, but, here, many targets are more equally approached. We further analyzed neural dynamics by using principal components (PC) analysis and measuring Lyapunov dimension in Fig. 4B. We found that the variation of the spontaneous activity is larger and more chaotic, when learning progresses and recall performance is improved. Thus, the spontaneous activity is constrained on lower dimension along axis connecting target patterns for lower recall performance, whereas that is distributed more isotropically against target patterns across higher dimensions for higher performance.

To confirm this relation between spontaneous activity and recall performance generally, we examined the spontaneous activity for different in Fig. S4. For smaller , recall performance is higher and the spontaneous activity shows high-dimensional distribution which is close to all of the targets. As increases, in contrast, recall performance decreases and the spontaneous activity turns to be low-dimensional, approaching to only a few targets which are perfectly recalled. Finally, for quite large (=5), the spontaneous activity turns to be a few of fixed points which are target patterns and only these target are successfully retrieved. These results support the relation between the spontaneous activity approaching the targets and recall performance.

To sum, by studying neural networks that memorize I/O maps, we have shown how repeated learning stabilizes each memorized state and enhances memory capacity via the interplay between neural dynamics and learning. In usual sequential learning, e.g., gradient descent methodRumelhart et al. (1985); Williams and Zipser (1989) and palimpsest memoryAmit and Fusi (1994); Brunel et al. (1998); Fusi and Abbott (2007), connections are slowly shaped. The network’s output moves in the direction of the desired target, but does not match it after a single step. In contrast, in the present study, connections are modified such that the network generates the correct target after each step in one shot. Thus, we can analyze how targets are embedded in neural dynamics and how the representation of these targets changes through learning. Interaction between neural dynamics and learning was investigated to reveal how neural representation is shaped in several studies Bernacchia and Amit (2007); Bernacchia (2014); Blumenfeld et al. (2006); Siri et al. (2008); Galtier et al. (2011); Kim et al. (2008). These studies, however, did not focus on parametric effects of neural dynamics (e.g., the gain parameter) and learning (e.g., the learning speed) on learning performance and representation of memories.

Spontaneous activity which intermittently reproduces stimulus-evoked patterns is commonly reported in visualKenet et al. (2003); Berkes et al. (2011) and auditoryLuczak et al. (2009) cortices. Theoretical studiesZenke et al. (2015); Hartmann et al. (2015); Miconi et al. (2016); Litwin-Kumar and Doiron (2014); Bernacchia (2014) demonstrated how the spontaneous activity is shaped through learning. Our study provides another simple learning rule to form such spontaneous activity. Further, we showed a relation between features of spontaneous activity and recall performance - consistent with its interpretation as a prior distribution in terms of probabilistic inferenceOrbán et al. (2016); Berkes et al. (2011); Buesing et al. (2011); Ma et al. (2006). More generally, properties of neural dynamics relevant for information processing were investigatedToyoizumi and Abbott (2011); Bertschinger and Natschl (2004); Legenstein and Maass (2007); Sussillo and Abbott (2009), and the edge of chaos was suggested as an appropriate regime. Our model suggests that high-dimensional chaos with intermittent visits to learned patterns is suitable to produce appropriate targets in response to inputs. The role of such itinerant dynamicsKaneko and Tsuda (2003) has been discussed over decadesTsuda (1992); Skarda and Freeman (1987); Rabinovich et al. (2008), and the present study clearly demonstrates it.

The Pseudo inverse modelPersonnaz et al. (1986); Kanter and Sompolinsky (1987); Diederich and Opper (1987) can achieve higher memory capacity than the standard Hopfield network Hopfield (1984); Amit et al. (1987). In this model, the inverse correlation matrix of memories is included in the connectivity to reduce interference among memories in recall, and non-local information is required to shape this connection. Further, DiedrichDiederich and Opper (1987) proved the local learning rule can shape such a connectivity after repeated learning. In our model, if we focus only on the relaxation dynamics in the vicinity of , a fixed point of neural dynamics in eq. 1, which is given by . Then, the learning rule in eq. 2 takes a similar form with the Diedrich rule, by neglecting the decay term. This may partially explain why our local, repeated learning shapes the connection matrix to include the inverse correlation matrix and enhances the memory capacity.

In the present study, in contrast to the ordinary associative memoryHopfield (1984); Amit et al. (1987); Amit and Fusi (1994); Brunel et al. (1998); Fusi and Abbott (2007), each memory is recalled through an input-induced bifurcation from the spontaneous neural activity. After repeated learning, the spontaneous activity and the fixed point of the recalled memory state are distinguished discontinuously through this bifurcation, resulting in the stability of memory against a perturbed input pattern. Although modulation of neural dynamics by input is analyzed in some studiesMinai and Anand (1998); Rajan et al. (2010); Rubin et al. (2015), our study suggests that the memory state is represented as a robust and distinct phase against the parameter space of input strength. This discrete representation of memory often observed in auditoryBathellier et al. (2012), olfactoryNiessing and Friedrich (2010) cortices and HippocampusWills et al. (2005). In these cortices, neural activity patterns are discretely switched between two memory states depending on the intensity of sensory inputs and/or ratio of mixture of two different inputs. Our model provides a simple learning rule to form such memory representations and gives a prediction in the terms of spontaneous activity properties and memory performance.

## I Aknowledgement

We thank David Colliaux for fruitful discussion. This work was partly support by KAKENHI (no. 18K15343) and Hitachi The university of Tokyo for funding.

## References

- Blumenfeld et al. (2006) B. Blumenfeld, S. Preminger, D. Sagi, and M. Tsodyks, Neuron, 52, 383 (2006), ISSN 0896-6273.
- Bernacchia and Amit (2007) A. Bernacchia and D. J. Amit, Proceedings of the National Academy of Sciences of the United States of America, 104, 3544 (2007), ISSN 0027-8424.
- McKenzie et al. (2013) S. McKenzie, N. T. M. Robinson, L. Herrera, J. C. Churchill, and H. Eichenbaum, The Journal of neuroscience : the official journal of the Society for Neuroscience, 33, 10243 (2013), ISSN 1529-2401.
- Dunsmoor et al. (2015) J. E. Dunsmoor, V. P. Murty, L. Davachi, and E. a. Phelps, Nature (2015), ISSN 0028-0836, doi:10.1038/nature14106.
- Driscoll et al. (2017) L. N. Driscoll, N. L. Pettit, M. Minderer, S. N. Chettih, and C. D. Harvey, Cell, 170, 986 (2017), ISSN 00928674.
- Amari (1977) S.-i. Amari, Biological Cybernetics, 185, 175 (1977).
- Hopfield (1984) J. J. Hopfield, Proceedings of the National Academy of Sciences of the United States of America, 81, 3088 (1984), ISSN 0027-8424.
- Amit et al. (1987) D. J. Amit, H. Gutfreund, and H. Sompolinsky, Annals of Physics, 173, 30 (1987).
- Kurikawa and Kaneko (2012) T. Kurikawa and K. Kaneko, EPL (Europhysics Letters), 98, 48002 (2012), ISSN 0295-5075.
- Kurikawa and Kaneko (2013) T. Kurikawa and K. Kaneko, PLoS computational biology, 9, e1002943 (2013), ISSN 1553-7358.
- Orbán et al. (2016) G. Orbán, P. Berkes, J. Fiser, and M. Lengyel, Neuron, 530 (2016).
- Berkes et al. (2011) P. Berkes, G. Orbán, M. Lengyel, and J. Fiser, Science (New York, N.Y.), 331, 83 (2011), ISSN 1095-9203.
- Buesing et al. (2011) L. Buesing, J. Bill, B. Nessler, and W. Maass, PLoS Computational Biology, 7, e1002211 (2011), ISSN 1553-7358.
- Ma et al. (2006) W. J. Ma, J. M. Beck, P. E. Latham, and A. Pouget, Nature neuroscience, 9, 1432 (2006), ISSN 1097-6256.
- Litwin-Kumar and Doiron (2012) A. Litwin-Kumar and B. Doiron, Nature neuroscience, 15, 1498 (2012), ISSN 1546-1726.
- Hennequin et al. (2018) G. Hennequin, Y. Ahmadian, D. B. Rubin, M. Lengyel, and K. D. Miller, Neuron, 98, 846 (2018), ISSN 08966273.
- (17) The remaining singular vectors still enhance the memory performance. Actually, a matrix consisting of only dominant vectors shows a small decrease in the performance as shown in Fig S2.
- (18) “roughly” means and we discarded terms of , since targets and inputs are not exact normal orthogonal basis ().
- Personnaz et al. (1986) L. Personnaz, I. Guyon, and G. Dreyfus, Physical Review A, 34 (1986).
- Kanter and Sompolinsky (1987) I. Kanter and H. Sompolinsky, Physical Review A, 35, 380 (1987).
- Diederich and Opper (1987) S. Diederich and M. Opper, Physical review letters, 58, 949 (1987).
- (22) For it, detailed analysis to the order of is necessary, which is not easy due to the same order of noise generated by correlation between any two random patterns.
- Rumelhart et al. (1985) D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Technical rept. Mar-Sep 1985 (1985).
- Williams and Zipser (1989) R. J. Williams and D. Zipser, Neural Computation, 1, 270 (1989), ISSN 0899-7667.
- Amit and Fusi (1994) D. J. Amit and S. Fusi, Neural Computation, 982, 957 (1994).
- Brunel et al. (1998) N. Brunel, F. Carusi, and S. Fusi, Network (Bristol, England), 9, 123 (1998), ISSN 0954-898X.
- Fusi and Abbott (2007) S. Fusi and L. F. Abbott, Nature Neuroscience, 10, 485 (2007), ISSN 1097-6256.
- Bernacchia (2014) A. Bernacchia, Frontiers in Synaptic Neuroscience, 6, 1 (2014), ISSN 1663-3563.
- Siri et al. (2008) B. Siri, H. Berry, B. Cessac, B. Delord, and M. Quoy, Neural computation, 20, 2937 (2008), ISSN 0899-7667, arXiv:0705.3690v1 .
- Galtier et al. (2011) M. N. Galtier, O. D. Faugeras, and P. C. Bressloff, Neural computation, 24, 2346 (2011), ISSN 0899-7667, arXiv:1102.0166 .
- Kim et al. (2008) Y. Kim, B. B. Vladimirskiy, and W. Senn, Frontiers in computational neuroscience, 2, 1 (2008), ISSN 1662-5188.
- Kenet et al. (2003) T. Kenet, D. Bibitchkov, M. Tsodyks, A. Grinvald, and A. Arieli, Nature, 425, 954 (2003).
- Luczak et al. (2009) A. Luczak, P. Bartho, and K. D. Harris, Neuron, 62, 413 (2009).
- Zenke et al. (2015) F. Zenke, E. J. Agnes, and W. Gerstner, Nature Communications, 6, 6922 (2015), ISSN 2041-1723.
- Hartmann et al. (2015) C. Hartmann, A. Lazar, B. Nessler, and J. Triesch, PLoS Computational Biology, 11, 1 (2015), ISSN 15537358.
- Miconi et al. (2016) T. Miconi, J. L. McKinstry, and G. M. Edelman, Nature Communications, 7, 13208 (2016), ISSN 2041-1723.
- Litwin-Kumar and Doiron (2014) A. Litwin-Kumar and B. Doiron, Nature Communications, 5, 5319 (2014), ISSN 2041-1723.
- Toyoizumi and Abbott (2011) T. Toyoizumi and L. F. Abbott, Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 84, 1 (2011), ISSN 15393755.
- Bertschinger and Natschl (2004) N. Bertschinger and T. Natschl, 1436, 1413 (2004).
- Legenstein and Maass (2007) R. Legenstein and W. Maass, Neural networks : the official journal of the International Neural Network Society, 20, 323 (2007), ISSN 0893-6080.
- Sussillo and Abbott (2009) D. Sussillo and L. F. Abbott, Neuron, 63, 544 (2009), ISSN 1097-4199.
- Kaneko and Tsuda (2003) K. Kaneko and I. Tsuda, Chaos (Woodbury, N.Y.), 13, 926 (2003), ISSN 1054-1500.
- Tsuda (1992) I. Tsuda, Neural Networks, 5, 313 (1992), ISSN 08936080.
- Skarda and Freeman (1987) C. A. Skarda and W. J. Freeman, Behavioral and brain sciences, 10, 161 (1987).
- Rabinovich et al. (2008) M. I. Rabinovich, R. Huerta, P. Varona, and V. S. Afraimovich, PLoS Comput Biol, 4, e1000072 (2008).
- Minai and Anand (1998) A. Minai and T. Anand, Biological cybernetics, 79, 87 (1998), ISSN 0340-1200.
- Rajan et al. (2010) K. Rajan, L. F. Abbott, and H. Sompolinsky, Physical Review E, 82, 1 (2010), ISSN 1539-3755.
- Rubin et al. (2015) D. B. Rubin, S. D. VanHooser, and K. D. Miller, Neuron, 85, 402 (2015), ISSN 10974199.
- Bathellier et al. (2012) B. Bathellier, L. Ushakova, and S. Rumpel, Neuron, 76, 435 (2012), ISSN 1097-4199.
- Niessing and Friedrich (2010) J. Niessing and R. W. Friedrich, Nature, 465, 47 (2010).
- Wills et al. (2005) T. J. Wills, C. Lever, F. Cacucci, N. Burgess, and J. O’Keefe, Science (New York, N.Y.), 308, 873 (2005), ISSN 1095-9203.

## Supplemental materials

### .1 Change in recall performance during learning process

We studied how memory performance is changed through learning. We plotted against with the increase of . It is sorted in the order of magnitude of the overlap in Fig S1A(i). For early learning stage, only a few targets are stored, while, for the later stage, the number of targets perfectly recalled increases rapidly. Here, we measured as recall performance in A(ii). For and , we plot the recall performance against leaning step . The capacity increases rapidly up to and almost saturates at .

### .2 recall performance for different and

For and , the memory capacity is enhanced through repeated learning. We explored its dependence on different parameters, especially, and . is a time scale of learning process relative to that in neural dynamics. We plotted capacity curve for various in Fig.S1B. As increases, the number of patterns which are successfully recalled decreases and for , only one pattern is recalled. We also explored dependence of the recall performance on . Generally, in randomly coupled neural network models, attractors change from fixed points to chaos with the increase in . We plotted the recall performance for different in Fig. S1C. As increases, the recall performance is increased. For , only one or two memories are recalled successfully. These results show that relationship between timescales of neural dynamics and learning process is crucial to shape successful memories.

### .3 relationship between and

Scatter plot of is displayed in Fig S2A(i). is negatively correlated with . We also show a scatter plot of () in Fig S2A(ii). is positively correlated with . In the Right panel in Fig S2A, we plot () (). There is no correlation between them.

### .4 dependence of on

We plot the standard deviation (SD) of for different in Fig S2B. SD both in earlier and later stages of learning are scaled as .

### .5 Representation of targets

We explored the behaviors of the overlap with the target against inputs. In Fig S3A, we show the bifurcation of the overlap with the target 48 against strength of the input 48 under the presence of input 49 in addition to Fig 3B. the bifurcation of the overlap with the target 49 is plotted in Fig S3C, whereas the bifurcate diagram for inputs 48 and 49 are shown in Fig S3B. All the results support that recall of the target pattern is represented as a distinctive phase of the corresponding fixed-point attractor and as separated from oscillating neural activity.

### .6 Spontaneous activity

We analyzed how the nature of spontaneous activity is changed through learning. Spontaneous activity shows chaotic behavior intermittently approaching some targets in Fig. S4A. For earlier learning, we found clear correlation between recall performance and maximum overlap as shown in Fig. S4B. A few targets which show nearly perfect recall performance are closely approached by the spontaneous activity. For later learning, in contrast, there appears no clear correlation. Almost all targets show perfect recall performance and their closeness (the maximum overlap) is distributed in middle value.

Next, we explored spontaneous activity for different . As decreases and recall performance increases (Fig. S1B and Fig. S4E), the spontaneous activity is distributed broader (Fig. S4D and Fig. S4F) and more chaotic (Fig. S4F). This relation between the spontaneous activity and recall performance is consistent with that for different learning step. For quite larger , some fixed points, instead of chaotic dynamics, are shaped, one of which corresponds to the latest trained network in Fig. S4G.