[

[

[
Abstract

Recurrent Neural Networks or RNN are frequently used to model different aspects of brain regions. We studied the properties of RNN trained to perform temporal and flow control tasks with temporal stimuli. We present the results regarding three aspects: inner configuration sets, memory capacity with the scale and immunity to induced damage on trained networks. Our results allow us to quantify different aspects of these physical models, which are normally used as black boxes and must be understood previous modeling the biological response of cerebral cortex.

\helveticabold

1 Keywords:

trained RNN, temporal tasks, flow control tasks, cerebral cortex, Keras

\firstpage

1

A detailed study of RNN] A detailed study of recurrent neural networks used to model tasks in the cerebral cortex

C. Jarne and R. Laje]Jarne C. , Laje R.  \correspondance

\extraAuth

2 Introduction:

Recurrent neural networks are models that have been used during the last 35 years to understand the fundamental characteristics of a great variety of dynamical systems including the brain and also it have been used in other fields. This is mainly the result of the work presented in foundational research papers such those written by Hopfield Hopfield3088, Elman ELMAN1990179 and Funahashi and NakamuraDBLP:journals/nn/FunahashiN93; DBLP:journals/nn/Funahashi89.

Recurrent neural networks or RNN has been widely used in different applications ranging from modeling brain processes Gerstner60; 10.1371/journal.pcbi.1004792; 10.1371/journal.pcbi.1005175; 10.3389/fncom.2013.00070; nature_com; nature_03; reminton_02, through stability and control of systems DENG2013281; DINH201444; 7966138; PhysRevLett.118.258101 to machine learning DBLP:journals/ijon/GallicchioMP17; nature_02; libro_keras; deep-learning.

Currently, the fields involving RNN that stand out the most are machine learning and computational neurosciences. In the first one, the main goal is to produce efficient network topologies and training methods that could predict the further state of data time series (forecast) to perform decisions, with minimums computational cost. In Computational Neurosciences the goal is different. Since the brain is characterized by massively recurrent connectivity, the use of RNN is applied to model and explain the different mechanisms corresponding to different brain areas or processes. This happens because is well known that dynamical computation underlies a variety of models for information processing and memory function in the brain.

For that reason, the main goal of present work is to explore the inner properties of a simple general model that perform a set of cognitive inspired tasks and discuss the the results of the analysis of the model. We studied RNN trained to perform a set of different tasks processing temporal stimuli.

Main motivation in this work is that the considered models are frequently used to describe different processes in the brain.

RNN constitute a versatile model of neuroscience research. In general, as a first step one idealizes a task by describing the essence of the behavior as an input-output transformation. In the language of dynamical systems, the network should represent an abstract variable that obeys a low dimensional equation. Then these dynamic is translate in the connectivity of an RNN.

Every value of the variable corresponds to a certain point in the N-dimensional state space. The collection of such points forms a line attractor. Different tasks give rise to different dynamical objects in the space state. It is interesting that there are still many fundamental gaps in the theory of RNN networks BARAK20171. This work tries to elucidate some aspects of the networks configuration.

One problem of particular interest in Computational Neurosciences is to model the dynamics of the Cerebral Cortex and how does it process the flow of information 10.3389/fncom.2011.00001. Recent experimental studies of neurons recorded from cortex, reveal complex temporal dynamics SUSSILLO2014156; DBLP:journals/neco/SussilloB13; nature_01; nature_com, where different mechanisms of control the information flow could be present and coexist. The response of cortical circuits to sensory stimulation can be both multistable and graded. A simple model could perform computations that are similar to stimulus selection, gain modulation and temporal pattern generation in the cortex nature_letter_hahnloser.

Recurrent neural networks can be trained to process information and perform different tasks such as flow control with a different type of operations that roughly model brain areas.

Currently, some aspects of general models are used to describe the experimental results observed in different studies carnevale; 10.3389/fncom.2017.00112; nature_04.For instance, it has been study recently that recurrent circuits in brain may play a role in objetc identification PMID:31036945.

In this work, we studied the properties of the model that is generally used to represent a brain area or subgroup of neurons that perform a sensory response to the stimulus. We trained a simple minimal model using recurrent neural networks to perform a set of tasks of interest. On one hand, we chose those tasks that are relevant in processing information and flow control. On the other hand, we chose tasks that traditionally were used in previous works to model the behavior of different brain areas, particularly cortex SUSSILLO2014156.

Another aspect to consider is that currently are under study the computational principles that allow decisions and action regarding flexibly in time. In a recent review review-motor a dynamical system perspective is used to study such flexibly in such systems and shows how can be achieved though manipulations of inputs and initial conditions. Temporal aspects of RNN, constraint the parameters, topologies and different parts of the computation are aspects that deserve to be studied and will allow to improve current neuronal models.

Trained networks serve as a source of mechanistic hypotheses and also as a testing ground for data analyses that could link neural computation to behavior. RNN is a valuable platform for theoretical investigation.

We focus on the study of networks trained for the following list of tasks regarding the processing of stimuli as temporal inputs:

  1. Memorizing and reproducing a stimulus with a time delay.

  2. Binary basic operations between input stimuli with AND, OR, NOT, XOR.

  3. Flip-Flop task. i. e. memorizing and forgetting a stimulus.

  4. A stimulus that causes an oscillation output during certain time.

It should be noted that the tasks described at item 2 are not related to those made by static feedforward networks like libro_static. In all the tasks, the focus is on the process of temporal signals similar to the Xor temporal task implemented in ELMAN1990179. Also, we want to point out that in item 3 we refer to a network learning the “Flip-Flop rule” as in DBLP:journals/neco/SussilloB13 with two inputs. We are not referring to the concept of “Flip Flop neurons” such the one proposed in 7727548.

The RNN model emulates a “cognitive-type” cortical circuit such as the prefrontal cortex, which receives converging inputs from multiple sensory pathways and projects downstream to other areas. We designed our network architecture to be general enough for all the tasks mentioned above.

We used dimensional reduction methods to study the inner state of the network during and after training and discuss specifically the results and observations regarding each task dim_red_nature. In particular, PCA is the method that we chose because it has been widely used in the study of simulations as well as experimental high dimensional neural space states 10.1371/journal.pcbi.1002057

For the network implementation and training, we propose to use Keras libraries from chollet2015keras and Tensorflow from tensorflow2015-whitepaper as frameworks, where traditionally Matlab from thompson1995image is used or Theano from 2016arXiv160502688short, implemented in some works such as 10.3389/fncom.2018.00083. The reason for our selection is that these new scientific libraries are open source and its use is rapidly growing.

In the case of Keras, it is the first time that it is used for such kind of studies. In the case of Tensorflow, we fond in the literature a few recent works where it start to be used as a computational tool, where the work from williams stands out more.

We will present the results of our three studies performed: one regarding the initial conditions and the final possible configurations. A second study of scale and memory of networks and finally a study of induced damage on a trained network.

The rest of the paper is organized as follows. In Section 3 we described the network model, training method and the code implementation. In section 4.3 we explain each task listed above and describe how those are implemented. In section 5 we present and discuss the results and finally in Section 6 we present the conclusions.

3 A general description

3.1 From continues to discrete time walking with small steps

Equation 1 rules the dynamics of the interconnected units in analog neural networks, where . Originally the equation 1 used to model the state of the potential membrane. This equation has been used in numerous published works with different variants since Hopfield Hopfield3088.

(1)

In this equation, represents the time constant of the system. is the activation function. are the components of the vector of the input signal. The matrix elements are the synaptic connection strengths of the matrix and the matrix elements of from the input units. In order to read out the network activity it is common to include a readout in terms of the matrix elements from as:

(2)

In a more modern and general definition, can be viewed as the summed and filtered synaptic currents at the soma of a biological neuron. The continuous variable is the instantaneous “firing rate” and is a saturating nonlinear function of . Thus the recurrent neural network describes firing rates and does not explicitly model potentials SUSSILLO2014156.

In vector form, the equations 1 and 2 can be written as:

(3)

and respectively:

(4)

The units represent an approximation to biological full connected neurons in which a simplified set of important computational properties is retained.

Recurrent neural networks are powerful tools since it has been proven that given enough units they can be trained to approximate any dynamical system con_01; con_02; Gallacher2000; Chow2000. It has been well studied that RNN can display arbitrary complex dynamics including attractors, limit cycles and chaos SUSSILLO2014156.

Traditionally the system represented by Equation 1 is approximated using Euler’s method with a step time of . We considered . Then the dynamics of the discrete-time RNN and the implementation with a highly parallel architecture is done by means of:

(5)

with no further discussion regarding the discrete or continues nature of the system. Nevertheless, very early works have proved that it is possible to use discrete time RRN to uniformly approximate a discrete-time state space trajectory which is produced by either a dynamic system or a continues time function to any degree of precision 488134.

4 Materials and methos

When the model is implemented for its computation, it is always necessary to make a passage from the system in continues time to a system with discrete time steps, with as much precision as is necessary for modeling the considered problem.

Modern scientific open source libraries such as Tensorflow for high-performance numerical computation and particularly Keras allows implementing architectures such as equation 5 directly with a high-level neural networks API, written in Python and capable of running on top of TensorFlow. Keras framework has a large set of architectures and training algorithms that have already been tested by the Machine learning community with a detailed documentation chollet2015keras.

4.1 Regarding training methods

A great variety of algorithms coexist to train recurrent neural networks. For instance one of the most outstanding is the one developed by Sussillo and Abbot. They have developed a method called FORCE that allows them to reproduce complex output patterns matching human motion captured data susillo_2009. Subsequently, such an algorithm has also been applied successfully in various applications and modifications of the method from 10.1371/journal.pone.0191527.

In more recent work, a very detailed survey on RNNs with new advances for training algorithms and modern recurrent architectures is presented in DBLP:journals/corr/abs-1801-01078.

We were inspired in the algorithm selection for training in DBLP:journals/corr/abs-1801-01078 and TRISCHLER201667. In TRISCHLER201667 authors use Adam method to train networks and obtain numerical examples. Adam is an algorithm for first-order gradient-based optimization of stochastic objective functions. It is based on adaptive estimates of lower-order moments. This method is straightforward to implement and computationally efficient. It has little memory requirements and is invariant to a diagonal rescaling of the gradients. This algorithm is well suited for problems that are large in terms of data and/or parameters. Adam is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients DBLP:journals/corr/KingmaB14. It has not been used widely used yet in neuroscience works. There is a recent reference of it used in methods of a recent work russo-2018.

Adam algorithm calculates an exponential moving average of the gradient and the squared gradient, and two parameters and that controls the decay rates of these moving averages. By keeping track of the gradient mean value and standard deviation in past timesteps , the algorithm makes an informed guess about the path being traced through the weight space that minimizes the selected loss function. This is done by taking an exponentially weighted sum of the past gradient means values and gradient variances. Given a calculated gradient at time the estimate for the first moment (mean), is:

(6)

and second moment (mean of the square), is:

(7)

for the correct direction to move at time step .

4.2 Network implementation and training protocol

A simple RNN model, like the one implemented in present work, has three layers which are the input, recurrent hidden, and output layers. Figure 1 shows a representation of such model. The input layer has input units. The inputs to this layer is a sequence of vectors through time such as described in equation 1 whose components are . The input units in a fully connected RNN are connected to the hidden units in the hidden layer, where the connections are defined with a weight matrix . The hidden layer has hidden units that are fully connected to each other through time with recurrent connections. The initialization of hidden units using small non-zero elements can improve overall performance and stability of the network.

In present work, we implemented a fully connected recurrent neural network with 50 units in each study, otherwise indicated. We used as activation function the hyperbolic tangent. Weights matrix for the linear transformation of the inputs () is initialized with a random uniform distribution. For the transformation of the recurrent state, weights matrix is initialized with a random orthogonal matrix with the weights distribute according to a normal distribution with and DBLP:journals/corr/SaxeMG13. This election allows as to implement ADAM as a training method and obtain with low computational cost and solve the vanishing gradient problem for this simple implementation.

Figure 1: Neural network schema and and one arbitrary value for the input and output.

The loss function used to train the model is the mean square error and it is defined as:

(8)

Where is the desired target value and is the current output state.

For all tasks described in the previous section, we trained the model with no less than 15.000 input samples. Each sample is a time series that contains or not a stimulus signal with noise. The noise of the signal is modeled as Gaussian noise added to the primary signal square pulse with an amplitude of 10% of the amplitude.

In each experiment, we save the initial state and the final instance of the network weights to study then how weight matrix changes after training. The training objective is to adjust all the hyperparameters and obtain a network that can reproduce the task for which it was trained.

The model was implemented with python code using Keras and Tensorflow. That allows us to make use of all the algorithms and optimizations developed by that machine learning community. Code to train the networks and produce the Figures in this paper is available in the following repository: https://github.com/katejarne/RNN_study_with_keras

4.3 Time scale and general aspects of the taks

In the task that we implemented, input and output stimulus signals ware modeled as time series. At some point of the considered time series, a stimulus of a certain duration is produced and the network responds according to the rule that it was trained, considering a given the data set. We study the obtained networks and how they are able to learn the target task in each case.

The time scales considered was of the order of hundreds of milliseconds in each case to match the range of interest of signals processed by Cerebral Cortex Goel20120460. The value of the considered time step for the time evolution is 1 . Then, from equation 5, the state of the output of the units is updated as:

(9)

Initially, we considered the training data set as a low edge triggered response to the input signals with a delayed of 20 mS. In this way, we try to model a possible time delay representing a cortical response. In each task presented in the following section, we show a trained network responding to different testing samples. Each stimulus at the input is presented in and green line (input 1) and the blue line (input 2) for those task with two inputs. The desired output is in black solid line and the network response is in red.

5 Results and discussion

5.1 Memorizing and reproducing a stimulus

Let’s begin describing a simple temporal task. In this task, when the network has a stimulus at the input (a Gaussian pulse with noise), it has to respond with a pulse at the output matching the input and no response otherwise.

We trained the network to have a fixed delay between the signal and the output response that we proposed. If there is no stimulus, the output has to be close to zero. We considered a time series of 200 mS length, but the length and the position of the stimulus are arbitrary.

In Figure 2 we show 6 samples of testing data set and the response of the network that was successfully trained to perform the task. The green line in the plot represents the input. The black thick line is the target signal and red line the network output. The network was trained as described in the previous section.

This task was used in the scale studies presented in section 5.6.

Figure 2: A trained neural network response to 6 testing samples for the “memorizing” task. green line in the plot represents the input. The black thick line is the target signal and red line the network output. Time is in mS and amplitude in arbitrary units.

5.2 Binary basic operations between input stimulus with AND, OR, XOR and NOT

In these set of tasks, the network has to perform different binary inspired operations with temporal stimuli at the input (or inputs) with the result of changing or not the level of the output state. The input or inputs are square signals with a duration of 20 mS and Gaussian noise of 10% of the total amplitude. We considered as our data set random time series of 200 mS length with or without a stimulus at the input. When the task is between two stimuli, these are simultaneous in time. The network has to decide the state of the output which matches the training set rule en each considered task.

In each of the following tasks, we described the rule that used as target to the output.

5.2.1 “And” task

The boolean “and” task consist of turning the output to “High level” state when both inputs are “High level”. The Table 1 summarizes the output state. We required that the output does not change anymore after the stimuli. In panel a of Figure 3 we show the neural network output response to 6 testing samples for “And” between two stimuli.

Input 1 Input 2 And Output
0 0 0
0 1 0
1 0 0
1 1 1
Table 1: “And” state of the output with respect to the inputs states.

5.2.2 “Or” task

The boolean “Or” task consist of turning the output to “High level” state when at least one of the inputs is “High level”. The Table 2 summarizes the output state. As in the previous task, we required that the output does not change anymore after the stimulus. In panel b of Figure 3, we show neural network output response to 6 testing samples for “Or” operation between two stimuli.

Input 1 Input 2 Or Output
0 0 0
0 1 1
1 0 1
1 1 1
Table 2: “Or” state of the output with respect to the inputs states.

5.2.3 “Xor” task

The boolean “Xor” task consists of turning the output to “High level” state when exclusively one of the inputs is “High level”. The Table 3 summarizes the output state. As in the previous task, we required that the output does not change anymore after the stimulus. In Figure panel c of 3 we show the neural network output response to 6 testing samples for “Xor” between two stimuli.

Input 1 Input 2 Xor Output
0 0 0
0 1 1
1 0 1
1 1 0
Table 3: “Xor” state of the output with respect to the inputs states.

5.2.4 “Not” task

The boolean “Not” task consists of turning the output to “High level” state when input is in “Low Level” state and vice versa. In Figure panel dof 3 we show the state of the output compared with the input.

Input Not Output
0 1
1 0
Table 4: “Not” state of the output with respect to the input state.

a) b)

c) d)

Figure 3: Four trained neural networks responses for 6 testing samples for: a) “And” operation between two stimulus,b) “Or” operation, c) “Xor” operation and d) “Not” operation applied to the input. Time is in mS and amplitude in arbitrary units.

5.3 “Flip-Flop” task: memorizing and forgetting a stimulus

In this study, we trained a network with two inputs where each one has a different function. One is a “Set” signal and the other is a “Reset” signal. If the network receives a stimulus in the S-input, the output turns into a high-level. If the network receives a stimulus in the R-input, the output returns a low-level signal. Two consecutive S-signals or R-signals does not change the output state. Table 4 summarizes the rule learned by the network. Time series are 600 mS length to show different changes in the inputs during the same time-lapse. Figure 5 shows the response of a Neural network successfully trained to perform the task with 6 different samples of the testing set.

The training data set consist of trains of random stimuli at the S-input and R-input with noise corresponding to the Table 4 target output.

Set Reset Output state
0 0
0 1 0
1 0 1
1 1
Table 5: Flip Flop task table. means that the output remains at the previous state. mean that the state is forbidden for the data set.
Figure 4: Response of a trained neural network for 6 testing samples for the “Flip Flop” task. S-Signal in blue corresponding to input A, R-Signal in green corresponding to input B. Black thick line is the Target output and red is the Network response.

5.4 A stimulus that causes an oscillation during certain time

For this task, we trained a network to return an oscillation of a frequency of when input receives a stimulus of 20 mS length. We chose this frequency to be in the range of brain oscillations. When the network receives in the stimulus the output behaves as it is shown in Figure 5. If the network has not any stimulus the output remains at “Low level”.

Figure 5: A trained neural network response of the output to 6 testing samples for the “Oscillatory” task. Input signal in green, black thick line is the target output and red line is the network response.

In the following sections we present the different results of the studies performed on the trained recurrent neural network regarding the tasks presented in this Section.

5.5 Studies performed on network’s structure and dynamic

We will show different properties of the network activity and connectivity based on population analysis.

We started by training a set of 20 networks to perform each of the considered tasks. Each network was trained with random initial parameters distributed according to the description in Section 3. We considered two cases: the random normal distribution of weights and the orthogonal matrix with a random normal distribution of weights. We trained 20 different initial conditions for each case. We studied the eigenvalue spectrum of the recurrent matrix. We observed that the differences of the spectrum in each case for the initial conditions as it is shown in the upper and lower left panel of Figure 6. Previous to the training (upper left panel of Figure 6), for the random normal distribution, we observe a distribution which is consistent with the random matrix theory of Girko’s circle law doi:10.1137/1129095, which states that, for large N, the eigenvalues of an asymmetric random matrix lie uniformly within the unit circle in the complex plane, when the elements are chosen from a distribution with zero mean and variance .

As a result of the training, the eigenvalues are pushed out of the circle to archive a configuration able to perform each of the considered tasks. In the case of the networks shown in Figure 6, both final configurations correspond to fixed-point configurations. Here we show a comparison between initial state (left panels) and post-training (right panels) of the Figure 6.

For each task and the two different initial conditions, we estimated the rate of networks that successfully pass the training. The results are shown in Table 6. This table compares the successful rate for the orthogonal condition with the random normal showing that the first one slightly improves the success rate for each task.

We observed that in almost al tasks our result are consistent with studies previously conducted by Vorontsov2017OnOA. A possible explanation for the success rate differences between the two possible initial conditions is that at training stage is more “easy” to pull out the eigenvalues when they are placed in the edge of the circle corresponding to the orthogonal condition.

The exception is the time pulse memorization task. The result obtained with this task, has the highest success rate also. We think that this is because the task is the very simple to be learned for the network. That also makes this task a good candidate to perform scale studies on it.


Task
Initial orthogonal Initial Rand Normal
And 85% 65%
Or 90% 80%
Xor 90% 55%
Not 90% 45%
Flip Flop 95% 65%
Oscillatory 90% 65%
Time pulse 100% 100%
Table 6: Successful rate for the training of 20 networks for orthogonal initial condition compared with random normal initial condition.
Figure 6: Left. Eigenvalue distribution for a Neural network with initial configuration random normal (upper panel) and orthogonal (bottom panel). Right. Eigenvalue distribution distribution post training ( for orthogonal condition, for random normal).

Now we considered networks trained to the “And” task. We want to show the behavior of the recurrent units. We study the response to the stimuli corresponding to the 4 different input configurations corresponding to Table 1. We considered the response to noiseless stimuli to show the trajectory in the networks space state with more detail. But all networks were trained with noisy inputs as previously described.

We plot the components from the from Equation 5 , this is the temporal evolution of all recurrent units. We applied principal component analysis to this set.

Figure 7 and Figure 8 shows the plot of each set of stimulus applied on a trained network, with label . The left side of the Figure shows the Output and Inputs vs. time in the upper panel and some trajectories from the set. Right panel corresponds to the time evolution of the principal component analysis at the network space state.

Figure 7 corresponds to “High-High” and “Low-Low” and Figure 8 for the other combinations “Low-High” and “High-Low”. The colored segments in the PCA figures correspond to the different parts of the temporal signal: the initial part, the stimulus, the waiting gap, and the final state.

When no signal is applied to the input, the response of each neuron is zero level. When the network receives an individual stimulus, the neurons states are disturbed and then the systems migrates to a fixed state that is different from zero. When each input receives two stimuli, the units change the regime to match the learned behavior of a High-level output with another final internal state. We repeated this procedure for all trained networks.

We found that the way to archive the desired trained rule is not unique. In fact, we identified different internal regimes for learning the same rule. We show in Figure 9 and 10 another example of a trained network. This one corresponds to a different internal configuration for the input states: one fix point and two oscillatory regimes. For the “one stimulus” state the internal state is oscillatory, for the two stimulus state a fixed point state. In this case, the set of leading eigenvalues () of the matrix has one real value and two complex conjugated.

Figure 7: Left. Neural network response of the output in the “and” task (“Low -Low” input and “High - High” input). Right PCA analysis for the same dataset.
Figure 8: Left. Neural network response of the output in the “and” task (“Low -High” input and “High - Low” input). Right: PCA analysis for the same dataset. Top left panel shows inputs, target and output. Bottom left panel shows 25 individual states.
Figure 9: Left. Neural network response of the output in the “and” task (“Low -Low” input and “High - High” input). Right: PCA analysis for the same dataset. Top left panel shows inputs, target and output. Bottom left panel shows 25 individual states.
Figure 10: Left. Neural network response of the output in the “and” task (“Low -High” input and “High - Low” input). Right: PCA analysis for the same dataset. Top left panel shows inputs, target and output. Bottom left panel shows 25 individual states.

Each trained network acquires a different configuration in the connection weights an has a different individual neuron state to archive the same general behavior of the output, showing that different configurations led to the same learned rule.

It is also interesting that if we change the size of the network (for example one order of magnitude, i.e. 500 units), the behavior is also ruled only by a small set of eigenvalues outside the circle and also it is possible to find different regimes ruled only by a small set of dominant eigenvalues.

The same situation occurs for all learned tasks, except for the oscillatory task, where we couldn’t find fix point states in the trained networks. The network always finds an oscillatory regime, given the nature of the output and obtaining analog results to those presented by DBLP:journals/neco/SussilloB13.

We also show an example of another task. In this case, a network that learned the Flip-Flop Task in Figure11.

The input (input A) represents the “Set” Signal and the (input B) represents “Reset”. The upper panel represents an example where the “Set” signal came before “Reset”, and the lower panel one were first came the “Reset”. The high-level output is a fixed point state while the low level is an oscillatory state.

It is interesting to note that a reset signal will take the system to a new state even if the output must remain at zero. This kind of behavior is also ruled by the three leading eigenvalues. One real and a set of complex conjugated as it is shown in the bottom panel of Figure 11.

Figure 11: Left. Neural network response of the output in the “Flip-Flop” task). Right PCA analysis for the same dataset. Top left panel shows inputs, target, and output. Bottom left panel shows 25 individual states. Bottom panel: Eigenvalue spectrum.

Other examples of internal behavior in trained networks for all the studied task and different initial conditions are available in the repository showing different possible internal states archived with different initial conditions for the training (Supplementary materials).

To summarize the results of this study, we can affirm that a simple RNN fully connected, with a small number of units can perform all the considered tasks. The RNN ware trained applying back-propagation through time with ADAM method minimization algorithm with a good rate of success with the considered initial conditions. The final state of the network that we obtained was not uniquely determined by the learned rule different internal configurations could lead to the same learned behavior.

Finally we ware able to reproduce the task considered in previous studies by using these small full connected networks. Time scales ware selected in these studies to match the ones of interest in the study of different cortical processes.

5.6 Time invariance, memory and scale studies

An interesting aspect of the trained networks that we observed was that the network reaction is invariant if we change the time where we induce the stimulus. This situation is shown in Figure 12. The network will change the inner state of its units when receives the stimulus.

The same is true with the different task. An example for And is available on Supplementary information.

Figure 12: Time transnational invariance for the stimulus for an ‘Time memorization” task. The trained network is stimulated with a time series where stimulus occurs in different moments. Pink line represent the state of the input. Grey line represent the output response and red thick line the output target.

Given the nature of the network, it is possible to process with a trained network time series of arbitrary length. Now we ask about how much time between the stimulus and the answer is possible to have or retain for a particular network size. This problem is related to the well know vanish gradient problem with respect to long time dependencies.In the context of Machine learning, to process time series this problem was solved by using other recurrent networks architectures such as LTSTM and GRU.

To show the temporal limitations of our model, we performed a study where we trained a set of networks for each time response to a pulse memorization and measure the rate of success in terms of euclidean distance between target and output. Top panel of Figure 13 show the result of our study. Each point in the plot is the average of the distance obtained for the considered set of networks trained for that time interval of response.

Our results shows that for a temporal response longer than 50mS the mean distance decreases.

Now also considered 150 mS of delay between input and the target response, where the success rate low and study what happened with the distance when we increased the number of units. The results are shown in the bottom panel of Figure 13. For a fix time interval, the memory capacity improves with the size of the network as we expected.

Figure 13: Upper panel: Rate of success vs. time between stimulus and response of the learned task. Bottom panel: Rate of success vs. size between stimulus and response of the learned task.

With this study, we were able to characterize the memory limits for a given trained network with a fixed size and the memory dependence with size for simple tasks.

5.7 Applying selective connectivity damage into a trained network

Now we present the results of our last study. We induced post-training “damage” over a network that was previously successfully trained to perform one task. In the trained network, we removed connections and study at which degree of damage the network it was still able to perform the learned tasks.

We considered a set of 10 trained network that performs the “And” task with 50 units. Since we are using a fully connected network, the total number of connections is 2500 including positive and negative connections. The weight distribution of the trained network is given by Normal distribution with a and , as it is shown in the right side of Figure 14.

We removed the connections gradually in a symmetric way by taking different percentage slices of connectivity strength. The result of this study is shown in First panel of Figure 14. Each line color of the plot represent one different input configuration of Table 1. Each point is the average of the target output distance obtained removing the percentage value of the of lowest connections.

For this task, when we removed all the connections (positive and negative) with strength higher than 14% all possible output states deteriorates. In terms of distance, is when it is higher than 1. When the value is over 1.5, the states are destroyed.

In panel b of Figure 14 we show the result of only removing slices of positive connections and in panel c the negative ones. Here we show that the states also are destroyed if we remove only positive or only negative connections. Both types of connection are equally necessary to perform the considered task not showing preferences for a type of connection sign. Removing the most strong connections, even when the percentage is small destroys the learned task.

Now we study what happens if we removed only a slice of a percentile of each intensity strength. The states deteriorates when reaching higher than percentage of 10% of values indicating the importance of the intensity strength of the connections of a particular values but there is a more strong effect when removing in a cumulative way. These results are shown on panels d and e of Figure 14.

a)

b) c) d) e)

Figure 14: Result of removing the connection of strange lower than a certain percentage. Panel a) removing positive and negative, panel b) removing only positive and and c) only negative. Figures d) and e) corresponds to the cases removing a slice around indicated strength.

6 Conclusions

We have presented the results of a set of studies performed on RNN trained to perform the different temporal task and flow control task. We showed that a model with a small number of units fully connected is adequate to perform simple tasks with temporal inputs.

We showed that given the nature of the system the possible inner state is not unique, given a rule to be learned. We successfully determinate different properties of the networks that are important when studying more complex tasks.

We also performed studies of the memory capacity of such models showing the limitations in time scales of interest in cerebral cortex processes.

Finally, we show how much induced damage is feasible to be supported in a trained network.

These set of studies is useful, particularly when this is widespread and used to model different processes.

Further steps in our studies will include a description of networks with excitatory and inhibitory units.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Author Contributions

CJ have written the code and run the simulations and taken part to the production of text and figures. RL has coordinated the modeling and simulation activity and also directed the work and edited the manuscript.

Funding

This research was supported by CONICET.

Acknowledgments

This is a short text to acknowledge the contributions of specific colleagues, institutions, or agencies that aided the efforts of the authors.

Supplemental Data

Supplementary Material should be uploaded separately on submission, if there are Supplementary Figures, please include the caption in the same file as the figure. LaTeX Supplementary Material templates can be found in the Frontiers LaTeX folder.

Data Availability Statement

The datasets for this study can be found in the link: http://ceciliajarne.web.unq.edu.ar/network_weights And the code at: https://github.com/katejarne/RNN_study_with_keras.

References

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
390054
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description