S4NN: temporal backpropagation for spiking neural networks with one spike per neuron
Abstract
We propose a new supervised learning rule for multilayer spiking neural networks (SNNs) that use a form of temporal coding known as rankordercoding. With this coding scheme, all neurons fire exactly one spike per stimulus, but the firing order carries information. In particular, in the readout layer, the first neuron to fire determines the class of the stimulus. We derive a new learning rule for this sort of network, named S4NN, akin to traditional error backpropagation, yet based on latencies. We show how approximated error gradients can be computed backward in a feedforward network with any number of layers. This approach reaches stateoftheart performance with supervised multi fullyconnected layer SNNs: test accuracy of 97.4% for the MNIST dataset, and 99.2% for the Caltech Face/Motorbike dataset. Yet, the neuron model that we use, nonleaky integrateandfire, is much simpler than the one used in all previous works. The source codes of the proposed S4NN are publicly available at https://github.com/SRKH/S4NN.
1 Introduction
Biological neurons communicate via short stereotyped electrical impulses called “spikes”, or “action potentials”. Each neuron integrates incoming spikes from the presynaptic neurons and whenever its membrane potential reaches a certain threshold, it also sends an outgoing spike to the downstream neurons. In the brain, the spike times, in addition to the spike rates, are known to play an important role in how neurons process information [1, 2]. SNNs are thus more biologically realistic than the artificial neural networks (ANNs) [3, 4, 5, 6], and as SNNs use sparse and asynchronous binary signals processed in a massively parallel fashion, they are one of the best available options to study how the brain computes at the neuronal description level. But SNNs are also appealing for artificial intelligence technology, especially for edge computing, since their implementations on socalled neuromorphic chips can be far less energyhungry than ANN implementations (typically done on GPUs or similar hardware), mostly because they can leverage efficient eventbased computations [4, 7, 8, 9, 10, 11, 12].
Recently, an extensive effort has been made by numerous researchers to develop direct supervised learning algorithms for SNNs [7]. The main challenge for this is the nondifferentiability of the thresholding activation function of spiking neurons at firing times. One solution to this problem is to consider spike rates instead of exact firing times [13, 14, 15]. The second approach is to use smoothed spike functions that are differentiable with respect to time [16]. The third set of methods use surrogate gradients at the firing times [8, 17, 18, 19, 20, 21, 22]. The last approach, known as latency learning, is the main focus of this paper. In this approach, the firing time of the neuron is defined as a function of its membrane potential or the firing time of presynaptic neurons [23, 24, 25]. In this way, the derivation of the thresholding activation function is no longer required.
More specifically, our goal is to classify static inputs (e.g., images), with a SNN in which neurons fire once at most, but the most strongly activated neurons fire first [26, 27, 28, 29, 30, 31, 24, 32, 33, 25, 34, 35, 36]. Thus, the spike latencies, or firing order, carry information. Here, we used simple nonleaky integrateandfire neurons[37] in all the layers of the proposed SNN. Indeed, each neuron simply integrates weighted input spikes (received from instantaneous synapses) through time with no leak and emits only one spike right after crossing its threshold for the first time, or zero spike if this threshold is never reached. In the readout layer, there is one neuron per category. As soon as one of these neurons fires, the network assigns the corresponding category to the input, and the computations can stop when only a few neurons have fired. This coding scheme is thus extremely economical in the number of spikes.
In this work, we adapted the wellknown backpropagation algorithm [38], originally designed for ANNs, to this sort of SNNs. Backpropagation has been shown to solve extremely difficult classification problems in ANNs with many layers, leading to the socalled “deep learning” revolution [39]. The tour de force of backpropagation is to solve the multilayer credit assignment problem [40]. That is, it finds what the hidden layers should do to minimize the loss in the readout layer. This motivated us, and others [23, 24, 25, 34], to adapt backpropagation to singlespike SNNs, by using the latencies instead of the firing rates. The main strength of our approach with respect to the abovementioned ones is the use of a much simpler neuron model: a nonleaky integrateandfire neuron with instantaneous synapses. Yet it reaches a comparable accuracy on the MNIST dataset [41].
2 Methods
The proposed singlespike supervised spiking neural network (S4NN) is comprised of an input layer converting input data into a spike train and feeding it into the network, followed by one or more hidden layers of nonleaky integrateandfire (IF) neurons processing the input spikes, and finally, an output layer of nonleaky IF neurons with one neuron per category. Figure 1 demonstrates a S4NN with two hidden layers. Here, we use a temporal (i.e., rankorder) coding called timetofirstspike in the input layer which is very sparse and produces at most one spike for each input value. The subsequent neurons are also limited to fire exactly once.
To train the network, a temporal version of the backpropagation algorithm is used. We assume an image categorization task with several images per category. First, the network decision on the category of the input image is made by considering the first output neuron to fire. Then, the error of each output neuron is computed by comparing its actual firing time with a target firing time (see Subsection 2.5). Finally, these errors are backpropagated through the layers and weights get updated through stochastic gradient descent. Meanwhile, the temporal backpropagation confronts two challenges: defining the target firing time and computing the derivative of the neuron firing time with respect to its membrane potential. To overcome these challenges, the proposed learning algorithm uses relative target firing times and approximated derivations.
2.1 Timetofirstspike coding
The first step of a SNN is to convert the analog input signal into a spike train representing the same information. The neural processing in the following neurons should be compatible with this coding scheme to be able to decipher the information encoded in the input spikes. Here, we use a timetofirstspike coding for the entry layer (in which a larger input value corresponds to an earlier spike) and IF neurons in subsequent layers that fire once.
Consider a gray image with the pixel intensity values in range [0, ], each input neuron encodes its corresponding pixel value in a single spike time in range [0, ]. The firing time of the input neuron, , is computed based on the pixel intensity value, , as follows:
(1) 
Therefore, the spike train of the neuron in the input layer (layer zero) is defined as
(2) 
Notably, this simple intensitytolatency code does not need any preprocessing steps like applying Gabor or DoG filters that are commonly used in SNNs, especially, in those with STDP learning rule which can not handle homogeneous surfaces [30, 31, 42]. Also, it produces only one spike per pixel and hence the obtained spike train is way sparser than what is common in rate codes.
Neurons at the subsequent layers fire as soon as they reach their threshold, and the first neuron to fire in the output layer determines the network decision. Hence, the network decision depends on the earliest spikes throughout the network. In other words, neural information in all the layers is encoded in the spike times of the earliest neurons to fire. Therefore, one can say that the timetofirstspike information coding is at work in subsequent layers as well.
2.2 Forward path
S4NN consists of multiple layers of nonleaky IF neurons and there is no limitation on the number of the layers, hence, one can implement S4NN with any arbitrary number of hidden layers. The membrane potential of the neuron in the layer at time point , , is computed as
(3) 
where and are, respectively, the input spike train and the input synaptic weight from the presynaptic neuron in the previous layer to neuron . The IF neuron emits a spike the first time its membrane potential reaches the threshold, ,
(4) 
where checks if the neuron has not fired at any previous time step.
As explained in the previous section, the input image is transformed into a spike train, , in which each input neuron will emit a spike with a delay, in the range , negatively proportional to the corresponding pixel value. These spikes are propagated toward the first layer of the network, where each neuron receives incoming spikes and updates its membrane potential until it reaches its threshold and sends a spike to the neurons in the next layer. For each input image, the simulation starts by resetting all the membrane voltages to zero and continues for time steps. Note that during a simulation, each neuron at any layer is allowed to fire once at most. In the training phase, we need to know the firing time of all neurons (see Eq. 15 and Eq. 9), hence if a neuron was silent, we assume that it fires a fake spike at the last time step, . During the test phase, neurons can be silent or fire once at most. Finally, regarding the timetofirstspike coding deployed in our network, the output neuron which fires earlier than others determines the category of the input stimuli.
2.3 IF approximating ReLU
In traditional ANNs with Rectified Linear Units (ReLU) [43] activation function, the output of a neuron in layer with index is computed as
(5) 
where () and are the input and connection weight, respectively. Thus, the ReLU neuron with a larger has a larger output value, . Generally, the main portion of this integration value is due to the large inputs with large connection weights. In our timetofirstspike coding, larger values correspond to earlier spikes, and hence, if an IF neuron receives these early spikes through strong synaptic weights, it will also fire earlier. Note, as the network decision is based on the first spike in the output layer, thus earlier spikes carry more information. In this way, the timetofirstspike coding is preserved in the hidden and output layers. Therefore, for the same inputs and synaptic weights, we can assume an equivalence relation between the output of the ReLU neuron, , and the firing time of the corresponding IF neuron, ,
(6) 
and we know that
(7) 
where if .
Regarding the fact that in the IF neuron, is not a function of , we can not compute . Therefore, according to Eq. (6), we assume that if (see Eq. (7)). Note that according to Eq. 3, we have . Thus, we have
(8) 
where if .
2.4 Backward path
We assume that in a categorization task with categories, each output neuron is assigned to a different category. After completing the forward path over the input pattern, each output neuron may fire at a different time point. As mentioned before, the category of an input image is predicted as the category assigned to the winner output neuron (the output neuron which has fired earlier than others).
Hence, to be able to train the network, we define a temporal error function as
(9) 
where and are the target and actual firing times of the output neuron, respectively. The target firing times should be defined in a way that the correct neuron fires earlier than others. We use a relative target firing calculation that is fully explained in Section 2.5. Here, we assume that is known.
During the learning phase, we use the stochastic gradient descent [38] (SGD)and backpropagation algorithms to minimize the “squared error” loss function. For each training sample, the loss is defined as,
(10) 
and, hence, we need to compute its gradient with respect to each synaptic weight. To update , the synaptic weight between the neuron of layer and the neuron of layer , we have
(11) 
where is the learning rate parameter.
Let’s define
(12) 
therefore, by considering Eq. (8) and Eq. (12), we have
(13) 
where for the output layer (i. e., ) we have
(14) 
and for the hidden layers (i. e., ), according to the backpropagation algorithm, we have
(15) 
where, iterates over neurons in layer . Note that regarding Eq. 12 we have , and as explained in Section 2.3 we approximate . To compute we should note that reducing will increase by earlier in time, hence we approximate if and only if .
To avoid the exploding and vanishing gradient problems during backpropagation, we use normalized gradients. Literally, at any layer , we normalize the backpropagated gradients before updating the weights,
(16) 
To avoid overfitting, we added an norm regularization term (over all the synaptic weights in all the layers) to the “squared error” loss function in Eq. (10). The parameter is the regularization parameter accounting for the degree of weigh penalization.
2.5 Relative target firing time
As the proposed network works in the temporal domain, for each input image, we need to define the target firing time of the output neurons regarding its category label.
One possible scenario is to define a fixed and predefined vector of target firing times for each category, in a way that the correct neuron has a shorter target firing time than others. For instance, if the input image belongs to the category, then, one can define and for , where is the desired firing time for the winner neuron. In this way, the correct output neuron is encouraged to fire early at time , while others are forced to block firing until the end of the simulation.
Such strict approaches have several drawbacks. For instance, let’s assume an input image belonging to the category with , in this way, the correct neuron has a negative error (see Eq. 9). The backward path will update the weights to make this neuron fire later which means the network should forget what has helped the correct neuron to fire quickly. It is not desirable as we want the network to respond as quickly as possible.
The other scenario is to use a dynamic method to determine the target firing times for each input image, independently. Here, we propose a relative method that takes the actual firing times into account. Let’s assume an input image of the category is fed to the network and the firing time of the output neurons are obtained. First, we compute the minimum output firing time as and then we set the target firing time of the output neuron as
(17) 
where, is a positive constant term penalizing output neurons with firing times close to . Other neurons which have fired quite after are not penalized and the correct output neuron is encouraged to fire earlier than others at the minimum firing time, .
In a special case where all output neurons are silent during the simulation and their firing time is manually set to , we compute the target firing times as
(18) 
to encourage the correct output neuron to fire during the simulation.
2.6 Learning procedure
As mentioned before, the proposed network employs a temporal version of SGD and backpropagation to train the network. During a training epoch, images are converted into input spike trains by the timetofirstspike coding (see Section 2.1) and fed to the network one by one. Through the forward path, each IF neuron at any layer receives incoming spikes and emits a spike when it reaches its threshold (see Section 2.2). Then, after computing the relative target output firing times (encouraging correct output neuron to fire earlier, see Section 2.5), we update the synaptic weights in all the layers using temporal error backpropagation (see Section 2.4). Note that we force neurons to fire a fake spike at the last time step if they could not reach the threshold during the simulation (it is necessary for the learning rule). After completing the forward and backward processes on the current input image, the membrane potentials of all the IF neurons are reset to zero and the network gets ready to process the next input image. Notably, each neuron is allowed to fire only once during the processing of each input image.
As stated before, except for the fake spikes, IF neurons fire if and only if they reach their threshold. Let us consider an IF neuron that has decreased its weights (during the weight update process) in a way that it can not reach its threshold for any of the training images. Now, it is a dead neuron and only emits fake spikes. Hence, if a neuron dies, and does not fire real spikes during a training epoch, we reuse it by resetting its synaptic weights to a new set of random values drawn from a uniform distribution in the same range as the initial weights. Although it happens rarely, it helps the network to use all its learning capacity.
3 Results
We first use the Caltech 101 face/motorbike dataset to better demonstrate the learning process in S4NN and its capacity to work on largescale and natural images. Afterward, we evaluate S4NN on the MNIST dataset which is one of the widely used benchmarks in the area of spiking neural networks [7] to demonstrate its capability to handle large and multiclass problems. The parameter settings of the S4NN models used for the Caltech face/motorbike and MNIST datasets are provided in Table 1.
Layer Size  Initial Weights  Model Parameters  

Dataset  Input  Hidden  Output  Hidden  Output  
Caltech face/motorbike  4  2  256  100  0.1  3  
MNIST  400  10  256  100  0.2  3  

3.1 Caltech face/motorbike dataset
We evaluated S4NN on the Caltech 101 face/motorbike dataset available at http://www.vision.caltech.edu . Some sample images are provided in Figure 2. We trained the network on 200 randomly selected images per category. Also, we selected 50 random images from each category as the validation set. The remaining images were used in the test phase. We grayscaled all images and rescaled them to be of size 160250 pixels.
In the first experiment, we use a fully connected architecture with a hidden layer of four IF neurons. The input layer has the same size as the input images (i. e., 160250) and the firing time of each input neuron is calculated by the timetofirstspike coding explained in Section 2.1. The output layer is comprised of two output IF neurons (the face and the motorbike neurons) corresponding to the image categories. We set the maximum simulation time as and initialize the inputhidden and hiddenoutput synaptic weights with random values drawn from uniform distributions in range and , respectively. We also set the learning rate as , the penalty term in the target firing time calculation as , and the regularization parameter as . The threshold of all neurons in all layers, , is set to 100.
Model  Learning method  classifier  Accuracy (%) 

Masquelier et al. (2007)[28]  unsupervised STDP  RBF  99.2 
Kheradpisheh et al. (2018)[30]  unsupervised STDP  SVM  99.1 
Mozafari et al. (2018)[31]  Reward modulated STDP  Spikebased  98.2 
S4NN (This paper)  backpropagation  Spikebased  99.2 
Figure 3 shows the trajectory of the mean sumofsquarederror (MSSE) for the training and validation samples through the training epochs. The sudden jumps in the early part of the MSSE curves are mainly due to the enormous weight changes in the first training epochs that may keep any of the output neurons silent (emitting fake spikes only) for a while, however, it is being resolved during the next epoch. Finally, after some epochs, the network overcomes this challenge and decreases the MSSE below 0.1.
The proposed S4NN could reach 99.75% 0.1% recognition accuracy (i. e., the percentage of correctly classified samples) on training samples and 99.2% 0.2% recognition accuracy on testing samples which outperforms previously reported SNN results on this dataset (see Table 2). In Masquelier et al. (2007)[28], a twolayer convolutional SNN trained by unsupervised STDP followed by a supervised potentialbased radial basis functions (RBFs) classifier reached 99.2% accuracy on this dataset. This network uses four Gabor filters and four scales in the first layer and extracts ten different filters for the second layer. Also, it does not make decisions by the spike times, rather it uses neurons’ membrane potential to do the classification. In Kheradpisheh et al. (2018)[30], a STDPbased SNN with three convolutional layers (respectively consisting of 4, 20, and 10 filters) and a SVM classifier could reach to 99.1% accuracy on this dataset. This model has also used the membrane potentials of neurons in the last layer to do the classification. To do a spikebased classification, authors in Mozafari et al. (2018)[31] proposed a twolayer convolutional network with four Gabor filters in the first layer and 20 filters learned by rewardmodulated STDP in the second layer. Each of the 20 filters was assigned to a specific category and a decision was made by the first neuron to fire. It reached 98.2% accuracy on Caltech face/motorbike dataset. The important feature of this network was the spiketimebased decisionmaking achieved through reinforcement learning. The proposed S4NN also makes decisions by the spike times and could reach 99.2% accuracy only by using four hidden and two output neurons.
As explained in Section 2.2, each output neuron is assigned to a category and the network decision is made based on the first output neuron to fire. During the learning phase, regarding the relative target firing time (see Section 2.5), the network adjusts its weights to make the correct output neuron to fire first (see Section 2.4). Figure 4 provides the firing time of both face and motorbike output neurons (over the training and validation images) at the beginning and ending of the learning phase. As seen in Figure 4A, at the beginning of the learning, the distributions of the firing time of both output neurons (regardless of the image category) are interleaved which leads to a poor classification accuracy around the chance level. But as the learning phase proceeds and the network learns to solve the task, the correct output neuron tends to fire earlier.
As shown in Figure 4B, at the end of the learning phase, for each image category, its corresponding output neuron fires at the early time steps while the other neuron fires long after. Note that, during the training phase, we force neurons to emit a fake spike at the last time step if they have not fired during the simulation. Hence, in the test phase, we do not need to continue the simulation after the emission of the first spike in the output layer. Figure 5 shows the distributions of the firing time of the winner neurons. The mean firing time for winner neuron is 27.4 (shown by the red line) wherein 78% of the images, the winner neuron has fired within the first 40 time steps. It means that the network makes its decision very quickly (compared to the maximum possible simulation time, ) and accurately (with only 0.8% error rate).
As the employed network has only one hidden layer of fully connected neurons, we can simply reconstruct the pattern learned by each hidden neuron by plotting its synaptic weights. Figure 6 depicts the synaptic weights of the four hidden neurons at the end of the learning phase. As seen, neurons #2 to #4 became selective to different shapes of motorbikes covering the shape variety of motorbikes. Neuron #1 has learned a combination of faces appearing at different locations and consequently responds only to face images. Because of the competition held between the output neurons to fire first, hidden and output neurons should learn and rely on the early spikes received from the input layer (not all of them). And this is the reason why the learned features in the hidden layer are not visually well detectable. The distribution of synaptic weights for each of the four hidden neurons are plotted in Figure 7. As seen, the initial uniform distribution of the weights is transformed into the normal distribution with the zero mean. Here, positive weights encourage neurons to fire for their learned patterns and negative weights prevent them from firing for other patterns. Negative weights help the network to decrease the chance of unwanted spikes. For instance, a negative synaptic weight from a motorbike selective hidden neuron to the face output neuron significantly decreases the chance of an unwanted spike by the face neuron.
Furthermore, We evaluated the robustness of the trained S4NN to jitter noise. To this end, during the test phase, we add random integers drawn from a uniform distribution in range [J,J] to the pixels of the input images. We changed the jitter parameter, J, from 0 to 240 with a step size of 20. Figure 8 shows the recognition accuracy of the S4NN trained on face/motorbike dataset over the test samples contaminated by different levels of jitter. Interestingly, even for , the S4NN accuracy drops by at most 5%. It shows that S4NN is robust to even intense noise levels. Indeed, neurons in the hidden layer has strong (positive or negative) synaptic weights only to those input neurons that contribute in the face/motorbike categorization task (see Figure 6) while the rest majority of inputs have very small synaptic weights (see Figure 7) and do not contribute much in the neural processing. Hence, because the jitter noise just changes the order of spikes, it can not much affect the behavior of IF neurons. Note that IF neurons are perfect integrators without leak and are less sensitive to the order of inputs than leaky neurons.
To assess the capacity of the proposed temporal backpropagation algorithm to be used in deeper architectures, we did another experiment on Caltech face/motorbike dataset with a threelayer network. The deep network is comprised of two hidden layers each of which consists of four IF neurons followed by an output layer with two IF neurons. We initialized the inputhidden1, hidden1hidden2, and hidden2output weights with random values drawn from uniform distributions in range , , and , respectively. Other parameters are the same as the aforementioned network with one hidden layer. After 25 training epochs, the network reached 99.1%0.2% accuracy on testing images with the mean firing time of 32.1 for the winner neuron. Although the accuracy of the network is 0.1% higher than the deeper network on average, this difference is not statistically significant (paired ttest on the accuracies of ten different runs for each network; value ).
3.2 MNIST Dataset
Model  Coding  Neuron model  Learning method  Hidden neurons  Acc. (%) 

Mostafa (2017) [24]  Temporal  IF (exponential synaptic current)  Temporal backpropagation  800  97.2 
Tavanaei et al (2019) [44]  Rate  IF (instantaneous synaptic current)  STDPbased backpropagation  1000  96.6 
Comsa et al (2019) [25]  Temporal  SRM (exponential synaptic current)  Temporal backpropagation  340  97.4 
ANN 
—  ReLU  Backpropagation with Adam  400  98.1 
S4NN (This paper)  Temporal  IF (instantaneous synaptic current)  Temporal backpropagation  400  97.4 
MNIST [41] is a benchmark dataset that has been widely used in SNN literature [7]. We also evaluated the proposed S4NN on the MNIST dataset which contains 60,000 training and 10,000 test handwritten singledigit images. Each image is of size pixels and contains one of the digits 0–9. To this end, we used a S4NN with one hidden and one output layer containing 400 and 10 IF neurons, respectively. The input layer is of the same size as the input images where the firing time of each input neuron is determined by the timetofirstspike coding explained in Section 2.1 with the maximum simulation time of . The inputhidden and hiddenoutput layers’ synaptic weights are randomly drawn from uniform distributions in ranges and , respectively. The threshold for all the neurons in all the layers was set to . We set the learning rate as , the penalty term in the target firing time calculation as , and the regularization parameter as .
Digit  ’0’  ’1’  ’2’  ’3’  ’4’  ’5’  ’6’  ’7’  ’8’  ’9’ 

Mean firing timestep  97.2  44.1  75.3  98.1  118.5  81.2  90.9  100.1  115.6  75.6 
40.0  24.4  33.9  40.3  34.7  38.4  36.7  36.2  36.9  34.1  
Mean required spikes  221.0  172.6  226.4  220.5  233.2  220.7  224.0  224.6  233.6  213.4 
42.8  43.2  42.7  41.5  40.5  43.3  42.7  43.0  40.6  43.6 
Table 3 provides the categorization accuracies of the proposed S4NN (97.40.2%) and other recent SNNs with spiketimebased supervised learning rules on the MNIST dataset. In Mostafa (2017)[24], the use of 800 IF neurons with alpha functions complicates the neural processing and the learning procedure of the network. In Tavanaei et al. (2018)[44], the network computational cost is quite large due to the use of rate coding and 1000 hidden neurons. In Comsa et al. (2019)[25], the use of complicated SRM neuron model with the exponential synaptic current makes it difficult for eventbased implementation. Comsa et al. (2019) have implemented their model in two versions, where their fast model, similar to ours, decides by the first spike at the output layer and reached 97.4% accuracy on MNIST. While the slow version of their model needs to wait for all the hidden neurons to fire before making its decision. The slow version could reach 97.9% accuracy on MNIST. The advantages of S4NN is the use of simple neuron model (IF with an instantaneous synaptic current), temporal coding with at most one spike per neuron, and simple supervised temporal learning rule. Also, we used only 400 neurons in the hidden layer which makes it lighter than other networks.
We have also implemented a threelayer ANN (inputhiddenoutput) with 400 hidden units. We used the ReLU activation function for both hidden and output layers and employed mean squared error (MSE) as the loss function. We trained the network with Adam optimizer and reached 98.1% accuracy on MNIST. Although the ANN outperforms all the SNN models in Table 3, the advantage of SNNs is their energy efficiency and hardware friendliness.
Figure 9 shows the mean firing time of each output neuron on images of different digit categories. As seen, for each digit category, there is a huge gap between the mean firing time of the correct output neuron and others. Digits ’1’ and ’4’ with the firing times of 44.1 and 118.5 have the minimum and maximum mean firing times, respectively. Hypothetically, recognition of digit ’1’ relies on much fewer spikes than other digits and would have a much faster response. While digit ’4’ (or digit ’8’ with the mean firing time of 101.5) needs much more input spikes to be correctly recognized from other (and similar) digits. Interestingly, on average, the network needs 172.69 spikes to recognize digit ’1’ and 233.22 spikes for digit ’4’. Table 4 presents the mean firing time of the correct output neurons along with the mean required number of spikes. Note that the required spikes are obtained by counting the number of spikes in all the three layers (input, hidden, and output) until the emission of the first spike at the output layer.
On average, the proposed S4NN makes its decisions with 97.4% precision in 89.7 time steps (35.17% of maximum simulation time) with only 218.3 spikes (18.22% of 784+400+10 possible spikes). Note that, on average, hidden neurons emit 132.26.7 until the network makes its decision. Therefore, the proposed network works in a fast, accurate, and sparse manner.
In a further experiment, we assessed the speedaccuracy tradeoff in S4NN. To do so, we first trained S4NN (with the threshold 100 for all neurons) on MNIST and frizzed its weights, then we changed the threshold of all of its hidden and output neurons from 10 to 150 and evaluated it on the test set. Figure 10 shows the accuracy and the mean firing time of the winner output neurons (i. e., responsetime) over different threshold values. As seen, by increasing the threshold, the accuracy increases, goes above 94% after threshold 70, and peaks at the threshold 100. Also, it can be seen that the mean responsetime fastly grows after threshold 70. The mean responsetime is around 15 time steps for threshold 70 and around 89 time steps for threshold 100. Hence, one can get a faster but a bit less accurate response from S4NN by lowering the threshold of a pretrained network.
4 Discussion
SNNs are getting more and more popular these days[45, 46, 47, 48, 49, 50] and it is one of the best tools to study computations in the brain[51, 52, 53, 54, 55, 56, 57, 58, 59]. In this paper, we proposed a SNN (called S4NN) comprised of multiple layers of nonleaky IF neurons with timetofirstspike coding and temporal error backpropagation. Regarding the fast processing of objects in visual cortex (often in range 100 to 150 ms) and the fact that there are at least 10 synapses from photoreceptors in retina to object responsive neurons in inferotemporal (IT) cortex, each neuron has only about 1015 ms to perform its computation which is not enough for rate coding [60]. Also, it is shown that the first wave of spikes in IT cortex around 100 ms after the image presentation caries enough information for object recognition [61], indicating the importance of early spikes. In addition, there are many other neurophysiological [62, 63] and computational [26, 27] evidence supporting the importance of firstspikecoding.
According to our employed temporal coding, input neurons emit a spike with a latency negatively proportional to the corresponding pixel value and upstream neurons are allowed to fire only once at most. The proposed temporal error backpropagation, pushes the correct output neuron to fire earlier than others. It forces the network to make quick and accurate decisions with few spikes (high sparsity). Our experiments on Caltech face/motorbike (99.2% accuracy) and MNIST (97.4% accuracy) datasets show the merits of S4NN to accurately solve object recognition tasks with a simpler neuron model (i.e., nonleaky IF) compared to other recent supervised SNNs with temporal learning rules.
Let’s assume an S4NN model with layers, where is the number of neurons in the largest layer of the network. In a clockbased implementation, for any layer, the membrane potential of all neurons at any time step can be updated in . Therefore, the feedforward path of S4NN can be performed in , where is the time step of the first spike in the output layer. Note that the proposed temporal backpropagation forces the network to respond as accurate and early as possible. Hence, the required time steps, , would be much smaller than the maximum simulation time. Note that the actual computational time of S4NN could be shorter when the time step period is shorter.
Hardware implementations are out of the scope of this paper. However, S4NN has some important features that might make it more (digital) hardware friendly. First, computation is restricted to at most one spike per neuron, and in practice, a decision is made before most neurons have fired. Conversely, spikeratebased SNNs require a longer time to have enough output spikes to make a confident decision. Our approach is thus advantageous in terms of latency, but also in terms of energy, since on most neuromorphic chips energy consumption is mainly caused by spikes [10]. Second, our approach is memory efficient, as we can forget the state of a neuron as soon as it has fired, and reuse the corresponding memory for other neurons. Note that other approaches with at most one spike per neuron also share these three advantages [24, 25, 34, 64]. Yet our neuron model is much simpler: there is no leak, and the synapses are instantaneous, which, as explained below, make it more hardwarefriendly. Here we have shown for the first time that backpropagation can be adapted to this simple neuron model, even if this requires some approximation (Eq. 6).
If a leak can be efficiently implemented in analog hardware using the physics of transistors or capacitors [9], it is always costly in digital hardware. Two approaches have been proposed. Either the potential of all neurons is decreased periodically, for example, every millisecond (see e.g., [65]). Obviously, this approach is energyhungry. The leak can also be handled in an eventbased manner: leakage is taken into account when an input spike is received, based on the elapsed time since the last input spike (see e.g. [66, 67]). But this requires storing the last input spike time for each neuron, which increases the memory footprint. Finally, instantaneous synapses are by far the most simple synapses to handle: each input spike causes a punctual potential increment. Currentbased, or conductancebased synapses, require a state parameter, and each input spike causes the potential to be updated on several consecutive time steps.
Due to the nondifferentiability of the thresholding activation function of spiking neurons at their firing times, applying gradient descent and backpropagation algorithms to SNNs has always been a big challenge. Different studies proposed different techniques including ratebased differentiable activation functions [13, 14, 15], smoothed spike generators [16], and surrogate gradients [8, 17, 18, 19, 20, 21]. All these approaches do not deal with spike times. In the last approach, known as latency learning, neuronal activity is defined based on its firing time (usually the first spike) and contrary to the three previous approaches, the derivation of the thresholding activation function is not needed. However, they need to define the firing time of the neuron as a function of its membrane potential or the firing time of presynaptic neurons and use its derivation in the backpropagation process. For instance, in Spikeprop [23], authors use a linear approximation function that relies on the changes of the membrane potential around the firing time (hence, they can not use the IF neuron model). Also, in Mostafa (2017)[24], by using exponentially decaying synapses, the author has defined the firing time of a neuron directly based on the firing times of its presynaptic neurons. Here, by assuming a monotonically increasing linear relation between the firing time and the membrane potential, we could use IF neurons with instantaneous synapses in the proposed S4NN model.
SNNs with latency learning use singlespiketime coding, and hence, there is a problem if neurons do not reach their threshold, because then the latency is not defined. There are different approaches to deal with this problem. In Mostafa (2017)[24], the author uses nonleaky neurons and makes sure that the sum of the weights is more than the threshold or in Comsa (2019)[25], authors use fake input “synchronization pulses” to push neurons over the threshold. In the proposed S4NN, we assume that if a neuron has not fired during the simulation it will fire sometime after the simulation, thus, we force it to emit a fake spike at the last time step.
Here, we just tested the S4NN on image categorization tasks, future studies can test S4NN on other data modalities. As shown on the Caltech face/motorbike dataset, the proposed learning rule is scalable and can be used in deeper S4NN architectures. Also, it can be used in convolutional spiking neural networks (CSNNs). Current CSNNs are mainly converted from traditional CNNs with rate [68, 69, 70, 71] and temporal coding [72]. Although these networks are well in terms of accuracy, they might not work efficiently in terms of computational cost or time. Recent efforts to develop CSNNs with spikebased backpropagation have led to impressive results on different datasets [73, 74], however, they use costly neuron models and rate coding schemes. Hence, extending the proposed S4NN to convolutional architectures can provide large computational benefits. The most important challenge in this way is to prevent vanishing/exploding gradients and learning under the weightsharing constraint in convolutional layers. But contrary to the ratebased CSNNs, the maxpooling operation can be simply done by propagating the first spike emerging inside the receptive field of each pooling neuron.
Moreover, although SNNs are more hardware friendly than traditional ANNs, the backpropagation process in supervised SNNs is not easy to be implemented in hardware. Recently, efforts are made to approximate backpropagation using spikes [75] that can be used in S4NN and make it more suitable for hardware implementation.
Acknowledgments
This research was partially supported by the French Agence Nationale de la Recherche (grant: Beating Roger Federer ANR16CE28001701). The authors would like to thank Dr. A. Yousefzadeh for his valuable comments and discussions and Dr. J. P. Jaffrézou for proofreading the manuscript.
Footnotes
 Corresponding Author
Email addresses:
s_kheradpisheh@sbu.ac.ir (SRK),
timothee.masquelier@cnrs.fr (TM)
References
 R. VanRullen, R. Guyonneau and S. J. Thorpe, Spike times make sense., Trends in Neuroscience 28(1) (2005) 1–4.
 R. Brette, Philosophy of the spike: ratebased vs. spikebased theories of the brain, Frontiers in Systems Neuroscience 9 (2015) p. 151.
 A. Taherkhani, A. Belatreche, Y. Li, G. Cosma, L. P. Maguire and T. M. McGinnity, A review of learning in biologically plausible spiking neural networks, Neural Networks 122 (2020) 253–272.
 M. Pfeiffer and T. Pfeil, Deep learning with spiking neurons: opportunities and challenges, Frontiers in Neuroscience 12 (2018) p. 774.
 S. GhoshDastidar and H. Adeli, Spiking neural networks, International journal of neural systems 19(04) (2009) 295–308.
 B. Illing, W. Gerstner and J. Brea, Biologically plausible deep learning–but how far can we go with shallow networks?, Neural Networks (2019).
 A. Tavanaei, M. Ghodrati, S. R. Kheradpisheh, T. Masquelier and A. Maida, Deep learning in spiking neural networks, Neural Networks 111 (2019) 47–63.
 E. O. Neftci, H. Mostafa and F. Zenke, Surrogate gradient learning in spiking neural networks, arXiv (2019) p. 1901.09948.
 K. Roy, A. Jaiswal and P. Panda, Towards spikebased machine intelligence with neuromorphic computing, Nature 575 (nov 2019) 607–617.
 M. Oster, R. Douglas and S.C. Liu, Quantifying input and output spike statistics of a winnertakeall network in a vision system, 2007 IEEE International Symposium on Circuits and Systems, IEEE2007, pp. 853–856.
 R. SerranoGotarredona, M. Oster, P. Lichtsteiner, A. LinaresBarranco, R. PazVicente, F. GómezRodríguez, L. CamuñasMesa, R. Berner, M. RivasPérez, T. Delbruck et al., Caviar: A 45k neuron, 5m synapse, 12g connects/s aer hardware sensory–processing–learning–actuating system for highspeed visual object recognition and tracking, IEEE Transactions on Neural networks 20(9) (2009) 1417–1438.
 C. Posch, T. SerranoGotarredona, B. LinaresBarranco and T. Delbruck, Retinomorphic eventbased vision sensors: bioinspired cameras with spiking output, Proceedings of the IEEE 102(10) (2014) 1470–1484.
 E. Hunsberger and C. Eliasmith, Spiking deep networks with lif neurons, arXiv (2015) p. 1510.08829.
 J. H. Lee, T. Delbruck and M. Pfeiffer, Training deep spiking neural networks using backpropagation, Frontiers in Neuroscience 10 (2016) p. 508.
 E. O. Neftci, C. Augustine, S. Paul and G. Detorakis, Eventdriven random backpropagation: Enabling neuromorphic deep learning machines, Frontiers in Neuroscience 11 (2017) p. 324.
 D. Huh and T. J. Sejnowski, Gradient descent for spiking neural networks, Advances in Neural Information Processing Systems, 2018, pp. 1433–1443.
 S. M. Bohte, Errorbackpropagation in networks of fractionally predictive spiking neurons, International Conference on Artificial Neural Networks, Springer2011, pp. 60–68.
 S. K. Esser, P. A. Merolla, J. V. Arthur, A. S. Cassidy, R. Appuswama, A. Andreopoulos, D. J. Berg, J. L. McKinstry, T. Melano, D. R. Barch, C. d. Nolfo, P. Datta, A. Amir, B. Taba, M. D. Flickner and D. S. Modha, Convolutional networks for fast energyefficient neuromorphic computing, Proceedings of the National Academy of Sciences of USA 113(41) (2016) 11441–11446.
 S. B. Shrestha and G. Orchard, Slayer: Spike layer error reassignment in time, Advances in Neural Information Processing Systems, 2018, pp. 1412–1421.
 F. Zenke and S. Ganguli, Superspike: Supervised learning in multilayer spiking neural networks, Neural Computation 30(6) (2018) 1514–1541.
 G. Bellec, D. Salaj, A. Subramoney, R. Legenstein and W. Maass, Long shortterm memory and learningtolearn in networks of spiking neurons, Advances in Neural Information Processing Systems, 2018, pp. 787–797.
 R. Zimmer, T. Pellegrini, S. Singh Fateh and T. Masquelier, Technical report: supervised training of convolutional spiking neural networks with PyTorch, arXiv (nov 2019).
 S. M. Bohte, H. La Poutré and J. N. Kok, ErrorBackpropagation in Temporally Encoded Networks of Spiking Neurons, Neurocomputing 48 (2000) 17–37.
 H. Mostafa, Supervised learning based on temporal coding in spiking neural networks, IEEE Transactions on Neural Networks and Learning Systems 29(7) (2017) 3227–3235.
 I. M. Comsa, K. Potempa, L. Versari, T. Fischbacher, A. Gesmundo and J. Alakuijala, Temporal coding in spiking neural networks with alpha synaptic function, arXiv (2019) p. 1907.13223.
 S. J. Thorpe and J. Gautrais, Rank Order Coding, Computational Neuroscience : Trends in Research, ed. J. M. Bower (New York: Plenum Press, 1998), pp. 113–118.
 S. Thorpe, A. Delorme and R. V. Rullen, Spikebased strategies for rapid processing., Neural Networks 14(67) (2001) 715–725.
 T. Masquelier and S. J. Thorpe, Unsupervised learning of visual features through spike timing dependent plasticity, PLoS Computational Biology 3(2) (2007) p. e31.
 S. R. Kheradpisheh, M. Ganjtabesh and T. Masquelier, Bioinspired unsupervised learning of visual features leads to robust invariant object recognition, Neurocomputing 205 (sep 2016) 382–392.
 S. R. Kheradpisheh, M. Ganjtabesh, S. J. Thorpe and T. Masquelier, Stdpbased spiking deep convolutional neural networks for object recognition, Neural Networks 99 (2018) 56–67.
 M. Mozafari, S. R. Kheradpisheh, T. Masquelier, A. NowzariDalini and M. Ganjtabesh, Firstspikebased visual categorization using rewardmodulated stdp, IEEE Transactions on Neural Networks and Learning Systems 29(12) (2018) 6178–6190.
 M. Mozafari, M. Ganjtabesh, A. NowzariDalini, S. J. Thorpe and T. Masquelier, Bioinspired digit recognition using rewardmodulated spiketimingdependent plasticity in deep convolutional networks, Pattern Recognition 94 (2019) 87–95.
 M. Mozafari, M. Ganjtabesh, A. NowzariDalini and T. Masquelier, SpykeTorch: Efficient Simulation of Convolutional Spiking Neural Networks With at Most One Spike per Neuron, Frontiers in Neuroscience 13 (jul 2019) 1–12.
 J. Göltz, A. Baumbach, S. Billaudelle, O. Breitwieser, D. Dold, L. Kriener, A. F. Kungl, W. Senn, J. Schemmel, K. Meier and M. A. Petrovici, Fast and deep neuromorphic learning with timetofirstspike coding, arXiv (dec 2019).
 R. Vaila, J. Chiasson and V. Saxena, Feature Extraction using Spiking Convolutional Neural Networks, Proceedings of the International Conference on Neuromorphic Systems  ICONS ’19, (ACM Press, New York, New York, USA, 2019), pp. 1–8.
 P. Falez, P. Tirilly, I. M. Bilasco, P. Devienne and P. Boulet, Multilayered spiking neural network with target timestamp threshold adaptation and stdp, 2019 International Joint Conference on Neural Networks (IJCNN), IEEE2019, pp. 1–8.
 A. N. Burkitt, A review of the integrateandfire neuron model: Ii. inhomogeneous synaptic input and network properties, Biological cybernetics 95(2) (2006) 97–112.
 I. Goodfellow, Y. Bengio and A. Courville, Deep learning (MIT press, 2016), ch. 6.5, pp. 200–220.
 Y. LeCun, Y. Bengio and G. Hinton, Deep learning, Nature 521(7553) (2015) 436–444.
 J. Schmidhuber, Deep learning in neural networks: An overview, Neural networks 61 (2015) 85–117.
 Y. LeCun, L. Bottou, Y. Bengio, P. Haffner et al., Gradientbased learning applied to document recognition, Proceedings of the IEEE 86(11) (1998) 2278–2324.
 R. Vaila, J. Chiasson and V. Saxena, Deep convolutional spiking neural networks for image classification, arXiv preprint arXiv:1903.12272 (2019).
 A. Krizhevsky, I. Sutskever and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, 2012, pp. 1097–1105.
 A. Tavanaei and A. Maida, Bpstdp: Approximating backpropagation using spike timing dependent plasticity, Neurocomputing 330 (2019) 39–47.
 T. Wu, F.D. Bîlbîe, A. Păun, L. Pan and F. Neri, Simplified and yet turing universal spiking neural p systems with communication on request, International journal of neural systems 28(08) (2018) p. 1850013.
 M. Bernert and B. Yvert, An attentionbased spiking neural network for unsupervised spikesorting., International journal of neural systems (2018) 1850059–1850059.
 F. GalánPrado, A. Morán, J. Font, M. Roca and J. L. Rosselló, Compact hardware synthesis of stochastic spiking neural networks., International journal of neural systems (2019) 1950004–1950004.
 R. Hu, Q. Huang, H. Wang, J. He and S. Chang, Monitorbased spiking recurrent network for the representation of complex dynamic patterns., International journal of neural systems (2019) 1950006–1950006.
 A. Geminiani, C. Casellato, A. Antonietti, E. DâAngelo and A. Pedrocchi, A multipleplasticity spiking neural network embedded in a closedloop control system to model cerebellar pathologies, International journal of neural systems 28(05) (2018) p. 1750017.
 X. Zhang, G. Foderaro, C. Henriquez and S. Ferrari, A scalable weightfree learning algorithm for regulatory control of cell activity in spiking neuronal networks, International journal of neural systems 28(02) (2018) p. 1750015.
 G. Antunes, S. F. Faria da Silva and F. M. Simoes de Souza, Mirror neurons modeled through spiketimingdependent plasticity are affected by channelopathies associated with autism spectrum disorder, International journal of neural systems 28(05) (2018) p. 1750058.
 A. Antonietti, J. Monaco, E. D’Angelo, A. Pedrocchi and C. Casellato, Dynamic redistribution of plasticity in a cerebellar spiking neural network reproducing an associative learning task perturbed by tms, International journal of neural systems 28(09) (2018) p. 1850020.
 S. GhoshDastidar and H. Adeli, A new supervised learning algorithm for multiple spiking neural networks with application in epilepsy and seizure detection, Neural networks 22(10) (2009) 1419–1431.
 S. GhoshDastidar and H. Adeli, Improved spiking neural networks for eeg classification and epilepsy and seizure detection, Integrated ComputerAided Engineering 14(3) (2007) 187–212.
 H. Adeli and S. GhoshDastidar, Automated EEGbased diagnosis of neurological disorders: Inventing the future of neurology (CRC press, 2010).
 H. Peng, J. Yang, J. Wang, T. Wang, Z. Sun, X. Song, X. Luo and X. Huang, Spiking neural p systems with multiple channels, Neural Networks 95 (2017) 66–71.
 T. Wu, A. Păun, Z. Zhang and L. Pan, Spiking neural p systems with polarizations, IEEE transactions on neural networks and learning systems 29(8) (2017) 3349–3360.
 L. Pan, G. Păun, G. Zhang and F. Neri, Spiking neural p systems with communication on request, International journal of neural systems 27(08) (2017) p. 1750042.
 H. Peng, J. Wang, P. Shi, M. J. PérezJiménez and A. RiscosNúñez, An extended membrane system with active membranes to solve automatic fuzzy clustering problems, International journal of neural systems 26(03) (2016) p. 1650004.
 S. J. Thorpe, Spike arrival times: A highly efficient coding scheme for neural networks, Parallel processing in neural systems (1990) 91–94.
 C. P. Hung, G. Kreiman, T. Poggio and J. J. DiCarlo, Fast readout of object identity from macaque inferior temporal cortex, Science 310(5749) (2005) 863–866.
 F. Bengtsson, R. Brasselet, R. S. Johansson, A. Arleo and H. Jörntell, Integration of sensory quanta in cuneate nucleus neurons in vivo, PloS one 8(2) (2013) p. e56630.
 R. Brasselet, R. S. Johansson and A. Arleo, Quantifying neurotransmission reliability through metricsbased information analysis, Neural computation 23(4) (2011) 852–881.
 C. Stöckl and W. Maass, Recognizing Images with at most one Spike per Neuron, arXiv (dec 2019) 1–14.
 A. Yousefzadeh, T. Masquelier, T. Serrano Gotarredona and B. LinaresBarranco, Hardware implementation of convolutional STDP for online visual feature learning, 2017 IEEE International Symposium on Circuits and Systems (ISCAS) (may 2017) 1–4.
 A. Yousefzadeh, T. SerranoGotarredona and B. LinaresBarranco, Fast Pipeline 128x128 pixel spiking convolution core for eventdriven vision processing in FPGAs 2015 International Conference on Eventbased Control, Communication, and Signal Processing (EBCCSP) , (IEEE, jun 2015), pp. 1–8.
 G. Orchard, C. Meyer, R. EtienneCummings, C. Posch, N. Thakor and R. Benosman, HFirst: A Temporal Approach to Object Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (2015).
 Y. Cao, Y. Chen and D. Khosla, Spiking deep convolutional neural networks for energyefficient object recognition, International Journal of Computer Vision 113(1) (2015) 54–66.
 P. U. Diehl, G. Zarrella, A. Cassidy, B. U. Pedroni and E. Neftci, Conversion of artificial recurrent neural networks to spiking neural networks for lowpower neuromorphic hardware, 2016 IEEE International Conference on Rebooting Computing (ICRC), IEEE2016, pp. 1–8.
 A. Sengupta, Y. Ye, R. Wang, C. Liu and K. Roy, Going deeper in spiking neural networks: Vgg and residual architectures, Frontiers in Neuroscience 13 (2019) p. 95.
 B. Rueckauer, I.A. Lungu, Y. Hu, M. Pfeiffer and S.C. Liu, Conversion of continuousvalued deep networks to efficient eventdriven networks for image classification, Frontiers in Neuroscience 11 (2017) p. 682.
 B. Rueckauer and S.C. Liu, Conversion of analog to spiking neural networks using sparse temporal coding, 2018 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE2018, pp. 1–5.
 Y. Wu, L. Deng, G. Li, J. Zhu, Y. Xie and L. Shi, Direct training for spiking neural networks: Faster, larger, better, Proceedings of the AAAI Conference on Artificial Intelligence, 332019, pp. 1311–1318.
 C. Lee, S. S. Sarwar and K. Roy, Enabling spikebased backpropagation in stateoftheart deep neural network architectures, arXiv (2019) p. 1903.06379.
 J. C. Thiele, O. Bichler and A. Dupret, Spikegrad: An annequivalent computation model for implementing backpropagation with spikes, arXiv (2019) p. 1906.00851.