Stochastic Computing for Hardware Implementation of Binarized Neural Networks
Abstract
Binarized Neural Networks, a recently discovered class of neural networks with minimal memory requirements and no reliance on multiplication, are a fantastic opportunity for the realization of compact and energy efficient inference hardware. However, such neural networks are generally not entirely binarized: their first layer remains with fixed point input. In this work, we propose a stochastic computing version of Binarized Neural Networks, where the input is also binarized. Simulations on the example of the FashionMNIST and CIFAR10 datasets show that such networks can approach the performance of conventional Binarized Neural Networks. We evidence that the training procedure should be adapted for use with stochastic computing. Finally, the ASIC implementation of our scheme is investigated, in a system that closely associates logic and memory, implemented by Spin Torque Magnetoresistive Random Access Memory. This analysis shows that the stochastic computing approach can allow considerable savings with regards to conventional Binarized Neural networks in terms of area ( area reduction on the FashionMNIST task). It can also allow important savings in terms of energy consumption, if we accept reasonable reduction of accuracy: for example a factor can be saved, with the cost of in FashionMNIST test accuracy. These results highlight the high potential of Binarized Neural Networks for hardware implementation, and that adapting them to hardware constrains can provide important benefits.
Date of publication 2019. Digital Object Identifier TBA Stochastic Computing for Hardware Implementation of Binarized Neural Networks TIFENN HIRTZLIN^{1}, (Student, IEEE), BOGDAN PENKOVSKY ^{1}, MARC BOCQUET ^{2}, JACQUESOLIVIER KLEIN ^{1} (Member, IEEE), JEANMICHEL PORTAL ^{2} and DAMIEN QUERLIOZ^{1} (Member, IEEE) ^{1}Centre de Nanosciences et de Nanotechnologies, Univ. ParisSud, CNRS, France ^{2}Institut Matériaux Microélectronique Nanosciences de Provence, Univ. AixMarseille et Toulon, CNRS, France Corresponding author: Tifenn Hirtzlin (email: tifenn.hirtzlin@c2n.upsaclay.fr), Damien Querlioz (email: damien.querlioz@c2n.upsaclay.fr) This work was supported by the European Research Council Starting Grant NANOINFER (715872) and Agence Nationale de la Recherche grant NEURONIC (ANR18CE240009). ©2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. INDEX TERMS Binarized Neural Network, Stochastic Computing, Embedded System, MRAM, In Memory Computing
I Introduction
Recent advances in deep learning have transformed the field of machine learning, with numerous achievements in image or speech recognition, machine translation and others. However, a considerable challenge of deep neural network remains their energy consumption, which limits their use within embedded systems [editorial_big_2018]. The hardware implementation of deep neural networks is a widely investigated approach to increase their energy efficiency. A particularly exciting opportunity is to rely on inmemory or nearmemory computing implementations [yu2018neuro, ielmini2018memory, querlioz2015bioinspired, burr2017neuromorphic, giacomin2018robust], which are highly energy efficient as they avoid the von Neumann bottleneck entirely. This idea takes special meaning today, in particular with the emergence of novel memories such Resistive and Magnetoresistive Random Access Memories (RRAMs and MRAMs). Such memories are fast and compact non volatile memories, which can be embedded at the core of CMOS processes, and therefore provide an ideal technology for realizing inmemory neural networks [yu2018neuro, ielmini2018memory, burr2017neuromorphic].
A considerable challenge of this approach is that modern neural networks require important amounts of memory [canziani2016analysis], which is not necessarily compatible with hardware inmemory computing approaches. Multiple roads have been explored to reduce the precision and memory requirements of neural networks. The quantization of the weights used for inference is the most natural route [hubara2017quantized]. Architectural optimization can result in considerable reduction in terms of number of parameters and arithmetic operations, with only modest reduction in accuracy [sandler2018mobilenetv2]. Network pruning [reagen2016minerva] or network compression [chen2015compressing, han2015deep] techniques, sometimes combining different methods, can allow implementing hardware neural networks with reduced memory access and therefore higher energy efficiency.
Binarized Neural Networks (BNNs) have recently appeared as one of the most extreme vision of low precision neural networks, as they go further than these approaches [courbariaux2016binarized, rastegari2016xnor]. In these simple deep neural networks, synaptic weights as well as neuron activations assume Boolean values. These models can nevertheless achieve stateoftheart performance on image recognition, while being multiplierless, and relying only on simple binary logic functions. First hardware implementations have already been investigated and have shown highly promising results [bocquet2018memory, nurvitadhi2016accelerating, yu2018neuro, giacomin2018robust].
However, BNNs are not entirely binarized: the first layer input is usually coded as a fixed point real number. This fact is not a significant issue for operating BNNs on graphical processor units (GPUs) [courbariaux2016binarized], as they feature extensive arithmetic units. Research aimed at implementing binarized neural network on Field Programmable Gate Arrays (FPGAs) [zhao2017accelerating] has also not specifically investigated the question of the nonbinarized first layer: these works usually use the Digital Signal Processors (DSPs) of the FPGA to process the associated operations. However, in an applicationspecific integrated circuits (ASIC) implementation, the nonbinarization of the first layer implies that this layer needs a specific design, which is more energy consuming and uses more area than the design used for the purely binary layers.
For this reason, in this work, we introduce a stochastic computing implementation of BNNs, which allows implementing them in an entirely binarized fashion. The network functions by presenting several stochastically binarized versions of the images to the BNN, in a way reminiscent to the historic concept of stochastic computing [gaines1969stochastic]. After presenting the background of the work (section II), the paper describes the following contributions.

We show that this stochastic computing implementation of BNNs allows achieving high network performance in terms of recognition rate on the FashionMNIST and CIFAR10 datasets. Stochastic BNN quickly approaches standard BNN performance when several stochastic binarized images are presented to the network. We also evidence that strategy for training stochastic computing BNNs should differ from the one used for conventional BNNs (section III).

We design a full hardware ASIC inmemory BNN, which allows showing that the stochastic computing BNN strategy can save important area ( on FashionMNIST) and energy (factor on FashionMNIST with an accuract reduction of with regards to a standard BNN (section IV). These numbers are discussed with regards to different alternative implementations.
Ii Background of the Work
Iia Binarized Neural Networks
In this section, we first introduce the general principles of Binarized Neural Networks, an approach to considerably reduce the computation cost of inference in neural networks [courbariaux2016binarized, rastegari2016xnor]. In a conventional neural network with layers, the activation values of the neurons of layer , , are obtained by applying a nonlinear activation function to the matrix product between realvalued synaptic weight matrix and the realvalued activations of the previous layer of neurons :
(1) 
In a BNN, excluding the first layer, neuron activation values as well as synaptic weights assume binary values, meaning and . The products between weights and neuron activation values in Eq. (1) then simply become logic XNOR operation. The sum in Eq. (1) is replaced by the operation, the basic function that counts the number of ones in a data vector. The resulting value is then converted to a binary value by comparing it to a trained threshold value . Eq. (1) therefore becomes:
(2) 
where is the sign function.
Ordinarily, in binarized neural network, the first layer input is not binarized. The implementation of operations for computing the first layer activations is therefore more complex than the basic and operations:
(3) 
Additionally, the thresholding operation is not performed on the last layer of the neural network. Instead, for the last layer, we identify the neuron with the maximum value (i.e. the of the last layer neurons), which gives the output of the neural network. The whole inference process of a conventional BNN is described with vectorized notations in Algorithm 1.
The performance of BNNs is quite impressive. A fullyconnected BNN with two hidden layers of 1024 neurons, and the use of dropout during training [srivastava2014dropout] obtains a error rate on the test dataset of the canonical MNIST handwritten digits task [lecun1998gradient], with 300 epochs. In comparison, a conventional neural network with no binarization and activation function, and the same architecture and number of neurons, obtains a test error rate after 300 epochs. Similarly, on more complex datasets such as CIFAR10 or ImageNet, nearequivalent performed is obtained by BNNs and conventional neural networks [courbariaux2016binarized, rastegari2016xnor, lin2017towards]. The low memory requirements of BNNs (one bit by synapse), as well as the fact that they do not require any multiplication, makes them extremely adapted for inference hardware [nurvitadhi2016accelerating, sun2018xnor, tang2017binary, yu2018neuro].
The training process of BNNs is reminded in Appendix A. Unlike inference, the training process requires real valued weights and real arithmetic: training BNNs is not easier than in a conventional neural network. Therefore, a natural vision is to train BNNs on standard GPUs, and to use specialized ultraefficient hardware only for inference.
In this work, we investigate how the first layer can be approximated by a stochastic input to decrease computing resources. This approach could also allow processing stochastic data for near sensor computing, which is a way to reduce considerably data transfer between sensors and data process. In addition, due to the possibility of implementing binarization from the first layer, the model can be completely generic with exactly the same architecture over the layers and allows reducing chip area.
IiB Stochastic Computing
Stochastic computing is an approximate computing paradigm, known since the early days of computing [gaines1969stochastic, alaghi2013survey]. Nevertheless, hardware engineers have not exploited this computing scheme for processor design,as it requires applications that can be easily mapped with approximate computing. The principle is based on encoding all data as probabilities, represented as a temporal stochastic bitstreams: the number of ones among the bitstream represents the encoded probability. The main advantage of this encoding scheme is that mathematical functions can be easily approximated with simple logic gates. For instance a product is then implemented with a single AND gate, and a weighted adder can be implemented with a multiplexer gate [alaghi2013survey]. Many arithmetic operations are therefore easy to implement with low power and small footprint characteristic. Despite these benefits, stochastic computing holds drawbacks: its limitation to low precision arithmetics, and the need to generate random bits. Random number generation can be a major part of the energy consumption, and, moreover, the generated random bits need to be uncorrelated.
Random bits have also found applications in the field of neural networks. The most widely used neural networks that intrinsically exploit stochasticity are the restricted Boltzmann machine, where each neuron is binary valued with a probability that depends on the previous layer neurons states [hinton2006fast]. An alternative technique to exploit stochasticity in neural networks is to approximate standard neural network architecture with stochastic computing. This approach as been proposed as early as the 1990’s [bade1994fpga], and is currently being revisited in modern deep neural networks [ardakani2017vlsi, ren2017sc, canals2016new]. These works have shown promising results in terms of area and energy consumption. Typically, the largest challenge is the implementation of the nonlinear activation function within the stochastic computing framework.
In this article, we suggest that stochastic computing is particularly adapted to the case of binarized neural network, as they work so naturally with bitstreams, and as the activation function is replaced by a simple thresholding operation.
Iii Stochastic ComputingBased Binarized Neural Network
To evaluate the stochastic computing approach, we use the FashionMNIST dataset, which has the same format as MNIST, but presents grayscale images of fashion items [xiao2017fashion], and constitutes a harder task. The canonical MNIST dataset would not be appropriate for this study, as it consists in images that are mostly black and white. As in the MNIST dataset, each image in FashionMNIST has 28x28 pixels, and can be classified within ten classes. The dataset contains training examples, test examples. Conventional BNNs (nonbinarized first layer and no use of stochastic computing), perform very well on this task. With a fully connected BNN with first layer coded with eight bit fixed point real numbers, with two hidden layers of 1024 neurons each and dropout, a classification accuracy of can be obtained after 300 epochs. This result is comparable with the test accuracy of obtained by a conventional realvalued neural network with the same architecture.
Iiia Stochastic Computing with Regular Training Procedure
A first approach to design a stochastic computing BNN is to reuse the synaptic weights of a conventional BNN, trained with grayscale picture. However, in the inference phase, we approximate the computation of the first layer by using stochastic images presentation instead of grayscale images. The full inference algorithm is presented, in vectorized form, in Algorithm 2. An input is transformed into binarized stochastic inputs by taking the value of each grayscale pixel (between zero and one) as the probability for the corresponding pixel in the stochastic input to be one. Then, the networks computes , and sums the result of this computation over a number of stochastic versions of the input . Finally, the output of the layer is thresholded to obtain a binary value, and the rest of the neural network is computed in one pass in a fully binarized fashion.
The quality of the results depends on the number of image presentation . In Fig. 2, the navy blue curve shows the network test error as a function of . We can see that after 100 stochastic image presentation, the accuracy is nearly equivalent to the use of grayscale images. With eight image presentation, the test accuracy is reduced to instead of . With a single presentation, the test accuracy is only
IiiB Adapted Training Procedure
We now try a second strategy, where we train the neural network with binarized stochastic image presentation instead of grayscale images. To do this, during training, we use the conventional BNN training technique of Appendix A, but instead of using the normal grayscale FashionMNIST images, we use stochastic binarized ones, with the same number of presentation as will be used during inference. The inference technique then remains identical to the one described in section IIIA. In Fig. 2, in cyan color, we plotted the test error rate as a function of the number of presentation of the same image with this scheme. We see that the test accuracy is equivalent to the one obtained with grayscale images for high numbers of image presentation. On the other hand, with few stochastic presentation (one to five), the adapted input training technique allows reaching a quite high accuracy. If a single presentation is used at inference time, the network test accuracy is . This test accuracy is equivalent to the one obtained when training a BNN with nonstochastic black and white versions of the FashionMNIST dataset (dashed black line in Fig. 2). If three image presentation are used, the network test accuracy increases to .
These results show that when using the stochastic computing version of BNN, the adapted training procedure should be used.
IiiC Choice of the Accumulation Layer for Stochastic Samples
Until now, at inference time, we have accumulated the outputs of the first layer over several presentations of the same image, then propagated the binarized output of the first layer to the other layers. An alternative strategy can be to perform the accumulation over the realizations of the input images at another layer. If the accumulation is done at the last layer, this procedure corresponds to using stochastic computing in the whole depth of the neural network.
Fig. 3 presents the test accuracy of the neural network on the FashionMNIST dataset, as a function of the number of presented realizations of the input images, for the different accumulation strategy, in networks trained with the adapted training strategy. This Figure shows that that the different accumulation strategy lead to equivalent accuracy, consistently with the principles of stochastic computing. The strategy of accumulation at the first layer is retained for the rest of the paper, as it allows for the minimum energy consumption.
IiiD Extension to the CIFAR10 Dataset
We now apply this strategy to the more advanced CIFAR10 dataset. We use a convolutional neural network with six convolutional layers, with kernel size of three by three and a stride of one (number of filters 384, 384, 384, 768, 768 and 1536) and three fully connected layers (number of neurons 1024, 1024 and 10). Training is done in the same conditions as the FashionMNIST case, using dropout and Adam optimizer, and the pytorch deep learning framework. In the stochastic computing BNN, CIFAR10 images are presented with binarized channel: each RGB channel pixel presents a value of zero or one. This value is chosen randomly with a probability equal to the RGB value of the corresponding pixel of the image. Accumulation of stochastic realization is realized at the first layer, as described in section IIIC.
Fig. 4 shows that the results on CIFAR10 are very similar to the ones on FashionMNIST (Fig. 2). It present results obtained using the weights trained with full color images, and weights obtained with the adapted training approach. In both cases, the stochastic BNN results approach regular BNN results when the number of presentation of stochastic images is increased. The adapted training nevertheless gives highly superior results and should be preferred. This highlights that the stochastic BNN approach can be applicable to more complicated tasks than FashionMNIST.
We now consider a variation of this scheme, a partially binarized neural network. Fully connected layers of neural networks are particularly adapted for inmemory BNN implementation [yu2018neuro, bocquet2018memory], as these layers involve large quantities of memories. Convolutional layers are less memory intensive, and thus benefit less from binarization, while requiring increasing the number of channels when binarized [courbariaux2016binarized]. In a hardware implementation, it can therefore be attractive to binarize only the classifier (fully connected) layers. In that case, the input of the classifier is real, and is processed with the stochastic BNN approach. This is also of special interest as the first fully connected layer in a convolutional neural network is usually the layer that involves the highest number of additions, and can therefore benefit significantly in a hardware to be implemented with the stochastic approach.
We consider a neural network with the same architecture as the fully binarized one, a reduced number of filters (128, 128, 128, 256, 256 and 512) and the same number of neurons in the fully connected layers (1024, 1024 and 10). Without the stochastic approach, this neural network has the same CIFAR10 recognition rate than the fully binarized one (). Fig. 5 shows the results of the stochastic BNN with this approach. If the same weights are used than in a non stochastic BNN, the results look similar to the fully binarized approach of Fig. 4. On the other hand, if the classifier weights are retrained with the stochastic binarized inputs to the classifier, the stochastic results are very impressive. Even with a single image presentation , the network approaches the performance of the non stochastic network. The stochastic BNN approach therefore appears especially effective in this situation.
Iv Hardware Implementation of Stochastic ComputingBased Binarized Neural Network
In order to investigate the potential of the stochastic BNN approach, we designed a digital ASIC version of it using standard integrated circuit design tools. The architecture, presented in Fig. 6, allows performing the inference of a fully connected binary neural network of any size (up to 1024 neurons for each layer). The only parameter constrained by the hardware design is the number of weights that can be stored.
Iva Design of the Architecture
Our architecture is inspired by the works of [ando2017brein], with Static RAMs replaced by Spin Torque MRAM [shum2017cmos], and adaptation to stochastic computing. This architecture aims at performing inference on binarized neuronal networks with minimal energy consumption. To achieve this goal, it brings memory and computation as close as possible, to limit energy consumption related to data transfer. Such an architecture takes special interest with the emergence of new nonvolatile memory components such as Spin Torque MRAM, which can be integrated within the CMOS manufacturing process, and which we consider here.
The architecture is described in detail in Appendix B, and can compute following a parallel or a sequential structure. The full design is made by a basic cell repeated 32x32 times (Fig. 6 (bc)) that can perform both sequential or parallel calculation. It includes a 2 kbits memory array to stores weights, as well as XNOR gates and popcount logic.
We designed this system using the design kit of a commercial 28 nanometer technology. Digital circuits were described in synthesizable SystemVerilog description. MRAM memory arrays are modeled in a behavioral fashion, and their characteristics (area, energy consumption) are inspired by [chun2013scaling]. The system was synthesized to estimate its area and energy consumption. For energy consumption, we employed Value Change Dumps extracted from a FashionMNIST inference task, and estimated it using the Cadence Encounter tool.
IvB Energy Consumption and Area Results
Fig. 7(a) shows the area of a basic cell of our architecture (Fig. 6(bc)), in the case of binary input (one operating bit), and in situations where the input is coded in Fixed Point representation (two, four and eight operating bit), as is required in the first layer of a conventional BNN. This Figure separates the area used by registers, logic and MRAM. A cell with binary input uses six times less area than a cell designed for eight bit input. Interestingly, the difference is mostly due to the circuits, which need more depth when the input is nonbinary. Similarly, as seen in Fig. 7(b), a cell with binary input uses times less energy per cycle than the corresponding one with eight bits input. Again, the difference is mostly due to the circuits.
The savings in terms of area transfer directly at the system level. We now consider the whole neural network used for FashionMNIST classification throughout section III. Using our architecture, a full BNN with eight bit first layer occupies , while the BNN with stochastic binarized first layer occupies , a saving in area. These area values were extracted from a system designed for a value of eight.
Fig. 8 plots the energy consumption for recognizing an image with our ASIC architecture, as a function of the number of presented stochastic images. This is compared with the energy cost of the same architecture, but using a non stochastic first layer, with eight bit input. We see that the system with stochastic first layer is more energy efficient than the system with nonbinary first layer if less than eight presentation are used.
The previous curves do not include the cost of random bit generation. If we use a simple eightbit Linear Feedback Shift Register (LFSR) pseudo random number generator, the added energy is , and the added area is . Both are therefore negligible. It has also been shown that Spin Torque MRAM technology can be adapted to provide very low energy true random numbers [vodenicarevic2017low]. If such a technology was used, based on the numbers of [vodenicarevic2017low], the energy cost of random bit generation would be , and the area much smaller than LFSR. The energy cost of random number generation is therefore negligible with regards to the consumption of the system seen in Fig. 8.
These energy numbers are very attractive with regards to non binarized implementations at equivalent recognition rate. Non binarized neural networks require less neurons and synapses than BNNs to achieve equivalent recognition rate. For example, to match the performance obtained in Fig. 2 on FashionMNIST with three image presentations (), one only needs a nonbinarized neural network with eightbit synapses with two layers of 500 neurons, while the BNN needs 1024 neurons per layer. However, in an ASIC, the non binarized neural network requires energyhungry 8bits multiplications and addition ( and per operation in our technology). Taking into account only these arithmetic operations, the energy consumption is for recognizing a FashionMNIST image with the same accuracy as the stochastic BNN with three image presentations. This stochastic BNN requires only (Fig. 8), taking into account the whole system.
As a conclusion, this works highlights that the stochastic computing approach is attractive in terms of area occupancy. In terms of energy efficiency, it is very attractive if a relatively small number of presentation is used (). Therefore, it appears preferable to rely on the stochastic training approach seen in section IIIB, and to use few stochastic image presentation for inference. For example, if three image presentation are used, a factor can be saved on the energy consumption on FashionMNIST, with a reduction of of test accuracy with regards to the best accuracy obtained by a BNN (dashed red line in Fig. 2). It should be noticed that the benefits of stochastic computing would be reduced on very deep neural networks, where the first layer plays a smaller role. Our approach is therefore the most promising for InternetofThings or sensor networks applications, where relatively small neural networks can provide sufficient intelligence, but circuit cost and energy consumption are the most critical issues. On deep neural networks, nevertheless, the approach of implementing only the classifier with a stochastic BNN, as mentioned in section IIID, can be of high interest.
V Conclusion
In this work, we presented a stochastic computing approach to Binarized Neural Networks. This allows implementing them in an entirely binarized fashion, whereas in conventional BNNs, the first layer is not binary. We showed that the stochastic computing approach can reach recognition results similar to the conventional approach. We identified that for highest accuracy, the neural network should not be trained with regular images as conventional BNNs: it it is more beneficial to train stochastic BNNs with stochastic binarized images, using the same number of image presentation as will be used during inference. The design of a full BNN ASIC relying on inmemory computing, then highlighted the benefits of BNNs in terms of area and energy consumption. Stochastic BNNs allow using the same compact architecture for all layers, which leads to strong benefits in terms of area ( reduction in the case of FashionMNIST classification). In terms of energy, the benefits can be very strong if we accept a slight reduction in classification accuracy. For example, on FashionMNIST classification, we can reduce the energy consumption by a factor , with a decrease of in classification accuracy.
These results highlight the high potential of BNNs for implementing compact and energy efficient inmemory neural networks, and the potential of stochastic approaches for hardware artificial intelligence. Future works should focus on the physical implementation of the proposed scheme, as well as the extension of the approach to other tasks than vision, such as medical tasks, where energy efficiency can be a particularly important concern.
.
A Training Algorithm
Throughout the paper, neural networks are trained with the algorithm proposed by Courbariaux et al in [courbariaux2016binarized]. This algorithm relies on two fundamental principles. First, the function is used instead of the function in the backpropagation phase, as it can be differentiated. Second, the binarized weights are not directly modified during the back propagation: their modification is done indirectly through the modification of the real weight associated with each synapse.
Our design includes two modifications with regards to the work of [courbariaux2016binarized]. In the original paper, the multilayer perceptron trained on MNIST consisted of hidden layers of binarized units, topped by L2SVM output layer. Here, we used a output layer. Second, the parameters and used for the batch normalization were not trained, and we used and instead. The complete algorithm that we used is presented in Algorithm 3.
B Description of the ASIC BNN Architecture
The architecture for hardware implementation of BNN inference is presented in Fig. 6. The basic function of a BNN is to compute . To perform this function, first, the system needs to perform the XNOR between the inputs and the weights , stored in the Spin Torque MRAM memory blocks. Second, it needs to perform the function, and then compare this value with a threshold.
To achieve this goal, the architecture is made of basic cells (cell Fig. 6 (bc)), composed of a 2 kbits memory array that store weights, 32 XNOR logic gates that perform the XNOR between the 32 bits weights and the 32 bits received data, a 32 bits to 5 bits popcount module compound of basic tree adders. The basic cell is repeated 32x32 times.
The architecture can be operated with a “parallel to sequential” structure, or a “sequential to parallel” structure. The sequential to parallel structure allows dealing with long input sequence data, and outputs a limited parallel output data. By contrast, the parallel to sequential structure allows dealing with limited parallel input data, and outputs long sequence data. The basic cells of Fig. 6 (bc) can perform both, sequential or parallel calculation. The output of the popcount can be given to the sequential part of the cell or to the parallel part of the system that will perform the popcount through the whole column, with a “popcount tree” module shared with all the cells of the column. The sequential section of the cell that receive the popcount output will perform the full popcount operation sequentially by summing the popcount output using a register.
To perform the activation function of the neuron, the system adds in each cell the threshold values in a memory array. The signed bit of the difference between the popcount value saved in the register and gave the activation value. The same operation is made with the output of the popcount tree shared along the column.
References
Tifenn Hirtzlin is a PhD student in Electrical Engineering at Université ParisSud. He received the M.S. degree in Nanosciences and Electronics from the University ParisSud, France, in 2017. His work focuses on designing intelligent memorychip for low energy hardware data processing using bioinspired concepts as probabilistic approach to brain function or more classical neural network approaches. 
Bogdan Penkovsky is a postdoctoral CNRS researcher at ParisSud University. He received his M.S. degree in Applied Mathematics from the National University of KyivMohyla Academy, Ukraine, in 2013 and the Ph.D. degree in optics and photonics applied to neuromorphic computing from the University of Burgundy  FrancheComté, France, in 2017. His work is on intelligent, low energy hardware design for biomedical applications. 
Marc Bocquet is an Associate Professor in the Institute of Materials, Microelectronics and Nanosciences of Provence, IM2NP at Univerisity of AixMarseille. He received the M.S. in electrical engineering degree in 2006 and the Ph.D. degree in electrical engineering in 2009, both from the University of Grenoble, France. His research interests include memory model, memory design, characterization and reliability. 
JacquesOlivier Klein (M’90) received the Ph.D. degree from Univ. ParisSud, France, in 1995. He is currently Full Professor at Univ. ParisSud, where he focuses on the architecture of circuits and systems based on emerging nanodevices in the field of nanomagnetism and bioinspired nanoelectronics. In addition, he is lecturer at the Institut Universitaire de Technologie (IUT) of Cachan. He is author of more than one hundred technical papers. 
JeanMichel Portal (M’87) is a Full Professor in the Institute of Materials, Microelectronics and Nanosciences of Provence, IM2NP at Univeristé of AixMarseille. He received the Ph.D. degree in 1999 from University of Montpellier 2, France. From 1999 to 2000, he was temporary researcher at University of Montpellier 2 in the field of FPGA design and test. From 2000 to 2008, he was assistant professor at the Univ. of Provence, Polytech Marseille, and conducted research activities in L2MP in the field of Memory testing and diagnosis, test structure design and design for manufacturing. In this position he participated to industrial project on nonvolatile memory testing and diagnosis with ST Microelectronics. In 2008, he became Full Professor at AixMarseille Univ. and since 2009 he heads the “Memories Team” of the IM2NP. His research fields covers design for manufacturing and memory design, test and reliability. 
Damien Querlioz (M’08) is a CNRS Research Scientist at Univeristé ParisSud. He received his predoctoral education at Ecole Normale Supérieure, Paris and his PhD from Université ParisSud in 2008. After postdoctoral appointments at Stanford University and CEA, he became a permanent researcher at the Centre for Nanoscience and Nanotechnology of Université ParisSud. He focuses on novel usages of emerging nonvolatile memory, in particular relying on inspirations from biology and machine learning. Damien Querlioz coordinates the INTEGNANO interdisciplinary research group. In 2016, he was the recipient of an European Research Council Starting Grant to develop the concept of natively intelligent memory. In 2017, he received the CNRS Bronze medal. 