On-chip learning for domain wall synapse based Fully Connected Neural Network

On-chip learning for domain wall synapse based
Fully Connected Neural Network

Apoorv Dankar1,*, Anand Verma1,*, Utkarsh Saxena1,*, Divya Kaushik1, Shouri Chatterjee1, and Debanjan Bhowmik1 * These authors contributed equally to the work.Corresponding author: D. Bhowmik (email: debanjan@ee.iitd.ac.in). Manuscript submitted on Nov 5, 2018 for review 1Department of Electrical Engineering, Indian Institute of Technology Delhi, New Delhi 110016, India
Abstract

Spintronic devices are considered as promising candidates in implementing neuromorphic systems or hardware neural networks, which are expected to perform better than other existing computing systems for certain data classification and regression tasks. In this paper, we have designed a feedforward Fully Connected Neural Network (FCNN) with no hidden layer using spin orbit torque driven domain wall devices as synapses and transistor based analog circuits as neurons. A feedback circuit is also designed using transistors, which at every iteration computes the change in weights of the synapses needed to train the network using Stochastic Gradient Descent (SGD) method. Subsequently it sends write current pulses to the domain wall based synaptic devices which move the domain walls and updates the weights of the synapses. Through a combination of micromagnetic simulations, analog circuit simulations and numerically solving FCNN training equations, we demonstrate ”on-chip” training of the designed FCNN on the MNIST database of handwritten digits in this paper. We report the training and test accuracies, energy consumed in the synaptic devices for the training and possible issues with hardware implementation of FCNN that can limit its test accuracy.

Spintronics, Neuromorphic Computing, Hardware Neural Networks, Domain Wall Synapses

I Introduction

Artificial Neural Network (ANN) algorithms are currently being widely used by the machine learning and data sciences community to solve several kinds of data classification and regression problems [1]. These ANN algorithms, inspired by the working of the human brain inherently have memory and computing intertwined in them just like the brain. For example, in a feedforward Fully Connected Neural Network (FCNN) with no hidden layer, signals at the nodes of the input layer are multiplied by specific values called weights and then added up, followed by operation of an activation function on them, which leads to signals at the nodes of the output layer (Fig. 1(a)). Storing these weights in the network constitutes the memory functionality of the algorithm while the calculation of the product between the input signals and the weights, the summation of the products and operation of the activation function on the sum constitute the forward computation functionality of the algorithm. The network is trained to perform specific regression and classification tasks, under the supervised learning scheme, by updating these weights after every iteration on the training examples using the Stochastic Gradient Descent (SGD) method until the error at the output is minimized [1]. When such ANN algorithms are implemented in software, as has been the case currently, the algorithms are still executed on traditional computer hardware in which memory and computing are separate, following the von Neumann architecture [2, 3]. Thus the property of memory-computing entanglement inherent in these algorithms cannot be properly utilized.

However in neuromorphic computing systems, specialized hardware, where memory and computing are embedded together, is designed to implement such ANN algorithms. This can enhance the performance of the computing system with respect to speed and energy consumption [2, 3, 4]. Fig. 1(b) shows analog hardware implementation of single layer FCNN. Weights are stored and updated as conductances of the synaptic devices- devices that mimic synapses in the brain [5]. Forward computation takes place by adding currents flowing out of these synaptic devices, arranged in a crossbar architecture, followed by operation of a tan-sigmoid activation function on them through a transistor based circuit, partly mimicking the neurons in the brain.

Spintronic devices, owing to their non-volatile nature, are particularly suitable as synaptic devices in neuromorphic computing systems [5, 6, 7]. If a domain wall can be created in the ferromagnetic metal layer (free layer) of a heavy metal/ ferromagnetic metal (free layer)/ oxide / ferromagnetic metal (fixed layer) heterostructure corresponding to a Magnetic Tunnel Junction (MTJ) device, as long as the domain wall does not move, Tunneling Magneto Resistance (TMR) of the MTJ structure does not change (Fig. 2(a)) [8, 9, 10, 11, 12]. Hence the device can act as synapse in hardware ANN and store the corresponding weight of the synapse as its conductance. Further, in order to update the weight after every iteration with the goal of training the network, ”write” current pulses can be applied through the heavy metal layer of the device such that spin orbit torque from the current pulses can move the domain wall and bring about the required change in conductance of the device and hence the weight [8, 9, 10, 13, 14]. However a dedicated circuitry making use of the SGD method [1] will be needed to generate those suitable current pulses that can eventually train the network. This scheme of training the network in hardware along with the forward computation is known as ”on-chip learning”.

\includegraphics[width=3in]Figure1_LR.jpg

Figure 1: (a) Schematic of Fully Connected Neural Network (FCNN) without a hidden layer. Each corresponding to input feature (intensity of each of the 28 x 28 =784 pixels of images from MNIST character dataset say), corresponds to weight matrix, activation function and each corresponds to output of the network. For the node corresponding to the digit input image belongs to (0-9) = 1, else -1. (b) Implementation of FCNN in analog hardware. Domain wall based synaptic device stores weight , transistor based neuron circuit evaluates and transistor based feedback circuit updates using Stochastic Gradient Descent (SGD) method.

To the best of our knowledge, simulation of ”on-chip learning” on a spintronic FCNN system has not been reported before. Though some simulation based reports of ferromagnetic domain wall device based implementation of FCNN have been published recently, they do not implement ”on-chip learning” in their networks, i.e. the weight update method is not implemented in hardware [5, 15] . Essentially, in those reports, several iterations of forward computation and weight update are first run on a standard computer to obtain the final weight values of the synapses for the trained network. Then current pulses are applied on the domain wall based synaptic devices such that their conductances are proportional to the final synaptic weights. Subsequently the forward computation is implemented in hardware. Thus learning is ”off-chip” in this hardware implementation of neural network and hence proper advantage is not taken from the memory- computing intertwining present in hardware ANN. However in this paper, we employ a combination of micromagnetic simulations and transistor based circuit simulations to implement ”on-chip learning” in such spintronic neural networks. We use spin orbit torque driven ferromagnetic domain wall devices in [5, 16] as synaptic devices in our networks. We use Metal Oxide Semiconductor Field Effect Transistor (MOSFET) and such MOSFET based operational amplifier circuitry to implement the activation function (neuron) and generate current pulses in feedback, using the SGD method, that move the domain wall in the synaptic devices and change their weights after every iteration (Fig. 1(b)). It is to be noted that spintronic devices, domain wall based devices in this case, are only used as synapse/ memory elements in our network owing to their non-volatility. Implementing a synapse with existing transistor technology is problematic because of the large number of transistors needed to represent one synapse (around 6-8) and the high power consumption to retain a weight value at the synapse during training of the network because a transistor is a volatile device [17]. However for every other functionality in the network where non-volatility is not needed, be it the neuron or SGD calculation circuitry, transistor based circuits are used in our work since existing technology facilitates much easier fabrication of silicon based transistor circuitry compared to magnetic materials based spintronics circuitry [18].

It is also to be noted that though ”on-chip learning” for domain wall synapse based ANN is simulated in [6, 19] the ANN simulated there is of spiking type unlike ours. Also unlike our work, the synapse there follows a local learning rule- Spike Time Dependent Plasticity (STDP) for weight update [20, 21], the neuron follows Leaky Intergrate Fire (LIF) model [22] and an unsupervised learning is followed for training [23]. Though such STDP enabled spiking network is closer to the functioning of the brain, the machine learning and data sciences community currently use non-spiking ANN with SGD method based weight update much more than STDP enabled spiking ANN for various tasks. Hence it is important to study ”on-chip learning” of such FCNN in spintronic hardware which we have done in this paper,

Section II discusses the micromagnetic simulations performed to obtain the current controlled conductance characteristics of the domain wall based devices, used as synapses in the implemented FCNN. Section III discusses the design of the FCNN in hardware and how the forward computation and backpropagation algorithm are executed in the it. Section IV evaluates the performance of the network when it is trained and tested on the MNIST database of handwritten digits, abundantly used by the machine learning community to benchmark the performance of different algorithms. We see from the signal flow in the circuit for different inputs that the operational amplifier based SGD computation circuitry designed here is indeed capable of sending the appropriate current pulses to the spintronic synaptic devices and update their weights to successfully train the network. In Section V we summarize and comment on our results and conclude the paper.

Ii Synaptic Device Characteristic

Spin orbit torque driven ferromagnetic domain wall device was proposed as synaptic element in hardware ANN by Sengupta et al. in [5]. In this work, we simulate such a device on micromagnetic simulation package ”mumax3” [24] and obtain its synaptic characteristic- conductance of the vertical Magnetic Tunnel Junction (MTJ) structure as a function of ”write” current flowing horizontally through the heavy metal layer (Fig. 2(a)) that can move the domain wall through the application of spin orbit torque on the magnetic moments inside the wall [16].

\includegraphics[width=3in]Figure2_LR.jpg

Figure 2: (a)Schematic of domain wall based synaptic device (b) Conductance of the vertical MTJ structure in the device () after application of different magnitudes of write current () pulse horizontally through the heavy metal layer is computed through micromagnetic simulations and then plotted.

The lateral device dimensions of our synaptic device (Fig. 2(a)) are taken to be 500 nm in length and 50 nm in width. The ferromagnetic free layer, in which the domain wall is formed, is taken to be 1 nm thick in our micromagnetic simulations. We take saturation magnetization () = A/m , exchange correlation constant (A) = J/m and damping factor = 0.02 throughout the free layer. Perpendicular Magnetic Anisotropy (PMA) constant (K) is taken to be considering perpendicularly magnetized CoFeB/MgO structure [11]. Dzyalonshinskii Moriya Interaction (DMI) is taken to be J/m [13]. Neel domain wall is stabilized at such value of DMI in our simulations.

Dynamics of the domain wall formed in the ferromagnetic free layer is simulated in the presence of vertical spin current, that acts upon the magnetic moments due to charge current flowing horizontally in the heavy metal layer under the ferromagnetic free layer (Fig. 2(a)). Spin current density= spin Hall angle charge current density, where charge current density= charge current / Cross-sectional area, and cross-sectional area = width (50 nm) thickness of heavy metal layer [11, 25]. To obtain this expression for spin current density, it is assumed that the thickness of the heavy metal layer is greater than the spin diffusion length inside the heavy metal so that the spin current becomes independent of the thickness of the heavy metal layer [25, 26]. We consider platinum (Pt) as the heavy metal here. Since spin diffusion length in Pt has been reported experimentally to be 2-4 nm [27, 28] and the thickness of the heavy metal layer considered here is 10 nm, the assumption holds true in this case. The value of spin Hall angle of Pt is considered to be 0.07 in our work [29, 30].

Conductance vs ”write” current characteristics of the simulated device is shown in Fig. 2(b). Starting from domain wall at the center of the device, current pulse of 0.5 ns in duration and about 400 in magnitude is needed to move the domain wall all the way to one edge, corresponding to the anti-parallel alignment of magnetic moments of the free and fixed layer and hence minimum conductance of the MTJ (). About - 400 current is needed to move the domain wall to the other edge, corresponding to the parallel alignment of magnetic moments of the free and fixed layer and hence maximum conductance () (Fig. 2(b)). Intermediate conductance values are obtained by applying current pulses of magnitude between -400 and + 400 and duration 0.5 ns. Such conductance values correspond to the different values of weight that the device can store as a synaptic element in the neural network. For conductance calculation, Resistance- Area (RA) product of the MTJ is taken to be [31] and the TMR value is taken to be 120 [8]. This leads to values of and (Fig. 2(b)). Intermediate conductance values are calculated using the expression: = ( + )/2 + ( - )*/2, where represents the average perpendicular component of the magnetization of the free layer.

It is to be noted that the device is non-volatile. Once current is applied to obtain conductance , corresponding to weight , and then removed, conductance remains to be . Thus weight of the corresponding synapse in the network continues to be . However in order to train the network the weight may need to be updated to at a certain iteration. In that case, a current pulse of strength needs to be applied horizontally through the heavy metal layer of the device to change the conductance of the MTJ structure to , corresponding to weight . Thus, to change conductance by , a small current needs to be applied. If the device isn’t non-volatile a much larger current will be needed for the same weight update from to .

Thus for our domain wall based synaptic device simulated here,

(1)

From the micromagnetic simulations we perform on ”mumax3”, A- when duration of the current pulse = 0.5 ns (Fig.2(b)). When duration of the pulse is the 5 ns, smaller magnitudes of current pulses are needed to achieve the same conductance states because domain wall velocity is proportional to current density in our simulations, which is also confirmed experimentally [9, 10, 16, 32]. From our micromagnetic simulations, hence turns out to be A- when duration of the pulse = 5 ns.

In the following section, we discuss how several such synaptic devices can form a feedforward FCNN to generate ”read” currents at the nodes of the output layer of the network and how a transistor based feedback circuitry we designed applies ”write” currents on the heavy metal layers of the synaptic devices to change their weights by required amounts at every iteration to eventually train the network.

Iii Design of feedforward network and feedback circuitry

Iii-a Feedforward network

We simulate a cross-bar architecture of spin orbit torque driven domain wall synaptic devices of Fig. 2(a) to form a feedforward Fully Connected Neural Network (FCNN) as shown in Fig. 1(b) [5, 16]. Architecture of a standard FCNN without any hidden layer is shown in Fig. 1(a). In this paper, we train the FCNN to identify digits from 0 to 9 from the standard MNIST handwritten digit database [34]. So number of nodes in input layer = number of pixels of each input image of a digit = 28 x 28 =784. Input to the nodes of the input layer correspond to the intensities of the pixels. Number of nodes in output layer = number of digits = 10. The desired output when input image is of digit 0 is given by , for digit 1 is given by and so on. The target of training the network is such that for a given input the output at the output layer of the network matches the desired output . Once the network is trained and gives a high accuracy for an input from the training set (training accuracy) its accuracy needs to be tested on a fresh set of inputs (test accuracy). Following the standard FCNN training algorithm [1, 33], output at any node n is given by:

(2)

where is the activation function, -s are the synaptic weights, being the bias weight. Equation (2) is essentially a matrix-vector multiplication, matrix being weight and vector being , to obtain vector followed by operation of a non linear function (f)- ”tan sigmoid” in this case [33]- on every element of z to obtain vector y.

In order to implement equation (2) in hardware, voltages are applied on the cross-bar architecture of the domain wall based synaptic devices as shown in Fig. 1(b). Since conductance of the devices () only takes positive values and ranges between and (Fig. 2(b)) while the corresponding weights can take both positive and negative values, an extra conductance () is added in parallel to each of the synapses and negative of the voltage applied on the synapse is applied on it. The relation between conductance of a synapse connecting input node with output node and its corresponding weight is as follows:

(3)

where is the magnitude of the maximum weight value in the network.

(4)

For a voltage applied on the input node , current flowing through the combination of domain wall synapse device and extra conductance (), connecting input node with output node , is given by :

(5)

for m = 1 to 784. It is to be noted that this current, which we call ”read” current in this paper, flows through the vertical MTJ structure of the synaptic device (Fig. 2(a)) and is hence proportional to the conductance of the MTJ. It is not the ”write” current that flows horizontally through the heavy metal layer to move the domain wall and change the conductance of the MTJ (Fig. 2(a)). Magnitude of ”read” current is proportional to . Value of is chosen such that the maximum value of ”read” current flowing through the synaptic device is not large enough to move the domain wall and change the weight value it is storing. For the circuit we design here, is chosen to be 1 mV.

Corresponding to the bias weight there is a bias synapse with conductance (Fig. 1(b)) and ”read” current flowing through it is given by:

(6)

At the output node , ”read” currents from all connected synapses add up following Kirchoff’s Current Law (Fig. 1(b)) to yield the total ”read” current:

(7)

Thus the matrix-vector multiplication of equation (2) is accomplished in hardware as shown by equation (7), with an extra scaling factor coming from the circuit implementation.

\includegraphics[width=3in]Figure3_LR.jpg

Figure 3: (a) Transistor based neuron circuit present at each output node of the feedforward network, consisting of an op-amp amplifier and differential circuit that takes voltage as input. (b) Output voltage of neuron circuit () as a function of input current (), obtained from circuit simulation as well as analytical calculation (equation 9), is plotted.

\includegraphics[width=3in]Figure4_LR.jpg

Figure 4: (a) Alternative transistor based neuron circuit consisting of a differential circuit that takes current as input. (b) Output voltage of neuron circuit () as a function of input current (), obtained from circuit simulation is plotted.

The activation function of equation (1) is implemented at each output node of the circuit (Fig.1(b)) using a transistor based ”neuron” circuit of Fig. 3(a). The net ”read” current at each output node , given by equation (7), is first passed through a very low resistance- (1 in this case). is chosen so low so that the voltage at the output node stays close to 0 and the expression for ”read” current in equation (7) remains valid. The voltage across is next amplified through an op-amp (transistor based high gain amplifier) circuit [35] to eliminate the extra scaling factor in equation (7), to generate an output voltage ():

(8)

This voltage is next fed to one of the two inputs of the MOSFET based differential amplifier circuit of Fig. 3(a), designed by us, which operates the tan-sigmoid function on it [35, 36]. The factor arises in equation (8) since the parameter in our FCNN algorithm of equation (1) is 1 while the same factor for the differential amplifier based tan-sigmoid circuit () we design is 6. The amplification factor of equation (8) () turns out to be 42000. Such high amplification needed is carried out with three stages of op-amps as shown in Fig. 3(a).

The output of the differential amplifier circuit, and hence the ”neuron” circuit of Fig. 3(a) is expected to be:

(9)

We plot as a function of from equation (9) using the appropriate values of parameters in Fig. 3(b) (analytical plot). We next simulate the circuit of Fig. 3(a) on Cadence Virtuso circuit simulator to also obtain as a function of , as plotted in Fig. 3(b). United Microelectronics Corporation (UMC) 65 nm technology node library is used. Length of transistor is chosen to be 80 nm and width 60 nm. We see that the analytical plot and the plot from circuit simulations match quite well, which means the differential amplifier circuit works as per our expectations in terms of generating the tan-sigmoid function. Also from equation (9) we see that in hardware represents output at node n () in the FCNN of equation (9), without any extra factor coming from the hardware.

The circuit of Fig. 3(a) has the drawback that it needs a high voltage gain amplifier circuit which can give erroneous result in a noisy environment. An alternative circuit is presented in Fig. 4(a) [37]. Here, the transistor based differential circuit takes the ”read” current as input directly unlike the circuit of Fig. 3(a). Hence it is not needed to make the ”read” current flow through a load resistance and amplify the voltage across the resistance unlike the previous case. Thus the very high gain op-amp circuit of Fig. 3(a) is avoided here. Output voltage of the circuit as a function of input ”read” current , obtained from simulation of this circuit on Cadence Virtuoso simulator, is plotted in Fig. 4(b).

Iii-B Feedback Circuitry

\includegraphics[width=6.0in]Figure5_LR.jpg

Figure 5: (a) Schematic of the feedback circuit at every node of the FCNN which evaluates the change in synaptic weight using SGD method (equation 11 and 12) and generates write current pulse needed to bring about the change in weight is shown here. It is to be noted that the write current generated here is for the bias synapse connected to that output node. For synapses that connect the input nodes with this output node, the write current generated here has to be multiplied with input signal at the corresponding input node as given in equation (11). (b) Transistor based implementation of multiplier circuit we design here. (c) op-amp based implementation of subtracter circuit we design here.

We next describe how the weights of the ANN are updated to train the network in the Stochastic Gradient Descent (SGD) algorithm followed here. For a given training example , output is generated using the feedforward computation of equation (1). Since a supervised learning algorithm is followed, expected output for that training example is known and error at output node is calculated as follows:

(10)

Following the SGD method [1, 33], weight of synapse connecting input node with output node is updated as follows between iteration and :

(11)

and weight of the bias synapse for output node is updated as follows:

(12)

where is the learning rate, equal to 0.1 in our simulations.

The training sample is changed at every iteration to exhaust all examples in the training set. Then this process is repeated several times, each repetition being called an epoch. Thus, total number of iterations= number of epochs number of training samples.

Corresponding to the calculation of in the algorithm for the training example , voltages are generated at the output nodes of the corresponding feedforward computation circuit of Fig. 1(b) as described in the previous subsection. At each output node as we have already shown. Now, this is fed to the feedback circuit at that node (Fig. 1(b)) which evaluates at that iteration for all the synapses connecting that output node with all input nodes from to and the bias synapse. Details of the feedback circuit that we have designed for the purpose and simulated on Cadence Virtuoso are shown in Fig. 5(a). is split into two branches. At one branch it is subtracted from the desired output signal , using the op-amp based subtractor circuit of Fig. 5(c), to generate the term. At the other branch it is multiplied with itself using MOSFET based Gilbert cell multiplier circuit of Fig. 5(b) [38] and then subtracted from a constant voltage of 1 V using the subtractor circuit of Fig. 5(c) to generate the term. Then the two terms are multiplied using the Gilbert cell multiplier of Fig. 5(b) and amplified by the factor of to generate . Then it is multiplied by the input at each input node (scaled by ) to generate , as shown in Fig. 5(a).

Using equation (3) the required change in the conductance of the corresponding domain wall based synaptic device is given by:

(13)

From equation (1) the ”write” current that needs to be applied through the heavy metal layer of the synaptic devices to bring about the required change in conductance, and hence change in weight, is given by:

(14)

The feedback circuit of Fig. 5(a) amplifies the and it computes with the appropriate scaling factor in equation (14) to compute the corresponding ”write” currents and . Then it applies voltage on the heavy metal layer of synaptic device connecting input node with output node and on heavy metal layer of bias synapse device at output node (Fig. 1(b)) to bring about the changes and in the weights of the corresponding synapses. is the resistance of the heavy metal layer. This process repeated over a certain number of iterations trains the network and ”on-chip learning” is achieved.

In the following section, we show the results we obtain connected to our simulation of ”on-chip learning” for the designed hardware FCNN on the MNIST character dataset using the method described in this section.

Iv Performance of designed hardware network

\includegraphics[width=3in]Figure6.jpg

Figure 6: Training accuracy and test accuracy are plotted as a function of the epoch number during training of designed FCNN on MNIST dataset. Accuracy is determined by the number of times the FCNN generates the desired output for the given input.

\includegraphics[width=3.5in]Figure7.jpg

Figure 7: (a)Read current at output node 1, corresponding to digit ’0’, for the first 50 training samples of first epoch is plotted as function of time. Training for each sample lasts 0.5 ns. (b) Write current generated by circuit simulation of neuron circuit and SGD calculation circuit at node 1 to be sent back to bias synapse at that node is plotted (solid line). Neuron circuit considered is here is combination of op-amp voltage amplifier and transistor differential circuit from Fig.3(a). Training for each sample lasts 0.5 ns. Same obtained analytically is also plotted (dashed line). (c) Write current generated by circuit simulation of neuron circuit and SGD calculation circuit at node 1 to be sent back to bias synapse at that node is plotted here (solid line), where neuron circuit is differential amplifier based that takes current as input from Fig. 4(a). Training for each sample lasts 5 ns. Thus write current for first 50 training samples of first epoch is shown here as well. Same obtained analytically is also plotted (dashed line).

\includegraphics[width=2.5in]Figure8_LR.jpg

Figure 8: (a) Energy dissipation across all synaptic devices per epoch as a function of epoch number during on-chip learning of designed FCNN on MNIST dataset, when duration of write current pulse is 0.5 ns. (b) Energy dissipation across all synaptic devices per epoch as a function of epoch number when duration of of write current pulse is 5 ns.

In order to train the FCNN here on MNIST dataset in software, equations (2), (10), (11) and (12) are solved iteratively over several epochs in numerical package- Python. For 5000 examples in the training set and 10,000 examples in the test set, accuracy is plotted as a function of number of epochs in Fig. 6. After 200 epochs, the training accuracy is 92 percent. Thus the network has been very well trained on the training set. Testing accuracy turns out to be 72 percent, which can be further improved by inserting hidden layers in the network [33, 39, 40]. However, if we insert hidden layers in our FCNN, the weight update for next iteration expression of equation (11) and (12) will depend upon the weights of the synapses at the different layers for the present iteration after applying chain rule in equations (11) and (12) [1]. In hardware since weight of the synapse is stored as its conductance, its value can only be retrieved by passing a current through it which the feedforward circuit can do but the feedback circuit for weight update cannot (Fig. 1(b)). Hence we do not insert hidden layers in our FCNN.

The maximum magnitude that weight of any synapse in the network takes during the training process is also obtained and used in the corresponding equations for hardware as . The equations for hardware training of the FCNN (equations 3 - 14) are next solved iteratively in Python. Read currents at the output layer and write currents sent by the feedback circuit to the synapses are obtained as a function of iteration. Considering that this training happens real time in hardware and time duration of every iteration is equal to the duration of write current pulse for the synaptic device and are next obtained as function of time. We call this our analytical result. Now, the neuron circuits of Fig. 3(a) and Fig. 4(a) are expected to execute equations (8) and (9) and the SGD calculating feedback circuit of Fig. 5 is expected to execute equation (10)-(14). In order to make sure these circuits we design work as expected, obtained analytically for the case of training of MNIST at output node is fed to the neuron circuit of Fig. 3(a) followed by SGD calculation circuit of Fig. 5. Duration of write current pulse to synaptic devices is taken to be 0.5 ns (Fig. 2(b)). Hence every iteration lasts 0.5 ns. Time dependent simulation of the circuits is carried out on Cadence Virtuoso simulator. Write current generated by the SGD circuit that will be fed to synapse connecting input node with output node is obtained from the circuit simulation and compared with analytical result.

Fig. 7(a) shows such read current waveform at node 1, corresponding to digit ’0’ - for a certain time window: 0 to 25 ns, which essentially corresponds to first 50 iterations, i.e. first 50 training samples in the first epoch. The corresponding write current at bias synapse connected to node 0 ( ) obtained analytically as well as through circuit simulations is plotted in Fig. 7(b). We observe significant match between analytical and circuit simulation results. The same process is repeated to obtain results in Fig. 7(c), just that the circuit of Fig. 4(a) is used as neuron instead of the circuit of Fig. 3(a) and duration of each iteration is taken to be 5 ns, corresponding to a 5 ns long ”write” current pulse. As a result smaller magnitude of write current is needed to bring about the required weight update. We observe significant match between analytical and circuit simulation results in this case too, showing that we indeed have been able to design the complete network in hardware to carry out ”on-chip” learning on the MNIST dataset.

From obtained analytically as a function of iterations during the training process for a synaptic device connecting output node with input node , the corresponding heat energy () dissipated in the heavy metal layer of the domain wall synaptic device can be obtained from the following expression:

(15)

where is duration of each write current pulse. Considering the heavy metal to be Pt and the device dimensions used in simulations, turns out to be 100 . Adding the energies for all the synaptic devices, the total energy dissipation in synaptic devices during weight update is calculated and plotted as function of epochs in Fig. 8. Fig. 8(a) corresponds to pulse of 0.5 ns duration, while Fig. 8(b) corresponds to pulse of 5 ns duration. Since in the latter case is lower (Fig. 7) write energy is orders of magnitude lower when pulse duration is longer. Also maximum energy is dissipated during the initial epochs. Once the network starts getting trained, i.e., the accuracies start saturating (Fig. 6) energy dissipation per epoch also reduces because the weights start converging to trained values. Summing over 200 epochs, total energy dissipated in all synaptic devices is J for 0.5 ns long pulse and J for 5 ns long pulse. Since there are 7850 synapses in the network, energy dissipated per synapse for the entire training is as low as J for 0.5 ns long pulse and J for 5 ns long pulse.

V Conclusion

In conclusion, we have designed a feedforward FCNN using domain wall based devices as synaptic devices and transistor based differential amplifier circuits as neurons. We have also designed a feedback circuit through analog electronics that sends write currents to the synaptic devices and updates the corresponding weights. We have simulated the feedforward and feedback circuits together to updates weights of the synapses at every iteration and train the network over the MNIST dataset. We have also reported the performance connected to this ”on-chip learning” through training and test accuracy numbers. Hardware limitation connecting to inserting hidden layers limits the test accuracy which can be subject of research for the future. The circuits we design here and the accuracy numbers we report along with the hardware limitations are not only applicable to domain wall synapse based FCNN but also other analog implementations of FCNN, which use other kind of spintronic devices, e.g., skyrmionic devices [16, 21, 22], or non spintronic devices, e.g., memristor [41] and phase change memory [42], as synapses, if we make slight device specific modifications.

Acknowledgment

The authors would like to thank Atul Thakur and Prof. A. P. Pratosh, all from Indian Institute of Technology Delhi, for help with simulations and related discussions. This work is partly supported by DST INSPIRE Faculty Fellowship, awarded to Debanjan Bhowmik.

References

  • [1] Y. LeCun, Y. Bengio and G. Hinton, ”Deep Learning,”Nature, vol. 521, pp 436-444, 2015.
  • [2] J. Misra and I. Saha, ”Artificial Neural Networks in hardware: A survey of two decades of progress.” Neurocomputing vol. 74, pp. 239-255, 2010.
  • [3] C. D. Schuman et al, ”A Survey of Neuromorphic Computing and Neural Networks in Hardware”, arxiv. 1705.06963v1, 2017.
  • [4] A. Diamond, T. Nowotny and M. Schumaker, ”Comparing Neuromorphic Solutions in Action: Implementing a Bio-Inspired Solution to a Benchmark Classification Task on Three Parallel-Computing Platforms”, vol. 9, no. 491, 2016.
  • [5] A. Sengupta, Y. Shim and K. Roy, ”Proposal for an All-Spin Artificial Neural Network: Emulating Neural and Synaptic Functionalities Through Domain Wall Motion in Ferromagnets,”IEEE Transactions on Biomedical Circuits and Systems, vol. 10, no. 6, Dec. 2016.
  • [6] A. Sengupta and K. Roy, ”Encoding neural and synaptic functionalities in electron spin: A pathway to efficient neuromorphic computing,” Applied Physics Reviews, vol. 4 , p. 041105, 2017.
  • [7] J. Grollier, D. Querlioz and M. D. Stiles, ”Spintronic Nanodevices for Bioinspired Computing,” Proceedings of the IEEE, vol. 104, no. 10, 2016.
  • [8] S. Ikeda et al, ”A perpendicular-anisotropy CoFeB–MgO magnetic tunnel junction,” Nature Materials vol. 9, pp. 721-724, 2010.
  • [9] S. Emori, U. Bauer, S.M. Ahn, E. Martinez and G. S. D. Beach, ”Current-driven dynamics of chiral ferromagnetic domain walls,”Nature Materials vol. 12, pp. 611-616, 2013.
  • [10] K. S. Ryu, L. Thomas, S. H. Yang and S. Parkin, ”Chiral spin torque at magnetic domain walls,”Nature Nanotechnology vol. 8, pp. 527-533, 2013.
  • [11] D. Bhowmik, M. E. Nowakowski, L. You, O. Lee, D. Keating, M. Wong,et al., ”Deterministic Domain Wall Motion Orthogonal To Current Flow Due To Spin Orbit Torque,”Scientific Reports vol. 5, p. 11823, 2015.
  • [12] X. Zhang et al. ”Skyrmions in Magnetic Tunnel Junctions,” ACS Appl. Mater. Interfaces vol. 10, pp. 16887-16892, 2018.
  • [13] J. Sampaio, V. Cros, S. Rohart, A. Thiaville and A. Fert, ”Nucleation, stability and current-induced motion of isolated magnetic skyrmions in nanostructures,” Nature Nanotechnology vol. 8, pp. 839-844, 2013.
  • [14] D. Maccariello, W. Legrand, N. Reyren, K. Garcia, K. Bouzehouane, S. Collin, et al., ”Electrical detection of single magnetic skyrmions in metallic multilayers at room temperature,” Nature Nanotechnology 13, pp. 233-237, 2018.
  • [15] A. Sengupta, B. Han and K. Roy, ”Toward a spintronic deep learning spiking neural processor”, IEEE Biomedical Circuits and Systems Conference (BioCAS) 7833852, 2016.
  • [16] U. Saxena, D. Kaushik, M. Bansal, U. Sahu and D. Bhowmik, ”Low energy implementation of feed-forward neural network with back-propagation algorithm using a spin orbit torque driven skyrmionic device”, IEEE Transactions on Magnetics, vol. 54, no. 11, 2018.
  • [17] A. Sengupta and K. Roy, ”A Vision for All-Spin Neural Networks: A Device to System Perspective”, IEEE Transactions on Circuits and Systems- I, vol. 63, no. 12, 2016.
  • [18] A. Hirohata et al, ”Roadmap for Emerging Materials for Spintronic Device Applications”, IEEE Transactions on Magnetics vol. 51, no. 10, 2015.
  • [19] A. Sengupta, A. Banerjee, and K. Roy, ”Hybrid Spintronic-CMOS Spiking Neural Network with On-Chip Learning: Devices, Circuits, and Systems”, Phys. Rev. Appl. vol. 6, no. 064003, 2016.
  • [20] G. Q. Bi and M. M. Poo, ”Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type”, J. Neurosci. 18, pp. 10464–10472, 1998.
  • [21] Y. Huang, W. Kang, X. Zhang, Y. Zhou and W. Zhao, ”Magnetic skyrmion-based synaptic devices,” Nanotechnology vol. 28, p. 08LT02, 2017
  • [22] S. Li, W. Kang, Y. Huang, X. Zhang, Y. Zou and W. Zhao, ”Magnetic skyrmion-based artificial neuron device,” Nanotechnology vol. 28, p. 31LT01, 2017
  • [23] P. U. Diehl and M. Cook, ”Unsupervised Learning of Digit Recognition Using Spike-Timing-Dependent Plasticity”, Frontiers in Computational Neuroscience vol. 9, no. 99, 2015.
  • [24] A. Vansteenkiste et al. ”The design and verification of MuMax3,” AIP Advances vol. 4, p. 107133, 2014.
  • [25] L. Liu, C.F. Pai, Y. Li, H.W. Tseng, D.C.Ralph and R.A. Buhrman, ”Spin-Torque Switching with Giant Spin Hall Effect of Tantalum,” Science vol. 336, p. 6081, 2012.
  • [26] S. Zhang, ”Spin Hall Effect in the Presence of Spin Diffusion”, Phys. Rev. Lett. vol. 85, no. 2, 2000.
  • [27] A. Berger, E. R. J. Edwards, H. T. Nembach, O. Karis, M. Weiler and T. J. Silva, ”Determination of the spin Hall effect and the spin diffusion length of Pt from self-consistent fitting of damping enhancement and inverse spin-orbit torque measurements”, Phys. Rev. B vol. 98, no. 024402, 2018.
  • [28] D. Qu, S. Y. Huang, B. F. Miao, S. X. Huang and C. L. Chien, ”Self consistent determination of spin Hall angles in selected 5d metals by thermal spin injection”, Phys. Rev. B vol. 89, no. 140407(R), 2014.
  • [29] L. Q. Liu, T. Moriyama, D. C. Ralph and R. A. Buhrman, “Spin Torque Ferromagnetic Resonance Induced by the Spin Hall Effect,” Phys. Rev. Lett., vol. 106, no. 036601, 2011.
  • [30] L. Q. Liu, O. J. Lee, T. J. Gudmundsen, D. C. Ralph and R. A. Buhrman, “Current-induced switching of perpendicularly magnetized magnetic layers using spin torque from the spin Hall effect,” Phys. Rev. Lett., vol. 109, no. 096602, 2012.
  • [31] J. G. Zhu and C. Park. ”Magnetic Tunnel Junctions,” Materials Today, vol. 9, no. 11, pp. 36-45, 2006.
  • [32] E. Martinez, G. Finocchio, L. Torres and L. Lopez-Diaz, AIP Advances, vol. 3, no. 072109, 2013.
  • [33] S. Haykin, Neural Networks, Pearson, 1994
  • [34] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, ”Gradient-based learning applied to document recognition,” Proceedings of the IEEE, 86, pp. 2278–2324, 1998.
  • [35] B. Razavi, ”Design of Analog Integrated Circuits,” Chapter 9, Tata McGraw Hill, 2002.
  • [36] B. D. Yammenavar1, V. R. Gurunaik, R. N. Bevinagidad and V. U. Gandage, ”Design and Analog VLSI Implementation of Artificial Neural Network,” International Journal of Artificial Intelligence and Applications (IJAIA), vol. 2, no. 3, 2011.
  • [37] G. Khodabandehloo, M. Mirhassani and M. Ahmadi, ”Analog Implementation of a Novel Resistive-Type Sigmoidal Neuron,” IEEE Trans. on VLSI Systems, vol. 20, no. 4, 2012.
  • [38] A. S. Nandini, S. Madhavan and C. Sharma, ”Design and implementation of analog multiplier with improved linearity,” International Journal of VLSI design and Communication Systems (VLSICS), vol. 3, no. 5, 2012.
  • [39] G. Cybenko, ”Approximation by superpositions of a sigmoidal function,” Math. Control Signals Systems, vol. 2, pp. 303-314, 1989.
  • [40] K. Hornik, M. Stinchcombe and H. White, ”Multilayer feedforward networks are universal approximators,” Neural Networks vol. 2, pp. 359-366, 1989.
  • [41] C .Li, D. Belkin, Y. Li, P. Yan, M. Hu, N. Ge et al., ”Efficient and self-adaptive in-situ learning in multilayer memristor neural networks”, Nature Communications vol. 9, no. 2385, 2018.
  • [42] I. Boybat, M. L. Gallo, S. R. Nandakumar, T. Moraitis, T. Parnell, T. Tuma et al. ”Neuromorphic computing with multi-memristive synapses,” Nature Communications, vol. 9, no. 2514, 2018.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
320272
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description