Voltage-driven Building Block for Hardware Belief Networks

# Voltage-driven Building Block for Hardware Belief Networks

Orchi Hassan School of Electrical and Computer Engineering, Purdue University, IN, 47907    Kerem Y. Camsari School of Electrical and Computer Engineering, Purdue University, IN, 47907    Supriyo Datta School of Electrical and Computer Engineering, Purdue University, IN, 47907
###### Abstract

Probabilistic spin logic (PSL) based on networks of binary stochastic neurons (or -bits) has been shown to provide a viable framework for many functionalities including Ising computing, Bayesian inference, invertible Boolean logic and image recognition. This paper presents a hardware building block for the PSL architecture, consisting of an embedded MTJ and a capacitive voltage adder of the type used in neuMOS. We use SPICE simulations to show how identical copies of these building blocks (or weighted p-bits) can be interconnected with wires to design and solve a small instance of the NP-complete Subset Sum Problem fully in hardware.
Keywords - Probabilistic computing, Embedded MTJ, p-bits, p-circuits, Invertible Boolean logic, Subset Sum Problem

## I Introduction

Probabilistic spin logic (PSL) has been shown to provide a viable framework for Ising computing Sutton et al. (2017); Behin-Aein et al. (2016); Shim et al. (2017), Bayesian inference Behin-Aein et al. (2016); Faria et al. (2018), invertible Boolean logic Camsari et al. (2017a), and image recognition Zand et al. (2017). The PSL model is defined by two equations Camsari et al. (2017a) loosely analogous to a neuron and a synapse. The former is what we call the -bit whose output is related to its dimensionless input by the relation

 mi(t+Δt)=sgn{rand(−1,1)+tanh(Ii(t))} (1a) where rand(−1,+1) is a random number uniformly distributed between −1 and +1, and t is the normalized time unit. The synapse generates the input Ii from a weighted sum of the states of other p-bits according to the relation Ii(t)=I0(hi(t)+∑jJijmj) (1b)

where, is the on-site bias and is the weight of the coupling from -bit to -bit and , a dimensionless constant. The objective of this paper is to present a voltage-driven hardware building block using present day device technologies such as embedded MRAM Lin et al. (2009) and Floating-Gate MOS transistors, such that identical copies of the same block can be interconnected with wires to implement Eqs. 1.

The paper is organized as follows: We first show a complete hardware mapping for the weighted p-bit by augmenting a recently introduced Magnetic Random Access Memory (MRAM) type stochastic unit with a floating gate MOS-based capacitive network. We then show how the results of a fully interconnected -bit circuit closely approximate the the ideal equations using an example of an “invertible” Full Adder that can perform 1-bit addition and subtraction. Finally, we show how such invertible Full Adders can be interconnected to solve a simple instance of the NP-complete Subset Sum Problem.

Each example in this paper has been obtained using the full SPICE model which simply uses transistors, capacitors and resistors without any additional complex circuitry or processing.

## Ii Building block Figure 1: (a) Voltage-driven building block has two components corresponding to Eqs. 2a and b. The first is the p-bit on the right implemented through an embedded low-barrier unstable MTJ Camsari et al. (2017a) with two inverters added to give positive and negative outputs. The low-barrier MTJ can be designed using low barrier or circular nanomagnets. The second is the capacitive voltage adder with an inverter structure on the left similar to the floating gate MOS transistors used in neuMOS devices Shibata and Ohmi (1992). We call this combination of p-bit and its weight logic a weighted p-bit (Wp-bit) and denote it with the block diagram in (b). (c) Shows how an inverter helps amplify the input (¯¯¯¯¯Vi) of the capacitive network to give Vin,i at the gate of the embedded MTJ. (d) Shows the relation of the input gate voltage of the NMOS (Vin,i) to output (VOUT+). (d) Shows the transfer characteristics of the Wp-bit as a whole. The inputs in each case is swept from −0.4V to +0.4V in 1 μs. The yellow dots are time averaged values at each point over 300 ns and the solid blue line is a tanh fit with the fitting parameters ν0 and V0 using the magnet parameters in Camsari et al. (2017b). All transistors were modeled using minimum size (nfin=1) 14 nm HP-FinFET Predictive Technology Models.

Our building block has two components corresponding to the two Eqs. 1a,b. The one on the right is an embedded low-barrier unstable MTJ Camsari et al. (2017a) which provides a stochastic output whose average value is controlled by the input voltage:

 Vout,i=VDD2sgn(rand(−1,+1)+tanhVin,iV0) (2a) where ±VDD/2 are the supply voltages, and V0 is a parameter (∼22 mV) describing the width of the sigmoidal response. The value of V0 depends on the details of the 1T/1 MTJ in the embedded MRAM structure Camsari et al. (2017b). G0 of the MTJ is matched to center the transfer characteristics of the whole Wp-bit as shown in Fig.1d. To do that a DC analysis is performed, where an input voltage of “0” (VGS=0.4 V to turn ON the transistor) is applied at ¯¯¯¯¯Vi and G0 is swept to observe VOUT+ and VOUT−. The value of G0 for which VOUT+=VOUT−=0 (when VDS=0.4 V) is the value picked for the MTJ conductance. In this case it is 1/G0≈62 kΩ, and it seems reasonable considering the RA-products of modern MTJs. The value of V0 depends on the choice of G0 among other factors including transistor characteristics. The second of Eqs. 1 is implemented by the component on the left of Fig. 1, which is a capacitive voltage adder just like those used in neuMOS devices Shibata and Ohmi (1992); Nakamura et al. (2015). We can write ¯¯¯¯Vi=Vbias,iCb,i+∑jVout,jCijCg+Cz,i+Cb,i+∑jCij (2b) Note that the capacitive voltage divider typically attenuates the voltage ¯¯¯¯Vi at its output, and the inverter is intended to scale it up to Vin,i, the two being related approximately by Vin,i≈VDD2tanh¯¯¯¯Viν0 ≈VDD2ν0 ¯¯¯¯Vi   if   ¯¯¯¯Vi≪ν0 (2c)

where is a parameter characteristic of the inverter. Eqs. 2a,b can be mapped onto the PSL Eqs. 1a,b by defining

 mi=Vout,iVDD/2,  Ii=Vin,iV0 (3a) Cb,i=biC0  Cz,i=ziC0 (3b) hi=biVbias,iVDD/2,  Jij=CijC0 (3c) I0=(VDD/2ν0)(VDD/2V0)(Cg/C0)+zi+bi+∑jJij (3d)

The significance of is that we assume the input is composed of many identical capacitors , and that the weights have been designed to have integer values such that can be implemented by connecting elementary capacitors in parallel. The other coefficients , are also integers and we adjust the number of grounded capacitors to make a constant, so that is independent of index :

 I0=(VDD/2ν0)(VDD/2V0)(Cg/C0)+K (4)

Note that is usually a fairly large number equal to the sum of all the weights, and to implement an it is important to keep the factor to be much grater than 1. This is the reason for using an inverter between the capacitive voltage adder and the -bit.

Fig. 1b shows the icon we use to represent our building block which we call a weighted -bit. The input consists of four types of inputs designated S, D and Q having capacitances , and . Combinations of these are used to implement different weights and different bias . Each block has two outputs . The choice of output depends on the sign of the corresponding . Similarly different signs of are implemented by choosing to be or .

In PSL, any given truth table can be implemented using Eq. 1 by choosing an appropriate and matrices Camsari et al. (2017a). Here we show how that and are mapped onto physical hardware using our proposed building block using only transistors, resistors and capacitances. Figure 2: Invertible Full Adder with Wp-bit: (a)[J] matrix for implementing a Full Adder based on Camsari et al. (2017a). (b) Explicitly shows the hardware connections made to one of the inputs (A) (row 12) from the other p-bits where 1C0, 2C0, and 4C0 represent capacitors in units of C0=100aF. (c) Shows the subcircuit representation of the Full Adder with its input/output terminals. Ci,B,A input and S,Co output read terminals and separate corresponding clamping terminals hCi,hB,hA,hS,hC0. We used 8C0 for the clamping terminals to ensure input / outputs follow what is dictated from the external signals.

A Full Adder can be implemented in PSL using the matrix shown in Fig. 2. defines the interconnection between the 14 -bits to design the Full Adder in hardware. Each row of the matrix are realized in terms of capacitive coupling to the gate of the associated terminal.

To ensure a uniform is applied to each p-bit (Eq. 4), the same weighting factor K needs to be used for all -bits. To choose a given , we first find max() for any given , and then ground ( unit capacitances for all terminals where is a number that can be used to control , a larger causing a smaller . Fig. 2b shows explicit connections made to one of the inputs “A” and Fig. 2c shows the subcircuit of the Full Adder with as inputs, as the outputs, and as the clamping pins.

Fig. 4 shows the operation of a Full Adder in the usual forward mode with clamped to values (0,1,1) which forces the to (0,1) according to the truth table. In the invertible mode are clamped to (0,1) and the circuit stochastically searches consistent combinations of to satisfy the truth table: . Fig. 4 shows steady state (t = 1 ) histogram plots after thresholding of the Full Adder operation in direct and inverted mode side by side with results from the PSL model based on Eq. 1.

The good agreement between the ideal PSL model and the coupled SPICE simulation that solves PTM-based transistors models with stochastic LLGs validates the hardware mapping of the ideal p-bit equations with the weighted p-bits. Figure 3: Full SPICE implementation of an Invertible Full Adder(14 Wp-bit): The 14 Wp-bit invertible Full Adder circuit is simulated in (a) Directed and (b) Invertible modes. The clamping values are indicated. All biasing terminals that are not clamped to 1 or 0 are grounded. The histogram for [CiBASC0] is obtained after thresholding voltages (V<0≡−1,V>0≡+1). The SPICE model is run for 1μs) is compared with the PSL equations where each p-bit is updated in random but sequential order Camsari et al. (2017a). In this example a relatively low I0=1/3 is chosen to emphasize how the models are in good agreement even in the magnitudes of the minor peaks of the histogram. Figure 4: SPICE Simulation of a 4bit 3-SUM Problem (9 × 14 = 126 Wp-bit network): (a) The circuit is constructed by interconnecting two rows of invertible Full-Adders (FA) to construct a 3 number, 4-bit adder. The sum S is clamped to the desired value and A, B, C resolves themselves to create all the possible 3 number subsets out of all positive numbers 0 to 24−1 that satisfy A+B+C=S. (b) Shows the results when S is clamped to 15. Note how A, B and C get correlated to satisfy the sum with different combinations. In this example, the inputs A, B, C are unconstrained and can take on any value between 0-15.

## Iv 3SUM Problem

3SUM is a decision problem in complexity theory that asks whether three elements of a given set can sum up to zero. A variant of the problem is when the set of three numbers have to add up to a given constant number. This problem has a polynomial time solution and is not in NP. In this section, we show how the invertibility feature of the Full Adders can be exploited to design a hardware 3SUM solver. In the next section, we show how the 3SUM hardware can be modified to design a general solver for the NP-complete Subset Sum Problem.

The invertibility property of the Full Adders ensure that given the sum, it can provide the possible input combinations for that sum as shown in Fig.4a. So an n-bit 3 number adder circuit implemented in PSL can essentially provide solution sets for the 3SUM problem when the sum is clamped to a given value.

Fig. 4a shows the circuit constructed out of Full Adders to solve a 4-bit 3SUM problem. Each of the Full Adders in the circuit are the 14 p-bit invertible adders that were shown in Fig. 3. The first row of adders adds the two 4-bit numbers A and B, and feeds its output X, to the next row of adders which adds X and C to give the sum . Because p-circuits are invertible, if we clamp the sum S, the circuit naturally explores through all possible sets and multisets of the set of all integers from 0 to that add up to S. The given set for the problem could be implemented through clamping certain bits of A,B and C or externally circuitry could be used to detect only the results that belong to the given set. Fig. 4b shows the how A,B,C is fluctuating between values that satisfy the clamped sum 15.

## V Subset-sum Problem (SSP) Figure 5: SPICE Simulation of a 3 input, 3-bit Subset Sum Problem (7 × 14 = 98 Wp-bit network): (a) A 3-input 3-bit binary adder that adds three numbers A,B,C. Unlike the 3SUM, in this case inputs are constrained to a given value specified by the set chosen to be G ={1,2,4} in this example. Then a target S is selected and the output of the adders are clamped to the target value as shown in (b). (c) Shows three different instances of a target where the inputs find a consistent combination (the correct subset of G) to satisfy the target. Histograms show that the highest probable state is the correct subset in a ≈ 16 ns simulation. Another important difference from the 3SUM circuit is that the information flow is directed from the target (second layer of adders) to the first layer of adders.

In this section we show how the hardware circuit that was designed for 3SUM problem could be modified to solve a small instance of subset-sum problem (SSP) Cormen (2009) which is believed to be a fundamentally difficult problem in computer science (NP-complete). In the SSP, a set G with a finite number of positive numbers is defined. And then the decision problem is to ask whether there is a subset S’ such that S’ G whose elements sum to a specified target. For example, Fig. 5 shows a circuit that is programmed to choose a set, S= and a target that is defined by 4-bits. In the 3SUM circuit the input bits (A, B, C) were left “floating”, here, the inputs are constrained to a given number (1,2, 4) by clamping the remaining bits of an input. For example, the inputs and are clamped to zero to make A either 4 or 0. Under these conditions, clamping the output to a specified target makes the circuit search for a consistent input combination to find a subset that satisfies the clamped target. Fig. 5c shows three example targets where the inputs get correlated to satisfy the clamped sum. The invertibility feature that is utilized to solve the SSP in this hardware is similar to those discussed in the context of memcomputing Traversa and Di Ventra (2017), however the physical mechanisms are completely different.

One striking difference in the design of the SSP we considered, compared to the 3SUM hardware is the direction of information. In 3SUM the connections were from the first layer of Full Adders to the second, as in normal addition (Fig. 4a). In the SSP, we observed that reversing these connections from the second layer of adder to the first layer drastically improves the accuracy of the solution (Fig. 5a). A similar observation regarding the directional flow of information for another inverse problem using p-circuits (integer factorization) was made in Camsari et al. (2017a). Here we have limited the discussion to a small instance of the SSP which would in general require more layers of Full Adders in both vertical and horizontal directions to account for more numbers of elements in S and their size. This example illustrates how invertibility can be combined with standard digital VLSI design to construct any general “cost function” for hard problems of computer science in an asynchronously running hardware platform, without any external clocking.

## Vi Conclusion

In this paper we have proposed a compact building-block for Probabilistic Spin Logic (PSL) combining a recently proposed Embedded MRAM-based p-bit, with a capacitive network that can be implemented using Floating Gate MOS (FGMOS) transistors similar to the neuMOS concept. We have shown by extensive SPICE simulations that the results of the hardware model for the weighted p-bit agree well with the behavioral equations of PSL. Even though an FGMOS-based capacitive network to do the voltage addition seems like a natural option for the p-bit, we note that the device equations for a conductance network [ would have been essentially the same. Moreover, our discussion was only about static weights, but an FPGA-like reconfigurable weighting scheme can also be employed either by using transistor-based gates or by additional multiplexing circuitry to perform online learning or to redesign p-circuits. Finally, using the basic building block we have shown how a small instance of a hardware solver of the NP-complete Subset Sum Problem can be designed using the unique invertibility feature of p-circuits.

## Acknowledgment

This work was supported by the National Science Foundation (NSF) through E2CDA.

## References

• Sutton et al. (2017) B. Sutton, K. Y. Camsari, B. Behin-Aein,  and S. Datta, Scientific Reports 7 (2017).
• Behin-Aein et al. (2016) B. Behin-Aein, V. Diep,  and S. Datta, Scientific reports 6 (2016).
• Shim et al. (2017) Y. Shim, A. Jaiswal,  and K. Roy, Journal of Applied Physics 121, 193902 (2017)http://dx.doi.org/10.1063/1.4983636 .
• Faria et al. (2018) R. Faria, K. Y. Camsari,  and S. Datta, arXiv preprint arXiv:1801.00497  (2018).
• Camsari et al. (2017a) K. Y. Camsari, R. Faria, B. M. Sutton,  and S. Datta, Phys. Rev. X 7, 031014 (2017a).
• Zand et al. (2017) R. Zand, K. Y. Camsari, I. Ahmed, S. D. Pyle, C. H. Kim, S. Datta,  and R. F. DeMara, arXiv preprint arXiv:1710.00249  (2017).
• Lin et al. (2009) C. Lin, S. Kang, Y. Wang, K. Lee, X. Zhu, W. Chen, X. Li, W. Hsu, Y. Kao, M. Liu, et al., in Electron Devices Meeting (IEDM), 2009 IEEE International (IEEE, 2009) pp. 1–4.
• Shibata and Ohmi (1992) T. Shibata and T. Ohmi, IEEE Transactions on Electron devices 39, 1444 (1992).
• Camsari et al. (2017b) K. Y. Camsari, S. Salahuddin,  and S. Datta, IEEE Electron Device Letters 38, 1767 (2017b).
• Nakamura et al. (2015) N. Nakamura, K. Shimada, T. Matsuda,  and M. Kimura, in Future of Electron Devices, Kansai (IMFEDK), 2015 IEEE International Meeting for (IEEE, 2015) pp. 90–91.
• Cormen (2009) T. H. Cormen, Introduction to algorithms (MIT press, 2009).
• Traversa and Di Ventra (2017) F. L. Traversa and M. Di Ventra, Chaos: An Interdisciplinary Journal of Nonlinear Science 27, 023107 (2017).
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters   