MorphIC: A 65-nm 738k-Synapse/mm{}^{2} Quad-Core Binary-Weight Digital Neuromorphic Processorwith Stochastic Spike-Driven Online Learning

MorphIC: A 65-nm 738k-Synapse/mm Quad-Core Binary-Weight Digital Neuromorphic Processor
with Stochastic Spike-Driven Online Learning

Charlotte Frenkel,  Jean-Didier Legat, 
and David Bol, 
Abstract

Recent trends in the field of artificial neural networks (ANNs) and convolutional neural networks (CNNs) investigate weight quantization as a means to increase the resource- and power-efficiency of hardware devices. As full on-chip weight storage is necessary to avoid the high energy cost of off-chip memory accesses, memory reduction requirements for weight storage pushed toward the use of binary weights, which were demonstrated to have a limited accuracy reduction on many applications when quantization-aware training techniques are used. In parallel, spiking neural network (SNN) architectures are explored to further reduce power when processing sparse event-based data streams, while on-chip spike-based online learning appears as a key feature for applications constrained in power and resources during the training phase. However, designing power- and area-efficient spiking neural networks still requires the development of specific techniques in order to leverage on-chip online learning on binary weights without compromising the synapse density. In this work, we demonstrate MorphIC, a quad-core binary-weight digital neuromorphic processor embedding a stochastic version of the spike-driven synaptic plasticity (S-SDSP) learning rule and a hierarchical routing fabric for large-scale chip interconnection. The MorphIC SNN processor embeds a total of 2k leaky integrate-and-fire (LIF) neurons and more than two million plastic synapses for an active silicon area of 2.86mm in 65nm CMOS, achieving a high density of 738k synapses/mm. MorphIC demonstrates an order-of-magnitude improvement in the area-accuracy tradeoff on the MNIST classification task compared to previously-proposed SNNs, while keeping a competitive energy-accuracy tradeoff.

Neuromorphic engineering, spiking neural networks, binary weights, synaptic plasticity, hierarchical networks-on-a-chip, online learning, stochastic computing, event-based processing, CMOS digital integrated circuits, low-power design.

I Introduction

The massive deployment of neural network accelerators as inference devices is currently hindered by the memory footprint and power consumption required for high-accuracy classification [Whatmough17]. Two trends are being explored in order to solve this issue. The first trend consists in optimizing current artificial neural network (ANN) and convolutional neural network (CNN) architectures. Weight quantization down to binarization is a promising approach as it allows to simplify the operations and minimize the memory footprint, thus avoiding the high energy cost of off-chip memory accesses if all the weights can be stored into on-chip memory [Moons17]. The accuracy drop induced by quantization can be mitigated to acceptable levels for many applications with the use of quantization-aware training techniques that propagate binary weights during the forward pass and keep full-resolution weights for backpropagation updates [Courbariaux16]. The associated off-chip learning setup for quantization-aware training is shown in Fig. 1(a): this strategy allows binary-weight neural networks to perform inference with a favorable energy-area-accuracy tradeoff, as recently demonstrated by binary CNN chips (e.g., [Andri18, Moons18, Bankman18]).

Fig. 1: Learning strategies for binary-weight neural networks. (a) Quantization-aware off-chip learning setup: binary weights are used during the forward pass while full-resolution weights are kept for backpropagation updates [Courbariaux16]. Training is carried out in an off-chip high-performance optimizer, while inference is carried out in the power- and resource-constrained device. (b) On-chip online learning setup, where data-driven weight updates are carried out in parallel with inference in the power- and resource-constrained device. A teacher signal is required for supervised online learning, whereas teacher-less learning is unsupervised.

The second trend consists in changing the neural network architecture and data representation, which is currently being explored with bio-inspired spiking neural networks (SNNs) as a power-efficient neuromorphic processing alternative for sparse event-based data streams [Poon11]. Embedded online learning is a key feature in SNNs as it enables on-the-fly adaptation to the environment [Azghadi14]. Moreover, by avoiding the use of an off-chip optimizer, on-chip online learning allows SNNs to target applications that are power- and resource-constrained during both the training and the inference phases, as shown in Fig. 1(b). Spike-based online learning is an active research area, both in the development of new rules for high-accuracy learning in multi-layer networks (e.g., [Zheng17, Mostafa17, Neftci17, Zenke18]) and in the demonstration of silicon implementations in applications such as unsupervised learning for image denoising and reconstruction [Knag15, Chen18]. However, these approaches currently rely on multi-bit weights.

These two trends mostly evolve in parallel as only three chips have been proposed previously to leverage the density and power advantage of binary weights with SNNs. First, the TrueNorth chip proposed by IBM is the largest-scale neuromorphic chip with 1M neurons and 256M 1-bit synapses, however it does not embed online learning [Akopyan15]. Second, the Loihi chip recently proposed by Intel has a configurable synaptic resolution that can be reduced to 1 bit and embeds a programmable co-processor for on-chip learning, though not demonstrated using a binary synaptic resolution to the best of our knowledge [Davies18]. Finally, Seo et al. propose a stochastic version of the spike-timing-dependent plasticity (S-STDP) rule for online learning in binary synapses [Seo11]. However, S-STDP requires the design of a custom transpose SRAM memory with both row and column accesses, which severely degrades the density advantage of their approach.

It has been demonstrated in [Frenkel17] that the spike-dependent synaptic plasticity (SDSP) learning rule proposed by Brader et al. in [Brader07] allows for a more efficient resource usage than STDP: all the information necessary for learning is available in the post-synaptic neuron at pre-synaptic spike time. SDSP requires neither an expensive local synaptic storage of spike timings nor a custom SRAM with both row and column accesses. Therefore, in this work, we propose an efficient stochastic implementation of SDSP compatible with standard high-density foundry SRAMs in order to leverage embedded online learning in binary-weight SNNs.

Beyond plasticity, a second key aspect of spiking neural networks lies in connectivity. The brain organization in small-world networks with dense local connectivity and sparse long-range wiring leads to efficient clustering of neuronal activity and hierarchical information encoding [Bassett06]. Network-on-chip (NoC) design applied to multi-core SNNs is thus an active research topic [Akopyan15, Davies18, Moradi18, Park17, Navaridas09, Benjamin14, Schemmel10]. In this work, we propose a hierarchical combination of mesh-based routing for inter-chip connectivity, star-based routing for intra-chip inter-core connectivity and crossbar-based routing for local intra-core connectivity. We store all the connectivity information locally in the neuron memory to enable memory-less routers that do not require local mapping table accesses. With only 27 connectivity bits per neuron, this low-memory hierarchical connectivity allows reaching biologically-realistic fan-in and fan-out values of 1k and 2k neurons, respectively.

This paper extends [Frenkel19b] and demonstrates this two-fold approach with MorphIC, a quad-core digital neuromorphic processor: stochastic SDSP (S-SDSP) is combined with a hierarchical routing fabric for large-scale plastic connectivity. MorphIC was prototyped in 65nm CMOS and embeds 2k leaky integrate-and-fire (LIF) neurons and more than 2M synapses in an active silicon area of 2.86mm, therefore achieving a high density of 738k 1-bit online-learning synapses per mm. It results in an order-of-magnitude density improvement compared to the only previously-proposed binary-weight online-learning SNN processor from Seo et al. On the MNIST image recognition task [LeCun98], MorphIC achieves an accuracy of 97.8%. It demonstrates an order-of-magnitude improvement in the area-accuracy tradeoff compared to other SNNs, while keeping a competitive energy-accuracy tradeoff using rank order coding.

The remainder of this paper is structured as follows. The architecture and implementation of the MorphIC SNN processor are provided in Section II, together with detailed descriptions of the hierarchical event routing infrastructure and S-SDSP learning rule. The specifications, measurements and benchmarking results are presented in Section III. Finally, the presented results are discussed in Section LABEL:sec_disc.

Fig. 2: Block diagram of the MorphIC quad-core neuromorphic processor.
Fig. 3: Block diagram of a MorphIC core. Each core features 512 LIF neurons and 528k binary-weight synapses with embedded S-SDSP-based online learning.
Fig. 5: Architecture of the hierarchical three-level event routing fabric of MorphIC. (a) The level-2 (L2) router handles high-level inter-chip connectivity with four bidirectional address-event-representation (AER) links, events are dispatched following a unicast mesh-based strategy. Packet buffering in FIFOs ensures that all links can operate independently. (b) The level-1 (L1) router handles mid-level intra-chip inter-core connectivity with four local links, one for each MorphIC core. Events are dispatched following a multicast star-based strategy. (c) The level-0 (L0) router handles low-level connectivity, it decodes incoming packets and sorts them toward either the controller or the scheduler of the local core. When a local neuron configured for L1 and/or L2 outward connectivity spikes, all its connectivity information is encapsulated in a routing packet before exiting the L0 router. Event types indicated in light blue are testbench-type events that cannot be generated by MorphIC chips.
Fig. 4: Timing diagram of the crossbar operation in a MorphIC core, adapted from the time multiplexing scheme we previously proposed for the ODIN SNN in [Frenkel19], illustrating the time-multiplexed crossbar operation for a spike event from 9-bit source neuron address , leading to 512 synaptic operations (SOPs). Each SOP lasts two clock cycles. The core controller goes sequentially through all the local 512 neurons, it first reads their state in the local SRAM memory and then writes back the updated state retrieved from the leaky integrate-and-fire (LIF) update logic. The synapse SRAM has 128-bit words for density purposes: as MorphIC has 1-bit synapses, 128 synapses are handled by access and stochastic SDSP (S-SDSP) updates are buffered before being written back to the synapse SRAM memory. Depending on whether the source neuron was on the local core (L0 connectivity) or on another core from the same MorphIC chip (L1 connectivity), the MSB of the synapse SRAM address (L01 flag bit) selects whether L0 or L1 synapses are accessed.

Ii Architecture and Implementation

A block diagram of the MorphIC quad-core spiking neuromorphic processor is shown in Fig. 2, illustrating its hierarchical routing fabric for large-scale chip interconnection. Level-2 (L2) routers handle inter-chip connectivity, level-1 (L1) routers handle inter-core connectivity and level-0 (L0) routers handle intra-core connectivity (Section II-A). The clock can be either provided externally or generated internally using a configurable-length ring oscillator. A block diagram of the MorphIC core is shown in Fig. 3: each core embeds 512 leaky integrate-and-fire (LIF) neurons configured as a crossbar array with 256k L0 1-bit synapses and 256k L1 1-bit synapses, while 16k L2 synapses can be accessed independently. Each synapse embeds online learning with a stochastic implementation of the spike-dependent synaptic plasticity (S-SDSP) learning rule (Section II-B). Each axon can be configured to multiply its associated synaptic weights by a factor of 1, 2, 4 or 8. Time multiplexing is used to increase the neuron and synapse densities by using shared update circuits and storing neuron and synapse states to local SRAM memory, based on the strategy we previously proposed for the ODIN SNN in [Frenkel19]. Fig. 4 illustrates the time-multiplexed crossbar operation of a MorphIC core when it processes a spike event from a neuron in the local core (L0 connectivity) or from a neuron in another core in the same chip (L1 connectivity). The core controller goes sequentially through all the 512 local neurons, leading to 512 synaptic operations (SOPs), and handles the local SRAM memory accesses accordingly. As L2 events target a specific synapse of a neuron (Section II-A), they lead to a single SOP.

Ii-a Hierarchical event routing

Fig. 6: Examples of L0, L1 and L2 connectivity handling at the core level. Blue: L0 connectivity inside core 0, following a typical crossbar operation. Orange: L1 connectivity from neurons in core 1 and 2 to cores 0 and 3, following a crossbar operation in the destination cores. In this example, as the source neurons have identical 9-bit addresses, they map to the same L1 synapses in the destination cores. Gray: L2 connectivity from a neuron in another MorphIC chip to a specific L2 synapse of a target neuron, broadcasted to cores 1, 2 and 3 of the destination chip.

Clustering groups of neurons with dense local and sparse long-range connectivity allows minimizing memory requirements while keeping flexibility and scalability [Moradi18]. This organization is found in the brain and is known as small-world networks. Hierarchy is therefore a key concept in SNN event routing infrastructures for large-scale networks [Akopyan15, Davies18, Moradi18, Park17, Navaridas09, Benjamin14, Schemmel10]. MorphIC uses a heterogeneous hierarchical routing fabric with different router types at each level, as shown in Fig. 5: the L2 router follows a unicast mesh-based dimension-ordered destination-driven operation (Section II-A1), the L1 router follows a multicast star-based source-driven operation (Section II-A2) while the L0 router handles decoding and encoding of the different packet types for local core crossbar-based processing (Section II-A3). Such a heterogeneous event routing infrastructure is deadlock-free and allows for the three connectivity patterns illustrated in Fig. 6, depending on the source neuron location:

  • The source neuron targets neurons in the same core (L0 connectivity): the time-multiplexed crossbar approach of Fig. 4 is followed with the local L0 synapses (e.g., blue pattern in core 0 in Fig. 6).

  • The source neuron targets neurons in any combination of other cores in the same chip (L1 connectivity): the time-multiplexed crossbar approach of Fig. 4 is followed with the L1 synapses of the destination cores. The same L1 synapses are shared with up to three cores (e.g., orange pattern from source neurons in cores 1 and 2 to destination cores 0 and 3 in Fig. 6).

  • The source neuron is located in another MorphIC chip (L2 connectivity): the target is a specific L2 synapse address in any combination of cores in one destination chip (e.g., gray pattern from a source neuron retrieved from the West link toward identical L2 synapse addresses in cores 1, 2 and 3 in Fig. 6). As each neuron has 32 L2 synapses, an L2 synapse address has a width of 14 bits (9 bits for the neuron, 5 bits for the L2 synapse).

Fig. 7: Neuron memory map: structure of a word in the neuron SRAM memory. Each word contains the parameters, state, outward connectivity and the 32 1-bit input L2 synapses of a neuron. At runtime, only the LIF state (i.e. a 11-bit membrane potential, a 4-bit Calcium variable and a 4-bit counter to emulate Calcium leakage) and the L2 synapses can be modified by the LIF and S-SDSP update logic blocks (Fig. 3), respectively. The L2 and L1 connectivity fields occupy a total of only 27 bits per neuron.

Each neuron of MorphIC can use any combination of the aforementioned three types of L0, L1 and L2 connectivities, which allows reaching a fan-in of 512 (L0) + 512 (L1) + 32 (L2) and a fan-out of 512 (L0) + 3512 (L1) + 4 (L2).

The entire connectivity of a network of MorphIC chips is determined by only 27 connectivity bits per neuron, which are stored in the neuron 8-kB SRAM memories located inside each core (Fig. 3). It consists of 512 128-bit words, one word for each of the 512 LIF neurons per core, whose structure is outlined in Fig. 7. Destination-based L2 connectivity requires 24 bits in total: the 6-bit chip field stores 3-bit and fields encoding the destination chip (Section II-A1), the 4-bit cores field encodes the combination of target cores and the 5-bit syn and the 9-bit neur fields encode the 14-bit L2 synapse address. Source-based L1 connectivity requires only 3 bits per neuron in order to target any combination of the other cores in a MorphIC chip. Except if disabled in the core parameter bank, L0 crossbar connectivity is automatic and does not require further connectivity information. As all the connectivity information is decentralized next to the neurons and then encapsulated in the event packets, the routers do not require local or external mapping tables: they are memory-less beyond simple packet buffering. Let us now discuss the architectural details of the L2, L1 and L0 routers.

Ii-A1 Level-2 (L2) router

The L2 router (Fig. 5(a)) handles high-level inter-chip connectivity with four links along the North, South, East and West directions that operate independently and in parallel. Events from/to the four chip-level links and from/to the L1 router are buffered into FIFOs before being dispatched following a standard unicast mesh-based strategy with dimension-ordered routing (i.e. direction before direction). Two and fields in the chip-level packet contain the information necessary for destination-based routing. and have a 3-bit width each (one sign bit, two data bits), which allows routing packets to up to three MorphIC chips in any direction. At each East or West (resp. North or South) hop, the L2 router decrements the value of the (resp. ) data field. When both and are zero, the packet is then forwarded to the L1 router. Distance information is also maintained separately in the event packet: is 0 for local L0 events and 1 for events received from local L1 connectivity, it then increases for each L2 hop up to a maximum of 7 for events received from a chip located at =3 and =3. As synapses at all routing levels of MorphIC embed online learning (Section II-B), the probability of synaptic weight update can be modulated by the distance information, following a small-world network modeling strategy. To the best of our knowledge, this is the first SNN to propose online hierarchical learning.

The mesh-based dispatcher is controlled by an arbiter, which can be configured either for round-robin or for priority-based operation. Round-robin operation, by cycling through each link independently of the FIFO usage, guarantees a maximum latency for packet processing, while priority-based operation is a greedy approach that allocates processing time to the most active links based on the current FIFO usage.

Fig. 8: 32-bit packet transmission multiplexed into four 8-bit AER transactions at the L2 level. As double-latching synchronization barriers are used on the receiver REQ, the ADDR data can be asserted at the same time as the REQ line on the sender side.

Links in each direction consist of two address-event representation (AER) busses, a sender and a receiver, for a total of eight AER busses per MorphIC chip. AER is a de facto standard for spiking neural network connectivity as it allows high-speed asynchronous communication of spike events between chips using a four-phase handshake protocol [Mortara94, Boahen00]. The MorphIC design being pad-limited, the width of the AER busses has been reduced to 8 bits. Transmission and reception of 32-bit event packets are thus multiplexed into four 8-bit AER transactions, as illustrated in Fig. 8. In order to ensure an asynchronous operation of the AER busses between MorphIC chips, double-latching synchronization barriers have been placed on the receiver REQ and sender ACK handshake lines to limit metastability issues. Due to the increased latency of off-chip packet routing, L2 packet activity should be sparse compared to L1 and L0 activity. L2 events should thus represent high-level features, as illustrated in the experiments outlined in Section III.

Ii-A2 Level-1 (L1) router

The L1 router (Fig. 5(b)) handles mid-level intra-chip inter-core connectivity with the four local MorphIC cores. This router is based on a star topology and relies on a simple dispatcher that multicasts events to local cores following a source-based approach. It does not contain any FIFO buffering as awaiting packets are already buffered in the L2 and L0 routers. An arbiter controls the dispatcher following a configurable round-robin or greedy priority-based operation, similarly to the L2 router.

The L1 router is at the center of the hierarchy. For neuron events from local cores (i.e. ascending-hierarchy events), it handles multicasting to any combination of the other cores toward L1 synapses and/or forwarding to the L2 router toward another MorphIC chip. For events retrieved from the L2 router (i.e. descending-hierarchy events), it handles multicasting to any combination of the MorphIC cores toward L2 synapses.

Ii-A3 Level-0 (L0) router

The L0 router (Fig. 5(c)) handles low-level intra-core connectivity. This router is divided into two blocks: an interface and a scheduler. The interface handles packet decoding and encoding from/to the L1 router. The packet decoder segments input packets into different types:

  • configuration packets are used to program the local neuron and synapse SRAMs and the core parameter bank (Fig. 3), they are handled by the controller,

  • monitoring request packets query one byte from the neuron or synapse SRAM, they are handled by the controller,

  • scheduler events are buffered by a FIFO in the core scheduler, they include L2 events targeting a single L2 synapse, L1 events targeting L1 synapses, L0 events targeting L0 synapses, virtual events that directly update a neuron without accessing any physical synapse, teacher events that control the S-SDSP supervision mechanism through the neuron Calcium variables (Section II-B) and the leak events that drive the LIF leakage time constant.

Locally-generated L0 events are buffered directly in a scheduler FIFO, they are not visible from the L1/L2 router hierarchy. Locally-generated events that need to go up the router hierarchy are handled by the packet decoder:

  • monitoring reply packets contain the neuron or the synapse state byte previously queried by a monitoring request packet,

  • L1/L2 events forward the L1 and L2 connectivity information of a source neuron to the L1 router.

Fig. 9: Time-multiplexed S-SDSP update logic. up and down signals represent the values of the S-SDSP update conditions in Eq. (2).
Fig. 10: (a) Circuit diagram of a Galois 17-bit LFSR with characteristic polynomial . (b) Equivalent compact representation. (c) 9-unfolded 17-bit Galois LFSR for single-cycle 9-bit pseudo-random word generation for the S-SDSP online learning rule.

Ii-B Stochastic spike-dependent synaptic plasticity (S-SDSP)

As the spike-timing-dependent plasticity (STDP) learning rule relies on the relative timing between pre- and post-synaptic spikes, it requires a local synaptic buffering of spike timings, which leads to critical overheads as buffering circuitry has to be replicated inside each synapse [Frenkel17]. In order to avoid this problem, the stochastic binary approach proposed by Seo et al. in [Seo11] involves the design of a custom transpose SRAM with both row and column accesses to carry out STDP updates each time pre- and post-synaptic spikes occur. However, beyond increasing the design time, custom SRAMs do not benefit from DRC pushed rules for foundry bitcells and induce a strong area penalty compared to single-port high-density foundry SRAMs [Frenkel17]. Therefore, STDP cannot be implemented efficiently in silicon.

The spike-dependent synaptic plasticity (SDSP) learning rule [Brader07] avoids this drawback: the synaptic weight is updated each time a pre-synaptic event occurs, according to Eq. (1). The update depends solely on the state of the post-synaptic neuron at the time of the pre-synaptic spike, i.e. the membrane potential compared to threshold and the Calcium concentration Ca compared to thresholds , and . The Calcium concentration represents an image of the recent firing activity of the neuron, it disables SDSP updates for high and low post-synaptic neuron activities and helps prevent overfitting [Brader07]. A single-port high-density foundry SRAM can therefore be used for high-density time-multiplexed implementations. However, as SDSP relies on discrete positive and negative steps, it cannot be applied directly to binary weights.

(1)

Senn and Fusi proposed a bio-inspired stochastic learning rule for binary synapses in [Senn05], where the update conditions rely on the total synaptic input of the post-synaptic neuron at the time of the pre-synaptic spike. However, this information is not easily available in time-multiplexed implementations: as shown in Fig. 4, the destination neurons are processed sequentially, while obtaining the total post-synaptic input of a neuron would require sequential processing of the source neurons instead, which is incompatible with an event-driven operation. Therefore, we propose a stochastic spike-dependent synaptic plasticity (S-SDSP) learning rule suitable for binary weights, as formulated in Eq. (2). It results from the fusion of the stochastic mechanism proposed in [Senn05] with the SDSP update conditions. and are binary random variables with probabilities and of being at 1, respectively. The synaptic weight therefore goes from 0 to 1 (resp. 1 to 0) with probability (resp. ), depending on the update conditions. The Calcium concentration is implemented as a 4-bit variable, it is stored next to all S-SDSP parameters in the neuron SRAM (Fig. 7).

(2)

The proposed S-SDSP update logic is shown in Fig. 9. The binary random variables can be generated with probabilities using linear feedback shift register (LFSR)-based pseudo-random number generation. In order to generate with a resolution similar to the probabilities down to 0.01 used in [Senn05], approximately 6 bits of resolution are required. Distance-based modulation of from small-world network modeling requires another 3 bits of resolution as the distance information ranges from 0 to 7 (Section II-A). Therefore, we selected a 9-bit resolution for probabilities. As S-SDSP updates must be computed in a single clock cycle, it is possible to parallelize successive iterations of an LFSR by using the unfolding algorithm from [Parhi99], as suggested in [Cheng06] to avoid instantiating parallel LFSRs and save switching power. The number of parallelized successive iterations is governed by the unfolding factor, which is 9 in this case. The unfolding process and the resulting unfolded LFSR are illustrated in Fig. 10. Unfolding leads the combinational logic resources (here, a single XOR gate) to be multiplied by the unfolding factor, while the LFSR period is divided by the unfolding factor. In order to avoid inducing correlation between synapses, the period of the unfolded LFSR must be one order of magnitude higher than the number of synapses per neuron. We thus selected a 17-bit depth for the LFSR to be unfolded (Fig. 10(a-b)). The 9-unfolded LFSR is shown in Fig. 10(c). The overhead incurred by the resulting S-SDSP update logic is negligible as it is shared with time multiplexing for all the L0, L1 and L2 synapses in a MorphIC core.

Iii Measurements and Benchmarking Results

MorphIC was prototyped in the UMC 8-metal 65-nm low-power (LP) CMOS process. A chip microphotograph is presented in Fig. 11, while specifications and measurement results are provided in Table III. A detailed area breakdown is provided in Table LABEL:table_area. As derived in [Frenkel19], the power consumption of time-multiplexed digital SNN architectures can be modeled by

Fig. 11: MorphIC chip microphotograph, illustrating the floorplan of the neuron and synapse SRAM macros of each core.
Technology \adl@mkpreamc\@addtopreamble\@arstrut\@preamble
Implementation \adl@mkpreamc\@addtopreamble\@arstrut\@preamble
Area \adl@mkpreamc\@addtopreamble\@arstrut\@preamble
\adl@mkpreamc\@addtopreamble\@arstrut\@preamble
Number of cores \adl@mkpreamc\@addtopreamble\@arstrut\@preamble
Total # neurons (type) \adl@mkpreamc\@addtopreamble\@arstrut\@preamble
Total # synapses (hier.) \adl@mkpreamc\@addtopreamble\@arstrut\@preamble
Fan-in (hier.) \adl@mkpreamc\@addtopreamble\@arstrut\@preamble
Fan-out (hier.) \adl@mkpreamc\@addtopreamble\@arstrut\@preamble
Online learning \adl@mkpreamc\@addtopreamble\@arstrut\@preamble
Time constant \adl@mkpreamc\@addtopreamble\@arstrut\@preamble
Supply voltage 0.8V 1.2V
Max. clock frequency 55MHz 210MHz
Leakage power () 45W 190W
Idle power () 41.3W/MHz 94.0W/MHz
Energy per SOP () 30pJ 65pJ
Energy per L2 hop 9.0pJ 20.3pJ
Energy per L1 hop 1.7pJ 3.8pJ
L2 router bandwidth 2.3Mpackets/s/link 5.7Mpackets/s/link
L1 router bandwidth 55Mpackets/s 210Mpackets/s
Core bandwidth () 27.5MSOP/s/core 105MSOP/s/core
TABLE I: Specifications and measurements of MorphIC.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
354747
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description