Neuro-RAM Unit with Applications to Similarity Testing and Compression in Spiking Neural Networks

# Neuro-RAM Unit with Applications to Similarity Testing and Compression in Spiking Neural Networks

Nancy Lynch
MIT
lynch@csail.mit.edu
Cameron Musco
MIT
cnmusco@mit.edu
Merav Parter
MIT
parter@mit.edu
###### Abstract

We study distributed algorithms implemented in a simplified biologically inspired model for stochastic spiking neural networks. We focus on tradeoffs between computation time and network complexity, along with the role of noise and randomness in efficient neural computation.

It is widely accepted that neural spike responses, and neural computation in general, is inherently stochastic. In recent work, we explored how this stochasticity could be leveraged to solve the ‘winner-take-all’ leader election task. Here, we focus on using randomness in neural algorithms for similarity testing and compression. In the most basic setting, given two -length patterns of firing neurons, we wish to distinguish if the patterns are equal or -far from equal.

Randomization allows us to solve this task with a very compact network, using auxiliary neurons, which is sublinear in the input size. At the heart of our solution is the design of a -round neural random access memory, or indexing network, which we call a neuro-RAM. This module can be implemented with auxiliary neurons and is useful in many applications beyond similarity testing – e.g. we discuss its application to compression via random projection.

Using a VC dimension-based argument, we show that the tradeoff between runtime and network size in our neuro-RAM is nearly optimal. To the best of our knowledge, we are the first to apply these techniques to stochastic spiking networks. Our result has several implications – since our neuro-RAM can be implemented with deterministic threshold gates, it shows that, in contrast to similarity testing, randomness does not provide significant computational advantages for this problem. It also establishes a separation between feedforward networks whose gates spike with sigmoidal probability functions, and well-studied deterministic sigmoidal networks, whose gates output real number sigmoidal values, and which can implement a neuro-RAM much more efficiently.

## 1 Introduction

Biological neural networks are arguably the most fascinating distributed computing systems in our world. However, while studied extensively in the fields of computational neuroscience and artificial intelligence, they have received little attention from a distributed computing perspective. Our goal is to study biological neural networks through the lens of distributed computing theory. We focus on understanding tradeoffs between computation time, network complexity, and the use of randomness in implementing basic algorithmic primitives, which can serve as building blocks for high level pattern recognition, learning, and processing tasks.

##### Spiking Neural Network (SNN) Model

We work with biologically inspired spiking neural networks (SNNs) [Maa96, Maa97, GK02, Izh04], in which neurons fire in discrete pulses in synchronous rounds, in response to a sufficiently high membrane potential. This potential is induced by spikes from neighboring neurons, which can have either an excitatory or inhibitory effect (increasing or decreasing the potential). As observed in biological networks, neurons are either strictly inhibitory (all outgoing edge weights are negative) or excitatory. As we will see, this restriction can significantly affect the power of these networks.

A key feature of our model is stochasticity – each neuron is a probabilistic threshold unit, spiking with probability given by applying a sigmoid function to its potential. While a rich literature focuses on deterministic circuits [MP69, HT86] we employ a stochastic model as it is widely accepted that neural computation is stochastic [AS94, SN94, FSW08].

##### Computational Problems in SNNs

We consider an -bit binary input vector , which represents the firing status of a set of input neurons. Given a (possibly multi-valued) function , we seek to design a network of spiking neurons that converges to an output vector (or any if is multi-valued) as quickly as possible using few auxiliary (non-input or output) neurons.

The number of auxiliary neurons used corresponds to the “node complexity” of the network [HH94]. Designing circuits with small node complexity has received a lot of attention – e.g., the work of [FSS84] on PARITY and [All89] on . Much less is known, however, on what is achievable in spiking neural networks. For most of the problems we study, there is a trivial solution that uses auxiliary neurons for inputs of size . Hence, we primarily focus on designing sublinear size networks – with auxiliary neurons for some .

##### Past Work: WTA

Recently, we studied the ‘winner-take-all’ (WTA) leader election task in SNNs [LMP17]. Given a set of firing input neurons, the network is required to converge to a single firing output – corresponding to the ‘winning’ input. In that work, we critically leveraged the noisy behavior of our spiking neuron model: randomness is key in breaking the symmetry between initially identical firing inputs.

##### This Paper: Similarity Testing and Compression

In this paper, we study the role of randomness in a different setting: for similarity testing and compression. Consider the basic similarity testing problem: given , we wish to distinguish the case when from the case when the Hamming distance between the vectors is large – i.e., for some parameter . This problem can be solved very efficiently using randomness – it suffices to sample indices and compare and at these positions to distinguish the two cases with high probability. Beyond similarity testing, similar compression approaches using random input subsampling or hashing can lead to very efficient routines for a number of data processing tasks.

### 1.1 A Neuro-RAM Unit

To implement the randomized similarity testing approach described above, and to serve as a foundation for other random compression methods in spiking networks, we design a basic indexing module, or random access memory, which we call a neuro-RAM. This module solves:

###### Definition 1 (Indexing).

Given and which is interpreted as an integer in , the indexing problem is to output the value of the bit of 111Here, and throughout, for simplicity we assume is a power of so is an interger..

Our neuro-RAM uses a sublinear number of auxiliary neurons and solves indexing with high probability on any input. We focus on characterizing the trade-off between the convergence time and network size of the neuro-RAM, giving nearly matching upper and lower bounds.

Generally, our results show that a compressed representation (e.g., the index ) can be used to access a much larger datastore (e.g., ), using a very compact neural network. While binary indexing is not very ‘neural’ we can imagine similar ideas extending to more natural coding schemes used, for example, for memory retrieval, scent recognition, or other tasks.

##### Relation to Prior Work

Significant work has employed random synaptic connections between neurons – e.g., the Johnson-Lindenstrauss compression results of [AZGMS14] and the work of Valiant [Val00]. While it is reasonable to assume that the initial synapses are random, biological mechanisms for changing connectivity (functional plasticity) act over relatively large time frames and cannot provide a new random sample of the network for each new input. In contrast, stochastic spiking neurons do provide fresh randomness to each computation. In general, transforming of a network with possible random edges to a network with fixed edges and stochastic neurons requires auxiliary neurons and thus fails to fulfill our sublinearity goal, as there is typically at least one possible outgoing edge from each input. Our neuro-RAM can be thought of as improving the naive simulation – by reading a random entry of an input, we simulate a random edge from the specified neuron. Beyond similarity testing, we outline how our result can be used to implement Johnson-Lindenstrauss compression similar to [AZGMS14] without assuming random connectivity.

### 1.2 Our Contributions

#### 1.2.1 Efficient Neuro-RAM Unit

Our primary upper bound result is the following:

###### Theorem 2 (t-round Neuro-RAM).

For every integer , there is a (recurrent) SNN with auxiliary neurons that solves the indexing problem in rounds with high probability. In particular, there exists a neuro-RAM unit that contains auxiliary neurons and solves the indexing problem in rounds.

Above, and throughout the paper ‘with high probability’ or w.h.p. to denotes with probability at least for some constant . Theorem 2 is proven in Section 3.

##### Neuro-RAM Construction

The main idea is to first ‘encode’ the firing pattern of the input neurons into the potentials of neurons. These encoding neurons will spike with some probability dependent on their potential. However, simply recording the firing rates of the neurons to estimate this probability is too inefficient. Instead, we use a ‘successive decoding strategy’, in which the firing rates of the encoding neurons are estimated at finer and finer levels of approximation, and adjusted through recurrent excitation or inhibition as decoding progresses. The strategy converges in rounds – the smaller is the more information is contained in the potential of a single neuron, and the longer decoding takes.

Theorem 2 shows a significant separation between our networks and traditional feedforward circuits where significantly sublinear sized indexing units are not possible.

###### Fact 3 (See Lower Bounds in [Koi96]).

A circuit solving the indexing problem that consists of AND/OR gates connected in a feedforward manner requires gates. A feedforward circuit using linear threshold gates requires gates.

We note, however, that our indexing mechanism does not exploit the randomness of the spiking neurons, and in fact can also be implemented with deterministic linear threshold gates. Thus, the separation between Theorem 2 and Fact 3 is entirely due to the recurrent (non-feedforward) layout of our network. Since any recurrent network using neurons and converging in rounds can be ‘unrolled’ into a feedforward circuit using neurons, Fact 3 shows that the tradeoff between network size and runtime in Theorem 2 is optimal up to a factor, as long as we use our spiking neurons in a way that can also be implemented with deterministic threshold gates. However, it does not rule out improvements using more sophisticated randomized strategies.

#### 1.2.2 Lower Bound for Neuro-RAM in Spiking Networks

Surprisingly, we are able to show that despite the restricted way in which we use our spiking neuron model, significant improvements are not possible:

###### Theorem 4 (Lower Bound for Neuro-RAM in SNNs).

Any SNN that solves indexing in rounds with high probability in our model must use at least auxiliary neurons.

Theorem 4, whose proof is in Section 4, shows that the tradeoff in Theorem 2 is within a factor of optimal. It matches the lower bound of Fact 3 for deterministic threshold gates up to a factor, showing that there is not a significant difference in the power of stochastic neurons and deterministic gates in solving indexing.

##### Reduction from SNNs to Deterministic Circuits

We first argue that the output distribution of any SNN is identical to the output distribution of an algorithm that first chooses a deterministic threshold circuit from some distribution and then applies it to the input. This is a powerful observation as it lets us apply Yao’s principle: an SNN lower bound can be shown via a lower bound for deterministic circuits on any input distribution [Yao77].

##### Deterministic Circuit Lower Bound via VC Dimension

We next show that any deterministic circuit that succeeds with high probability on the uniform input distribution cannot be too small. The bound is via a VC dimension-based argument, which extends the work of [Koi96] on indexing circuits. As far as we are aware, we are the first to give a VC dimension-based lower bound for probabilistic and biologically plausible network architectures and we hope our work significantly expands the toolkit for proving lower bounds in this area. In contrast to our lower bounds on the WTA problem [LMP17], which rely on indistinguishability arguments based on network structure, our new techniques allow us to give more general bounds for any network architecture.

##### Separation of Network Models

Aside from showing that randomness does not give significant advantages in constructing a neuro-RAM (contrasting with its importance in WTA and similarity testing), our proof of Theorem 4 establishes a separation between feedforward spiking networks and deterministic sigmoidal circuits. Our neurons spike with probability computed as a sigmoid of their membrane potential. In sigmoidal circuits, neurons output real numbers, equivalent to our spiking probabilities. A neuro-RAM can be implemented very efficiently in these networks:

###### Fact 5 (See [Koi96], along with [Maa97] for similar bounds).

There is a feedforward sigmoidal circuit solving the indexing problem using gates.222Note that [MSS91] shows that general deterministic sigmoidal circuits can be simulated by our spiking model. However, the simulation blows up the size of the circuit size by , giving auxiliary neurons.

In contrast, via an unrolling argument, the proof of Theorem 4 shows that any feedforward spiking network requires gates to solve indexing with high probability.

It has been shown that feedforward sigmoidal circuits can significantly outperform standard feedforward linear threshold circuits [MSS91, Koi96]. However, previously it was not known that restricting gates to spike with a sigmoid probability function rather than output the real value of this function significantly affected their power. Our lower bound, along with Fact 5, shows that in some cases it does. This separation highlights the importance of modeling spiking neuron behavior in understanding complexity tradeoffs in neural computation.

#### 1.2.3 Applications to Randomized Similarity Testing and Compression

As discussed, our neuro-RAM is widely applicable to algorithms that require random sampling of inputs. In Section 5 we discuss our main application, to similarity testing – i.e., testing if or if . It is easy to implement an exact equality tester using auxiliary neurons. Alternatively, one can solve exact equality with three auxiliary neurons using mixed positive and negative edge weights for the outgoing edges of inputs. However this is not biologically plausible – neurons typically have either all positive (excitatory) or all negative (inhibitory) outgoing edges, a restriction included in our model. Designing sublinear sized exact equality testers under this restriction seems difficult – simulating the three neuron solution requires at least auxiliary neurons – for each input.

By relaxing to similarity testing and applying our neuro-RAM, we can achieve sublinear sized networks. We can use neuro-RAMs, each with auxiliary neurons to check equality at random positions of and distinguishing if or if with high probability. This is the first sublinear solution for this problem in the spiking neural networks. In Section 5, we discuss possible additional applications of our neuro-RAM to Johnson-Lindenstrauss random compression, which amounts to multiplying the input by a sparse random matrix – a generalization of input sampling.

## 2 Computational Model and Preliminaries

### 2.1 Network Structure

We now give a formal definition of our computational model. A Spiking Neural Network (SNN) consists of input neurons , output neurons , and auxiliary neurons . The directed, weighted synaptic connections between , , and are described by the weight function . A weight indicates that a connection is not present between neurons and . Finally, for any neuron , is the activation bias – as we will see, roughly, ’s membrane potential must reach for a spike to occur with good probability.

The weight function defining the synapses in our networks is restricted in a few notable ways. The in-degree of every input neuron is zero. That is, for all and . This restriction bears in mind that the input layer might in fact be the output layer of another network and so incoming connections are avoided to allow for the composition of networks in higher level modular designs. Additionally, each neuron is either inhibitory or excitatory: if is inhibitory, then for every , and if is excitatory, then for every . All input and output neurons are excitatory.

### 2.2 Network Dynamics

An SNN evolves in discrete, synchronous rounds as a Markov chain. The firing probability of every neuron at time depends on the firing status of its neighbors at time , via a standard sigmoid function, with details given below.

For each neuron , and each time , let if fires (i.e., generates a spike) at time . Let denote the initial firing state of the neuron. Our results will specify the initial input firing states and assume that for all . For each non-input neuron and every , let denote the membrane potential at round and denote the corresponding firing probability (). These values are calculated as:

 pot(u,t)=∑v∈X∪Z∪Awv,u⋅vt−1−b(u)%andp(u,t)=11+e−pot(u,t)/λ (1)

where is a temperature parameter, which determines the steepness of the sigmoid. It is easy to see that does not affect the computational power of the network. A network can be made to work with any simply by scaling the synapse weights and biases appropriately.

For simplicity we assume throughout that . Thus by (1), if , then w.h.p. and if , w.h.p. (recall that w.h.p. denotes with probability at least for some constant ). Aside from this fact, the only other consequence of (1) we use in our network constructions is that . That is, we will use our spiking neurons entirely as random threshold gates, which fire w.h.p. when the incoming potential from their neighbors’ spikes exceeds , don’t fire w.h.p. when the potential is below , and fire randomly when the input potential equals the bias. It is an interesting open question if there are any problems which require using the full power of the sigmoidal probability function.

For any vector we let denote the value at its position, starting from . Given binary , we use to indicate the integer encoded by . That is, . Given an integer we use to denote its binary encoding, where the number of digits used in the encoding will be clear from context. We will often think of the firing pattern of a set of neurons as a binary string. If is a set of neurons then is the binary string corresponding to their firing pattern at time . Since the input is typically fixed for some number of rounds, we often just write to refer to the -bit string corresponding to the input firing pattern.

##### Boolean Circuits.

We mention that SNNs are similar to boolean circuits, which have received enormous attention in theoretical computer science. A circuit consists of gates (e.g., threshold gates, probabilistic threshold gates) connected in a directed acyclic graph. This restriction means that a circuit does not have feedback connections or self-loops, which we do use in our SNNs. While we do not work with circuits directly, for our lower bound, we show a transformation from an SNN to a linear threshold circuit. We sometimes refer to circuits as feedforward networks, indicating that their connections are cycle-free.

## 3 Neuro-RAM Network

In this section we prove our main upper bound:

###### Theorem 6 (Efficient Neuro-RAM Network).

There exists an SSN with auxiliary neurons that solves indexing in rounds. Specifically, given inputs , and , which are fixed for all rounds , the output neuron satisfies: if then w.h.p. Otherwise, if , w.h.p.

Theorem 6 easily generalizes to other network sizes, giving Theorem 2, which states the general size-time tradeoff. Here we discuss the basic construction and intuition behind our network construction. The full details and proof are given in Appendice A.1 and A.2.

We divide the input neurons X into buckets each containing neurons333Throughout we assume for simplicity that for some integer . This ensures that , , and are integers. It will be clear that if this is not the case, we can simply pad the input, which only affects our time and network size bounds by constant factors.:

 X0={x0,...,x√n−1},...,X√n−1={x(√n−1)√n,...,xn−1}.

Throughout, all our indices start from . We encode the firing pattern of each bucket via the potential of a single neuron . Set for all . In this way, for every round , the total potential contributed to by the firing of the inputs in bucket is equal to:

 √n−1∑j=0xi√n+j⋅2√n−j=2⋅dec(¯Xi). (2)

where is the reversal of and gives the decimal value of a binary string, as defined in the preliminaries. We set . We will see later why this is an appropriate value. We defer detailed discussion of the remaining connections to for now, first giving a general description of the network construction.

In addition to the encoding neurons , we have decoding neurons for ( neurons total). The idea is to select a bucket (via ) using the first bits in the index . Let and be the higher and lower order bits of respectively. It is not hard to see that using neurons we can construct a network that processes and uses it to select with . When a bucket is selected, the potential of any with is significantly depressed compared to that of and so after this selection stage, only fires.

We will then use the decoding neurons to ‘read’ each bit of the potential encoded in . The final output is selected from each of these bits using the lower order bits , which can again be done efficiently with neurons. We call this phase the decoding phase since the bucket neuron encodes the value (in decimal) of its bucket , and we need to decode from that value the bit of the appropriate neuron inside that bucket.

The decoding process works as follows: initially, will fire only if the first bit of bucket is on. Note that the weight from this bit to is and thus more than double the weight from any other input bit. Thus, by appropriately setting , we can ensure that the setting of this single bit determines if fires initially.

If the first bit is the correct bit to output (i.e. if the last bits of the index encode position ), this will trigger the output to fire. Otherwise, we iterate. If in fact fired, this triggers inhibition that cancels out the potential due to the first bit of bucket . Thus, will now only fire if the second bit of is on. If did not fire, the opposite will happen. Further excitation will be given to again ensuring that it can fire as long as the second bit of is on. The network iterates in this way, successively reading each bit, until we reach the one encoded by and the output fires. The first decoding neuron for position , , is responsible to triggering the output to fire if is the correct bit encoded by . The second decoding neuron is responsible for providing excitation when does not fire. Finally, the third decoding neuron provides inhibition when does fire.

In Appendix A.1, we describe the first stage in which we use the first index bits to select the bucket to which the desired index belongs to.

In Appendix A.2, we discuss the second phase where we use the last bits of , to select the desired index inside the bucket . Our success decoding process is synchronized by a clock mechanism, shown in Appendix A.2.1. This clock mechanism consists of chain of neurons that govern the timing of the steps of our decoding scheme. Roughly speaking, traversing the bits of the chosen bucket from left to right, we spend rounds checking if the current index is the one encoded by . If yes, we output the value at that index and if not, the clock will “tick” and we move to the next candidate. This successive decoding scheme is explained in Appendix A.2.2.

Note that our model and the proof of Theorem 6 assume that no auxiliary neurons or the output neuron fire in round . However, in applications it will often be desirable to run the Neuro-RAM for multiple inputs, with execution not necessarily starting at round . We can easily add a mechanism that ‘clears’ the network once it outputs, giving:

###### Observation 7 (Running Neuro-RAM for Multiple Inputs).

The Neuro-RAM of Theorem 6 can be made to run correctly given a sequence of multiple inputs.

## 4 Lower Bound for Neuro-RAM in Spiking Networks

In this section, we show that our neuro-RAM construction is nearly optimal. Specifically:

###### Theorem 8.

Any SNN solving indexing with probability in rounds must use auxiliary neurons.

This result matches the lower bound for deterministic threshold gates of Fact 3 up to a factor, demonstrating that the use of randomness cannot give significant runtime advantages for the indexing problem. We note that even if one just desires a constant (e.g. ) probability of success – the lower bound applies. By replicating any network with success probability , times and taking the majority output (which can be computed with just a single additional auxiliary neuron), we obtain a network that solves the problem w.h.p. We thus have:

###### Corollary 9.

Any SNN solving indexing with probability in rounds must use auxiliary neurons.

The proof of Theorem 8 proceeds in a number of steps, which we overview here.

### 4.1 High Level Approach and Intuition

##### Reduction to Deterministic Indexing Circuit.

We first observe that a network with auxiliary neurons solving the indexing problem in rounds can be unrolled into a feedforward circuit with layers and neurons per layer. We then show that the output distribution of a feedforward stochastic spiking circuit is identical to the output distribution if we first draw a deterministic linear threshold circuit (still with layers and neurons per layer) from a certain distribution, and evaluate our input using this random circuit.

This equivalence is powerful since it allows us to apply Yao’s minimax principal [Yao77]: assuming the existence of a feedforward SNN solving indexing with probability , given any distribution of the inputs , there must be some deterministic linear threshold circuit which solves indexing with probability over this distribution.

If we consider the uniform distribution over , this success probablity ensures via an averaging argument that for at least of the possible values of , succeeds for at least a fraction of the possible inputs. Note, however, that the can only take on possible values – thus this ensures that for the possible values of , succeeds for all possible values of the index . Let be the set of ‘good inputs’ for which succeeds.

##### Lower Bound for Deterministic Indexing on a Subset of Inputs.

We have now reduced our problem to giving a lower bound on the size of a deterministic linear threshold circuit which solves indexing on an arbitrary subset of inputs. We do this using VC dimension techniques inspired by the indexing lower bound of [Koi96].

The key idea is to observe that if we fix some input , then given , evaluates the function , whose truth table is given by . Thus can be viewed as a circuit for evaluating any function for , where the inputs are ‘programmable parameters’, which effectively change the thresholds of some gates.

It can be shown that the VC dimension of the class of functions computable by a fixed a linear threshold circuit with gates and variable thresholds is . Thus for a circuit with layers and gates per layer, the VC dimension is [BH89]. Further, as a consequence of Sauer’s Lemma [Sau72, She72, AB09], defining the class of functions , since , we have . These two VC dimension bounds, in combination with the fact that we know can compute any function in if its input bits are fixed appropriately, imply that . Rearranging gives , completing Theorem 8.

### 4.2 Reduction to Deterministic Indexing Circuit

We now give the argument explained above in detail, first describing how any SNN that solves indexing w.h.p. implies the existence of a deterministic feedforward linear threshold circuit which solves indexing for a large fraction of possible inputs .

###### Lemma 10 (Conversion to Feedforward Network).

Consider any SNN with auxiliary neurons, which given input that is fixed for rounds , has output satisfying . Then there is a feedforward SNN (an SNN whose directed edges form an acyclic graph) with auxiliary neurons also satisfying when given which is fixed for rounds .

###### Proof.

Let – all non-input neurons. We simply produce duplicates of each auxiliary neuron and of , which are split into layers . For each incoming edge from a neuron to and each we add an identical edge from to . Any incoming edges from input neurons to are added to each for all . Finally connect to the appropriate neurons in (which may include if there is a self-loop in ).

In round , the joint distribution of the spikes in is identical to the distribution of in since these neurons have identical incoming connections from the inputs, and since any incoming connections from other auxiliary neurons are not triggered in since none of these neurons fire at time .

Assuming via induction that is identically distributed to , since only has incoming connections from and the inputs which are fixed, then the distribution of identical to that of . Thus is identically distributed to , and since the output in is only connected to its distribution is the same in round as in . ∎

###### Lemma 11 (Conversion to Distribution over Deterministic Threshold Circuits).

Consider any spiking sigmoidal network with auxiliary neurons, which given input that is fixed for rounds , has output neuron satisfying . Then there is a distribution over feedforward deterministic threshold circuits with auxiliary gates that, for with output , when presented input .

###### Proof.

We start with obtained from Lemma 4.2. This circuit has layers of neurons . Given that is fixed for rounds , has , which matches the firing probability of the output in in round .

Let be a distribution on deterministic threshold circuits that have identical edge weights to . Additionally, for any (non-input) neuron , letting be the corresponding neuron in the deterministic circuit, set the bias , where is distributed according to a logistic distribution with mean and scale . The random bias is chosen independently for each . It is well known that the cumulative density function of this distribution is equal to the sigmoid function. That is:

 Pr[η≤x] =11+e−x−b(u)λ. (3)

Consider and any neuron in the first layer of . only has incoming edges from the input neurons . Thus, its corresponding neuron in also only has incoming edges from the input neurons. Let . Then we have:

 PrD[¯u1=1]=Pr[W−η≥0] =Pr[η≥W] (Deterministic threshold) =11+e−W−b(u)λ (Logistic distribution CDF (3)) =Pr[u1=1]. (Spiking sigmoid dynamics (1))

Let denote the neurons in corresponding to those in . Since in round , all neurons in fire independently and since all neurons in fire independently as their random biases are chosen independently, the joint firing distribution of is identical to that of .

By induction assume that is identically distributed (over the random choice of deterministic network ) to . Then for any we have by the same argument as above, conditioning on some fixed firing pattern of in round :

 PrD[¯ui+1=1|¯Bii=V]=Pr[ui+1=1|Bii=V].

Conditioned on , the neurons in fire independently in round . So do the neurons of due to their independent choices of random biases. Thus, the above implies that the distribution of conditioned on is identical to the distribution of . This holds for all , so, the full joint distribution of is identical to that of .

We conclude by noting that the same argument applies for the outputs of and since is identically distributed to . ∎

Lemma 11 is simple but powerful – it demonstrates the following:

{mdframed}

[hidealllines=false,backgroundcolor=gray!30] The output distribution of a spiking sigmoid network is identical to the output distribution of a deterministic feedforward threshold circuit drawn from some distribution .

Thus, the performance of any spiking sigmoid network is equivalent to the performance of a randomized algorithm which first selects a linear threshold circuit using and then applies this circuit to the input. This equivalence allows us to apply Yao’s minimax principal:

###### Lemma 12 (Application of Yao’s Principal).

Assume there exists an SNN with auxiliary neurons, which given any inputs and which are fixed for rounds , solves indexing with probability in rounds. Then there exists a feedforward deterministic linear threshold circuit with auxiliary gates which solves indexing with probability given drawn uniformly at random.

###### Proof.

This follows from Yao’s principal [Yao77]. In short, given drawn uniformly at random, solves indexing with probability (since by assumption, it succeeds with this probability for any ). By Lemma 11, performs identically to an algorithm which selects a deterministic circuit from some distribution and then applies it to the input. So at least one circuit in the support of must succeed with probability on drawn uniformly at random, since the success probability of on the uniform distribution is just an average over the deterministic success probabilities. ∎

From Lemma 12 we have a corollary which concludes our reduction from our spiking sigmoid lower bound to a lower bound on deterministic indexing circuits.

###### Corollary 13 (Reduction to Deterministic Indexing on a Subset of Inputs).

Assume there exists an SNN with auxiliary neurons, which, given inputs and which are fixed for rounds , solves indexing with probability in rounds. Then there exists some subset of inputs with and a feedforward deterministic linear threshold circuit with auxiliary gates which solves indexing given any and any index .

###### Proof.

Applying Lemma 12 yields which solves indexing on uniformly random with probability . Let if solves indexing correctly on and otherwise. Then:

 1−12n ≤1n⋅2n∑X∈{0,1}n∑Y∈{0,1}lognI(X,Y)=EX uniform from {0,1}n⎡⎢⎣1n∑Y∈{0,1}lognI(X,Y)⎤⎥⎦

which in turn implies:

 EX uniform from {0,1}n⎡⎢⎣1n∑Y∈{0,1}logn(1−I(X,Y))⎤⎥⎦≤12n. (4)

If then just by the fact that the sum is an integer. Thus, for (4) to hold, we must have for at least of the inputs . That is, solves indexing for every input index on some subset with . ∎

### 4.3 Lower Bound for Deterministic Indexing on a Subset of Inputs

With Corollary 13 in place, we now turn to lower bounding the size of a deterministic linear threshold circuit which solves the indexing problem on some subset of inputs with . To do this, we employ VC dimension techniques first introduced for bounding the size of linear threshold circuits computing indexing on all inputs [Koi96].

Consider fixing some input , such that the output of is just a function of the index . Specifically, with fixed, computes the function whose truth table is given by . Note that the output of with fixed is equivalent to the output of a feedforward linear threshold circuit where each gate with an incoming edge from has its threshold adjusting to reflect the weight of this edge if .

We define two sets of functions. Let be all functions computable using some as defined above. Further, let be the set of all functions computabled by any circuit which is generated by removing the input gates of and adjusting the threshold on each remaining gate to reflect the effects of any inputs with . We have and hence, letting denote the VC dimension of a set of functions have: We can now apply two results. The first gives a lower bound :

###### Lemma 14 (Corollary 3.8 of [Ab09] – Consequence of Sauer’s Lemma [Sau72, She72]).

For any set of boolean functions with :

 VC(H)≥log|H|logn+loge.

We next upper bound . We have the following, whose proof is in Appendix B:

###### Lemma 15 (Linear Threshold Circuit VC Bound).

Let be the set of all functions computed by a fixed feedforward linear threshold circuit with gates (i.e. fixed edges and weights), where each gate has a variable threshold. Then:

Applying the bounds of Lemmas 14 and 15 along with gives:

###### Lemma 16 (Deterministic Circuit Lower Bound).

For any set with , any feedforward deterministic linear threshold circuit with non-input gates which solves indexing given any and any index must have

###### Proof.

Let and be as defined in the beginning of the section. We have . At the same time, by Lemma 14 we have

for some fixed constant . By Lemma 15 we have We thus can conclude that , and so

We conclude by proving our main lower bound:

###### Proof of Theorem 8.

The existence of a spiking sigmoidal network with auxiliary neurons, solving indexing with probability in rounds implies via Corollary 13 the existence of a feedforward deterministic linear threshold circuit with non-input gates solving indexing on some subset of inputs with . Thus by Lemma 16 we must have . ∎

## 5 Applications to Similarity Testing and Compression

### 5.1 Similarity Testing

###### Theorem 17 (Similarity Testing).

There exists an SNN with auxilary neurons that solves the approximate equality testing problem in rounds. Specifically, given inputs which are fixed for all rounds , the output satisfies w.h.p. if . Further if then w.h.p.

Our similarity testing network will use copies of our Neuro-RAM from Theorem 6, labeled and for all . The idea will be to employ auxiliary neurons whose values encode a random index . By feeding the inputs and into and , we can check whether and match at position . Checking different random indices suffices that identify if w.h.p. Additionally, if , they will never differ at any of the checks, and so the output will never be triggered. We use the following:

###### Observation 18.

Consider with . Let be chosen independently and uniformly at random in . Then for ,

 Pr[(X1)it=(X2)it for all t∈1,...,T]≤1nc.
###### Proof.

For any fixed , as we select indices at random. Additionally, each of these events is independent since are chosen independently so:

#### 5.1.1 Implementation Sketch

It is clear that the above strategy can be implemented in the spiking sigmoidal network model – we sketch the construction here. By Theorem 6, we require auxiliary neurons for the neuro-RAMs employed, which dominates all other costs.

It suffices to present a random index to each pair of neuro-RAMs an for rounds (the number of rounds required for the network of Theorem 6 to process an -bit input). To implement this strategy, we need two simple mechanisms, described below.

##### Random Index Generation:

For each of the index neurons in we set and add a self-loop . In round , since they have no-inputs, each neuron has potential and fires with probability . Thus, represents a random index in . To propagate this index we can use a single auxiliary inhibitory neuron , which has bias and for every input neuron . Thus, fires w.h.p. in round and continues firing in all later rounds, as long as at least one input fires.

We add an inhibitory edge from to for all with weight . The inhibitory edges from will keep the random index ‘locked’ in place. The inhibitory weight of prevents any without an active self-loop from firing w.h.p. but allows any with an active self-loop to fire w.h.p. since it will still have potential .

If both inputs are , will not fire w.h.p. However, here our network can just output since so it does not matter if the random indices stay fixed.

##### Comparing Outputs:

We next handle comparing the outputs of and to perform equality checking. We use two auxiliary neurons – and . is excitatory and fires w.h.p. as long as long as at least one of or has an active output. is an inhibitor that fires only if both and have active outputs. We then connect to our output with weight and connect with weight for all . We set . In this way, fires in round w.h.p. if for some , exactly one of or has an active output in round and hence an inequality is detected. Otherwise, does not fire w.h.p. This behavior gives the output condition of Theorem 17.

### 5.2 Randomized Compression

We conclude by discussing informally how our neuro-RAM can be applied beyond similarity testing to other randomized compression schemes. Consider the setting where we are given input vectors . Let denote the matrix of all inputs. Think of as being a large ambient dimension, which we would like to reduce before further processing.

One popular technique is Johnson-Lindenstrauss (JL) random projection, where is multiplied by a random matrix with to give the compressed dataset . Regardless of the initial dimension , if is set large enough, preserves significant information about . is enough to preserve the distances between all points w.h.p. [KN14], is enough to use for approximate -means clustering or -rank approximation [BZD10, CEM15], and preserves the full covariance matrix of the input and so can be used for approximate regression and many other problems [CW13, Sar06].

JL projection has been suggested as a method for neural dimensionality reduction [AZGMS14, GS12], where is viewed as a matrix of random synapse weights, which connect the input neurons representing to the output neurons representing . While this view is quite natural, we often want to draw with fresh randomness for each input . This is not possible using changing synapse weights, which evolve over a relatively long time scale. Fortunately, it is possible to simulate these random connections using our neuro-RAM module.

Typically, is sparse so that it can be multiplied by efficiently. In one of the most efficient constructions [CW13], it has just a single nonzero entry in each row which is chosen randomly to be and placed in a uniform random position in the row. Thus, computing a single bit of requires selecting on average random columns of , multiplying their entries by a random sign and summing them together. This can be done with a set of neuro-RAMS, each using auxiliary neurons which select the random columns of . In total we will needs networks – the maximum column sparsity of with high probability, yielding auxiliary neurons total. In contrast, a naive simulation of random edges using spiking neurons would require auxiliary neurons, which is less efficient whenever . Additionally, our neuro-RAMs can be reused to compute multiple entries of , which is not the case for the naive simulation.

Traditionally, the value of an entry of is a real number, which cannot be directly represented in a spiking neural network. In our construction, the value of the entry is encoded in its potential, and we leave as an interesting open question how this potential should be decoded or otherwise used in downstream applications of the compression.

### Acknowledgments

We are grateful to Mohsen Ghaffari. Some of the ideas of this paper came up while visiting him at ETH. We would like to thank Sergio Rajsbaum, Ron Rothblum and Nir Shavit for helpful discussions.

## References

• [AB09] Martin Anthony and Peter L Bartlett. Neural network learning: Theoretical foundations. Cambridge University Press, 2009.
• [All89] Eric Allender. A note on the power of threshold circuits. In FOCS, 1989.
• [AS94] Christina Allen and Charles F Stevens. An evaluation of causes for unreliability of synaptic transmission. PNAS, 1994.
• [AZGMS14] Zeyuan Allen-Zhu, Rati Gelashvili, Silvio Micali, and Nir Shavit. Sparse sign-consistent Johnson–Lindenstrauss matrices: Compression with neuroscience-based constraints. PNAS, 2014.
• [BH89] Eric B Baum and David Haussler. What size net gives valid generalization? In NIPS, 1989.
• [BZD10] Christos Boutsidis, Anastasios Zouzias, and Petros Drineas. Random projections for -means clustering. In NIPS, 2010.
• [CEM15] Michael B Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu. Dimensionality reduction for k-means clustering and low rank approximation. In STOC, 2015.
• [CW13] Kenneth L Clarkson and David P Woodruff. Low rank approximation and regression in input sparsity time. In STOC, 2013.
• [FSS84] Merrick Furst, James B Saxe, and Michael Sipser. Parity, circuits, and the polynomial-time hierarchy. Theory of Computing Systems, 17(1):13–27, 1984.
• [FSW08] A Aldo Faisal, Luc PJ Selen, and Daniel M Wolpert. Noise in the nervous system. Nature Reviews Neuroscience, 9(4):292–303, 2008.
• [GK02] Wulfram Gerstner and Werner M Kistler. Spiking neuron models: Single neurons, populations, plasticity. Cambridge University Press, 2002.
• [GS12] Surya Ganguli and Haim Sompolinsky. Compressed sensing, sparsity, and dimensionality in neuronal information processing and data analysis. Annual Review of Neuroscience, 2012.
• [HH94] Bill G Horne and Don R Hush. On the node complexity of neural networks. Neural Networks, 7(9):1413–1426, 1994.
• [HT86] John J Hopfield, David W Tank, et al. Computing with neural circuits- a model. Science, 233(4764):625–633, 1986.
• [Izh04] Eugene M Izhikevich. Which model to use for cortical spiking neurons? IEEE Transactions on Neural Networks, 15(5):1063–1070, 2004.
• [KN14] Daniel M Kane and Jelani Nelson. Sparser Johnson-Lindenstrauss transforms. JACM, 2014.
• [Koi96] Pascal Koiran. VC dimension in circuit complexity. In CCC, 1996.
• [LMP17] Nancy Lynch, Cameron Musco, and Merav Parter. Computational tradeoffs in biological neural networks: Self-stabilizing winner-take-all networks. In ITCS, 2017.
• [Maa96] Wolfgang Maass. On the computational power of noisy spiking neurons. In NIPS, pages 211–217, 1996.
• [Maa97] Wolfgang Maass. Networks of spiking neurons: the third generation of neural network models. Neural Networks, 10(9):1659–1671, 1997.
• [MP69] Marvin Minsky and Seymour Papert. Perceptrons. 1969.
• [MSS91] Wolfgang Maass, Georg Schnitger, and Eduardo D Sontag. On the computational power of sigmoid versus boolean threshold circuits. In FOCS, 1991.
• [Sar06] Tamas Sarlos. Improved approximation algorithms for large matrices via random projections. In FOCS, 2006.
• [Sau72] Norbert Sauer. On the density of families of sets. Journal of Combinatorial Theory, Series A, 13(1):145–147, 1972.
• [She72] Saharon Shelah. A combinatorial problem; stability and order for models and theories in infinitary languages. Pacific Journal of Mathematics, 41(1):247–261, 1972.
• [SN94] Michael N Shadlen and William T Newsome. Noise, neural codes and cortical organization. Current Opinion in Neurobiology, 4(4):569–579, 1994.
• [Val00] Leslie G Valiant. Circuits of the Mind. Oxford University Press on Demand, 2000.
• [Yao77] Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of complexity. In FOCS, 1977.

## Appendix A Missing Details for Implementing Neuro-RAM

### a.1 First Stage: Bucket Selection

To implement the bucket selection stage, we connect the neurons in to each such that if the potential of is increased significantly, and if the potential of remains the same. By setting this potential increase to a very large value and the bias to a correspondingly very large value, will not fire with high probability unless . This selection phase can be implemented with auxilary neurons.

Specifically, for each , we have two neurons , connected as follows:

• is an excitatory neuron with and . In this way, if , and so w.h.p.

• is an inhibitory neuron with and . So again, if , and so w.h.p.

The behavior of and can be summarized as:

###### Lemma 19.

For any , if then w.h.p. .

For each , we then have an auxiliary excitatory neuron . Connected as follows:

• if and otherwise.

• if and otherwise.

• .

• .

We have the following lemma:

###### Lemma 20.

For any , if then w.h.p. and for all .

###### Proof.

The connections to and and Lemma 19 ensure that if , w.h.p.

 pot(gi,t+1) =−b(gj)+logn/2−1∑j=0[w(yj,1,gi)⋅yt+1j,1+w(yj,1′,gi)⋅yt+1j,1′] =1−2∥bin(i)∥1+2logn/2−1∑j=0[bin(i)j⋅ytj+(bin(i)j−1)⋅ytj] =1−2∥bin(i)∥1+2logn/2−1∑j=0[bin(i)2j+(bin(i)j−1)⋅bin(i)j] =1−2∥bin(i)∥1+2∥bin(i)∥1=1.

Thus, if , w.h.p. Similarly, if then for any