A Basic Compositional Model for Spiking Neural Networks

# A Basic Compositional Model for Spiking Neural Networks

Nancy Lynch
MIT
lynch@csail.mit.edu
Cameron Musco
Microsoft Research
camusco@microsoft.com

## 1 Introduction

This paper is part of a project on developing an algorithmic theory of brain networks, based on stochastic Spiking Neural Network (SNN) models. Inspired by tasks that seem to be solved in actual brains, we are defining abstract problems to be solved by these networks. In our work so far, we have developed models and algorithms for the Winner-Take-All problem from computational neuroscience [LMP17a, Mus18], and problems of similarity detection and neural coding [LMP17b]. We plan to consider many other problems and networks, including both static networks and networks that learn.

This paper is about basic theory for the stochastic SNN model. In particular, we define a simple version of the model. This version assumes that the neurons’ only state is a Boolean, indicating whether the neuron is firing or not. In later work, we plan to develop variants of the model with more elaborate state; we expect that our results should extend to these variants as well, but this remains to be worked out. We also define an external behavior notion for SNNs, which can be used for stating requirements to be satisfied by the networks.

We then define a composition operator for SNNs. We prove that our external behavior notion is “compositional”, in the sense that the external behavior of a composed network depends only on the external behaviors of the component networks. We also define a hiding operator that reclassifies some output behavior of an SNN as internal. We give basic results for hiding.

Finally, we give a formal definition of a problem to be solved by an SNN, and give basic results showing how composition and hiding of networks affect the problems that they solve. We illustrate our definitions with three examples: building a circuit out of gates, building an “Attention” network out of a “Winner-Take-All” network and a “Filter” network, and a toy example involving combining two networks in a cyclic fashion.

## 2 The Model

For our model definitions, we first specify the structure of our networks. Then we describe how the networks execute; this involves defining individual (non-probabilistic) executions and then defining probabilistic behavior. Next we define the external behavior of a network. Finally, we give two examples: a Boolean circuit and a Winner-Take-All network.

### 2.1 Network structure

Assume a universal set of neuron names. A firing pattern for a set of neuron names is a mapping from to . Here, represents “firing” and represents “not firing”.

A neural network consists of:

• , a subset of , partitioned into input neurons , output neurons , and internal (auxiliary) neurons . We sometimes write as shorthand for , and as shorthand for . (Here, stands for “locally controlled”)..

Each neuron has an associated bias, ; this can be any real number, positive, negative, or 0.

• , a set of directed edges between neurons. We permit self-loops.

Each edge has a weight, , which is a nonzero real number.

• , an initial firing pattern for the set of non-input neurons.

We assume that input neurons have no incoming edges, not even self-loops. Output neurons may have incoming or outgoing edges, or both.

### 2.2 Executions and probabilistic executions

#### 2.2.1 Executions and traces

A configuration of a neural network is a firing pattern for , the set of all the neurons in the network. We consider several related definitions:

• An input configuration is a firing pattern for the input neurons, . An output configuration is a firing pattern for the output neurons, . An internal configuration is a firing pattern for the internal neurons, .

• A non-input configuration is a firing pattern for the internal and output neurons, .

• An external configuration is a firing pattern for the input and output neurons, .

We define projections of configurations onto subsets of . Thus, if is a configuration and , then is the firing pattern for obtained by projecting onto the neurons in . In particular, we have for the projection of on the input neurons, for the output neurons, for the internal neurons, for the external neurons, and for the non-input neurons.

An initial configuration is a configuration such that . The values for the input neurons are arbitrary.

An execution of is a (finite or infinite) sequence of configurations, where is an initial configuration. The length of a finite execution , , is defined to be . The length of an infinite execution is defined to be .

We define projections of executions: If is an execution of and , then is the sequence . We define an -execution of to be for any execution of . Note that an -execution restricts the initial firing states of only the non-input neurons that are in , that is, the neurons in . We define an input execution to be an -execution where , and similarly for an output execution, an internal execution, an external execution, and a locally-controlled execution (or lc-execution) .

For an execution , we sometimes write to denote , the projection of on the external neurons. We define a trace of to the the trace of any execution of .

If is any finite -execution, for , then we define to be the set of executions of such that is a prefix of . This means that can have any firing states for the neurons that are not in , except for the initial states of neurons in , which are determined by . We will often consider the special case where , i.e., where is a trace of .

###### Lemma 1

Let and be finite executions of .

1. If neither nor is an extension of the other, then and are disjoint.

2. If is an extension of , then .

#### 2.2.2 Probabilistic executions

We define a unique “probabilistic execution” for any particular infinite input execution . Formally, such a probabilistic execution is a probability distribution on the sample space of infinite executions of the network such that ; we say that such executions are consistent with . Note that all of these executions have the same initial configuration, call it . This is constructed from the element of and the initial non-input firing pattern for the network, .

The -algebra of measurable sets is generated from the “cones”, each of which is the set of infinite executions that extend a particular finite execution. Formally, if is a finite execution such that is a prefix of , then the “cone” of is simply , as defined earlier. The other measurable sets in the -algebra are obtained by starting from these cones and closing under countable union, countable intersection, and complement.

Now we define the probabilities for the measurable sets. We start by explicitly defining the probabilities for the cones, . Based on these, we can derive the probabilities of the other measurable sets in a unique way, using general measure extension theorems. Segala presents a similar construction for probabilistic executions in his PhD thesis, Chapter 4 [Seg95].

We compute the probabilities recursively based on the length of (which is here always assumed to be consistent with ):

1. is of length .
Then consists of just the initial configuration ; define .

2. is of length , .
Let be the length- prefix of . We determine the probability of extending to . Then the probability is simply .

Let be the final configuration of and the final configuration of . Then for each neuron separately, use and the weights of ’s incoming edges to compute the potential and then the firing probability for neuron . In more detail: For each , we first calculate a potential, , defined as

 ∑(v,u)∈EC′(v)weight(v,u)−bias(u).

We then convert to a firing probability using the standard sigmoid function:

 pu=11+e−potu/λ,

where is a positive real number “temperature” parameter.111 We assume a standard sigmoid function. However, the results of this paper don’t appear to depend much on the precise function definition. Different functions could be used, subject to some basic constraints. 222 We will try to generalize our model to include other state besides just firing status. For example, we might remember history of firing, or history of incoming potentials. We combine all those probabilities to compute the probability of generating from : for each such that , use the calculated probability , and for each for which , use . The product

 ∏u∈Nlc:C(u)=1pu×∏u∈Nlc:C(u)=0(1−pu)

is the probability of generating from , which is the needed probability of extending to .

We will often consider conditional probabilities of the form . Because we use a sigmoid function, we know that cannot be , and so this conditional probability is actually defined.333 One useful property of our sigmoid functions is that the probabilities are never exactly or , which makes it unnecessary to worry about -probability sets when conditioning. We have to be careful to retain this property if we consider different functions.

From now on in this subsection, we assume a particular and . The following lemma follows immediately from Lemma 1.

###### Lemma 2

Let and be finite executions of that are consistent with .

1. If neither nor is an extension of the other, then .

2. If is an extension of , then .

So we can easily compute the conditional probabilities from the absolute probabilities. Conversely, we can easily compute the absolute probabilities from the conditional ones, by unwinding the recursive definition above:

###### Lemma 3

Let be a length- execution of , . Let , be the successive prefixes of (so that ). Then

 P(A(α))=P(A(α1)|A(α0))×P(A(α2)|A(α1))⋯×P(A(αt)|A(αt−1)).

Notice in the above expression, we did not start with a term for . This is not needed because we are considering only traces in which is obtained from and the initial assignment . So is determined, and .

Since we can compute the conditional and absolute probabilities from each other, either can be used to characterize the probabilistic execution.

##### Tree representation:

The probabilistic execution for can be visualized as an infinite tree of configurations, where the tree nodes at level represent the configurations that might occur at time (with the given input execution). The configuration at the root of the tree is the initial configuration . Each infinite branch of the tree represents an infinite execution of the network, and finite initial portions of branches represent finite executions. If is a finite branch in the tree, then we can associate the probability with the node at the end of the branch; this is simply the probability of reaching the node during probabilistic operation of the network, using the inputs from .

#### 2.2.3 Probabilistic traces

Now we define a unique “probabilistic trace” for any particular infinite input execution . Formally, such a probabilistic trace is a probability distribution on the sample space of infinite traces of the network such that . All of these traces have the same initial configuration, constructed from the element of and the initial output firing pattern for the network, .

The basic measurable sets are the sets of traces that extend a particular finite trace. For a particular finite trace , we define

 B(β)={trace(α):β is a prefix of trace(α)}

To define probabilities for the sets , we rely on the probabilistic execution for . If is a finite trace of , then has already been defined. Then define the probability of to be simply .

The following lemma expands the probability in terms of probabilities for the relevant executions.

###### Lemma 4

If is a finite trace of , then

 A(β)=⋃α:trace(α)=βA(α), and P(A(β))=∑α:trace(α)=βP(A(α)).

The next lemma describes conditional probabilities for one-step extensions:

###### Lemma 5

Let be a finite execution of of length that is consistent with . Suppose is its one-step prefix. Let , and . Then , , and are also consistent with ,444 As before, this means that their projections on are prefixes of . and

1. , and .

2. , and .

3. , and .

4. , and .

5. , and .

###### Lemma 6

Let , , , and be as in Lemma 5. Then

 P(A(α)|A(β))=P(A(α)|A(β′))×P(A(β′))P(A(β))=P(A(α)|A(β′))P(A(β)|A(β′)).
• By Lemma 5.

The next lemma gives some simple equivalent formulations of a one-step extension of traces, by unwinding definitions in terms of executions.

###### Lemma 7

Suppose that is a finite trace of length that is consistent with . Suppose that is the length- prefix of . Then is equal to all of the following:

1. .

2. .

3. .

4. .

5. .

6. .

We can also give a lemma about repeated conditioning, as for probabilistic executions:

###### Lemma 8

Let be a length- trace of , . Let , , be the successive prefixes of (so that ). Then

 P(A(β))=P(A(β1)|A(β0))×P(A(β2)|A(β1)⋯×P(A(βt)|A(βt−1)).

As before, in the above expression, we did not use a separate term for . This is not needed because we are considering only traces in which is obtained from and the initial assignment . So is determined, and .

We will need some other easy facts about executions and traces, for example:

###### Lemma 9

Let be a finite execution of of length , that is consistent with . Let be the one-step prefix of and . Then .

• By Lemma 5, we see that

 P(A(α)|A(β′))=P(A(α))P(A(β′))=P(A(α))P(A(α′))×P(A(α′))P(A(β′))=P(A(α)|A(α′))×P(A(α′)|A(β′)).

###### Lemma 10
1. Suppose that is a finite execution of that is consistent with . Then .

2. Suppose that is a finite trace of that is consistent with . Then .

• Since the input execution is already fixed at , the probability for is just the probability for the projection of on the non-input neurons. Similarly for .

### 2.3 External behavior of a network

So far we have talked about individual probabilistic traces, which depend on a fixed input execution. Now we define the external behavior of a network, to capture its visible behavior for all possible inputs. Later in the paper, in Section 5, we will show that our notion of external behavior is compositional, which implies that the external behavior of is unabiguously determined by the external behavior of and the external behavior of .

##### Behavior Definition:

Our definition of external behavior is based on the entire collection of probabilities for the cones of all finite traces. Namely, the external behavior is the mapping that maps each infinite input execution of to the collection of probabilities , where is a finite trace of that is consistent with .555 Formally, this collection is a mapping from finite traces to probabilities , but the use of two mappings here may look slightly confusing.

Other definitions of external behavior might be possible. Any such definition would have to assign some “behavior object” to each network. In general, we define two external behavior notions and to be equivalent provided that the following holds. Suppose that and are two networks with the same input neurons and the same output neurons. Then if and only if .

In this paper, we find it useful to define a second, “auxiliary” external behavior notion, based on one-step conditional probabilities. This will be useful in our proofs for compositionality.

##### Auxiliary Behavior Definition:

is the mapping that maps each infinite input execution of to the collection of conditional probabilities , where is a finite trace of with length that is consistent with , and is the one-step prefix of .

###### Lemma 11

The two behavior notions and are equivalent.

• Suppose that and are two networks with the same input neurons and the same output neurons. We show that and are equivalent by arguing two directions separately:

1. If then . This follows because the conditional probability is determined as a function of the unconditional probabilities and ; see Lemma 5, Part 5.

2. If then . This follows because the unconditional probability is determined as a function of the conditional probabilities, see Lemma 8.

### 2.4 Examples

In this subsection we give two fundamental examples to illustrate our definitions so far.

#### 2.4.1 Simple Boolean gate networks

Figure 1 depicts the structure of simple SNNs that represent and-gates, or-gates, and not-gates. For completeness, we also include an SNN representing the identity computation.

We describe the operation of each of these types of networks, in turn. Fix a value for the temperature parameter of the sigmoid function. Fix an error probability , . Assume for each case below that the initial firing status for the non-input neurons is .

Throughout this section, we use the abbreviation for the quantity .

##### Identity network:

This has one input neuron and one output neuron, connected by an edge with weight . The output neuron has bias . We define and . Then we have

 eb/λ=1−δδ=1δ−1.

With these settings, we get potential and firing probability when the input firing state is , and potential and firing probability when the input firing state is . More precisely, consider just the input firing state at time . Whether it is or , the probability that the output firing state at time is the same is exactly .

Our model also describes what happens with an arbitrary infinite input firing sequence, not just the initial inputs. Let be an arbitrary infinite firing sequence for the input neuron.

Let be a trace of length that is consistent with . Suppose further that, for every , , the output status at time in is equal to the input status at time . Then by repeated use of the argument above, we get that .

Now suppose that is a length trace as above. Suppose that the output firing status at time in is equal to the input status at time , but the output status values for all earlier times is arbitrary. Suppose that is the one-step prefix of . Then we can show that . It follows that, for every time , the probability that the output at time is equal to the input at time is . This uses the law of Total Probability, over all the possible length output firing sequences.

##### k-input And network:

This has input neurons and one output neuron. Each input neuron is connected to the output neuron by an edge with weight . The output neuron has bias . The Identity network is a special case of this network, where .

The idea here is to treat this as a threshold problem, and set and so that being over or under the threshold gives value or , respectively, in each case with probability at least . For a -input And network, the output neuron should fire with probability at least if all input neurons fire, and with probability at most if at most input neurons fire.

The settings for and generalize those for the Identity network. Namely, define and . When all input neurons fire, the potential is , and (expanding and plugging into the sigmoid function), the firing probability is just . When input neurons fire, the potential is , and the firing probability is just . If fewer than fire, the potential and the firing probability are smaller. Similar claims about multi-round computations to what we argued for the Identity network also hold for the And network.

##### k-input Or network:

This has the same structure as the -input And network. The -input Or network also generalizes the Identity network, which is the same as the -input Or network. Now the output neuron should fire with probability at least if one or more of the input neurons fire, and with probability at most if no input neurons fire. This time we set and . When one input neuron fires, the potential is and the firing probability is . If more than one fire, then the firing probability is even greater. When no input neurons fire, the potential is , and the firing probability is . Again, similar claims about multi-round computations hold for the Or network.

##### Not network:

This network has one input, one output, and one internal neuron, which acts as an inhibitor for the output neuron.666We generally classify neurons into two categories: excitatory neurons, all of whose outgoing edges have positive weights, and inhibitory neurons, whose outgoing edges have negative weights. However, this classification is not needed for the results in this paper. The network contains two edges, one from the input neuron to the internal neuron with weight , and one from the internal neuron to the output neuron with weight . The internal neuron has bias and the output neuron has bias .

The assembly consisting of the input and internal neurons acts like the identity gate, with settings of and as before: and . So, for example, if we consider just the input firing state at time . the probability that the internal neuron’s firing state at time is the same is exactly .

Let , the bias of the output neuron, be , and let , the weight of the outgoing edge of the inhibitor, be . Then if the inhibitor fires at time , the output fires at time with probability , and if the inhibitor does not fire at time , the output fires at time with probability . This yields probability of correct inhibition, which then yields probabiity at least that the output at time gives the correct answer for the Not-network. Similar claims about multi-round computations also hold for the Not network, except that the Not network has a delay of instead of .

#### 2.4.2 Winner-Take-All circuits

This example is a simple Winner-Take-All network for inputs and corresponding outputs. It is based on a network presented in [LMP17a] and Chapter 5 of [Mus18]. Assume that some nonempty subset of the input neurons fire, in a stable manner. The output firing behavior is supposed to converge to a configuration in which exactly one of the outputs corresponding to the firing inputs fires. We would like this convergence to occur quickly, in some fairly short time . And we would like the resulting configuration to remain stable for a fairly long time . Figure 2 depicts the structure of the network.

So fix to be an infinite input firing sequence, in which all the input configurations are the same, and at least one input neuron is firing. Let be the resulting probabilistic execution. In [LMP17a, Mus18] we prove that, for certain values of and , the probability of convergence within time to an output configuration that remains stable for time is at least .

The formal theorem statement is as follows. Here, is the weighting factor used in the biases and edge weights in the network, is a bound on the failure probability, and and are particular small constants.

###### Theorem 12

Assume . Then starting from any configuration, with probability , the network converges, within time , to a single firing output corresponding to a firing input, and remains stable for time . and are universla constants, independent of ,, and .

The proof appears in [Mus18], based on work in [LMP17a]. The basic idea is that, when more than one output is firing, both inhibitors are triggered to fire. When they both fire, they cause each firing output to continue firing with probability . This serves to reduce the number of firing outputs at a predictable rate. Once only a single output fires, only one inhibitor continues to fire; its effect is sufficient to prevent other non-firing outputs from beginning to fire, but not sufficient to stop the firing output from firing. All this, of course, is probabilistic.

Noting that the network is symmetric with respect to the outputs. Therefore, we can refine the theorem above to assert that, for any particular output neuron that corresponds to a firing input neuron , the probability that is the eventual firing output neuron is at least .

## 3 Composition

In this section, we define composition of networks. We focus on composing two networks, but the ideas should extend easily to any finite number of networks.

### 3.1 Composition of two networks

Networks that are composed must satisfy some basic compatibility requirements. These are analogous to those used for I/O automata and similar models. Namely, two networks and are said to be compatible provided that:

1. No internal neuron of is a neuron of .

2. No internal neuron of is a neuron of .

3. No neuron is an output neuron of both and .

On the other hand, we are allowed to have common input neurons, and also output neurons of one of the networks that are also input neurons of the other.777 In the brain setting, common input neurons for two different networks seem to make sense: the same neuron might have two separate sets of outgoing edges (synapses), leading to different neurons in the two different networks. 888 We can prove from these requirements that and cannot have any edge in common. For if they had a common edge, then it would have to have the same source neuron and the same target neuron in both sub-networks. Since the target neuron is shared, it would have to be an input neuron of at least one of the networks. But then that network would then have an edge leading to one of its input neurons, which is forbidden by our network definition. Assuming and are compatible, we define their composition as follows:

• , the set of neurons of , is the union of and , which are the sets of neurons of the two respective sub-networks. Note that common neurons are inserted only once. Each neuron inherits its bias from its original sub-network. This definition of bias is unambiguous: If a neuron belongs to both sub-networks, it must be an input of at least one of them, and input neurons do not have biases.

Thus, when an input of one sub-network is combined with an output of the other sub-network, the resulting neuron acquires the bias from its output “precursor”.

• , the set of edges, is defined as follows. If is an edge from neuron to neuron in either or , then we include also in . Each edge inherits its weight from its original sub-network. This definition of weight is unambiguous, since, as noted earlier, cannot be an edge of both sub-networks.

Thus, if the source neuron is an input of both sub-networks, then it has edges in to all the nodes to which its “precursors” have edges in the two sub-networks. If is an output of and an input of , then in , it has all the incoming and outgoing edges it has in as well as the outgoing edges it has in .

On the other hand, the target neuron cannot be an input of both networks since it has an incoming edge in one of them. So must be an output of one, say , and an input of the other, say . Then in , has all the incoming and outgoing edges it had in as well as the outgoing edges it has in .

• , the initial non-input firing pattern of , gets inherited directly from the two sub-networks’ initial non-input firing patterns. Since the two sub-networks have no non-input neurons in common, this is well-defined.

In the composed network, the neurons retain their classification as input/output/internal, except that a neuron that is an input of one sub-network and output of the other gets classified as an output neuron of .

The probabilistic executions and probabilistic traces of the new network are defined as usual. In Sections 4 and 5, we show how to relate these to the probabilistic executions and probabilistic traces of and .

Here are some basic lemmas analogous to those for general probabilistic executions and traces: For these lemmas, fix and a particular input execution of , which yields a particular probabilistic execution .

###### Lemma 13

Let be a finite execution of of length that is consistent with . Suppose that is its one-step prefix. Let and . Let . Let , , , and . Then

1. , and .

2. , and .

3. , and .

4. , and .

5. , and .

###### Lemma 14

Let , , , and be as in Lemma 13. Then

 P(A(αj)|A(βj))=P(A(αj)|A(β′j))×P(A(β′j))P(A(βj))=P(A(αj)|A(β′j))P(A(βj)|A(β′j)).
###### Lemma 15

Let be a finite execution of of length , that is consistent with . Let be the one-step prefix of and . Let . Then .

• We have that

 P(A(α⌈Njlc)|A(β′⌈Nj))=P((A(α⌈Njlc)∩A(β′⌈Nj))|A(β′⌈Nj)),

by basic conditional probability, which is equal to

 P((A(α⌈Njlc)∩A(α′⌈Nj))|A(β′⌈Nj)),

because already fixes all the firing patterns for neurons in . This last expression is equal to

 P((A(α⌈Njlc)∩A(α′⌈Nj)))P(A(β′⌈Nj)),

which is equal to

 P((A(α⌈Njlc)∩A(α′⌈Nj)))P(A(α′⌈Nj))×P(A(α⌈Nj))P(A(β′⌈Nj)).

This last expression is equal to , as needed.

### 3.2 A special case: acyclic composition

An important special case of composition is acyclic composition, in which outputs of are not inputs to . That is, may have inputs only from the outside world, and its outputs can go to , , and the outside world. may have inputs from the outside world and from , and its outputs go just to and the outside world. Formally, the definition of acyclic composition is the same as the general definition of composition, except for the additional restriction that .

### 3.3 Examples

Here we give three examples. The first two represent acyclic composition, and the third is a toy example that involves cycles.

#### 3.3.1 Boolean circuits

Figure 3 contains a circuit which is a composition of four Boolean gate circuits of the types described in Section 2.4.1: two And networks, one Or network, and a Not network. We compose these networks into a larger network that is intended to compute an Xor function.

In terms of binary composition operator, we can compose the four networks in three steps, as follows:

1. Compose one of the And networks and the Not network to get a network with 2 inputs, 2 outputs, and 1 internal neuron, by identifying the output neuron of the And network with the input neuron of the Not network. Note that the composed network has two outputs because the And gate remains an output—the composition operator does not reclassify it as an internal neuron. The composed network is intended to compute the Nand of the two inputs (as well as the And).

2. Compose the network produced in Step 1 with the Or network to get a 2-input 3-output, 1-internal network, by identifying the the corresponding inputs in the two networks. The resulting network has outputs corresponding to the Nand and the Or of the two inputs (as well as the And).

3. Finally, compose the Nand network and the Or network with the second And network, by identifying the Nand output neuron and the Or output neuron with the two input neurons for the And network. The resulting network has an output corresponding to the Xor of the two original inputs (as well as outputs for the first And, the Nand, and the Or).

To state a simple guarantee for this composed circuit, let’s assume that the inputs fire consistently, in an unchanged firing pattern. Then, working from the previously-shown guarantees of the individual networks, we can say that the probability that the final output neuron produces its required Xor value at time is at least . We will say more about this later, in Section 4.2.

#### 3.3.2 Attention using WTA

Figure 4 depicts the composition of our WTA network from Section 2.4.2 with a -input output Filter network. The Filter network is, in turn, a composition of disjoint And gates. The composition is acyclic since information can flow from WTA to Filter but not vice versa.

The Filter network is designed to fire any of its outputs right after the corresponding input fires, provided that its input (which is an output of the WTA) also fires. In this way, the WTA network is used to select particular outputs for the Filter network to fire—those that are “reinforced” by the inputs from the WTA.

When the WTA and Filter networks are composed, and the WTA inputs fire stably, with at least one input firing, the WTA network should soon stabilize as we described in Section 2.4.2, to a configuration with a single firing output , which is equally likely to be any of the outputs. That configuration should persist for a fairly long time. The detailed bounds are given in Theorem 12. After the WTA stabilizes, it reinforces only a particular input for the Filter. From that point on, the Filter’s outputs should mirror its inputs, and no other outputs should fire. The probability of such mirroring should be at least , if denotes the failure probability for an And gate. (Recall the definition of from Example 2.4.2.) In this way, the composition can be viewed as an “Attention” circuit, which pays attention to just a single input stream.

Note that the composed network behaves on two different time scales: the WTA takes some time to converge, but after that, the responses to the selected intput stream will be essentially immediate.

#### 3.3.3 Cyclic composition

In this section we give a toy example, consisting of two networks that affect each other’s behavior in a simple way. Throughout this section, we use the abbreviation for the quantity , just as we did in Section 2.4.1.

Figure 5 shows a network with one input neuron , one output neuron , and one internal neuron . It has edges from to , from to , and from to itself (a self-loop). The biases of and are and the weights on all edges are .

Network behaves so that, at any time , the firing probability for the internal neuron is exactly if fires at time , and is exactly if does not fire at time . This is as for the Identity network in Section 2.4.1. The firing probability of the output neuron is:

• , if neither nor fires at time .

• , if exactly one of and fires at time .

• if both and fire at time .

Thus, if input fires, output will be likely to fire times later (with probability at least ). Without any additional input firing, the firing of is sustained only by the self-loop, which means that the firing probability decreases steadily over time, by a factor of at each time. Eventually, the firing should “die out”.

Network is similar, replacing , , and by , , and , respectively. However, we omit the self-loop edge on . The biases are and the weights on the two edges are .

Network behaves so that, at any time , the firing probability for the internal neuron is exactly if fires at time , and is exactly if does not fire at time . Likewise, the firing probability for the output neuron is exactly if fires at time and if does not fire. Thus, if input fires, then output will be likely to fire times later (with probability at least ). However, the firing of is not sustained.

Now consider the composition of and , identifying the output of with the input of , and the output of with the input of . The behavior of the composition depends on the starting firing pattern. Let us suppose that both and do not fire initially; we consider the behavior for the various starting firing patterns for and . We assume that is “sufficiently small”.

First, if neither nor fires at time , then with “high probability”, none of the four neurons will fire for a long time. If one or both of and fire at time , then with “high probability”, they will trigger all the neurons to fire and continue to fire for a long time. We give some details in Section 5.4.1.

### 3.4 Compositionality definitions

We have defined a specific external behavior notion for our networks. We have also allowed the possibility of other external behavior notions. Here we define compositionality for general behavior notions. Later in the paper, in Section 5.3, we will show that our particular behavior notion is compositional.

In general, we define an external behavior notion to be compositional provided that the following holds: Consider any four networks , , , and , where and have the same sets of input and output neurons, and have the same sets of input and output neurons, and are compatible, and and are compatible. Suppose that and . Then .

We show that, in general, if two external behavior definitions are equivalent and one is compositional, then so is the other. This will provide us with a method that will be helpful in Section 5.3 for showing compositionality.

###### Theorem 16

If and are two equivalent external behavior notions for stochastic SNNs and is compositional, then also is compositional.

• Suppose that and are two external behavior notions and is compositional. We show that is compositional. For this, consider any four networks , , , and , where and have the same sets of input and output neurons, and have the same sets of input and output neurons, and are compatible, and and are compatible. Suppose that and . We must show that .

Since and are equivalent and , we have that . Likewise, since , we have that . Since is assumed to be compositional, this implies that . Then since and are equivalent, we get that , as needed.

###### Lemma 17

An external behavior notion is compositional if and only if, for all compatible pairs of networks and , is determined by and .

• Straightforward.

## 4 Theorems for Acyclic Composition

Our general composition results appear in Section 5. Those are a bit complicated, mainly because of the possibility of connections in both directions between the sub-networks. Acyclic composition is a very important special case of general composition, in fact, most interesting examples seem to satisfy the acyclic restriction. Since the results for this case are much simpler, we present those first.

For this section, fix the notation , and assume that we have no edges from to , that is, that .

In this section, and from now on in the paper, we will mostly avoid writing the cone notation . Thus, instead of , we will write just . We hope this does not cause much confusion.

### 4.1 Compositionality

We have not formally defined “compositionality” for the special case of acyclic composition. So instead of proving “compositionality” here, we will simply show how to express in terms of and .999 We could, presumably, define compositionality for this special case as before but based on a modified definition of compatibility—one that includes the extra acyclic condition. Then we could give a characterization similar to Lemma 17 and use our result to show this version of compositionality. We leave this for later.

Specifically, we fix any particular input execution of , which generates a particular probabilistic execution of . Then we consider an arbitrary finite trace of that is consistent with . We show how to express in terms of probability distributions and that are generated by and , respectively, from certain input executions.

We begin by deriving a simple expression for , for an arbitrary finite trace of that is consistent with .

###### Lemma 18

Let be a finite trace of