# Parallel computation using active self-assembly^{†}^{†}thanks:
mpchen@caltech.edu, dorx@alumni.caltech.edu, woods@caltech.edu. Supported by National Science Foundation grants CCF-1219274, 0832824 (The Molecular Programming Project), and CCF-1162589.
Preliminary version appeared at The 19th International Conference on DNA Computing and Molecular Programming (DNA 19).

###### Abstract

We study the computational complexity of the recently proposed nubot model of molecular-scale self-assembly. The model generalises asynchronous cellular automata to have non-local movement where large assemblies of molecules can be pushed and pulled around, analogous to millions of molecular motors in animal muscle effecting the rapid movement of macroscale arms and legs. We show that the nubot model is capable of simulating Boolean circuits of polylogarithmic depth and polynomial size, in only polylogarithmic expected time. In computational complexity terms, we show that any problem from the complexity class NC is solvable in polylogarithmic expected time and polynomial workspace using nubots.

Along the way, we give fast parallel nubot algorithms for a number of problems including line growth, sorting, Boolean matrix multiplication and space-bounded Turing machine simulation, all using a constant number of nubot states (monomer types). Circuit depth is a well-studied notion of parallel time, and our result implies that the nubot model is a highly parallel model of computation in a formal sense. Asynchronous cellular automata are not capable of this parallelism, and our result shows that adding a rigid-body movement primitive to such a model, to get the nubot model, drastically increases parallel processing abilities.

## 1 Introduction

We study the theory of molecular self-assembly, working within the recently-introduced nubot model by Woods, Chen, Goodfriend, Dabby, Winfree and Yin [51]. Do we really need another new model of self-assembly? Consider the biological process of embryonic development: a single cell growing into an organism of astounding complexity. Throughout this active, fast and robust process there is growth and movement. For example, at an early stage in the development of the fruit fly Drosophila, the embryo contains approximately 6,000 large cells arranged on its ellipsoid-shaped surface. Then, in just four minutes, the embryo rapidly changes shape to become invaginated, creating a large structure that becomes the mesoderm, and ultimately muscle. How does this fast rearrangement occur? A large fraction of these cells undergo a rapid, synchronised and highly parallel rearrangement of their internal structure where, in each cell, one end of the cell contracts and the other end expands. This is achieved by a mechanism that seems to crucially involve thousands of molecular-scale myosin motors pulling and pushing the cellular cytoskeleton to quickly effect this rearrangement [30]. At an abstract level one can imagine this as being analogous to how millions of molecular motors in a muscle, each taking a tiny step but acting in a highly parallel fashion, effect rapid long-distance muscle contraction. This rapid parallel movement, combined with the constraint of a fixed cellular volume, as well as variations in the elasticity properties of the cell membrane, can explain this key step in embryonic morphogenesis. Indeed, molecular motors that together, in parallel, produce macro-scale movement are a ubiquitous phenomenon in biology.

We wish to understand, at a high level of abstraction, the ultimate limitations and capabilities of such molecular scale rearrangement and growth. We do this by studying the computational power of a theoretical model that includes these capabilities. As a first step towards such understanding, we show in this paper that large numbers of tiny motors (that can each pull or push a tiny amount) coupled with local state changes on a grid, are sufficient to quickly solve inherently parallelisable problems. This result, described formally below in Section 1.2, demonstrates that the nubot model is a highly parallel computer in a computational complexity-theoretic sense.

Another motivation, and potential test-bed for our theoretical model and results, is the fabrication of active molecular-scale structures. Examples include DNA-based walkers, DNA origami that reconfigure, and simple structures called molecular motors [53] that transition between a small number of discrete states (see [51] for references). In these systems the interplay between structure and dynamics leads to behaviours and capabilities that are not seen in static structures, nor in other unstructured but active, well-mixed chemical reaction network type systems. Our theoretical results here, and those in [51], provide a sound basis to motivate the experimental investigation of large-scale active DNA nanostructures.

There are a number of theoretical models of molecular-scale algorithmic self-assembly processes [39]. For example, the abstract Tile Assembly Model, where individual square DNA tiles attach to a growing assembly lattice one at a time [48, 42, 16], the two-handed (hierarchical) model where large multi-tile assemblies come together [1, 7, 11, 14], and the signal tile model where DNA origami tiles that form an “active” lattice with DNA strand displacement signals running along them [23, 36, 37]. Other models enable one to program tile geometry [12, 18], temperature [1, 25, 46], concentration [5, 8, 15, 26], mixing stages [11, 13] and connectivity/flexibility [24].

The well-studied abstract Tile Assembly Model [48] is an asynchronous, and nondeterministic, cellular automaton with the restriction that state changes are irreversible and happen only along a crystal-like growth frontier. The nubot model is a generalisation of an asynchronous and nondeterministic cellular automaton, where the generalisation is that we have a non-local movement primitive. Since the nubot model is intended to be a model of molecular-scale phenomena it ignores friction and gravity, allows for the creation/destruction of monomers (we assume an invisible “fuel” source) and has a notion of random uncontrolled motion (called agitation, but not used in this paper). Instances of the model evolve as continuous time Markov processes, and time is modelled as in stochastic chemical kinetics [19, 45]. The nubot style of rigid-body movement is analogous to that seen in reconfigurable robotics [6, 43, 32], and indeed results in these robotics models show that non-local movement can be used to effect fast global reconfiguration [4, 3, 41]. The nubot model includes features seen in cellular automata, Lindenmayer systems [40] and graph grammars [28]. See [51] for a more detailed comparison of the similarities and differences with these models.

### 1.1 Previous work on active self-assembly with movement

Previous work on the nubot model [51] showed that it is capable of building large shapes and patterns exponentially quickly: e.g. lines and squares in time logarithmic in their size. The same paper goes on to describe a general scheme to build arbitrary computable (connected, 2D) size- shapes in time and number of monomer states (types) that are polylogarithmic in , plus the time and states required for Turing machine simulation due to the inherent algorithmic complexity of the shape. Furthermore, 2D patterns with at most coloured pixels, where the colour choice for each pixel is computable in time (i.e. polynomial in the length of the binary description of pixel indices), are nubot-computable in time and number of monomer types polylogarithmic in [51]. The latter result is achieved without going outside the pattern boundary and in a completely asynchronous fashion. These results show that the nubot model is capable of parallelism not seen in many other models of self-assembly. The goal of the present paper is to characterise the kind of parallelism seen in the nubot model by formally relating it to the computational complexity of classical decision problems.

Dabby and Chen [10] study a 1D model, where monomers insert between, and push apart, other monomers. Their model is closely related to a 1D restriction of the nubot model without state changes, and they build length lines in expected time and monomer types. They also show that the set of 1D polymers produced by any instance of their model is a context-free language, and give a design for implementation with DNA molecules. Malchik and Winslow [29] first show that any context-free language can be expressed as an instance of this model, and then give an asymptotically tight bound of on the length of polymers produced using monomer types (in merely expected time), thus characterising two aspects of the model.

### 1.2 Main result

In the nubot model a program is specified as a finite set of nubot rules and is said to decide a language if, beginning with a word encoded as a sequence of “binary monomers”, the system eventually reaches a configuration containing exactly the 1 monomer if , and 0 otherwise.
Let denote the (well-known) class of problems solved by uniform polylogarithmic depth and polynomial size Boolean circuits.^{1}^{1}1, or Nick’s class, is named after Nicholas Pippenger. Our main result is stated as follows.

###### Theorem 1.

For each language , there is a set of nubot rules that decides in polylogarithmic expected time, constant number of monomer states, and polynomial space in the input string length. Moreover, for , is contained in the class of languages decided by nubots running in expected time, monomer states, and polynomial space in input length .

problems are solved by circuits of shallow depth, hence they can be thought of as those problems that can be solved on a highly parallel architecture (simply run each layer of the circuit on a bunch of parallel processors, after polylogarithmic parallel steps we are done). is contained in —problems solved by polynomial time Turing machines—and this follows from the fact that circuits are of polynomial size. Problems in , and the analogous function class, include sorting, Boolean matrix multiplication, various kinds of maze solving and graph reachability, and integer addition, multiplication and division. Besides its circuit depth definition, has been characterised by a large number of other parallel models of computation including parallel random access machines, vector machines, and optical computers [20, 52, 49]. It is widely conjectured, but unproven, that is strictly contained in . In particular, problems complete for (such as Turing machine and cellular automata [35] prediction, context-free grammar membership and many others [20]) are believed to be “inherently sequential”—it is conjectured that these problems are not solvable by parallel computers that run for polylogarithmic time on a polynomial number of processors [20, 9].

Thus our main result gives a formal sense in which the nubot model is highly parallel: for any highly parallelisable () problem our proof gives a nubot algorithm to efficiently solve in it in only polylogarithmic expected time and constant states. This stands in contrast to sequential machines like Turing machines, that cannot read all of an -bit input string in polylogarithmic time, and “somewhat parallel” models like cellular automata and the abstract Tile Assembly Model, which can not have all of bits influence a single bit decision in polylogarithmic time [27]. Thus, adding a movement primitive to an asynchronous non-deterministic cellular automation, as in the nubot model, drastically increases its parallel processing abilities.

We finish this discussion on a technical remark. Previous results [51] on the nubot model were of the form: for each there is a set of nubot rules (i.e. the number of rules is a function of ) to carry out some task parameterised by (examples: quickly grow a line of length or an square, or grow some complicated computable pattern or shape whose size is parameterised by , etc.). For each problem in our main result here gives a single set of rules (i.e. of constant size), that works for all problem instances.

### 1.3 Overview of results and paper structure

Section 1 contains the statement of our main result, the overall proof structure and some future work directions. Section 2 gives the full definition of the nubots model and relevant complexity classes. Section 3 serves as an introduction to the nubots model by giving a simple nubots algorithm to double the length of a length- line in expected time, we suggest the reader begins there.

#### 1.3.1 New synchronization and line growth algorithms

In Section 4 we describe a fast signalling method for nubots from [51], here called shift synchronization, and give a new variant on this called lift synchronization. These signalling mechanisms are used throughout our constructions as a method to quickly send a bit, 0 or 1, distance in expected time, with the choice of 0 or 1 being encoded by the use of shift or lift synchronization respectively.

The line growth algorithm given in [51] grows a line of length in time, using monomer states and starting from a single monomer on the grid. Section 5 gives a new line-growth algorithm that completes in time, using monomer states and starting from monomers on the grid. A key feature of our algorithm is that it uses only a constant number of states. This helps us achieve our main result, which requires a single set of nubots rules that accept any word from some, possibly infinite, language: as part of our circuit simulation we need to build longer and longer lines to simulate larger and larger circuits, all with a single set of nubots rules.

#### 1.3.2 Parallel sorting, Boolean matrix multiplication & space bounded Turing machine simulation

Section 6 shows that the nubots model is capable of fast parallel sorting: numbers can be sorted in expected time polylogarthmic in . More precisely, distinct natural numbers, taken from the set when presented as unordered “strings” of binary (0 or 1) monomers on the grid, can be sorted in increasing numerical order in expected time , space , and monomer states. Our sorting routine is used throughout our main construction and is inspired by mechanisms, such as gel electrophoresis, that sort via spatial organization based on physical quantities, such as mass and charge [33].

Section 7 shows that two Boolean matrices can be multiplied in expected time, space and monomer states. This immediately implies that problems reducible to Boolean matrix multiplication, such as directed graph reachability and indeed any problem in the complexity class , of languages accepted by nondeterministic logarithmic space bounded Turing machines, can be solved in polylogarithmic expected time on nubots.

Indeed in Section 8.1 we go on to generalise this result by showing that any nondeterministic logarithmic space bounded Turing machine that computes a function (as opposed to merely deciding a language) can also be simulated in polylogarithmic space. This involves modifying the usual matrix multiplication method to keep track of the contents of the output tape of the Turing machine, and correctly reassembling the encoded tape contents on the 2D grid.

These results show that the model is capable of fast parallel solution of many problems, in particular all of those in . Recall that , so we are not done yet. Indeed these techniques form part of our more general result: polylogarithmic expected time solution of problems in via efficient simulation of uniform Boolean circuits, as described next.

#### 1.3.3 Proof overview of main result: Theorem 1

Let , in other words, is decidable by a logspace-uniform family of Boolean circuits of polylogarthmic depth and polynomial size. To prove Theorem 1, we show that for each such there exists a finite set of nubots rules that decides . being in logspace-uniform implies that there is a deterministic logarithmic space (in input size) Turing machine such that , where is a description of the unique Boolean circuit in that has input gates. Our initial nubots configuration consists of a length- line of binary nubots monomers denoted , that represents some input word (as described in Definition 3). From this we create (copy) another length- line of monomers that encode the unary string to be given as input to a nubots simulator of . The rule set includes a description of , and the system first generates a circuit by simulating the computation of on input , which produces a nubots configuration (collection of monomers in a connected component) that represents the circuit . The circuit is then simulated on input . Both of these tasks present a number of challenges.

##### Circuit Generation.

Logspace Turing machines run in at most polynomial time in their input length (otherwise they repeat a configuration), but here we wish to generate the circuit in merely polylogarithmic time. To achieve this, our simulation of works in a highly parallel fashion. This uses a number of techniques. First, in nubots, we implement the (known) trick of space-bounded Turing machine simulation by fast iterated matrix multiplication, which in turn is used to solve reachability on the directed graph of all possible configurations of the Turing machine. One of the main challenges here is to carry out matrix multiplication on the 2D grid sufficiently fast but without monomers unintentionally colliding with each other. Second, although iterated matrix multiplication is sufficient to simulate a Turing machine that decides a language, here we wish to simulate a Turing machine that computes a function. To do this, our parallel matrix multiplication algorithm keeps track of any symbols written to the output tape by both valid (reachable) and invalid (unreachable) configurations, and at the end deletes those symbols written by invalid configurations leaving the valid output symbols only. These valid output symbols are then arranged into the correct order by our fast parallel sorting routine. This results in a string of monomers that encodes the circuit . These monomers then rearrange themselves in the plane, to lay out the circuit with each row of gates layered one on top of the next as shown in Figure 1. (Note that for convenience and to save space we sometimes draw figures on a square grid, although the nubots model is formally defined on the hexagonal grid.)

##### Circuit Simulation.

As already described, the input is encoded as the binary monomers , and the entire circuit is “grown” from . The monomers now move to the first (bottom) row of the encoded circuit (Figure 1(c)) and position themselves so that each gate can “read” its 1 or 2 input bit monomers from . After each gate computes a “result” bit, layer “synchronizes” via a expected time synchronization routine.

Next, we wish to send the “result” bits from layer to layer . Circuits are not necessarily planar, so we need to handle wire crossings. We use our fast parallel sorting routine: the outputs from the first circuit layer are sorted, from left to right in increasing order, using their “to” address as a key. For example, a layer 1 result bit that is destined for gate 5 in layer 2 will be placed to the left of a layer 1 result bit that is destined for gate 6 in layer 2. Using this sorting routine, the blue “wire address” regions in the circuit (Figure 1(d)) are sorted in increasing order from left to right, then appropriately padded with empty space in between (using counters), and are passed up to the next level. Layer 1 then destroys itself. The entire circuit is simulated, level by level, from bottom to top, in this manner. After the “output gate” monomer computes its output bit it destroys itself, leaving a single monomer in state or . No more rules are applicable and so the system has halted with its answer. This completes the overview of the simulation.

This overview ignores many details. In particular the nubots model is asynchronous, that is, rule updates happen independently as discrete events in continuous time with no two events happening at the same time (as in stochastic chemical kinetics). The construction includes a large number of synchronization steps and signal passing to ensure that all parts of the construction are appropriately staged, but yet the construction is free to carry out many fast, asynchronous, parallel steps between these “sequential” synchronization steps.

### 1.4 Future work and open questions

The line growth algorithm in [51] runs in expected time , uses states and space . In Section 5 we give another line growth algorithm that runs in expected time , uses states and space . Is there a line-growth algorithm that does better than time space states ? To keep the game fair, the input should be a collection of monomers with space states .

Theorem 1 gives a lower bound on the power of the nubot model. What are the best lower and the upper bounds on the power of confluent^{2}^{2}2By confluent we mean a kind of determinism where the system (rules with the input) is assumed to always make a unique single terminal assembly. polylogarithmic expected time nubots? One challenge involves finding better Turing machine space, or circuit depth, bounds on computing multiple applications of the movable set (see Section 2) on a polynomial size (or larger) nubot grid.

Synchronization is a signalling method we use to quickly send signals in a non-local fashion. In this paper it is used extensively to compose nubot algorithms. What conditions are necessary and sufficient for composition of arbitrary nubot algorithms that do not use synchronization? Theorem 7.1 in [51] shows that a wide class of patterns can be grown without synchronization, and its proof of this gives examples of composition without synchronization. It would be interesting to formalise this notion of composition in our distributed systems without the long-range fast signalling that synchronization gives.

Agitation is a kind of undirected, or random, movement that was defined for the nubot model in [51] and is intended to model a nanoscale environment where there are uncontrolled movements and turbulent fluid flows in all directions interacting with each monomer. Is it possible to simulate nubot-style movement using agitation? As motivation, note that every self-assembled molecular-scale structure was made under conditions where agitation is a dominant source of movement! Our question asks if we can programmably exploit this random molecular motion to build structures quicker than without it.

Is the nubot model intrinsically universal? More precisely, does there exist a set of monomer rules , such that any nubot system can be simulated by “seeding” with a suitable initial configuration? The notion of intrinsic universality is giving rise to interesting characterisations, and separations, in a variety of tile assembly models [16, 17, 12, 14, 31, 21, 22], for an overview see the survey [50]. Our hope would be that intrinsic universality, with its tight notion of simulation, could be used to tease apart the power of different notions of movement (for example to understand if nubot-style movement is weaker or stronger than other notions of movement).

Other open problems and further directions can be found in [51].

## 2 The nubot model and other definitions

In this section we formally define the nubot model. Figure 2 gives an overview of the model and rules, and Figure 3 gives an example of the movement rule. An example nubot construction for “line-doubling” is given in Section 3 which may aid the reader at this point. Let .

The model uses a two-dimensional triangular grid with a coordinate system using axes and as shown in Figure 2(a). A third axis, , is defined as running through the origin and , but we use only the and coordinates to define position. The axial directions are the unit vectors along axes . A grid point has the set of six neighbours . Let be a finite set of monomer states. A nubot monomer is a pair ) where is a state and is a grid point. Two monomers on neighbouring grid points are either connected by a flexible or rigid bond, or else have no bond (called a null bond). Bonds are described in more detail below. A configuration is a finite set of monomers along with the bonds between them.

One configuration transitions to another via the application of a single rule, that acts on one or two monomers.^{3}^{3}3In reference [51] the nubot model includes “agitation”: each monomer is repeatedly subjected to random movements intended to model a nano-scale environment where there is Brownian motion, uncontrolled movements and turbulent fluid flows in all directions. Our constructions in this paper work with or without agitation, hence they are robust to random uncontrolled movements, but we choose to ignore this issue and not formally define agitation for ease of presentation. The left and right sides of the arrow respectively represent the contents of two monomer positions before and after the application of rule .
Here are monomer states where at most one of is (denotes lack of a monomer),
is the bond type between them, and is the relative position of the monomer to the monomer.
If either of or (respectively or ) is then (respectively ) is (monomers can not be bonded to empty space). The right is defined similarly, although there are some further restrictions on valid rules (involving ) described below.
A rule is only applicable in the orientation specified by , and so rules are not rotationally invariant.

A rule may involve a movement (translation), or not. First, in the case of no movement: . Thus we have a rule of the form . From above, at most one of is , hence we disallow spontaneous generation of monomers from empty space. State change ( and/or ) and bond change () occur in a straightforward way, examples are shown in Figure 2(b). If is and is not, then the rule induces the appearance of a new monomer at the empty location specified by if , or if . If one or both monomer states go from non-empty to , the rule induces the disappearance of monomer(s) at the orientation(s) given by .

For a movement rule, . Also, it must be the case that , where is Manhattan distance on the triangular grid, and . If we fix , then there are two that satisfy . A movement rule is applicable if it can be applied both (i) locally and (ii) globally, as follows.

(i) Locally, the pair of monomers should be in state , share bond and have orientation of relative to . Then, one of the two monomers is chosen nondeterministically to be the base (that remains stationary), the other is the arm (that moves). If the monomer, denoted , is chosen as the arm then moves from its current position to a new position . After this movement is the relative position of the monomer to the monomer, as illustrated in Figure 2(b). Analogously, if the monomer, , is chosen as the arm then moves from to . Again, is the relative position of the monomer to the monomer. Bonds and states may change during the movement.

(ii) Globally, the movement rule may push and/or pull other monomers, or if it can not then it is not applicable. This is formalised as follows, and an example is shown in Figure 3. Let be a unit vector. The -boundary of a set of monomers is defined to be the set of grid points outside that are unit distance in the direction from monomers in . Let be a configuration containing adjacent monomers and , and let be except that the bond between and is null in if not null in . The movable set is the smallest subset of that contains but not and can be translated by to give the set where the new configuration is such that: (a) monomer pairs in that are joined by rigid bonds have the same relative position in and , (b) monomer pairs in that are joined by flexible bonds are neighbours in , and (c) the -boundary of contains no monomers. If there is no such set, then we define . If , then the movement where is the arm (which should be translated by ) and is the base (which should not be translated) is applied as follows: (1) the movable set moves unit distance along ; (2) the states of, and the bond between, and are updated according to the rule; (3) the states of all the monomers besides and remain unchanged and pairwise bonds remain intact (although monomer positions and flexible/null bond orientations may change). If , the movement rule is inapplicable (the rule is “blocked” and in particular is prevented from translating).

A nubot system is a pair where is the initial configuration, and is the set of rules. If configuration transitions to by some rule we write . A trajectory is a finite sequence of configurations where and . A nubot system is said to assemble a target configuration if, starting from the initial configuration , every trajectory evolves to a translation of .

A nubot system evolves as a continuous time Markov process. The rate for each rule application is 1. If there are applicable transitions for a configuration (i.e. is the sum of the number of rule and agitation steps that can be applied to all monomers), then the probability of any given transition being applied is , and the time until the next transition is applied is an exponential random variable with rate (i.e. the expected time is ). The probability of a trajectory is then the product of the probabilities of each of the transitions along the trajectory, and the expected time of a trajectory is the sum of the expected times of each transition in the trajectory. Thus, is the expected time for the system to evolve from configuration to configuration , where is the set of all trajectories from to any translation of , and is the expected time for trajectory .

The complexity measure number of monomers is the maximum number of monomers that appears in any configuration. The number of states is the total number of distinct monomer states that appear in the rule set. Space is the maximum area, over the set of all reachable configurations, of the minimum area rectangle (on the triangular grid) that, up to translation, contains all monomers in the configuration.

The following lemma is used to analyse some of our constructions and was proven in [51].

###### Lemma 2 ([51]).

In a nubot system, if there are rule applications that must happen, and (1) the desired configuration is reached as soon as all rule applications happen, (2) for any specific rule application among those rule applications, there exist at most rule applications such that and for all , can be applied directly after have been applied, regardless of whether other rule applications have happened or not, (3) for some constant , then the expected time to reach the desired configuration is .

### 2.1 Nubots and decision problems

Let . denotes a finite length line segment of nubot monomers. Given a binary string , written , we let denote a line segment of nubot monomers that represent using one of two “binary” monomer states. denotes the number of monomers in . Given a line of monomers composed of line segments, the notation means segment of , and means monomer (or sometimes the bit encoded by monomer ) of segment of . We next define what it means to decide a language (or problem) using nubots.

###### Definition 3.

A finite set of nubot rules decides a language if for all there is an initial configuration consisting only of the horizontal line of monomers, where by applying the rule set , the system always eventually reaches a configuration containing only a single “answer” monomer which is in one of two states: (a) “” if , or (b) “” if . Further, from the time it first appears, the answer monomer never changes its state.

### 2.2 Boolean circuits and the class

We define a Boolean circuit to be a directed acyclic graph, where the nodes are called gates and each node has a label that is one of: input (with in-degree 0), constant 0 (in-degree 0), constant 1 (in-degree 0), (OR, in-degree or ), (AND, in-degree or ), (NOT, in-degree ). One of the gates is identified as the output gate, which has out-degree 0. The depth of a circuit is the length of the longest path from an input gate to the output gate. The size of a circuit is the number of gates it contains. Besides the output gate, all other gates have out-degree bounded by the circuit size. We work with layered circuits: gates on layer feed into gates on layer . A circuit computes a Boolean (no/yes) function on a fixed number of Boolean variables, by the inputs and constants defining the output gate value in the standard way. In order to compute functions over an arbitrary number of variables, we define (usually, infinite) families of circuits. We say that a family of circuits decides a language if for each circuit on input outputs if and if .

In a non-uniform family of circuits there is no required similarity, or relationship, between family members. In order to specify such a requirement we use a uniformity function that algorithmically specifies some similarity between members of a circuit family. Roughly speaking, a uniform circuit family is an infinite sequence of circuits with an associated function that generates members of the family and is computable within some resource bound. Here we care about logspace-uniform circuit families:

###### Definition 4 (logspace-uniform circuit family).

A circuit family is logspace-uniform, if there is a function that is computable on a deterministic logarithmic space Turing machine, and where for all , and is a description of a circuit with input gates.

Without going into details, we assume reasonable descriptions (encodings) of circuits as strings. We note that there are stricter, but more technical to state, notions of uniformity in the literature, such as AC and DLOGTIME uniformity [2, 20, 34]. We do not require anything less powerful than logspace uniformity here as our main result is a lower bound on nubots’ power, hence the more expressive the uniformity condition on circuits, the better (although most of the common circuit classes are reasonably robust under these more restrictive definitions anyway).

Define to be the class of all languages that are decided by depth, polynomial size logspace-uniform Boolean circuit families. Define , in other words is the class of languages decided by polylogarithmic depth and polynomial size logspace-uniform Boolean circuit families. Since circuits are of polynomial size, they can be simulated by polynomial time Turing machines, and so . It remains open whether this containment is strict [20]. See [47] for more on circuits.

The complexity class is the set of languages accepted by nondeterministic Turing machines that have a read-only input tape and a single worktape of length logarithmic in the input length.

## 3 Example: A nubots line doubling routine

This section describes a simple construction with the goal of familiarising the reader with the nubot model. We give an algorithm for doubling the length of a line of monomers in expected time. This algorithm is essentially a simplification of the line growth algorithm in [51], and it will be used in later sections of the paper. We first describe the algorithm then provide a proof for correctness and a time and space analysis.

We require that the input line be comprised of monomers of alternating states, i.e. every monomer in the input line is in one of two unique states with the property that no two adjacent monomers are in the same state. This property of the line is preserved at the end of the line doubling routine.

###### Lemma 5.

A length line of monomers can be doubled to length in expected time, states space.

###### Proof.

Algorithm description. The algorithm uses concurrent applications of the pair doubling subroutine (PDS) described in Figure 4. As described in more detail below, the algorithm treats the input line of monomers as a line of monomer pairs that can double in length independently of each other, for even . After the execution of the subroutine, a monomer pair is transformed into two monomer pairs in alternating states different from the original pair. This ensures that each pair of monomers in the input line can only double in length once during the course of the entire algorithm execution. Thus, the length of the input line is doubled by the end of the algorithm, which terminates when every monomer pair in the input has been doubled in length via the subroutine. For odd , the same thing happens for monomer pairs, and the rightmost monomer simply adds a single new monomer to its right.

PDS begins with a pair of monomers with states and ends with four monomers in states . Figure 4a provides an example input and output of the line doubling algorithm, where monomers are shown as left (purple), right (blue) pairs. The rules for PDS are given in Figure 4b and an example execution is shown in Figure 4c. Each monomer on the line assumes either the “left” or the “right” state: left is colored purple, right is colored blue. The initial monomers send themselves to state while inserting two new monomers to give the pattern . To achieve this, the initial pair of monomers create a “bridge” of 2 monomers on top and, by using movement and appearance rules, two new monomers are inserted. The bridge monomers are then deleted and we are left with four monomers. Throughout the execution, all monomers are connected by rigid bonds so the entire structure is connected. PDS completes in constant expected time 13 as shown in Figure 4c since there are a total of 13 rules for PDS that must be applied sequentially, as shown in Figure 4b.

PDS has the following properties: (i) during the application of its rules to an initial pair of monomers it does not interact with any monomers outside of this pair, and (ii) a left-right pair creates two adjacent left-right pairs. These properties imply that along a partially formed line, multiple subroutines can execute asynchronously and in parallel, on disjoint left-right pairs, without interfering with each other.

##### Correctness.

To demonstrate that the algorithm doubles the length of the line correctly, it is sufficient to demonstrate that the following invariant holds throughout the algorithm execution and that the algorithm terminates. Every left/right pair of monomers in states the input becomes replaced by two left/right monomer pair in states . Locally, the invariant holds from the fact that PDS takes a pair of left/right monomers in states as shown in Figure 4a.1 and outputs four monomers in states as shown in Figure 4a.2, with Figure 4c demonstrating that PDS does this correctly. Since PDS can be applied to each monomer pair independently of any other pair, adjacent concurrent applications of PDS will not block each other. To see that the algorithm terminates, we note that since the input and the output of PDS assume different states and PDS can only double monomer pairs in the input states, each pair of monomers in the original input line can undergo PDS exactly once.

##### Time and space analysis.

As shown in Figure 4c, the space complexity of PDS is . Since PDS only attaches monomers on top of the input monomers as per the rules, adjacent monomer pairs in the input of the line doubling algorithm will remain on the same axis (i.e. maintain their -coordinates on the triangle grid shown in Figure 2a). Thus, the space complexity of the line doubling algorithm is . We have established above that the expected time for PDS is . The event in which an application of PDS takes place is a Poisson process; therefore, the expected time for a single occurrence of this event to take place is , where is the total possible positions for PDS to be applied. Let be the time it takes for the line doubling algorithm to terminate on an input of length , then the expected value of is . ∎

## 4 Using synchronization to communicate quickly

In previous work [51] a fast signalling method, called synchronization, was introduced for the nubot model. Here, we use the term “shift synchronization” for this technique, and introduce another kind of synchronization called “lift synchronization”. With these two synchronization mechanisms, we can send one of two distinct messages (bits) to all monomers on a line in expected time that is merely logarithmic in the line length.

###### Lemma 6 (Communication via synchronization).

Let be a length line of monomers, where each monomer in is in one of two distinct states , with each adjacent pair distinct from each other. A bit can be communicated to all monomers on the line in expected time, monomer states and space.

###### Proof.

We first give a brief overview of shift synchronization using Figure 5, more details can be found in [51]. Each monomer on the line, in state , attaches a new synchronization monomer below itself with state and with a rigid bond. When a synchronization monomer with state senses a new horizontally adjacent neighbouring synchronization monomer it forms a rigid (horizontal) bond with this monomer. After connecting to both neighbouring synchronization monomers, the monomer removes the bond between it and its parent monomer (with state ) above.

The rightmost and leftmost synchronization monomers are treated differently. At the rightmost end of the line, the new monomer requires only one bonded neighbour (to the left) before removing its bond to its parent monomer. The leftmost synchronization monomer is called the “shift monomer”. This shift monomer attempts to push the (new) synchronization row to the right. However, by definition of the movement rule, the shift monomer can move only after all of vertical rigid bonds between the synchronization row and the original line have been removed. Also, due to the order in which bonds are formed and removed, this can only happen after the entire synchronization row has grown. At some point, we are guaranteed to get to the configuration in Figure 5(g), where the shift monomer is free to push right. After the move (Figure 5(h)), the relative position of synchronization monomers to their generating monomers has changed. Thus, the original line of monomers are free to detect that synchronization has occurred, and a 0 bit has been communicated to all of them.

To send a 1 bit we use a similar method, called lift synchronization, shown in Figure 6. In lift synchronization the synchronization row is lifted vertically down, and away, from the original line, rather than being shifted right. As with shift synchronization this can only occur after the entire synchronization row has been built and all bonds are in their final form. After the move (Figure 6(h)), the monomers on the original line detect the new empty space below, and thus detect that a 1 bit has been communicated to them.

In this way, for a line in any of the 6 rotations, it is possible to communicate a 0 or 1 bit, depending on whether shift or lift synchronization is used. The expected time to send the bit is , as (a) all new monomers are created independently and in parallel, and (b) each monomer needs only to wait on a constant number of neighbours in order to get its bond structure to the final configuration. The space and states bounds are straightforward to see. ∎

## 5 Fast line growth using states

The line growth algorithm given in [51] grows a line of length in time, using monomer states and starting from a single monomer on the grid. Here, we provide an alternative line growth algorithm that completes in time, using monomer states and starting from monomers on the grid. Although our construction is an factor slower than that in [51], it uses only states while maintaining the property that all growth is contained within an region. The latter two properties are both requirements in achieving our main theorem via the other constructions in this paper, which extensively use this line growth algorithm.

###### Problem 7 (Binary Line Growth problem).

Input: A line of monomers each in one of two binary states from , that encode the binary string in the standard way, where .

Output: A line of monomers.

###### Theorem 8 (Binary Line Growth).

There is a nubot algorithm to solve the Binary Line Growth problem in expected time , space , and with states.

###### Proof.

As described in the problem statement, the input is encoded as a line of monomers where the th monomer encodes bit of the binary string , and where encodes in the usual way. The construction proceeds iteratively: at iteration , where , bit is read from the input and if the partially grown line is increased in length by the value , otherwise the length of the line remains unchanged. The idea is described at a high-level in the algorithm in Figure 7, below we show that the integer variables in that algorithm can be implemented as lines of the corresponding integer lengths, and these can be acted upon in a way that quickly builds the length line.

##### Construction details.

During construction, the line-growing configuration is composed of three main regions. The first is the “input”, as described above; at iteration of the algorithm the least significant bit of the input is read (stored), and deleted. Then we have a working region containing two lines, respectively called the “generator” and the “mask”, each of which have length at iteration . Finally we have the “line” under construction: at iteration , the line length is given by the binary number encoded by the first bits (LSBs) of the input.

The construction begins with the rightmost of the input monomers growing a small, constant-size, hardcoded structure containing both the generator and mask, both initialised to be of length .

Figure 7 describes a (seemingly overcomplicated, but analogous to our construction) algorithm for generating the integer from a bit string . Our construction implements this algorithm, but where the integer variables “”, “”, “” are encoded in unary as lines of monomers of that length. It is straightforward to verify, via induction on , that upon input of the string , that encodes , the algorithm in Figure 7 returns the integer . Our nubot implementation of one iteration of this algorithm is shown in Figure 8. Figure 8 uses a high-level notation where lines of nubot monomers are represented as colored lines drawn on the square grid. We describe the construction by describing the main primitives it uses to implement the algorithm in Figure 8: line doubling or tripling implement multiplying by 2 or 3; synchronization implements bit communication—and thus which instructions to implement next—to all monomers; and masking implements taking differences.

##### Line doubling and tripling.

Line doubling takes a line of length and generates a line of length , as described in Section 3. Line tripling takes a line of length and generates a line of length , using a similar technique (rather than inserting 2 monomers, we insert 1, synchronize, then insert 1 again), hence we omit the details.

##### Synchronization & communicating a bit.

We use a synchronization algorithm to simultaneously switch a line of monomers into a single shared state. As described in Section 4, we have the two methods of lift and shift synchronization: we use one to communicate a 0 bit and the other to communicate a 1 bit to monomers in the generator and mask.

##### Masking.

For two lines of different lengths, , masking communicates their difference to the line of greater length . The lines are assumed to be orientated parallel, touching, and horizontal with their leftmost extent at the same position. Assume the shorter line is on top: it synchronizes (by growing a new synchronization row on top), then the longer line synchronizes (by growing a new synchronization row on bottom). Then the monomers in the longer line detect the presence or absence of monomers on the shorter line above: if there is a monomer above then the longer line monomer goes to state , if not it goes to state . See Figure 9(d)-(f) for an outline.

##### Final steps.

The final bit of the input to be read is (the MSB of a binary number is always 1) and just before reading it the line length is . Upon reading the final bit some message passing occurs (via synchronizations) to trigger the deletion of the mask and to cause the generator monomers to change state so that they are now part of the line. This latter step adds (generator length) to the line, giving the desired line length of .

##### Time, space, and states analysis.

Line doubling/tripling of a length line happens in expected time , as does synchronization. There are iterations each with a constant number of doublings/triplings and synchronizations, hence the total expected time is . The three lines (mask, generator, line) are of length and with their synchronization rows the height needed is 4, giving a space bound of . A straightforward analysis of the algorithm shows that states are sufficient. ∎

## 6 Fast parallel sorting

In this section we show how, on the nubot model, to sort binary numbers, taken from the set , in polylogarithmic expected time and a constant number of states. Our sorting algorithm is loosely inspired by the work of Murphy et al. [33] who show that physical techniques can be used to sort numbers that are represented as the magnitude of some physical quantity. They show that a variety of physical mechanisms can be thought of as an implementation of fast parallel sorting, including gel electrophoresis and chromatography (molecular weight), rainbow sort [44] (frequency), and mass spectrometry (mass to charge ratio). However, our construction needs to take care of the fact that ours is a robotic-style geometric model that needs to implement fast growth while handling blocking and other geometric constraints. A similar algorithm works for variations on this problem, such as sorting such numbers, but we omit the details of that.

We first define the distinct element sorting problem and then formally state the result.

###### Problem 9 (Distinct element sorting problem).

Input: A line of monomers, denoted , composed of contiguous line segments where for each line segment is of length
and encodes a distinct binary number from , where specifically, for all it is the case that monomer is in one of two binary states from and the end-of-segment monomer is in one of two binary end-of-segment states .

Output: A line consisting of the binary line segments sorted in increasing order of the standard lexicographical ordering of their binary sequences.

###### Theorem 10 (Distinct element sorting).

Any instance of the distinct element sorting problem is solvable on the nubot model in expected time , space , and monomer states.

###### Proof.

The general idea is as follows. For each element (encoded as a “head”) to be sorted, we grow a line of monomers (a “rod”) to length as shown in Figure 10(b). After doing so, the relative heights of the heads gives their order. We then move each head horizontally left, through a sequence of parallel merging steps, so that all heads are vertically aligned (Figure 10(c)). Finally, the heads are rotated and translated so that they lay along a vertical line as shown in Figure 10(d), in increasing order. The details are described next.

### 6.1 Sorting details: rod growth and labeling

We begin with an instance of the distinct element sorting problem, an example of which is shown in Figure 11(a).

##### Initialization.

The monomers begin in binary states as described in Definition 9. Growth begins at each of the blue heads: the head is copied and rotated down to vertical as shown in Figure 11(b). This rotation of a length line takes expected time to complete using the parallel “arm rotation” method in [51]—that is, each monomer independently rotates by one position, relative to its leftmost neighbour. After rotation (Figure 11(b)), each blue-green line independently synchronizes, then makes a copy of itself which is in turn rotated down to become one of the horizontal light-grey line segments shown in Figure 11(c). After all light-grey segments are horizontal, they bond to each other and synchronize. This entire process completes in expected time , using Lemma 2, and is dominated by the synchronization process.

##### Grow rods.

After this synchronization step, as shown in Figure 11(c), the rightmost grey line segment is copied to form a dark grey segment that is copied down to vertical in Figure 11(d). Also in Figure 11(d), and triggered by the previous synchronization, the blue-green rods, in expected time, signal the heads to disconnect from each other, and the blue-grew rods then begin “growing upwards”. This vertical growth of the rods implements a form of counting: we want the rods to grow to the height encoded by their blue head. This is carried out by using the line-growth algorithm in Section 5 which takes time (an alternative method would be to use a suitable counter, such as the one described below). After a rod has grown to the value encoded in its head, shown in Figure 11(e), the rod synchronizes, this latter step taking expected time . After expected time all rods have synchronized.

##### Label growth.

Rod growth occurs above the light-grey line. Below that line another process takes place, the purpose of which is to label each rod with its position (from right), as a binary number in purple. Here the dark-grey line (Figure 11(d), on right) grows a “padded” counter, from right to left. The result of this counter is shown in Figure 11(e) and is a rectangle where each of the purple columns, from right to left, encodes a distinct value from down to 1, with the grey regions in between being there for padding purposes only.

This counter works as follows. The counter is a modified version of the one used in Section 6.2 of [51]; the counter in [51] used states, here we use states. First note that the dark grey strip is of height , it begins counter growth by converting each of its monomers to a state that represents the bit 1, giving the binary representation of the number . Let , and we begin from the single dark grey column, applying the following procedure iteratively to each new column until . Each column copies itself to the left and in the new column the th bit is flipped. Both columns then decrement their value of , and both iterate the copy and bit-flip procedure. As is the case in [51], this process happens asynchronously and independently to all columns. After this happens we have a rectangle containing all of the purple columns. We are not done yet: we wish for the purple counter columns to align themselves with the green rods which are distance apart, as in Figure 11(e). To achieve this, another round of column insertion (i.e. counting) begins, so that between each pair of counter columns, exactly new columns are inserted (between each pair of purple columns we are implementing a counter that counts from down to ; note that the integer is available since the purple counter columns are of height ). Now the purple counter rows are exactly distance apart. When the process is complete the bottom row of the entire rectangle synchronizes, to give the structure illustrated in Figure 11(e) (although the grey rectangle extends further to the left than shown).

To give a straightforward time analysis, we assume that the copying and decrementing for an individual column happens sequentially and so takes expected time . Then in the completed counter, each counter column is the result of no more than column-copying operations, hence any monomer in the final rectangle depends on the application of rules. Applying Lemma 2 gives an expected time of . The final synchronization step costs expected time, giving a total expected time of .

##### Deletion and synchronization.

All columns of the dark grey region then delete themselves, except those that are directly below a green rod. The deletion events happen in time . The purple regions that remain are counter columns that encode distinct binary numbers. After each green rod (above) has synchronized it signals to the light-grey line. As each counter row below completes deletion, it too signals to the light-grey line. The light-grey line undergoes a lift synchronization. The system is now in the configuration shown in Figure 11(f).

##### Analysis.

As already discussed, rod growth and the subsequent synchronization of all rods takes expected time , and label growth takes expected time .

### 6.2 Sorting details: merging

Now that all rods have grown, and are labeled, we will now merge them as shown in in Figure 10(c).

##### Main idea.

Intuitively, we would like to simply shift all of the heads to the left, deleting any rods that get in the way. However, if we are not careful, rods can block each other and significantly slow down the process so that it no longer runs in time polylogarithmic in (consider the worst case, where the shortest rod is the rightmost one, and we wish to move all heads to the left). Our merging algorithm gets around this issue by merging in a pairwise fashion. Every second pair of heads merge, deleting one of the rods and then the light-grey line synchronizes. We are left with rods, each having two heads. Then every second pair of those merge, and so on for iterations. To organise the correct order of mergings, we use the purple labels, specifically their binary sequences, which are shown in Figure 11(f).

##### Merging algorithm.

The following procedure is iterated until there is exactly one rod left. For each , the th rod checks its label, if the least significant bit (LSB) of the purple label is 1 (in Figure 11(f) the LSB is on top), then rod attempts to merge with rod to its left, by “moving” its head to the left distance . This pairwise merging process is described in the caption of Figure 12. Rod , and its label (but not its head) get deleted in the process. After merging of the pair of rods is complete, rod deletes its LSB, thus shortening its purple region by 1. Rod now has two heads, and signals to the light-grey line that it is done. After all pairs have merged, we have rods, with 2 heads each. At this point the light-grey line synchronizes and the process iterates. After rounds of pairwise merging we are left with one rod, which has no label and is carrying all heads.

When a pair of rods are merging, the right rod needs to move distance to the left. In the worst case there are collisions for a pair of rods, however, these are all resolved in parallel as described in Figure 11(c). So we have heads, that each need to independently walk distance to the left which naïvely takes expected time, and by applying Lemma 2 takes expected time.

##### Final steps.

After merging is complete, the heads are on a single rod, sorted vertically upwards in increasing order of head value. The heads rearrange themselves on the rod to that they are separated by vertical distance exactly , and then rotate down into a line configuration, giving the sorted list as shown in Figure 10(d).

### 6.3 Sorting details: time, space and states analysis

The expected time to complete the various stages of sorting was given above, and is dominated by growing and synchronizing the rods, which is . For the space analysis, note that the length of the light-grey line is (giving the horizontal space bound). The rods are of height , and the purple labels are of height , giving a vertical space bound of . Hence we get a space bound of . All counters and line growth algorithms use number of states that is constant, which can be seen by a careful analysis of each part of the construction. ∎

## 7 Fast Boolean matrix multiplication

Let and be Boolean matrices. Let denote the element at row and column of , and let denote their matrix product. The following two definitions are illustrated in Figure 13 and describe our encoding of a square matrix as an addressed line of monomers.^{4}^{4}4Our choice of a 1D, rather than 2D, encoding simplifies our constructions. It would also be possible use a more direct 2D square encoding, which, it turns out, can be unfolded to and from our line encoding in expected time . We omit the details.

###### Definition 11 (Matrix element encoding).

An element of Boolean matrix is encoded in nubot monomers as a line of monomers where is a nubot monomer that encodes the bit , and and are lines of binary monomers of length that encode the numerical values and , respectively (the segments and are each terminated by delimiter monomers).

###### Definition 12 (Monomer encoded Boolean matrix).

An Boolean matrix is encoded in nubot monomers as a line of monomers of all for , ordered from left to right, first by , then by .

The main result of this section, Theorem 14, is a fast parallel algorithm for Boolean matrix multiplication.

###### Problem 13 (Monomer encoded Boolean matrix multiplication problem).

Input: Monomer encoded Boolean matrices and , that represent Boolean matrices .

Output: Monomer encoding of the Boolean matrix .

###### Theorem 14.

The monomer encoded Boolean matrix multiplication problem can be solved in expected time, space and with monomer states.

### 7.1 Parallel function evaluation in 2D

Before proving Theorem 14 we give a useful lemma that formalises a notion of nubots efficiently computing many ( here) functions in parallel, where each function acts on two length inputs. Figure 14 illustrates the proof.

###### Lemma 15 (Parallel function evaluation in 2D).

Let be any function that maps a pair of length adjacent parallel horizontal monomer lines to a length horizontal monomer line , that is , and moreover is nubot computable in expected time, space, and states. Let and be monomer lines, each composed of consecutive length monomer lines (called “line segments”). Then, given and as input, the line consisting of all for is computable using nubots in expected time, space, and states.

###### Proof of Lemma 15.

Figure 14 gives an overview of the construction. From an initial configuration with and adjacent as in Figure 14(a), rotates down to vertical (Figure 14(b)). is copied from the grey line which rotates down to horizontal as shown in Figure 14(c). In Figure 14(d.1), we duplicate each line segment , for , times, down to the grey vertical line, which acts as a barrier to stop the duplication. A one monomer horizontal gap is inserted between adjacent columns of green columns (of line segments), which triggers a vertical synchronization, shown as a vertical red line in Figure 14(d.2) of each completed green column. Next, monomer-to-monomer messages are passed, horizontally from right to left, within each green line segment to signify that monomers should change from being “vertically connected” to being “horizontally connected”. After this, the vertical red synchronization lines carry out another synchronization and then delete themselves in a way that keeps all green monomers horizontally connected. In Figure 14(d.3), each purple inserts a 1-monomer vertical gap between between it and its neighbour . After all gaps insert, the purple vertical line synchronizes, and then horizontal synchronizations happen which tell excess duplicates of to delete themselves to give the configuration in Figure 14(d.4).

Next, a duplication and deletion process occurs with line segments as shown in Figure 14(e) (similar to what we did before, but now horizontally rather than vertically). The ’s duplicate until they hit the vertical grey barrier on the right, at which point the system synchronizes. After this occurs, excess segments are deleted (using direct monomer-to-monomer message transfer as before). When this process is complete, we are at Figure 14(e.3).

Next, the duplicates of each rotate up to horizontal as shown, and the leftmost copy of deletes itself in a way that vertically “shrinks” the assembly to get Figure 14(f). During this process we make grey synchronization rows, also shown in Figure 14(f). From Figure 14(f) to Figure 14(g), is computed on each of these line segments (independently and in parallel), and by the lemma hypotheses this can be done in the allotted space. The horizontal red lines synchronize, and then the vertical red line synchronizes. After this occurs, we can delete the grey synchronization rows and unfold the result into a line, which is of length , to get the final configuration in Figure 14(h), using the technique shown in Figure 15.

##### Space, state and time analysis.

By stepping through the construction (and Figure 14), it is straightforward to check that the entire construction is contained within space , and uses states.

For the time analysis, we first observe that rotation, and copying, of a length line can each be done in expected time via a straightforward analysis [51]. Steps (b), and (c) of Figure 14 involve rotations and copying of lines of length : this completes in expected time . The duplication processes of green and purple segments in Figures 14(d) and (e) take expected time. Each application of takes expected time , and we apply it independently in parallel times, hence via Lemma 2, all complete in merely expected time. There are a number of other places where independent processes, each with expected time , take place (deletions in Figures 14(d.4) and (e.3), and rotations in (f)), and by Lemma 2, they take expected time . In each of Figures 14(d.2), (e.2) and (g) there are lines, each of length that need to be synchronized. For example, in Figure 14(d.2), synchronization for each red vertical single line takes expected time , and since we must wait until all vertical lines are synchronized (independently), and only then synchronize the horizontal line, this takes expected time . Finally, the rearrangement in Figures 14(g) to (h) (given in detail in Figure 15) takes expected time : each insertion line must grow monomers before a level is moved up. There are of them that work independently, so Lemma 2 gives an expected time to finish of . Besides computing , the slowest parts of the construction run in expected time , and there are at most a constant number of these parts, so the entire construction finishes in expected time . This concludes the proof of Lemma 15. ∎

### 7.2 Proof of Theorem 14: fast Boolean matrix multiplication

###### Proof of Theorem 14.

The multiplication of two Boolean matrices is defined as . To calculate , for each , we begin by defining the function which acts on two encoded matrix elements and as follows.

where is a special monomer denoting “no useful data here”, is the monomer encoding for the (useful) bit when , and as usual , , , denote the binary monomer line segments encoding of .

We now apply Lemma 15 to and setting to . This gives a line of monomers with segments, each of length , and of which encode useful data. The remainder are the segments for which . The entire line synchronizes and begins the process of deleting the useless line segments as follows. Each encodes an bit number (as the concatenation of the bit strings for . The digits of are used to organise the deletion of the segments. If the LSB of in segment