Search Methods for Tile Sets in Patterned
Mika Göös, Tuomo Lempiäinen, Eugen Czeizler, and Pekka Orponen
Department of Information and Computer Science and
Helsinki Institute for Information Technology HIIT
Aalto University, Finland
Abstract. The Pattern self-Assembly Tile set Synthesis (PATS) problem, which arises in the theory of structured DNA self-assembly, is to determine a set of coloured tiles that, starting from a bordering seed structure, self-assembles to a given rectangular colour pattern. The task of finding minimum-size tile sets is known to be NP-hard. We explore several complete and incomplete search techniques for finding minimal, or at least small, tile sets and also assess the reliability of the solutions obtained according to the kinetic Tile Assembly Model.
Keywords. DNA self-assembly, tilings, Tile Assembly Model, pattern assembly, tile set synthesis, reliable self-assembly
This is the author’s version of a work that was accepted for publication in Journal of Computer and System Sciences. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal of Computer and System Sciences, volume 80, issue 1, pages 297–319, February 2014, doi:10.1016/j.jcss.2013.08.003.
Preliminary versions of parts of this work have appeared in the Proceedings of the 16th and 17th International Conference on DNA Computing and Molecular Programming (Hong Kong, China, June 2010 and Pasadena, CA, USA, September 2011, respectively) [13, 18].
Department of Computer Science, University of Toronto
Helsinki Institute for Information Technology HIIT and Department of Computer Science, University of Helsinki
Algorithmic assembly of nucleic acids (DNA and RNA) has advanced extensively in the past 30 years, from a seminal idea to the current designs and experimental implementations of complex nanostructures and nanodevices with dynamic, programmable evolution and machinelike properties. Recent developments in the field include fundamental constructions such as in vitro complex 3D pattern formation and functionalisation [4, 16], robotic designs such as mobile arms, walkers, motors [24, 42], computational primitives [29, 30], and also applications to in vivo biosensors  and potential drug delivery mechanisms and therapeutics .
Self-assembly of nucleic acids can be seen both as a form of structural nanotechnology and as a model of computation. As a computational model, one first encodes the input of a computational problem into an algorithmically designed (DNA) pattern or shape. Then, by making use of both the initial oligomer design and the intrinsic properties of the self-assembly system, one manipulates the structure to produce a new architecture that encodes the desired output.
As a nanotechnology, the goal of algorithmic (DNA/RNA) self-assembly is to design oligomer sequences that in solution would autonomously (or with as little interaction as possible) assemble into complex polymer structures. These may have both static and dynamic properties, may bind other molecules such as gold nanoparticles or various proteins, may act as fully addressable scaffolds, or may be used for further manipulation. Such molecular constructions can be composed of from only a couple of DNA strands to more than 200 and, in some cases, can change their conformation and achieve distinct functionalities.
In recent years there has been a growing interest in integrating these two directions, in order to obtain complex supramolecular constructions with interdependencies between computational functions and conformational switching. Such approaches are envisioned due to a key property of nucleic acid scaffolds, viz. their modularity: multiple functional units can be attached to a common scaffold, thus giving rise to multifunctional devices. Thus, the self-assembly of nanostructures templated on synthetic DNA has been proposed by several authors as a potentially ground-breaking technology for the manufacture of next-generation circuits, devices and materials [15, 26, 40, 41]. Also laboratory techniques for synthesising the requisite 2D DNA template lattices, many based on Rothemund’s  DNA origami tiles, have recently been demonstrated by many groups [22, 31].
In order to support the manufacture of aperiodic structures, such as electronic circuit designs, these DNA templates need to be addressable. When the template is constructed as a tiling from a family of DNA origami (or other kinds of) tiles, one can view the base tiles as being “coloured” according to their different functionalities, and the completed template implementing a desired colour pattern.111For examples of such tile-based high-level designs for nano-electric circuits cf. Appendix A, which summarises a scheme from Czeizler et al. . Now, a given target pattern can be assembled from many different families of base tiles, and to improve the laboratory synthesis it is advantageous to try to minimise the number of tile types needed and/or maximise the probability that they self-assemble to the desired pattern, given some characteristics of tiling errors.
The task of minimising the number of DNA tile types required to implement a given 2D pattern was identified by Ma and Lombardi , who formulated it as a combinatorial optimisation problem, the Pattern self-Assembly Tile set Synthesis (PATS) problem, and also proposed two greedy heuristic algorithms for solving the task. The problem was recently proved to be NP-hard [2, 36], and hence finding an absolutely minimum-size tile set for a given pattern most likely requires an exponential amount of time in the worst case. Thus the problem needs to be addressed either with complete methods yielding optimal tile sets for small patterns, or incomplete methods that work also for larger patterns but do not guarantee that the tile sets produced are of minimal size. In this work, we present search algorithms covering both approaches and assess their behaviour experimentally using both randomly generated and benchmark pattern test sets. We attend both to the running time of the respective algorithms, and to the size and assembly reliability of the tile sets produced.
In the following, we first in Section 2 present an overview of the underlying tile assembly model [39, 33] and the PATS problem , and then in Section 3 discuss the search space of pattern-consistent tile sets (viewed abstractly as partitions of the ambient rectangular grid). In Section 4 we proceed to describe our exhaustive partition-search branch-and-bound algorithm (PS-BB) to find tile sets of absolutely minimum cardinality. The algorithm makes use of a search tree in the lattice of grid partitions, and an efficient bounding function to prune this search tree.
While the PS-BB algorithm can be used to find certifiably minimal tile sets for small patterns, the size of the search space grows so rapidly that the algorithm hits a complexity barrier at approximately pattern sizes of tiles, for random test patterns. Thus, in a second approach, presented in Section 5, we tailor the basic partition-search framework of the PS-BB algorithm towards the goal of finding small, but not necessarily minimal tile sets. Instead of a systematic branch-and-bound pruning and traversal of the complete search space, the modified algorithm PS-H applies heuristics which attempt to optimise the order of the directions in which the space is explored.
It is well known in the heuristic optimisation community [12, 23] that when the runtime distribution of a randomised search algorithm has a large variance, it is with high probability more efficient to run several independent short runs (“restarts”) of the algorithm than a single long run. Correspondingly, we investigate the efficiency of the PS-H algorithm for a number of parallel executions ranging from 1 to 32, and note that indeed this number has a significant effect on the success rate of the algorithm in finding small tile sets.
As a third alternative, presented in Section 6, we formulate the PATS problem as an Answer Set Programming (ASP) task , and apply a generic ASP solver to find solutions to it. Here our experimental results indicate that for patterns with a small optimal solution, the ASP approach indeed works well in discovering that solution.
Given the inherently stochastic nature of the DNA self-assembly process, it is important also to assess the reliability of a given tile set, i.e. the probability of its error-free self-assembly to the desired target pattern. In Section 7 we introduce a method for estimating this quantity, based on Winfree’s analysis of the kinetic Tile Assembly Model . We present experimental data on the reliability of tile sets found by the PS-BB and PS-H algorithms and find that also here the heuristic optimisations introduced in the PS-H approach result in a notable improvement over the basic PS-BB method.
2.1 The Abstract Tile Assembly Model
The aTAM is a custom-made generalisation of Wang tile systems [38, 14], designed for the study of self-assembly systems. The basic components of the aTAM are non-rotatable unit square tiles, uniquely defined by the sets of four “glues” assigned to their edges. The glues come from a finite alphabet, and each pair of two glues is associated a strength value that determines the stability of a link between two tiles having these glues on the abutting edges. In most cases, it is assumed that the strength of two distinct glues is zero, while a pair of matching glues has strength either 1 or 2.
Let be the set of four functions corresponding to the four cardinal directions:222In many cases, we use the elements of just as direction labels and do not interpret them as functions. However, in these cases too, we identify and . , , and . Let be a finite set of glue types and a glue strength function such that, unless otherwise specified, only if . A tile type is a quadruple of glue types for each side of the unit square. A tile system is a finite collection of different tile types.
A (tile) assembly is a partial mapping that assigns tiles to locations in the two-dimensional grid. A tile assembly system (TAS) consists of a tile system , a seed assembly , a glue strength function and a temperature (we use ). The seed structure can be either an individual tile or a connected, finite assembly. Given an existing (connected) assembly , such as the seed structure , a tile from can adjoin the assembly if the total strength of the binding, given by the sum of all strength function values among the glues placed on the boundary between the tile and the assembly, reaches or surpasses the temperature threshold . Note that tiles of the seed assembly do not need to be in the tile system , but that can be extended only by tiles from .
Formally, we say that assembly produces directly assembly , denoted , if there exists a site and a tile such that , where the union is disjoint, and
where ranges over those directions in for which is defined.
In Figure 1 we present a TAS with seven tile types and temperature which, starting from the seed tile, assembles a continuously growing structure that corresponds to a binary counter pattern (see Figure 9). Out of the seven tile types in Figure 1(a), one can distinguish the tile s used as a seed, two tile types which assemble the boundary of the structure, and four rule-tile types (two of which are distinguished by x and y), which fill the area in between the L-shaped boundary. Considering the partial assembly presented in Figure 1(b), a tile of type y can adjoin the assembly at position since
while a tile of type x cannot adjoin the assembly at the same position (i.e. ) since
Let be the reflexive transitive closure of . A TAS produces an assembly if is an extension of the seed assembly , that is, . Denote by the set of all assemblies produced by . A TAS is deterministic if for any assembly and for every there exists at most one such that can be extended with at site . Then the pair forms a partially ordered set, which is a lattice if and only if is deterministic. The maximal elements in , i.e. the assemblies for which there does not exist any satisfying , are called terminal assemblies. Denote by the set of terminal assemblies of . In case of finite assemblies, an equivalent definition of determinism is that all assembly sequences terminate and for some assembly . In this case we say that uniquely produces .
2.2 The PATS Problem
Let the dimensions and be fixed. A mapping from onto defines a -colouring or a -coloured pattern. To build a given pattern, we start with boundary tiles in place for the west and south borders of the by rectangle and keep extending this assembly by tiles with strength-1 glues.
Definition 1 (Pattern self-Assembly Tile set Synthesis (PATS) ).
Given: A -colouring . Find: A tile assembly system such that The tiles in have glue strength 1. The domain of is and all the terminal assemblies have domain . There exists a tile colouring such that each terminal assembly satisfies for all .
Finding minimal solutions (in terms of ) to the PATS problem was claimed to be NP-hard by Ma and Lombardi  and proved to be so by Czeizler and Popa .333An improvement of the result to use only a constant number of tile colours is due to Seki . Without loss of generality, we consider only TASs in which every tile type participates in some terminal assembly of .
In the literature, the seed assembly of a TAS is often taken to be a single seed tile  whereas we consider an L-shaped seed assembly. The boundaries can always be self-assembled using different tiles with strength-2 glues, but we wish to make a clear distinction between the complexity of constructing the boundaries and the complexity of the 2D pattern itself. Moreover, in some experimental designs for DNA tile assembly systems, such as that by Fujibayashi et al. , the implementation of seed structures by the DNA origami technique  allows the creation of such complete boundary conditions in a natural way.
Due to constraint P1 the self-assembly process proceeds in a uniform manner directed from south-west to north-east. This paves the way for a simple characterisation of deterministic TASs in the context of the PATS problem.
Solutions of the PATS problem are deterministic precisely when for each pair of glue types there is at most one tile type such that and .
A simple observation reduces the work needed in finding minimal solutions of the PATS problem.
The minimal solutions of the PATS problem are deterministic TASs.
For the sake of contradiction, suppose that is a minimal solution to a PATS problem instance and that is not deterministic. By the above proposition, let tiles be such that and . Consider the simplified TAS . We show that this, too, is a solution to the PATS problem, which violates the minimality of .
Suppose . If , then some can be used to extend in . If , then could be used to extend in , so we must have . But since new tiles are always attached by binding to south and west sides of the tile, could then be extended by in . Thus, we conclude that and furthermore . This demonstrates that has property P2. The properties P1 and P3 can be readily seen to hold for as well. In terms of we have found a more optimal solution—and a contradiction. ∎
We consider only deterministic TASs in the sequel.
3 The Search Space of Consistent Tile Sets
Let be the family of partitions of the set . Partition is coarser than partition (or is a refinement of ), denoted , if
Now, is a partially ordered set, and in fact, a lattice. Note that implies .
A colouring induces a partition of the set . In addition, since every (deterministic) solution of the PATS problem uniquely produces some assembly , we associate with a partition of , . Here, in case all tiles in are used in the terminal assembly. Now condition P3 in the definition of PATS is equivalent to requiring that a TAS satisfies
A partition is constructible if for some deterministic TAS satisfying properties P1 and P2. Hence the PATS problem can be rephrased using the family of partitions as the fundamental search space.
A minimal solution to the PATS problem corresponds to a partition such that is constructible, and is minimal.
For example, the 2-coloured pattern in Figure 3(a) defines a 2-class partition . The 7-class partition in Figure 3(b) is a refinement of () and in fact, is constructible (see Figure 4(b)) and corresponds to a minimal solution of the PATS problem instance defined by the pattern .
3.1 Determining Constructibility
In this section, we give an algorithm for deciding the constructibility of a given partition in polynomial time. To do this, we use the concept of most general (or least constraining) tile assignments. For simplicity, we assume the set of glue labels to be infinite.
Given a partition of the set , a most general tile assignment (MGTA) is a function such that
When every position in is assigned a tile type according to , any two adjacent positions agree on the glue type of the side between them.
For all assignments satisfying A1 we have444To shorten the notation, we write instead of .
for all .
Given a partition and a function , we say that is obtained from by merging glues and missing if for all we have
A most general tile assignment for a partition can be found as follows. We start with a function that assigns to each tile edge a unique glue type, or in other words, a function such that the mapping is injective. Next, we go through all pairs of adjacent positions in in some order and require their matching sides to have the same glue type by merging the corresponding glues. This process generates a sequence of functions and terminates after steps.
The above algorithm generates a most general tile assignment.
By the end, we are left with a function that satisfies property A1 by construction. To see why property A2 is satisfied, we again use the language of partitions.
Any tile assignment on gives rise to a set of equivalence classes (or a partition) on : class-direction pairs that are assigned the same glue type reside in the same equivalence class. The initial assignment gives each class-direction pair a unique glue type, and thus corresponds to the initial partition . In the algorithm, any glue merging operation corresponds to the combining of two equivalence classes.
The algorithm goes through a list of pairs of elements from that are required to have the same glue type. In this way, the list records necessary conditions for property A1 to hold. This is to say that every tile assignment satisfying A1 has to correspond to a partition of that is coarser than each of the partitions in , where is the partition obtained from the initial partition by combining classes and . Since the set is a lattice, there exists a unique greatest lower bound of the partitions in . This is exactly the partition that the algorithm calculates in the form of the assignment . As a greatest lower bound, is finer than any partition corresponding to an assignment satisfying A1, but this is precisely the requirement for condition A2. ∎
The above analysis also gives the following.
For a given partition of , MGTAs are unique up to relabelling of the glue types.
Thus, for each partition , we take the MGTA for missing to be some canonical representative from the class of MGTAs for .
For efficiency purposes, it is worth mentioning that MGTAs can be generated iteratively: A partition can be obtained by repeatedly combining partition classes starting from the initial partition :
As a base case, an MGTA for can be computed by the above algorithm. An MGTA for each can be computed from an MGTA for the previous partition by just a small modification: Let an MGTA be given for and suppose is obtained from by combining classes . Now, an MGTA for can be obtained from by merging tiles and , that is, merging the glue types on the four corresponding sides.
We now give the conditions for a partition to be constructible in terms of MGTAs.
A partition is constructible iff the MGTA for is injective and the tile set is deterministic in the sense of Proposition 2.
“”: Let be constructible and let the MGTA for be given. Let be a deterministic TAS such that . The uniquely produced assembly of induces a tile assignment that satisfies property A1. Now using property A2 for the MGTA we see that any violation of the injectivity of or any violation of the determinism of the tile set would imply such violations for . But since corresponds to a constructible partition, no violations can occur for and thus none for .
“”: Let be an injective MGTA with deterministic tile set . Because is deterministic, we can choose glue types for a seed assembly so that the westernmost and southernmost tiles fall into place according to in the self-assembly process. The TAS , with appropriate glue strengths , then uniquely produces a terminal assembly that agrees with on . This gives , but since is injective, and so . ∎
In order to understand the result of Lemma 8 better, let us consider the 2-coloured pattern in Figure 5(a), and associate to it the 2-class partition generated by the colours of the pattern. We can use the result of the previous lemma to show that this partition in not constructible. Indeed, if we consider the procedure for generating an MGTA for this partition, e.g. Figure 5(b) and (c), we obtain that the two tiles of the MGTA (one coloured white, the other black) must have the same glues on all corresponding positions. Hence the MGTA is not injective, nor deterministic in the sense of Proposition 2.
4 Complete Search for Minimal Tile Sets
We now extend the techniques of Ma and Lombardi  to obtain an exhaustive branch-and-bound search method to find minimal solutions to the PATS problem. We call this approach the partition-search branch-and-bound (PS-BB) algorithm. The idea of Ma and Lombardi  (following experimental work of Park et al. ) is to start with an initial tile set that consists of different tiles, one for each of the grid positions in . Their algorithm then proceeds to merge tile types in order to minimise . We formalise this search process as an exhaustive search in the set of all partitions of the set . In the following, we let a PATS instance be given by a fixed -coloured pattern .
The PS-BB algorithm performs an exhaustive exploration of the lattice , searching for constructible partitions (see Figure 6). We start with the initial partition that is always constructible. In the search, we maintain and incrementally update MGTAs for every partition we visit. First, we describe simple branching rules to obtain a rooted directed acyclic graph search structure and later give rules to prune this DAG to a node-disjoint search tree.
The root of the DAG is taken to be the initial partition . For each partition we next define the set of children of . Our algorithm always proceeds by combining classes of the partition currently being visited, so for each we will have . Say we visit a partition . We have two possibilities:
If is not a refinement of the target pattern , that is if , we can drop this branch of the search, since no possible descendant can be a refinement of either.
In case , we can use the MGTA for to give a concrete solution to the PATS problem instance defined by the colouring . To continue the search and to find further improved solutions we consider each pair of classes in turn and recursively visit the partition where the two classes are combined. In fact, by the above analysis, it is sufficient to consider only pairs of the same colour. So, in this case,
is not constructible: In this case the MGTA for gives and for some classes . We continue the search from partition .
To guarantee that our algorithm finds the optimal solution in the case C2 above, we need the following.
Let be a non-constructible partition, the MGTA for and , , classes such that and . For all constructible we have .
Let , , and be as in the statement of the lemma. Let be a constructible partition and the MGTA for . Since is coarser than we can obtain from a tile assignment such that , where for every , is the unique class for which . The assignment has property A1 and so using A2 for the MGTA we get that
Now, since is constructible, the identities and can not hold for any two different classes . Looking at the definition of , we conclude that and for some . This demonstrates . ∎
4.1 Pruning the DAG to a Search Tree
Computational resources should be saved by not visiting any partition twice. To keep the branches in our search structure node-disjoint, we maintain a list of graphs that store restrictions on the choices the search can make.
For each partition we associate a family of undirected graphs , one for each colour class of the pattern . Every class in is represented by a vertex in the graph corresponding to the colour of the class. More formally, the vertex set of the graph is taken to be those classes for which . (So now, .) An edge indicates that the classes and are not allowed ever to be combined in the search branch in question. When we start our search with the initial partition , the edge sets are initially empty, . At each partition , the graphs have been determined inductively and the graphs for those children that we visit are defined as follows.
If is constructible: We choose some ordering , of similarly coloured pairs of classes. Define , to be the colour of the pair , so that . Now, we visit a partition if and only if . If we decide to visit a child partition , we define the edge sets as follows:
We start with the graphs and add the edges for all to their corresponding graphs. Call the resulting graphs .
Finally, as we combine the classes and to obtain the partition , we merge the vertices and in the graph (after merging, the neighbourhood of the new vertex is the union of the neighbourhoods for and in ). The graphs follow as a result.
If is not constructible: Here, the MGTA for suggests a single child partition for some . If , we terminate this branch of the search. Otherwise, we define the graphs to be the graphs , except that in the vertices and are merged.
One can see that the outcome of this pruning process is a search tree that has node-disjoint branches and one in which every possible constructible partition is still guaranteed to be found. Figure 6 presents a sketch of the search tree.
Note that we are not usually interested in finding every constructible partition , but only in finding a minimal one (in terms of ). Next, we give an efficient method to lower-bound the partition sizes of a given search branch.
4.2 The Bounding Function
Given a root of some subtree of the search tree, we ask: What is the smallest partition that can be found from this subtree? The nodes in the subtree rooted at comprise those partitions that can be obtained from by merging pairs of classes that are not forbidden by the graphs . This merging process halts precisely when all the graphs have been reduced into cliques. As is well known (and easy to see), the size of the smallest clique that a graph can be turned into by merging non-adjacent vertices is given by the chromatic number555The chromatic number of a graph is the smallest number of colours needed to colour the vertices of so that no two adjacent vertices share the same colour. of the graph . This immediately gives the following.
For every in the subtree rooted at and constrained by , we have
Determining the chromatic number of an arbitrary graph is an NP-hard problem. Fortunately, we can restrict our graphs to be of a special form: graphs that consist only of a clique and some isolated vertices. For these graphs, the chromatic numbers are given by the sizes of the cliques.
To see how to maintain graphs in this form, consider as a base case the initial partition . Here, for all , so is of our special form—it has a clique of size 1. For a general partition , we go through the branching rules D1–D2.
is constructible: Since we are allowed to choose an arbitrary ordering , , for the children , we design an ordering that preserves the special form of the graphs. For a graph of our special form, let consist of those vertices that are part of the clique in . In the algorithm, we first set for all and repeat the following process until every graph is a complete clique.
Pick some colour and an isolated vertex .
Process the pairs for all in some order. By the end, update to include all the edges that were just processed (the size of the clique in increases by one).
A moment’s inspection reveals that when the graphs are of our special form, so are all of the derived graphs passed on to the children of .
is not constructible: If the algorithm decides to continue the search from a partition , for some , we have . This means that either , in which case we are merging two isolated vertices, or one of and is part of the clique , in which case we merge an isolated vertex to the clique. In both cases, we maintain the special form in the graphs .
4.3 Traversing the Search Tree
When running a branch-and-bound algorithm we maintain a “current best solution” discovered so far as a global variable. This solution gives an upper bound for the minimal value of the tile set size and can be used to prune such search branches that are guaranteed (by the bounding function) to only yield solutions worse than the current best. There are two general strategies to traverse a branch-and-bound search tree: Depth-First Search and Best-First Search . Our description of the search tree for the lattice is general enough to allow either of these strategies to be used in the actual implementation of the algorithm. In the following section we give performance data on our DFS implementation of the PS-BB algorithm.
The running time of the PS-BB algorithm is proportional—up to a polynomial factor—to the number of partitions the algorithm visits. Hence, we measure the running time in terms of the number of merge operations performed in the search. Figure 7(a) presents the number of such merge operations in order to find a minimal solution for random 2-coloured instances of the PATS problem. The algorithm was executed for instance sizes and ; the 20th and 80th percentiles are shown alongside the median of 21 separate runs for each instance size. For the limiting case , the algorithm spent on the order of two hours of (median) computing time on a 2.61 GHz AMD processor.
Even though branch-and-bound search is an exact method, it can be used to find approximate solutions by running it for a suitable length of time. Figure 7(b) illustrates how the best solution found up to a point develops as increasingly many steps of the algorithm are run. The figure provides data on random 2-coloured instances of sizes . Because we begin our search from the initial partition, the best solution at the first step is precisely equal to the instance size. For each size, several different patterns were used. The algorithm was cut off after steps. By this time, an approximate reduction of 58% in the size of the tile set was achieved (cf. a reduction of 43.5% in Ma and Lombardi ).
Next, we consider two well known examples of structured patterns: the discrete Sierpinski triangle and the binary counter (see Figures 9 and 9 for instances of both patterns). A tile set of size 4 is optimal for both of these patterns, see e.g. Winfree  or Rothemund and Winfree . First, for the Sierpinski triangle pattern, we get a tile set reduction of well over 90% (cf. 45% in Ma and Lombardi ) in Figure 8(a). We used the same cutoff threshold and instance sizes as in Figure 7(b).
Our description of the PS-BB algorithm leaves some room for randomisation in deciding which search branch the DFS is to explore next. This randomisation does not seem to affect the search dramatically if considering the Sierpinski triangle pattern—the separate single runs in Figure 8(a) are representative of an average randomised run. By contrast, for the binary counter pattern, randomised runs for a single instance size do make a difference. Figure 8(b) depicts several separate runs for instance size . Here, each run brings about a reduction in solution size that oscillates between a reduction achieved on a random 2-coloured instance (Figure 7(b)) and a reduction achieved on the Sierpinski instance (Figure 8(a)). This suggests that, as is characteristic of DFS traversal, restarting the algorithm with different random seeds may help with large instances that have small optimal solutions. We explore this opportunity for efficiency improvement further in connection to the algorithm PS-H presented in the next section.
5 Heuristically Guided Search for Small Tile Sets
5.1 The PS-H Algorithm Scheme
The PS-BB algorithm utilises effective pruning methods to reduce the search space. Even though it offers significant reduction in the size of tile sets compared to earlier approaches, it is in most cases still too slow for patterns of practical size. Often it is not important to find a provably minimal solution, but to find a reasonably small solution in a reasonable amount of time. To address this objective, we present in the following a modification of the basic PS-BB algorithm with a number of search-guiding heuristics. We call this approach the partition-search with heuristics (PS-H) algorithm scheme.
Whereas the pruning methods of the PS-BB algorithm try to reduce the size of the search space in a “balanced” way, the PS-H algorithm attempts to “greedily” optimise the order in which the coarsenings of a partition are explored, in the hope of being directly led to close-to-optimal solutions. Such opportunism may be expected to pay off in case the success probability of the greedy exploration is sufficiently high, and the process is restarted sufficiently often, or equivalently, several runs are explored in parallel.
The basic heuristic idea is to try to minimise the effect that a merge operation has on partition classes other than those which are combined. This can be achieved by preferring to merge classes already having as many common glues as possible. In this way one hopes to extend the number of steps the search takes before it runs into a conflict. For example, when merging classes and such that and , the glues on the W and S edges of all other classes are unaffected. This way, the search avoids proceeding to a partition which is not constructible after the merge operation is completed. Secondarily, we prefer merging classes which already cover a large number of sites in . That is, one tries to grow a small number of large classes instead of growing all the classes at an equal rate.
We define the concept of the number of common glues formally as follows.
Given a partition and a MGTA for , the number of common glues between classes is defined by the function ,
where if and 0 otherwise, for all .
Except for the bounding function, the PS-BB algorithm allows an arbitrary ordering , for the children (coarsenings) of a constructible partition . In the PS-H algorithm, we choose the ordering using the following heuristics. First form the set
of class pairs of same colour, and then repeat the following process until is empty.
Maximise the number of common glues:
Maximise the size of the larger class:
Maximise the size of the smaller class:
Pick some pair at random and visit the partition .
Remove from :
The PS-H algorithm also omits the pruning process utilised by the PS-BB algorithm. That way, it aims to get to the small solutions quickly by reducing the computational resources used in a single merge operation.
Since step H5 of the heuristics above leaves room for randomisation, the PS-H algorithm performs differently with different random seeds. While some of the randomised runs may lead to small solutions quickly, others may get sidetracked into worthless expanses of the solution space. We make the best of this situation by running several executions of the algorithm in parallel, or equivalently, restarting the search several times with a different random seed. The notation PS-H denotes the heuristic partition search algorithm with parallel search threads. The solution found by the PS-H algorithm is the smallest solution found by any of the parallel threads.
In this section, we present results on the performance of the PS-H algorithm for and compare it to the previous PS-BB algorithm. We consider several different finite 2-coloured input patterns, two of which were analysed also previously using the PS-BB algorithm: the discrete Sierpinski triangles of sizes (Figure 9) and , and the binary counter of size (Figure 9). Furthermore, we introduce a 2-coloured “tree” pattern of size (Figure 9) as well as a 15-coloured pattern of size based on a CMOS full adder design (Figure 9).666For an explanation of the notation used in Figure 9, see Appendix A. While the Sierpinski triangle and binary counter patterns are known to have minimal solutions of 4 tiles, the minimal solutions for the tree pattern and the full adder pattern are unknown. The experiments were conducted on a high performance computing cluster equipped with 2.6 GHz AMD Opteron 2435 processors and Scientific Linux 6 operating system.
Figure 10 presents the evolution of the “current best solution” as a function of time for the \subreffig:size-srp32-ct and \subreffig:size-srp64-ct Sierpinski triangle patterns. To allow fair comparison, Figures 10 and 10 present the same data with respect to the total processing time taken by all the executions that run in parallel. The experiments were repeated 21 times and the median of the results is depicted. In 37% of all the individual runs777In total there were runs for each input pattern. conducted, the PS-H algorithm was able to find the optimal 4-tile solution for the Sierpinski triangle pattern in less than 30 seconds. A similar percentage for the Sierpinski triangle pattern is 34% in one hour. Remarkably, the algorithm performs only from 1030 to 1035 and from 4102 to 4107 merge steps before arriving at the optimal solution for the and patterns, respectively. In other words, the search rarely needs to backtrack. In contrast, the smallest solutions found by the PS-BB algorithm have 42 tiles, reached after merge steps, and 95 tiles, reached after merge steps.
In Figure 11 we present the corresponding results for the binary counter and tree patterns. The size of the smallest solutions found by the PS-H algorithm were 20 (cf. 307 by PS-BB) and 25 (cf. 192 by PS-BB) tiles, respectively. In the case of the tree pattern, the parallelisation brings significant advantage over a single run. Finally, Figures 12–12 show the results for the 15-colour CMOS full adder pattern. In this case, the improvement over the previous PS-BB algorithm is less clear. The PS-H algorithm is able to find a solution of 58 tiles, whereas the PS-BB algorithm gives a solution of 69 tiles.
6 Answer Set Programming for Minimal Tile Sets
6.1 An ASP Model for PATS
Answer Set Programming (ASP)  is a declarative logic programming paradigm for solving difficult combinatorial search problems. In ASP, a problem is described as a logic program, and an answer set solver is then used to compute stable models (answer sets) of the logic program. The ASP paradigm can be applied also to the PATS problem. In the following we give a brief description on how to transform the PATS problem to an ASP program using a modelling language that is accepted by ASP grounders such as lparse  or gringo .
First, we define a constant for each position of the grid , each colour, each available tile type and each available glue type. After that, a number of choice rules are introduced to associate a tile type with each position of the grid, a glue type with each of the four sides of the tile types and a colour with each of the tile types. Next, we use basic rules to make the glues of every pair of adjacent tiles match and to make the tile system deterministic, i.e. to ensure that every tile type has a unique pair of glues on its W and S edges. Finally, we compile the target pattern to a set of rules that associate every position of the grid with the desired colour.
The above-described program is given to a grounder, which computes an equivalent variable-free program. The variable-free program is forwarded to an answer set solver, which then outputs a tile type for each position of the grid, given that such a solution exists. We run the programs repeatedly and increment the number of available tile and glue types, until a solution is found.
We used grounder gringo 3.0.5  and answer set solver clasp 2.1.3  with default settings to run our experiments. A traditional solver, smodels , was also considered, but clasp proved to be significantly faster in solving instances of the PATS problem. We consider two patterns having a minimal solution of 4 tiles: the Sierpinski triangle and binary counter patterns. The programs were executed for patterns of sizes . We repeated the experiments 21 times with different random seeds and the median running time is presented in Figure 13 for the Sierpinski triangle pattern and in Figure 13 for the binary counter pattern. The results include the running time of both the grounder and the solver as well as all the incremental steps needed until a solution is found. We were able to find the minimal solution for both the Sierpinski triangle pattern and the binary counter pattern in approximately 31 minutes of (median) running time. The results were obtained on the same computing cluster as the results in Section 5.2.
Based on the above results, the ASP approach performs very well when considering patterns with a small optimal solution. However, the running time seems to increase dramatically with patterns that have a larger optimal solution. Indeed, we were not able to find solutions for the tree pattern or the CMOS full adder pattern using the ASP approach.
7 The Reliability of Tile Sets
7.1 The Kinetic Tile Assembly Model
In the following, we utilise the kinetic Tile Assembly Model (kTAM) to assess the reliability of various tile sets generated by the PS-BB and PS-H algorithms. The kTAM was introduced by Winfree  as a kinetic counterpart of the aTAM. Several variants of the kTAM exist [8, 35]. However, the main elements are similar.
The kTAM simulates two types of reactions, each involving an assembly, i.e. a crystal structure consisting of several merged tiles, and a tile: association of tiles to the assembly (forward reaction) and dissociation (reverse reaction), see e.g. Figure 14.888Note that interactions between two tiles, such as forming a new assembly, as well as interactions between two assemblies, are not taken into consideration in the initial model . However, they are studied in some of the later developed variants of the kTAM, see e.g. Schulman and Winfree . In the first type of reaction, any tile can attach to the assembly at any position (up to the assumption that tile alignment is preserved), even if only a weak bond is formed; the rate of this reaction is proportional to the concentration of free tiles in the solution. In the second type of reaction, any tile can detach from the assembly with rate , , which is exponentially correlated with the total strength of the bonds between the tile and the assembly. Thus, tiles which are connected to the assembly by fewer or weaker bonds, i.e. incorrect “sticky end” matches, are more prone to dissociation than those which are strongly connected by several bonds (well paired sticky end sequences).
In the following, we follow the notation of Winfree . For any tile type , the rate constant of the association (forward reaction) of to an existing assembly is given by
where is the concentration in solution of free tiles of type and is a temperature dependent parameter. In the case of DNA double-crossover (DX) tiles, this parameter is given by the formula
where , , , and is the temperature (in ).
In the case of dissociation (reverse reaction), for a tile which is connected to the assembly by a total bond strength , the rate constant is given by the formula
where is the standard free energy needed to break bonds. In the case of DX tiles, as the glues of the tiles are implemented using 5-base long single-stranded DNA molecules, can be estimated using the nearest-neighbour model  as
Moreover, can range with integer values from to , corresponding to the cases when the tile is totally erroneously placed in the assembly (no bond connects it to the crystal) and when the tile is fully integrated into the assembly (all its four sticky ends are correctly matched), respectively.
In order to easily represent and scale the system, the free parameters involved in the formulas of the rate constants and are re-distributed into just two dimensionless parameters, and , where the first is dependent on the initial tile concentration and the second is dependent on the assembly temperature:
where, in the case of DX tiles, is adjusted in order to take into consideration possible entropic factors, such as orientation or location of tiles. The previous parameter re-distribution is made possible as a result of the assumption made in the initial kTAM  that all tile types are provided into the solution in similar concentrations, and that the consumption in time of the free monomers is negligible compared to the initial concentration.
7.2 Computing the Reliability of a Tile Set
By choosing appropriate physical conditions, the probability of errors in the assembly process can be made arbitrarily low, at the cost of reducing the assembly rate . However, we would like to be able to compare the error probability of different tile sets producing the same finite pattern, under the same physical conditions. Given the amount of time the assembly process is allowed to take, we define the reliability of a tile set to be the probability that the assembly process of the tile system in question completes without any incorrect tiles being present in the terminal configuration. In the following, we present a method for computing the reliability of a tile set, based on Winfree’s analysis of the kTAM , and the notion of kinetic trapping introduced within.
We call the W and S edges of a tile its input edges. First, we derive the probability of the correct tile being frozen at a particular site under the condition that the site already has correct tiles on its input edges. Let and be the number of tile types having one mismatching and two mismatching input glues, respectively, between them and the correct tile type for site . Now, for a deterministic tile set , the total number of tiles is for any . Given that a site has the correct tiles on its input edges, a tile is correct for that site if and only if it has two matches on its input edges.
In what follows, we assume that correct tiles are attached at sites and . The model for kinetic trapping  gives four distinct cases in the situation preceding the site being frozen by further growth. To each of these cases we can associate an “off-rate” for the system to exit its current state: (E) An empty site, with off-rate . (C) The correct tile, with off-rate . (A) A tile with one match, with off-rate . (I) A tile with no matches, with off-rate . Additionally, we have two sink states FC and FI, which represent frozen correct and frozen incorrect tiles, respectively. The rate of a site being frozen is equal to the rate of growth . Figure 15 describes the dynamics of the system. Let denote the probability of the site being in state after seconds for all . To compute the frozen distribution, we write the rate equations for the model of kinetic trapping from Figure 15 as follows:999The notation is used to denote the derivative of with respect to time.
where . To compute the steady-state probability of the site being frozen with the correct tile, i.e. , we make use of the steady state of the related flow problem :101010By the definition of the kinetic trapping model , it is assumed that a unit amount of material is supplied into state of the system at any time point.
which gives us a system of linear equations. This system has a single solution, namely
where denotes the event of the correct tile being frozen at site .
The assembly process can be thought of as a sequence of tile addition steps where , , denotes a tile being frozen at site . Due to the fact that the assembly process of the tile systems considered here proceeds uniformly from south-west to north-east, we have that for all . We assume that tiles elsewhere in the configuration do not affect the probability. Now we can compute the probability of a finite-size pattern of size assembling without any errors, i.e. the reliability of that pattern:
We have computed the probability in terms of and . Given the desired assembly rate, we want to minimise the error probability by choosing values for and appropriately. If the assembly process is allowed to take seconds, the needed assembly rate for an pattern is approximately . In order to simplify the computations, we use the approximation
For small error probability and ,