Thermodynamic Binding Networks
Strand displacement and tile assembly systems are designed to follow prescribed kinetic rules (i.e., exhibit a specific time-evolution). However, the expected behavior in the limit of infinite time—known as thermodynamic equilibrium—is often incompatible with the desired computation. Basic physical chemistry implicates this inconsistency as a source of unavoidable error. Can the thermodynamic equilibrium be made consistent with the desired computational pathway? In order to formally study this question, we introduce a new model of molecular computing in which computation is driven by the thermodynamic driving forces of enthalpy and entropy. To ensure greatest generality we do not assume that there are any constraints imposed by geometry and treat monomers as unstructured collections of binding sites. In this model we design Boolean AND/OR formulas, as well as a self-assembling binary counter, where the thermodynamically favored states are exactly the desired final output configurations. Though inspired by DNA nanotechnology, the model is sufficiently general to apply to a wide variety of chemical systems.
Most of the models of computing that have come to prominence in molecular programming are essentially kinetic. For example, models of DNA strand displacement cascades and algorithmic tile assembly formalize desired interaction rules followed by certain chemical systems over time [9, 13]. Basing molecular computation on kinetics is not surprising given that computation itself is ordinarily viewed as a process. However, unlike electronic computation, where thermodynamics holds little sway, chemical systems operate in a Brownian environment . If the desired output happens to be a meta-stable configuration, then thermodynamic driving forces will inexorably drive the system toward error. For example, leak in most strand displacement systems occurs because the thermodynamic equilibrium of a strand displacement cascade favors incorrect over the correct output, or does not discriminate between the two . In DNA tile assembly, we typically must find and exploit kinetic barriers to unseeded growth to enforce that growth happens only from seed assemblies, otherwise thermodynamically favored assemblies will quickly form that are not the intended self-assembly program execution from the seed/input [11, 1].
We introduce the Thermodynamic Binding Networks (TBN) model, where information processing is due entirely to the thermodynamic tradeoff between entropy and enthalpy, and not any particular reaction pathway. In most experimental systems considered in DNA nanotechnology, thermodynamic favorability is determined by a tradeoff between: (1) the number of base pairs formed or broken (all else being equal, a state with more base pairs bound is more favorable); (2) the number of separate complexes (all else being equal, a state with more free complexes is more favorable). We use the terms enthalpy and entropy to describe (1) and (2) respectively (although this use does not perfectly align with their physical definitions, see Section 2). Intuitively, the entropic benefit of configurations with more separate complexes is due to additional microstates, each describing the independent three-dimensional positions of each complex. Although the general case of a quantitative trade-off between enthalpy and entropy is complex, we develop an elegant formulation based on the limiting case in which enthalpy is infinitely more favorable than entropy. Intuitively, this limit corresponds to increasing the strength of binding, while diluting (increasing the volume), such that the ratio of binding to unbinding rate goes to infinity. Systems studied in molecular programming can in principle be engineered to arbitrarily approach this limit. Indeed, this is the regime previously studied in the context of leak reduction for strand displacement cascades . Figure 1 shows a simple TBN, which can exist in 9 possible binding configurations. The favored (stable) configuration is the one that, among the maximally bound ones (bottom row), maximizes the number of separate complexes (bottom right).
As a central choice in seeking a general theory, we dispense with geometry: formally, we treat monomers simply as multisets of binding sites (domains). Viewed in the context of strand displacement, this abstracts away secondary structure (the order of domains on a strand), allowing us to represent arbitrary molecular arrangements such as pseudoknots , and handle non-local error modes such as spurious remote toeholds . In the context of tile self-assembly, we consider configurations in which binding does not follow the typical regular lattice structure. Since the TBN model does not rely on geometric constraints to enforce correct behavior, showing that specific undesired behavior is prevented by enthalpy and entropy alone leads to a stronger guarantee. Thus, for example proving leaklessness in this model would imply that even if pseudoknots, or other typically disallowed structures form, we would still have little leak. Indeed, by casting aside the vagaries of DNA biophysics (e.g., persistence length, number of bases per turn, sequence dependence on binding strength, etc.), our aim is to develop a general theory of programmable systems based on molecular bonds, a theory that will apply to bonds based on other substrates such as proteins, base stacking, or electric charge.
After introducing the TBN model in Section 2, we give results on Boolean circuit-based and self-assembly-based computation. In Section 3 we show how to construct AND and OR gates where the thermodynamically favored configurations encode the output. We develop provable guarantees on the entropic penalty that must be overcome to produce an incorrect output, showing how the logic gates can be designed to make the penalty arbitrarily large. Although completely modular reasoning seems particularly tough in this model, we develop a proof technique based on logically excising domains to handle the composition of Boolean gates—specifically trees of AND gates. Further work is needed to generalize these results to arbitrary circuits.
In Section 4 we look at self-assembly, beginning with questions about large assemblies. On the one hand we exhibit a class of TBNs with thermodynamicall stable assemblies (with simple ‘tree’ connectivity) of size exponential in the number of constituent monomer types. On the other hand, we show that this bound is essentially tight by giving an exponential size upper bound on the size of stable assemblies in general. These self-assembly results, along with the binary counter result below, tell us that monomer-efficient self-assembly is indeed possible within this model, but that (somewhat surprisingly for a model that favors enthalpy infinitely over entropy) super-exponential size polymers are necessarily unstable, even if they are self-assemblable in kinetic-based models.
For clarity of thought in separating the computational power of thermodynamics and kinetics, throughout much of this paper we do not identify any particular kinetic pathway leading to the desired TBN stable state. Of course real-world physical systems do not operate at thermodynamic equilibrium, and might take longer than the lifetime of the universe to get there. Thus, for such ‘kinetically trapped’ systems, encoding desired output in thermodynamic equilibrium is not enough by itself. \optarxiv_versionTo address this, in Section E we give a kinetically and thermodynamically favoured binary counter that assembles in both the abstract Tile Assembly Model and the TBN model. \optLNCS_versionTo address this, in the full version of this paper we give a kinetically and thermodynamically favoured binary counter that assembles in both the abstract Tile Assembly Model and the TBN model. Similarly, the strand displacement AND gate from ref.  can be shown to compute correctly in the TBN model . Nonetheless, more work is needed to come up with TBN schemes that have fast kinetic pathways, in addition to the provable thermodynamic guarantees.
Let denote the set of nonnegative integers, integers, and positive integers, respectively. A key type of object in our definitions is a multiset, which we define in a few different ways as convenient. Let be a finite set. We can define a multiset over using the standard set notion, e.g., , where . Formally, we view multiset as a vector assigning counts to . Letting denote the set of functions , we have . We index entries by elements of , calling the count of in . Fixing some arbitrary ordering on the elements of , we may equivalently view as an element of , where for , denotes . Let denote the size of . For any vector or matrix , let denote the largest absolute value of any component of .
We model molecular bonds with precise binding specificity abstractly as binding “domains”,
designed to bind only to other, specific binding domains.
Formally, consider a finite set of primary domain types.
Each primary domain type is mapped to a complementary domain type (a.k.a., codomain type) denoted .
Let denote the set of codomain types of .
The mapping is assumed 1-1, so .
We assume that a domain of primary type binds only to its corresponding complementary type , and vice versa.
We assume a finite set of monomer types, where a monomer type is a non-empty multiset of domain types, e.g., with being primary domain types.
A thermodynamic binding network (TBN) is a pair consisting of a finite set of primary domain types and a finite set of monomer types.
A monomer collection of is multiset of monomer types;
intuitively, indicates how many of each monomer type from there are, but not how they are bound.
Since one monomer collection usually contains more than one copy of the same domain type,
we use the term domain to refer to each copy separately.
A single monomer collection can take on different configurations depending on how domains in monomers are bound to each other.
To formally model configurations, we first need the notion of a bond assignment.
Let be the bipartite graph describing all possible bonds, where
is the multiset of all primary domains in all monomers in ,
is the multiset of all codomains in all monomers in ,
and is the set of edges between primary domains and their complementary codomains .
A bond assignment is a matching
Another graph that will be useful in describing the connectivity of the monomers, independent of which exact domains are bound, is the monomer binding graph , which is obtained by contracting each monomer edge of . In other words, is the set of monomers in , with an edge between monomers that share at least one pair of bound domains.
Which configurations are thermodynamically favored over others depends on two properties of a configuration: its bond count and entropy.
The enthalpy of a configuration is the number
configurations with higher enthalpy (more bonds formed) and higher entropy (more separate complexes) are thermodynamically favored.
What happens if there is a conflict between the two?
One can imagine capturing a tradeoff between enthalpy and entropy by some linear combination of and .
In DNA nanotechnology applications,
the tradeoff can be controlled by increasing the number of nucleotides constituting a binding domain (increasing the weight on ), or by decreasing concentration (increasing the weight on ).
In the rest of this paper, we study the particularly interesting limiting case in which enthalpy is infinitely more favorable than entropy.
3 Thermodynamic Boolean formulas
Fig. 2 shows an example of a TBN that performs AND computation, based on the CRN strand displacement gate from ref. . Realized as a strand displacement system, it has a kinetic pathway taking the untriggered (left) to the triggered (right) configuration. The inputs are specified by the presence (logical value 1) or absence (logical value 0) of the input monomers and . The output convention followed is the following. The output is 1 if and only if some stable configuration has the output monomer unbound to any other monomer (free). This can be termed the weak output convention. Alternatively, in the strong output convention, output 1 implies every stable configuration has the output monomer free, and output 0 implies every stable configuration has the output monomer bound to some other monomer. More complex AND gate designs are compatible with the strong output convention (not shown).
Note that even the weak output convention, coupled with a kinetic pathway releasing the output given the correct inputs, can be used to argue that: (1) if the correct inputs are present the output will be produced (via kinetic argument), (2) if the correct inputs are not present then ultimately little output will be free (thermodynamic argument). In the context of strand displacement cascades, TBNs can explore arbitrary structures (pseudoknots, remote toeholds, etc) since we do not impose any ordering on domains in a monomer, nor any geometry. This strengthens the conclusion of (2), showing that arbitrary (even unknown) kinetic pathways must lead to a thermodynamic equilibrium with little output.
While individual AND gates can be proven correct with respect to the above output conventions (e.g., through the SAT solver of ref. ), it remains to be shown that these components can be safely composed into arbitrary Boolean circuits. Note that the input and output monomers have orthogonal binding sites. This is important for composing AND gates, where the output of one acts as an input to another. As is typical for strand displacement logic, OR gates can be trivially created when multiple AND gates have the same output. Dual-rail AND/OR circuits are sufficient to compute arbitrary Boolean functions without explicit NOT gates. Nonetheless it is not obvious that the input convention (complete presence or absence of input monomers) matches the output convention (weak or strong). It is also not clear how statements about the stable configurations of the whole circuit can be made based on the stable configurations of the individual modules.
We now show that correct composition can be proven in certain cases. Although we believe that the gate shown in Fig. 2 is composable, the argument below relies on a different construction. We further consider a restricted case of AND gate formulas (trees).
An important concept in the argument below is the notion of “distance to stability”. This refers to the difference between the entropy of the stable configurations and the largest entropy of a saturated configuration with incorrect output. The larger the distance to stability, the larger the entropy penalty to incorrectly producing the output. Unlike the simple AND gate from Fig. 2, the constructions below can be instantiated to achieve arbitrary desired distance to stability (by increasing the redundancy parameter ).
Many open questions remain. Can our techniques be generalized to arbitrary circuits, rather than just trees of AND gates? Can we prove these results for logic gates that have a corresponding kinetic pathway (like the AND gates in Fig. 2 which can be instantiated as strand displacement systems)? Finally, in our Boolean gate constructions, we assume that the monomer collection has exactly one copy of certain monomers. It remains open whether these schemes still work if there are many copies of all monomers.
3.1 Translator cascades
We begin with the simplest of circuits, translator cascades (), which simply propagate signal through layers when the input signal is present. Logically a translator gate is simply a repeater gate. The input is the presence or absence of the input monomer consisting of copies of domain . Our analysis below implies that if and only if the input is present, there is a stable configuration with copies of domain in the same polymer. The terminator gadget converts this output to the weak output convention defined above (whether or not the monomer consisting of copies of domain is free). The following Lemma shows that we can exactly compute the distance from stability of a translator cascade shown in Fig. 3. Besides being a “warm-up” for AND gate cascades, the Lemma is used in the proof of Theorem 3.2.
The intended configuration of a monomer collection representing a depth , redundancy translator cascade, without input, and with output , is saturated and has . (See Fig. 3.)
If is a saturated configuration of a monomer collection representing a depth , redundancy translator cascade, without input, and with output , then .
3.2 Trees of AND gates
In this section we motivate how Boolean logic gates can be composed such that the overall circuit has a guaranteed distance to stability, relative to a redundancy parameter . Specifically, we start with the AND gate design of Fig. 4, and we give a concrete argument for a tree of these AND gates (e.g., Fig. 5).
Consider a TBN for AND gates, with redundancy , composed into a tree of depth . If at least one of the inputs is not present, the distance to stability for any saturated configurations with output is at least .
Let be any saturated configuration of the TBN with output . Consider the missing input and define the leak path to be the linear sequence of AND gates from the missing input to and including the terminator gadget. For convenience we imagine relabelling all the domains in the leak path indexed by the position of the AND gate in the leak path. For example, Fig. 5 highlights the leak path through the tree from a missing input (“0”) to erroneous output (“1”). Specifically, the domain names as shown in Fig. 4 appear in the th AND gate (for ), where feeds into the terminator gadget. Domains connect the leak path to the rest of the tree.
Given a configuration of a monomer collection , we say we excise a domain if we create a new configuration by removing the node corresponding to and all incident edges. (Note that is a configuration of a monomer collection of a different TBN.)
Excise all domains of type and codomains of type on monomers of the leak path involved in fan-in, , yielding the new configuration . Note that if domain is on a monomer other than the leak path, then it is not excised.
The leak path in now has no domains in common with the rest of the tree (and thus no bonds). Let be the subconfiguration of the leak path, and let be the subconfiguration of the rest of the system. (Note .)
Given a saturated configuration , if you excise all domains or codomains of a particular type (or both its domains and codomains) yielding , then is saturated.
By Observation 3.3 is saturated since for every domain type and codomain type , every instance of is excised; . This implies and are also saturated.
Excise all domains of type and and all codomains of type and in , , yielding the new configuration . By Observation 3.3, is saturated.
Proof of the claim.
Entropy can only be decreased via excision if an entire monomer is excised. Since Manipulation 1 only excised domain and codomain types from the set , and those domain types only appear on monomers which also have domain instances with types not in , then no entire monomer was excised. ∎
Proof of the claim.
For every layer , , there are monomers that only contain domain and codomain types in the set . Therefore, contains at most fewer monomers than , for each of the layers. ∎
Proof of the claim.
Recognize that is a saturated configuration of a monomer collection representing a depth , redundancy translator cascade, without input, and with output . The claim follows by Lemma 1. ∎
Now, take the monomers from the leak path in , and configure them into the “untriggered configuration” (see Fig. 4, left), yielding subconfiguration . Let . Note that is saturated, and therefore is a saturated configuration of the entire tree (i.e., the same TBN as ).
Finally, consider the entropy gap between and .
Therefore, there exists a saturated configuration with output over the same TBN as , but with entropy at least larger, thus establishing the theorem. ∎
Theorem 3.2 seems to suggest that in order to maintain the bound on distance to stability for incorrect computation, the redundancy parameter should increase to compensate for an increase in circuit depth . However, a more sophisticated argument shows that manipulations and can decrease entropy by at most . Following the above argument, the distance to stability is found to be . This is optimal because a single AND gate with redundancy can be shown to have no entropy gap between output and output configurations.
4 Thermodynamic self-assembly: Assembling large polymers
TBNs can not only exhibit Boolean circuit computation, but they can also be thought of as a model of self-assembly. Here we begin to explore this connection by asking a basic question motivated by the abstract Tile Assembly Model (aTAM) : how many different monomer types are required to assemble a large polymer?
Favoring enthalpy infinitely over entropy, on its face, appears to encourage large polymers. Perhaps we can imagine designing a single TBN that can assemble arbitrarily large polymers where for each , has a stable polymer composed of at least monomers. In this section we show that this is impossible: every TBN has stable polymers of size at most exponential in the number of domain types and monomer types (Theorem 4.5). The proof shows that any polymer larger than the bound can be partitioned into at least two saturated (maximally bound) polymers, which implies that is not stable. Fig. 7 gives an example. We also show that this upper bound is essentially tight by constructing a family of systems with exponentially large stable polymers (Theorem 4.1). Taken together, the exponential lower bound of Theorem 4.1 and upper bound of Theorem 4.5 give a relatively tight bound on the maximum size achievable for stable TBN polymers.
Is it possible to construct algorithmically interesting TBN polymers that are stable? In the full version of this paper, we show that a typical binary counter construction from the aTAM model is not stable, but can be modified to become stable in our model. Importantly, this TBN binary counter demonstrates that in principle algorithmically complex assemblies could have effective assembly pathways (aTAM) as well as be thermodynamically stable (TBN).
4.1 Superlative trees: TBNs with exponentially large stable polymers
The next theorem shows that there are stable polymers that are exponentially larger than the number of domain types and monomer types required to assemble them.
For every , there is a TBN with and , having a stable polymer of size
An example of for and is shown in Fig. 7. Let and , where, for each , (i.e., 1 copy of and copies of ), , and . Define by for Then . Observe that has a unique (up to isomorphism) saturated configuration (which is therefore stable), described by a complete -ary tree: level of the tree is composed of copies of , each bound to children of type in level . ∎
4.2 A linear algebra framework
We prove Theorem 4.5, the main result of Section 4, by viewing TBNs from a linear algebra perspective. Let be a TBN, with and . For a matrix , let denote the entry in the ’th row and ’th column. Define the positive monomer matrix of by . Define the negative monomer matrix of by . Define the monomer matrix of to be . Note that and are matrices over , but is over .
The rows of the monomer matrix correspond to domain types and the columns correspond to monomer types. The mapping from a TBN to a monomer matrix is not 1-1: is the number of domains minus the number of domains in monomer type , which would be the same, for instance, for monomer types and . Let be a monomer collection and let ; for , is the number of domains minus the number of domains in the whole monomer collection .
Let be saturated; can only have a domain unbound if all copies of its complement are bound, and vice versa. If , in there is an excess of domains, and all domains are bound. If , in there is an excess of domains, and all domains are bound. This leads to the following observation.
Let be a TBN and a monomer collection. Let . Then for every configuration , is saturated if and only if, for all , if (respectively, if ), then is the number of unbound (resp., ) domains in .
Let and be TBNs with the same set of domain types. Then we call a relabeling of if there exists a subset such that can be obtained from by starring any instance of in and unstarring any instance of in . Since this corresponds to negating the ’th row of , which negates the ’th entry of the vector , this gives the following observation.
Let be a TBN and a monomer collection. There exists a relabeling of so that .
Combining Observations 4.2 and 4.3 results in the following observation, which essentially states that for any given monomer collection , we may assume without loss of generality that domains unbound in saturated configurations are all primary domain types.
Let be a TBN and a monomer collection. There exists a relabeling of so that, letting , for all configurations , is saturated if and only if, for all , is the number of unbound primary domains of type in .
The following lemma is a key technical tool for showing that a polymer is not stable (or equivalently that a stable configuration has entropy greater than 1 and therefore cannot be a single polymer). It generalizes the idea shown in Fig. 7 that if one can find a monomer subcollection in a larger collection , and has a saturated configuration with no bonds left unbound, then one can create a saturated configuration with no bonds between and . (Thus has at least two polymers.)
given a monomer collection with at least as many as domains (under appropriate relabeling this holds for each by Observation 4.3),
if we can partition into subcollections and ,
and each of them also has at least as many as domains for each ,
then every stable configuration has at least two polymers,
since there is a saturated configuration of in which there are no bonds between and .
Let be a TBN, let be a monomer collection of such that , and let be a stable configuration. If there exist nonempty subcollections where 1) and 2) and , then .
4.3 Exponential upper bound on polymer size
We now show a converse to Theorem 4.1, namely Theorem 4.5, showing that stable polymers have size at most exponential in the number of domain and monomer types. The proof of Theorem 4.5 closely follows Papadimitriou’s proof that integer programming is contained in . That proof shows, for any linear system , where is a given integer matrix, is a given integer vector, and represents the unknowns, that if the system has a solution , then it has a “small” solution . “Small” means that is at most exponential in . The technique of  proceeds by showing that any sufficiently large solution can be split into two vectors such that , where , so is also a solution: . This is useful because and satisfy the hypothesis of Lemma 2, which tells us that all stable configurations obey , so any single-polymer configuration of is not stable.
We include the full proof for three reasons:
2) it requires a bit of care to convert our inequality into an equality as needed for the technique,
We require the following discrete variant of Farkas’ Lemma, also proven in .
Lemma 3 ().
Let , , and . Then exactly one of the following statements holds:
There exist integers , not all , such that
There exists a vector such that, for all , .
Intuitively, statement (1) of Lemma 3 states that the vectors can be added to get (they are “directions of balanced forces” ). This is false if and only if statement (2) holds: the vectors all lie on one side of some hyperplane, whose orthogonal vector would then have positive dot product with each of the vectors (thus adding any of them would move positively in the direction and could never cancel to get ).
Intuitively, Theorem 4.5 states that the size of polymers in stable configurations is upper bounded by a function which is exponential in . We prove this by first defining a constant which is exponential in . If each of the individual monomer counts is less than , then we are done since no polymer in the configuration can have size bigger than . If some of the monomer counts are greater than (call these large-count monomers), we consider two cases.
For the first case, we consider the scenario where the vectors which describe the monomer types with large monomer counts are such that they can “balance” each other out with relatively small linear combination coefficients. If this is the case, then we can make a saturated subconfiguration which has at least one polymer using these small linear combination coefficients and large-count monomer types since the domains and codomains completely “balance” each other out. We can then use the rest of the counts of the configuration to make another saturated subconfiguration which has at least one polymer. This is shown mathematically by applying Lemma 3 to show that the monomer counts in the polymer can be split to find a configuration consisting of two separate saturated polymers. This means that there is a saturated configuration that has at least two polymers which contradicts the assumption is a single stable polymer.
If there exist no such linear combination to “balance out” out the vectors describing the large-count monomers, then Lemma 3 tells us all of these vectors lie on the same side of some hyperplane. In this case, we show that counts of the small-count monomers play a role in bounding the counts of the large-count monomers. Intuitively, if all of the vectors describing the large-count monomers lie on the same side of some hyperplane, they are missing domains and codomains which will allow them to bind together. The domains and codomains they need in order to bind together, then must be found on the small-count monomer. Consequently, this means the size of polymers will be bound by the counts of small-count monomers (which is exponential in ). \optLNCS_versionThe proof appears in the full version of this paper.
arxiv_versionThe proof of the following theorem is in Appendix D.
Let be a TBN with and . Let be the maximum count of any domain in any monomer. Then all polymers of every stable configuration of have size at most
Appendix A Proof of Lemma 1
By structural induction on cascade depth . Consider as a minimal element a saturated configuration having output , for a translator cascade of depth , redundancy , and without input. By assumption, the terminator monomer is free. To saturate the codomains of type , the monomer and the monomers must be in the same polymer. To saturate the codomains of type , the monomers must also be in the same polymer containing the monomers. There are therefore two polymers containing the following monomers: (1) the terminator monomer , and (2) every other monomer. Thus, .
Assume that if is a saturated configuration having output for a translator cascade of depth , redundancy , and without input, then .
Consider a saturated configuration having output , for a translator cascade of depth , redundancy , and without input. Let be the gate monomers of each layer ; . We first modify into a saturated configuration with output , such that there are no bonds between monomers in and . The only possible bonds between monomers in and is between a monomer and a monomer . Let be the number of bonds between these two types of monomers. If , we are done. Otherwise, we note that there must be free monomers in . Let be the configuration where the bonds between and monomers in are replaced by new bonds between and monomers, both from . Thus remains saturated and can be partitioned into two saturated sub-configurations: containing the monomers from and containing the remainder. Since there are no bonds between the sub-configurations then .
First, note that the monomers from can only form a single saturated configuration containing polymers, with each polymer being an monomer bound to a monomer. Thus, . Second, note that is a saturated configuration having output for a translator cascade of depth , redundancy , and without input. Thus, by the inductive assumption. Therefore, establishing the claim. ∎
Appendix B Proof of Lemma 2
Let , , and . For , is the number of unbound domains of type in every saturated configuration (and no domains are unbound in ). Let and be saturated configurations. Define the configuration by ; note that since there are no bonds between and . Excess domains of each type in are given by the vector . Thus is saturated by Observation 4.2. Since , every stable configuration of has at least two polymers as well. ∎
Appendix C Upper bound on the size of a stable polymer if it is a tree
Let be a TBN, , and be the maximum number of domains in any monomer type. Then any stable polymer of such that is acyclic has at most monomers.
Fig. 8 shows an example. A path in traverses an edge by either moving from a monomer with a codomain to a monomer with a primary domain , or vice versa. Suppose there is a simple path in of length . Then by the pigeonhole principle, some domain is traversed twice with the same primary/codomain ordering. Then by binding the first instance of to the second instance of and the second instance of to the first instance of , we introduce a cycle. However, since the number of edges has not changed, the new graph must be disconnected, i.e., split into at least two polymers. But since this new configuration is saturated, the original could not have been stable. Therefore the diameter of any acyclic stable monomer binding graph is at most .
Note that upper bounds the degree of . Any degree- acyclic graph of diameter has at most nodes: the worst case is a rooted complete -ary tree of depth (so diameter ), whose root is the only node that can have up to children since it has no parent. ∎
Appendix D Proof of Theorem 4.5
It suffices to prove the theorem in the special case where is itself a single stable polymer. Let be the monomer collection of . By Observation 4.3 we can assume without loss of generality that .
To turn this inequality into an equality, we introduce slack variables. Let be the identity matrix. Let ( concatenated horizontally with ); note has dimension . Let ; by Observation 4.4 is the number of unbound primary domains in every saturated configuration in (and no such configuration has any unbound codomains). Concatenating with to obtain