# Physical limits on cooperative protein-DNA binding and the kinetics of combinatorial transcription regulation

###### Abstract

Much of the complexity observed in gene regulation originates from cooperative protein-DNA binding. While studies of the target search of proteins for their specific binding sites on the DNA have revealed design principles for the quantitative characteristics of protein-DNA interactions, no such principles are known for the cooperative interactions between DNA-binding proteins. We consider a simple theoretical model for two interacting transcription factor (TF) species, searching for and binding to two adjacent target sites hidden in the genomic background. We study the kinetic competition of a dimer search pathway and a monomer search pathway, as well as the steady-state regulation function mediated by the two TFs over a broad range of TF-TF interaction strengths. Using a transcriptional AND-logic as exemplary functional context, we identify the functionally desirable regime for the interaction. We find that both weak and very strong TF-TF interactions are favorable, albeit with different characteristics. However, there is also an unfavorable regime of intermediate interactions where the genetic response is prohibitively slow.

Key words:
cooperative protein-DNA binding, transcription regulation, target search, monomer vs. dimer pathway, DNA-protein complex assembly

## Introduction

Cells respond to many biochemical signals by adjusting their gene expression levels, often in a combinatorial way where the transcription rate of a given gene is a nonlinear function of several inputs. The entire signal transduction cascade, beginning with the detection of the biochemical signals and culminating in a changed intracellular protein concentration, is generally believed to be under strong selective pressure for rapid and well-adjusted responses in competitive environments. An important step in this cascade involves proteins belonging to the large class of transcription factors (TFs) which convey the external signal and trigger the appropriate genetic response by binding to specific binding sites on the genomic DNA. The search process of individual TFs for their functional target sites hidden within millions of non-functional sites on the DNA is well characterized, see e.g. [1, 2, 3, 4, 5, 6, 7]. This has led to an understanding of the tradeoffs inherent in the choice of TF-DNA interaction parameters, when both a rapid search as well as sufficient equilibrium discrimination for the functional sites is required [8, 9, 10].

However, the experimental timescale for the search process, as inferred e.g. from single-molecule measurements in vivo [11], is surprisingly short compared to the timescale for significant change in gene expression levels: Whereas a TF target site is occupied within a minute even at low TF concentrations, the concentration of the protein expressed from the target gene typically changes significantly only over a timescale of several minutes, due to the slow kinetics of protein synthesis and degradation. Hence, the search time is only a fraction of the total response time, and it is unclear whether fine-tuning of TF-DNA interaction parameters is needed for kinetic reasons. On the other hand, even in bacteria many genes are co-regulated by a combination of different TFs [12, 13, 14, 15, 16, 17, 18, 19, 20], while the search process studied so far is that of a single TF species, i.e. multiple TF molecules of the same type. A salient question is whether the timescale of transcription control increases with the complexity of the implemented regulatory function.

To explore this question, we consider a simple theoretical model for the kinetics of combinatorial transcription regulation. We focus on the example of an AND-like cis-regulatory function implemented by two TFs, referred to as ‘’ and ‘’, which bind cooperatively to two adjacent target sites to activate a gene. This scenario is exemplified by the melAB promoter of E. coli, where CRP and MelR bind cooperatively to activate transcription [19]. Our model is sufficiently generic that it can be applied to a variety of cooperative protein-DNA binding situations. However, the example of the “AND-gate” is particularly well suited to illustrate the basic effects and functional tradeoffs that become apparent when the interaction parameters are varied. Compared to the well-studied case of a single TF-species, the new aspect here is the mutual interaction between the TFs (cf. Fig. 1), which is quantified by the dimensionless cooperativity . This quantity is only a measure of the interaction strength between TFs, with the effective free energy of the interaction and the energy scale of thermal fluctuations. It is not related to the Hill coefficient, which depends on the number of components involved in a cooperative complex. The strengths of direct protein-protein interactions vary over a broad range with dissociation constants between the femto and the centi-molar regime [21]. Biochemically feasible values can therefore span many orders of magnitude, from weak transient interaction with to strong dimerization with or larger. Depending on this value, the kinetics of cooperative protein-DNA binding will either be dominated by a “monomer pathway” or a “dimer pathway” [22, 23]. How do the response time and the steady-state levels of a regulatory module depend on the cooperativity? And which regime of values could be favorable in which functional context?

Our model, illustrated in Fig. 2, generalizes the classic facilitated diffusion model [1] to two interacting protein species. It incorporates the basic kinetic moves, i.e. binding to a DNA site, sliding along the DNA, and unbinding from the DNA, for monomers as well as for dimers. In addition, dimers can form or break up either in solution or while bound to the DNA. We characterize the behavior of our model using a variety of analytical and numerical approaches to calculate equilibrium and kinetic observables over a parameter range chosen to permit the exploration of functional tradeoffs in a bacterial system such as E. coli. For instance, in bacterial transcription regulation, a faster response is generally expected to be advantageous, whereas the steady-state transcription levels of a cis-regulatory function must be adjusted to yield the optimal protein concentrations for the biological conditions represented by the input signals [24, 25, 26]. Therefore, when considering different choices of , we compare regulatory systems that lead to the same steady-state levels. The exploration of our model leads us to two favorable regimes of , corresponding to weak (and often promiscuous) interactions and very strong heterodimerization, respectively. On the other hand, our model predicts that the search kinetics will be prohibitively slow at intermediate values, at least when the protein copy number is small as is usually the case for bacterial transcription factors. In the ‘Discussion’ section, we consider biological implications of these theoretical findings and discuss possible experiments to characterize the cooperative search problem and the kinetics of combinatorial transcription regulation.

## Results

### Cooperativity and regulatory function

Cooperative protein-DNA binding is employed in diverse functional contexts. For some functions, many molecules of the same protein polymerize along DNA, e.g. RecA for homologous recombination [27] or single-strand-binding-protein during DNA replication [28]. In these cases, the role of the protein-protein interaction is to enhance the probability of obtaining continuous DNA coverage rather than a patchwork of randomly positioned molecules. Here we focus on the functional context of transcription regulation where cooperative protein-DNA binding is involved in the processing of input signals. These signals are integrated and transformed into a single output, the transcription rate of a gene [29].

The cooperative binding of a transcription factor (TF) with RNA polymerase (RNAp) transfers a signal, by regulating the effective binding threshold for RNAp via the concentration of active TF (‘regulated recruitment’ [29], see Fig. 1A). When two different TFs bind cooperatively and each makes contact with RNAp to activate transcription, see Fig. 1C, two signals are effectively integrated into a single output. A similar case is depicted in Fig. 1B where TF binding is assisted by a helper protein that does not make contact with RNAp itself. This motif resembles, for instance, the regulation of the melAB promoter, which is co-dependent on the the transcription factors CRP and MelR [19]. The helper can also be another molecule of the same TF, making the response to its signal more switch-like (increased effective Hill coefficient).

The molecular function in the ‘signal transfer scenario’ of Fig. 1B is quantitatively described by the probability to find a protein bound as a function of the concentration of a protein that binds adjacently. In contrast, for the ‘signal integration scenario’, the functional activity is captured by the probability that two DNA sites and are both occupied by the matching TF proteins. In the following, we will refer to both quantities, and , simply as the ‘average activity’ for the respective scenario. We envisage that selection acts on these average activities, as well as on a characteristic time scale, the ‘response time’ , associated with the kinetics of each mechanism. Here, corresponds to the typical delay for adjusting the activity to a new average level after a change in the input signal. In a steady state, is also a characteristic time scale of spontaneous fluctuations in the activity (noise). Importantly, both the average activity as well as the response time depend on the binding cooperativity .

### Average activity

Before we introduce our full model, it is instructive to consider the average activity within the simple approximation where we focus only on two binding sites and and ignore binding of the TFs to the rest of the DNA. This consideration will be useful in particular as a guide for our detailed study of possible tradeoffs in the choice of within the full model.

We first consider the signal transfer scenario as shown in Fig. 1B. In equilibrium, the probability that site is occupied by one of available molecules of type is the normalized sum of the statistical weights for all states where is occupied [30]. In the absence of , i.e. for , this is just , with the statistical weight for an unoccupied site set to one and denoting the relative weight for to be occupied. Here, the ‘binding threshold’ , which corresponds to the number of molecules needed to obtain a average occupancy of in the absence of , is directly connected to the effective equilibrium binding constant of to and the cell volume via . In the presence of , the occupancy of increases to

(1) |

where is the average occupancy of in the absence of . Thus, the presence of boosts the statistical weight for binding by the ‘regulation factor’ [30], i.e. the factor in square brackets in Eq. 1. Intuitively, this factor may be thought of either as a boost in the local effective concentration of [29], or as a decrease in the effective binding threshold (the latter interpretation is closer to the underlying physics).

Importantly, the regulation factor cannot exceed the cooperativity value , and it reaches only if takes on its maximal value of one. As a consequence, the cooperativity is also an upper bound on the fold-change in -occupancy induced by a change in concentration, since . This constitutes a physical constraint on that arises from the equilibrium statistics of cooperative protein-DNA binding,

(2) |

i.e. the cooperativity must be larger than the required fold-change in the output signal ( for the signal transfer scenario). On the molecular level, this constraint can be implemented by a sufficiently strong direct protein-protein interaction or by indirect mechanisms of cooperativity, e.g. via collaborative competition [31] or DNA bending [32].

For the signal integration scenario in Fig. 1C, the definition of the fold-change is different, but the constraint (2) holds as well. Here, the relevant fold-change is the average activity in the presence of both inputs relative to the average activity with only a single input, or , where

(3) |

This fold-change is then transferred to the promoter activity in the example considered in Fig. 1C. Taken together, when considering steady-state activities, both the signal transfer and the signal integration function benefit from larger cooperativities, since large ’s allow for tight regulation. However, since large binding energies often lead to slow kinetics, we will explore whether a tradeoff exists between the fold-change in average activity and the response time.

### Full model

We now introduce a full kinetic model for the cooperative target search which is based on the energies of TF binding states and the transition rates between these states, as illustrated in Fig. 2. We consider a single circular genome of length (in units of base pairs) inside a cell of volume with a single pair of adjacent target sites for and . The unbound state of free TFs in solution is our reference state, with its energy set to . If and dimerize in solution, the interaction energy is gained, while entropy is lost, since the number of possible states is reduced by a factor that we write as , with a microscopic volume on the order of the size of a TF. Each TF molecule has possible binding sites on the DNA (indexed by with ) with the respective bound-state energies and . These bound-state energies are either equal to , if the protein-DNA interaction is non-specific, or they take on a lower value if the binding sequence favors specific protein-DNA contacts, . We denote by the number of base pairs on the DNA which are occupied by a bound monomer (occupied DNA is inaccessible to other TF molecules), and we posit that and can form a DNA bound dimer only when binds directly upstream of .

For the kinetic rates, we assume that all binding reactions are diffusion limited. For simplicity, we take the same rate constant for the binding of two protein molecules in solution and for the association of a TF molecule with a specified DNA site (thus, the total rate of TF binding anywhere on the DNA is , if no DNA site is occupied already). The random diffusion of TFs along the DNA contour occurs with the basal sliding rate . When neighboring sites have different energies, the sliding rate is the basal rate from the higher to the lower energy state while the reverse process occurs at the reduced rate , with the energy difference, such that detailed balance is respected (in the following we assume all energies to be in units of which amounts to setting ). The rates for all other possible reactions are similarly obtained from detailed balance. For instance, the unbinding rate of a monomer from a non-specific DNA site is determined by , and the dissociation rate of a free dimer . Note that monomers can also unbind or slide away from a DNA site while simultaneously dissociating from a cooperatively bound partner (thus disrupting the DNA-bound dimer, see Fig. 2b, top right). In that case detailed balance dictates that monomer sliding and dissociation rates decrease by a factor due to the loss of the dimerization energy .

Within the framework of this full model, we calculate the steady-state activities as described in Section S1 in the Supporting Material (this exact calculation includes the effect of the genomic background and mutual exclusion of overlapping binding sites, both neglected in the simple discussion above). We determine average search times numerically, using kinetic Monte Carlo simulations as described in Section S2, and we also develop analytical approximations further below and in Section S3.

We choose the parameters of our full model to roughly reflect the situation in a bacterium such as E. coli. We set the genome length to bp, choose a cell volume of , and consider DNA binding sites of length bp. The sliding rate can be determined from recent measurements of the one-dimensional diffusion constant for TF sliding on non-specific DNA [11, 33], which obtained values close to , corresponding to a sliding rate of about . The same experiments also determined a residence time of ms for TF molecules on non-specific DNA before dissociation. At the given genome length, this fixes our rate constant to be in the range , and we set in the following. Unless otherwise stated, we will assume, for simplicity, that the target sites , are the only specific binding sequences in the genome, both with energy . We measure all energies in units of . We set the strength of the non-specific protein-DNA interaction by requiring that a single TF spends on average equal time unbound in solution as bound somewhere on the DNA. This parameter choice corresponds to the well-characterized optimum for the search process of a single TF species [34, 4]; see also the discussion of this point further below. Within our energy model, this corresponds to a non-specific binding energy , assuming a reaction volume nm. In our model, the effective dimerization rate is increased by the presence of the DNA (which acts as a scaffold for the interaction). A similar increase was observed experimentally in a study of the Jun-Fos DNA complex [23].

### Quantitative Analysis

We now analyze how the quantitative characteristics of the two-protein-species system depend on the cooperativity . The cooperative target state where both target sites are occupied can be reached via two distinct kinetic pathways: In the “monomer pathway”, and separately search for their specific target sites in multiple rounds, alternating between one-dimensional diffusion along the DNA and three-dimensional diffusion in the cytoplasm to a new position on the DNA. In this pathway, and arrive independently, i.e., one after the other, at their specific target sites. By contrast, in the “dimer pathway”, the dimer forms beforehand, either in solution or in the DNA background, such that and reach their target sites simultaneously (cf. Fig. 2A). Clearly, we expect the monomer pathway to dominate for weak TF-TF interactions (small ), while the dimer pathway should dominate for large . But what is the behavior of the overall search time that results from the kinetic competition between the two pathways?

Before performing the kinetic analysis, we first characterize the steady state characteristics of our full model. We will focus on the signal integration scenario in the remainder of this study; the behavior in the signal transfer scenario is qualitatively similar. As discussed above, the most relevant steady state characteristic in the functional context of gene regulation is the attainable fold-change of the average activity, which determines how tightly a gene can be regulated. We assume that the expression level of the regulated gene in the high-activity state, when both TF species can bind the promoter (the “ON-state”) is constrained to its optimal level by evolutionary selection, e.g. the optimal level of a metabolic enzyme in the presence of its substrate [25, 24]. The fold-change between the ON-state and the OFF-state (in which only one of the TFs can bind) then determines how tightly the production of the protein can be suppressed under conditions when it would be useless or even detrimental. Hence, when we consider the system at different cooperativity values , we take for granted that another system parameter is adjusted to keep the ON-state level constant. Specifically, we will assume that this compensation occurs via the target binding threshold, which is programmable via the DNA sequence of the target site [10]. In other words, we compensate a weaker protein-protein interaction with a stronger protein-target interaction such that the ON-state level remains constant. In E.coli and yeast, binding sites indeed tend to deviate from the consensus motif when multiple TFs bind next to each other in the cis-regulatory region [35, 18, 15]. For simplicity, we consider a symmetric pair of TFs, which have different binding sequences, but the same energetics, such that .

Fig. 3B shows the resulting fold-change for the full model as a function of the cooperativity (on a double-logarithmic scale), with the three curves corresponding to different ON-state levels . The fold-change increases monotonously with the cooperativity, roughly as , before it saturates at a maximal level that depends slightly on the ON-state level. For , the dependence on the ON-state level is non-monotonous, with a larger for than for both and . Much of this behavior can be understood already within the simple approximation of Eqs. 1 and 3 as follows: For large , cooperative binding to the targets becomes dominant in the ON-state, such that the non-cooperative contribution in the denominator of Eq. 3 can be neglected. One then finds that , explaining the behavior in the intermediate range of Fig. 3B, i.e. the -dependence and the non-monotonous dependence on the ON-state level . However, the saturation of the fold-change at very large is beyond this simple approximation, which neglects the background DNA and assumes that the TFs hetero-dimerize only on the target. This assumption breaks down in the strongly-interacting regime, as shown in Fig. 3A, which plots the equilibrium probability to find the TFs as a hetero-dimer. Dimers become prevalent in the background when the cooperativity outweighs the entropic cost of dimerization. If the non-specific DNA interaction of monomers is optimized for independent search (see below), the dimerization probability is simply (see Section S1). Further increase of has no significant effect on the fold-change. Hence, the full model confirms our previous conclusion that a large cooperativity is generally beneficial for the steady-state response, but only up to a value of .

Next, we turn to the cooperative search process. We first consider the situation with only one molecule of each type (). Initially, both monomers are unbound. The cooperative search time corresponds to the first point in time when and are both occupied. Fig. 3C shows its mean, , as a function of , for three different ON-state levels . Here, the symbols represent simulation results, where the average is taken over a large number of simulation runs (see Section S2 for details), while the solid lines represent an analytical approximation discussed below and in Section S3. Note that is plotted in units of the monomer search time , which is defined as the average time needed by a monomer, e.g. of type , to find its target in the absence of . This kinetic ratio, is a direct measure of the slow down of cooperative regulation relative to the timescale for independent regulation. When the cooperativity is negligible (), Fig. 3C shows that the kinetic ratio is only slightly larger than one. In this regime, the second protein arrives independently and on the same timescale as the first, while each protein is stably bound by itself, such that the first protein typically does not unbind from its target before the second protein arrives. Indeed, the probability of such a “missed encounter” depends on the ON-state level and is simply when , which consistently explains the -dependence (at fixed ) in Fig. 3C.

With increasing cooperativity , the cooperative search time also becomes longer. Note that our reference time scale, the monomer search time , is independent of , such that the ratio plotted in Fig. 3C shows indeed the -dependence of the absolute timescale for cooperative search. The slow down scales with the square root of the cooperativity, . This scaling reflects the mechanism underlying the slow down, which is produced by an increasing probability of missed encounters: As the cooperativity is increased, our constraint of a constant implies that a monomer bound to its target becomes less stable and detaches more often before its partner arrives. The cooperative search time is then determined by the number of times a TF must return to its target before finding the other target occupied, which is roughly , the inverse of the probability that a single target is occupied. At intermediate , this probability scales as , leading to the observed scaling.

The increase of the search time with is not indefinite, however, because the relative importance of the dimer pathway increases with . The contribution of the dimer pathway is shown in Fig. 3D. It displays a sigmoidal form, with a narrow transition region where the cooperative search switches from the monomer mode to the dimer mode. This transition is accompanied by a peak in the cooperative search time in Fig. 3C. Note that this transition occurs at significantly smaller values than the transition in the equilibrium probability for hetero-dimerization shown in Fig. 3A.

To understand the non-monotonous behavior of the cooperative search time in Fig. 3C, it is instructive to consider the extreme case of a purely dimeric search. Fig. 4 shows the purely dimeric search time (black line and circles) as a function of the dimer binding ratio, i.e. the relative probability to find a dimer on the DNA versus in the cytoplasm (top x-axis). Here, the binding ratio is varied by changing the non-specific binding strength . For comparison, the gray line and squares show the corresponding curve for a monomer (search time for a single target; monomer binding ratio on the bottom x-axis). Both curves display the same qualitative behavior, with the well-known optimum [34, 4] where the respective binding ratio equals one, i.e. the average time spent on the DNA matches the time spent in the cytoplasm. At larger binding ratios, the local 1D search becomes too redundant, whereas at smaller binding ratios TFs spend too large a fraction of their time in solution, not searching. However, the minima of the two search time curves do not coincide, since dimers bind DNA more tightly than monomers. Consequently the protein-DNA interaction cannot be simultaneously optimized for monomer and dimer search. We generally assume that the protein-DNA interaction is optimal for monomers, since single TFs are the basic functional unit for transcription control in bacteria (see below for further discussion). At this point in Fig. 4, the purely dimeric search time is roughly a factor 10 longer than the monomer search time. Returning to Fig. 3, this factor corresponds to the level of the plateau that is reached for very large in Fig. 3C.

We now consider again the intermediate range in Fig. 3C. With increasing the monomer pathway eventually becomes slower than the dimer pathway, due to the increasing probability of missed encounters. At the same time the dimerized state is increasingly stabilized. Upon dimerization of and in the background, it becomes more likely that this dimer localizes the target before it dissociates again into monomers. The increasing predominance of the faster dimer pathway explains the regime where the cooperative search time decreases with . It also explains why the kinetic monomer-dimer transition in Fig. 3D occurs before the equilibrium monomer-dimer transition in Fig. 3A: even when the dimer fraction has not reached , the dimer pathway can be kinetically dominant. At very large , the monomer pathway is entirely negligible. The TFs form relatively stable dimers, either already in solution or when bound to non-target sites, subsequently search together for most of the time, and ultimately arrive at the target as a pair. This search mode is independent of the target binding energy and the cooperative search time then becomes independent of and equal to the pure dimer search time plotted in Fig. 4.

The cooperative search kinetics admits an analytical treatment, to quantitatively describe the kinetic competition between the monomer pathway and the dimer pathway. This description takes a coarse-grained view of the problem, with effective transition rates between only four states, as depicted in Fig. S1. The initial state has both TFs and unbound in solution (state 2 in Fig. S1), from where the proteins either enter the dimer pathway by dimerizing (state 1) at rate or one of them independently finds its target site on the DNA (state 3) at rate . From state 1, the dimer either locates its pair of target sites at rate or reverts back to state 2 at the effective dissociation rate . Along the monomer pathway, from state 3, either the other TF locates its target as well (at rate ), or the waiting TF leaves its target, leading back to state 2 at rate . In Section S3, we express the six effective rate constants in terms of the parameters of the full model, and then use the mean first passage time formalism to calculate the mean cooperative search time analytically. We have used this approach to obtain the curves in Fig. 3C and D, which agree well with the simulation data.

So far we have focused on the case of a single TF molecule of each type. We now turn to the general case where we have molecules of type and molecules of type . If we increase both molecule numbers simultaneously (), mass action drives the monomer-dimer equilibrium towards the dimerized state. Fig. S2A shows the probability for a molecule to be dimerized, , as a function of , with the different curves corresponding to different values. As expected, the dimerization threshold of the sigmoidal curves moves to smaller -values as is increased. Note that while we have treated the case of exactly one molecule for , we keep the number of proteins constant only on average for , via the chemical potential in the grand canonical ensemble (see Section S1 for details). This choice is technically motivated, but is also biologically meaningful, since proteins are constantly produced and degraded in cells and their numbers can at best be constant on average.

Fig. S2B displays the -dependence of the fold-change . In contrast to Fig. 3B, the ON-state level is now kept fixed at and instead the different curves are for different values (the fold-change is defined here with respect to the state where and ). For below the dimerization threshold, the fold-change is independent of the molecule number . However, as in Fig. 3B, increasing does no longer raise the fold-change once the dimerization threshold is reached. As the dimerization threshold decreases with , the fold-change saturates at smaller and the maximal decreases as .

The average time required for the parallel cooperative search with molecules is shown in Fig. S2C. As in Fig. 3, we have used the monomer search time as the reference time scale, but now scaled by , since the expected timescale for the parallel search of monomers is . Consequently, the fact that all curves fall on top of each other in the regimes of weak and very strong interaction shows that in these regimes the cooperative search time exhibits the simple scaling, which corresponds to a linear increase of the frequency at which the targets are visited by monomers (in the small regime) or by dimers (in the large regime). In the intermediate regime, we find a more complex dependence on , indicated by the fact that the curves do not collapse. To understand this dependence, we extend our simplified analytical expression developed above. Under the conditions of interest here, the dimerization equilibrium of Fig. S2A is reached on a timescale much shorter than the cooperative search. As detailed in Section S3, we can then approximate the search process as a parallel search of dimers and monomers of each kind, resulting in

(4) |

Here, is the independent search rate of the monomers, which indirectly depends on through the probability of missed encounters, see Section S3, while the dimer search rate is , as in Fig. 4. We used Eq. 4 to obtain the lines in Fig. S2C, which display good agreement with the full simulation, showing that the analytical approximation yields a useful description of the cooperative search kinetics.

On a more qualitative level, Fig. S2C shows how the peak in the search time at intermediate values is affected by . The peak shifts to smaller values with larger , and also becomes less pronounced. From Fig. S2D, which shows the weight of the dimer pathway in the cooperative search process according to Eq. S26, we see that the position of the peak remains determined by the switch from the monomeric to the dimeric search mode. The shifted switch to the dimeric search mode, which occurs at smaller for larger , also explains the reduction in the peak height: The dimeric search mode takes over before the slowdown of the monomeric search mode becomes dramatic. However, even with hundreds of TF molecules of each species, we still find a peak in the cooperative search time, which divides the values into three regimes, as discussed below.

## Discussion

We studied the kinetics and the equilibrium statistics of cooperative transcription factor-DNA binding to specific target sites in the genomic background. For our analysis, we considered the dimensionless cooperativity as a parameter with a broad range of biochemically feasible values, and sought to identify functional tradeoffs associated with the choice of this value. We focused on the functional context of a signal integration scenario with AND-logic, but the results hold in a similar fashion for a signal transduction scenario, see Fig. 1. From this functional context we derived the central assumption that the average activity of the regulated gene has an optimal level in the ON-state, such that there is a strong selection pressure to maintain this level fixed regardless of the value. We satisfied this constraint by compensating changes in via the target site binding energy, which is “programmable” through the binding site sequence [10]. Such a compensation has been observed in an analysis of combinatorial promoters, i.e. binding sites tend to deviate from the consensus motif when multiple TFs bind next to each other in the cis-regulatory region [35]. It is also biologically plausible as it does not interfere with the regulation of genes that are only regulated by one of the TFs or combinatorially with other TFs.

Given this functional setting, we determined which fold-change in the steady-state activity could be implemented at a given , and how the kinetic search time depends on . The fold-change quantifies the discrimination in the promoter output between the states where one or two input signals are present, while the search time is a lower limit to the response time of the regulatory system. The search process has contributions from a monomer and a dimer search pathway, the relative weights of which we determined, again as a function of . In the regime of weak protein-protein interactions, e.g. , we found a tradeoff between the kinetics and the steady-state behavior, in the sense that a higher fold-change is associated with a slower response due to a longer assembly time for the protein-complex on the target site. This tradeoff is a consequence of gene activation via the monomer pathway, where individual TFs visit their targets independently and consecutively, possibly dissociating from the target before the cooperative partner arrives (“missed encounters”). In this regime, search time and fold-change both increase as . At larger , heterodimers are more stable, increasing the probability that the target is located simultaneously by both proteins (dimer pathway). At the same time, the missed encounters further slow down the independent monomer search, to timescales larger than the dimeric search time. Thus, a transition occurs where the dimer pathway gains weight and the search time decreases again to settle at the purely dimeric search time.

### Assumptions and limitations

We made a number of simplifying assumptions in our coarse-grained theoretical model. For instance, we assumed that the DNA-binding energy of the dimer is the sum of the binding energies of the monomers. While dimerizing, the monomers may undergo conformational changes that affect the DNA-binding strength [36], possibly speeding up the dimeric search. In that case, the peak of the cooperative search time as a function of can be even more pronounced than in our model. For simplicity, we assumed identical binding properties of the two TFs and , however this assumption is without loss of generality and the extension to asymmetric cases is straightforward. We performed the analysis reported here under the assumption of a non-specific background, although we have formulated our model and the theoretical methods to also cover the more general case of a heterogeneous DNA background. A brief analysis of the heterogeneous case has shown that the most significant effect of the heterogeneous background is to slow down the search time in all regimes. For our model, we have also assumed that the cooperativity between the TFs is mediated by a direct interaction. Indirect cooperativity mediated e.g. by DNA bending or looping has the same steady-state properties as direct cooperativity in the low regime. However, these indirect mechanisms lead to different steady-state behavior at large values and to different kinetics. A detailed analysis of these mechanisms is beyond the scope of this study.

### Biological ramifications and examples

A central and robust result of our theoretical study is that one can distinguish three qualitatively distinct regimes of TF-TF interaction strengths for transcription regulation:

(i) Weak interactions, with a cooperativity , suffice to implement regulation functions with moderate fold-changes, on the order of 10-fold, in the transcription level. In this regime, the cooperative search time is only moderately elevated above the search time of a single TF (also on the order of 10-fold). In bacteria, where the search time of a single TF molecule is around one minute [11], the parallel cooperative search of copies of each TF would then still result in fast responses on the minute timescale. The principal advantage of this regime from a design point of view is that TFs with weak interactions are flexible components, which can be used to control different genes in different ways, alone or cooperatively in various combinations [37]. Each TF then only needs to be separately optimized for monomeric search (via the non-specific protein-DNA interaction), while cooperative regulation by pairs of TFs is still sufficiently fast.

(ii) Interactions of intermediate strength, with values in the approximate range of , lead to cooperative search kinetics that are prohibitively slow, due to an excessive amount of missed encounters. Recent single-molecule experiments have been able to monitor the search process of a single TF in vivo [11]. Our prediction of slow cooperative search kinetics could in principle be verified using two-color fluorescence methods. Alternatively, one could measure the transcriptional response time of a synthetically designed, cooperatively regulated gene with a rapid reporter. We also expect that TF-TF interactions within this intermediate regime are avoided by cells. A test of this implication of our study will require a large dataset quantifying a significant subset of the TF-TF interactions in a model organism. To our knowledge, a quantitative high-throughput assay for TF-TF interactions is not yet available and remains as an experimental challenge in the field. Instead, we discuss several specific biological examples below.

(iii) Strong interactions, with a cooperativity , allow high fold-changes and a passable response time at the cost of losing combinatorial flexibility: Suppose that each TF signals a different environmental cue, and a set of genes needs to be activated whenever A is present, whereas another, more specialized group of genes is to be activated only if both signals are present. In this situation, a strong heterodimer would not lead to a favorable regulatory design, since the regulation of the unconditional genes by A would be strongly affected by the presence of B. In other words, the strong cooperativity can lead to undesired crosstalk. Nevertheless, this regime of TF-TF interactions is biologically interesting: For instance, strong homodimers can exploit the cooperative stability mechanism to improve the robust function of regulatory circuits [38]. Also, in cases where the combinatorial flexibility described above is not needed, strong heterodimers could be used to perform a very sharp and AND-like signal integration. This signal integration can be made very rapid by tuning the non-specific protein-DNA interaction of the TFs into a weaker regime, such that the dimer DNA binding ratio is closer to the optimal value for search on the DNA. As Fig. 4 shows, this would lead to a concomitant decrease of the monomer binding ratio. For TFs that work in this regime, we therefore expect that monomers spend less than of their time bound on DNA. So far, the DNA binding ratios of transcription factors have not been assayed on a large scale. Such an experiment would yield interesting clues about the design and the mode of operation of these TFs.

Finally, we discuss biological examples. Currently, 383 operons in E. coli are known to be transcriptionally regulated by two or more TFs (see Section S4). However, it is not known what fraction of these regulatory interactions involves cooperative protein-DNA binding. One well-studied case of co-dependent activation is the melAB promoter, where CRP and MelR bind cooperatively and activate transcription [19]. The interaction of CRP and MelR occurs via a weak surface contact and the binding of either is found to be reduced if the binding of the partner is impeded. In the presence of both, the transcription rate is tenfold increased [19]. This case is a good example for our regime (i). It is interesting to note that the binding sites of CRP and MelR in the melAB promoter display a relatively poor match to the consensus sequence, which is consistent with our assumption that the target binding energies are evolutionarily tuned. Also, CRP is a well known global regulator that controls many other genes in different ways, and hence the combinatorial flexibility achieved with a small cooperativity appears to be amply exploited by E. coli. Other examples of prokaryotic co-activation are the ansB promoter, activated by CRP and FNR [15], and the activation of the mapEP promoter by CRP and MalT [18, 39]. More generally, the regime (i) corresponds to the regulated recruitment mechanism for transcription regulation [29], which appears to be widely used in eukaryotes. Indeed, the case of the melAB promoter described above has been described as a bacterial version of eukaryotic enhanceosomes [19]. A prokaryotic example for regime (iii) may be the RcsA/RcsB heterodimer which is required to activate capsule expression through the RcsF phosphorylation cascade [40]. Interestingly, RcsB can also from homodimers and regulate the transcription of other genes by itself, suggesting that this TF may be optimized to always search and function as a dimer (homo- or heteromeric).

## Conclusion

We reported a biophysical analysis of the design principles for TF-TF interactions. The exploration of our theoretical model lead us to two functionally favorable regimes for the cooperativity , corresponding to weak, glue-like promiscuous interactions and very strong heterodimerization, respectively. Cells appear to implement both favorable regimes, but in different biological contexts. On the other hand, our model predicts that the search kinetics will be prohibitively slow at intermediate values, at least when the protein copy number is small as is typically the case for transcription factors. Hence the intermediate -regime appears undesirable in this functional context. This prediction could be tested with experimental approaches from single-molecule biophysics. Currently, there is only limited biochemical data available for the cooperativity values involved in transcription regulation, typically from in vitro experiments with selected DNA-binding proteins. Once more data becomes available, it will be interesting to see whether the intermediate -regime is indeed avoided.

## Acknowledgements

We thank Nicolas Buchler and Karin Schnetz for helpful comments. We especially thank Terry Hwa for stimulating discussions in the initial phase of this study. Financial support from the DFG and the German Excellence Initiative via the ÔNano-Initiative Munich (NIM)Õ is gratefully acknowledged. This work was partially supported by the Spanish Ministry of Education, grant number FPU-AP-2007-00975.

## References

- Berg et al. [1981] Berg, O. G., R. B. Winter, and P. H. von Hippel, 1981. Diffusion-Driven Mechanisms of Protein Translocation on Nucleic Acids. 1. Models and Theory. Biochemistry 20:6929–6948.
- Bruinsma [2002] Bruinsma, R. F., 2002. Physics of protein-DNA interaction. Physica A 313:211–237.
- Halford and Marko [2004] Halford, S., and J. Marko, 2004. How do site-specific DNA-binding proteins find their targets? Nucl. Acids Res. 32:3040–52.
- Slutsky and Mirny [2004] Slutsky, M., and L. A. Mirny, 2004. Kinetics of Protein-DNA Interaction: Facilitated Target Location in Sequence-Dependent Potential. Biophys. J. 87:4021–4035.
- Coppey et al. [2004] Coppey, M., O. Bénichou, R. Voituriez, and M. Moreau, 2004. Kinetics of Target Site Localization of a Protein on DNA: A Stochastic Approach. Biophys. J. 87:1640–1649.
- Lomholt et al. [2005] Lomholt, M. A., T. Ambjörnsson, and R. Metzler, 2005. Optimal Target Search on a Fast-Folding Polymer Chain with Volume Exchange. Phys. Rev. Lett. 95:260603.
- Hu et al. [2006] Hu, T., A. Y. Grosberg, and B. I. Shklovskii, 2006. How proteins search for their specific sites on DNA: The role of DNA conformation. Biophys J. 90:2731–2744.
- von Hippel and Berg [1980] von Hippel, P., and O. Berg, 1980. On the specificity of DNA-protein interactions. Proc. Natl. Acad. Sci. USA 83:1608–1612.
- Stormo and Fields [1998] Stormo, G. D., and D. S. Fields, 1998. Specificity, free energy and information content in protein-DNA interactions. Trends Biochem. Sci. 23:109–113.
- Gerland et al. [2002] Gerland, U., J. D. Moroz, and T. Hwa, 2002. Physical Constraints and Functional Characteristics of Transcription Factor DNA Interaction. Proc. Natl. Acad. Sci. USA 99:12015–12020.
- Elf et al. [2007] Elf, J., G.-W. Li, and X. S. Xie, 2007. Probing Transcription Factor Dynamics at the Single-Molecule Level in a Living Cell. Science 316:1191–1194.
- Richet et al. [1991] Richet, E., D. Vidal-Ingigliardi, and O. Raibaud, 1991. A new mechanism for coactivation of transcription initiation: Repositioning of an activator triggered by the binding of a second activator. Cell 66:1185–1195.
- Gerlach et al. [1991] Gerlach, P., L. Søgaard-Andersen, H. Pedersen, J. Martinussen, P. Valentin-Hansen, and E. Bremer, 1991. The cyclic AMP (cAMP)-cAMP receptor protein complex functions both as an activator and as a corepressor at the tsx-p2 promoter of Escherichia coli K-12. J. Bacteriol. 17:5419–5430.
- Jennings and Beacham [1993] Jennings, M., and I. Beacham, 1993. Co-dependent positive regulation of the ansB promoter of Escherichia Coli by the CRP and FNR protein: a molecular analysis. Mol. Microbiol. 9:155–164.
- Scott et al. [1995] Scott, S., S. Busby, and I. Beacham, 1995. Transcriptional co-activation at the ansB promoters: involvement of the activating regions of CRP and FNR when bound in tandem. Mol. Microbiol. 18:521–531.
- Pedersen et al. [1995] Pedersen, H., J. Dall, G. Dandanell, and P. Valentin-Hansen, 1995. Gene-regulatory modules in Escherichia coli: nucleoprotein complexes formed by cAMP-CRP and CytR at the nupG promoter. Mol. Microbiol. 5:843–853.
- Brikun et al. [1996] Brikun, I., K. Suziedelis, O. Stemmann, R. Zhong, L. Alikhanian, E. Linkova, A. Mironov, and D. Berg, 1996. Analysis of CRP-CytR interactions at the Escherichia coli udp promoter. J. Bacteriol. 6:1614–1622.
- Richet [2000] Richet, E., 2000. Synergistic transcription activation: A dual role for CRP in the activation of an Escherichia Coli pomoter depending on MalT and CRP. EMBO J. 19:5222–5232.
- Wade et al. [2001] Wade, J. T., T. A. Belyaeva, E. I. Hyde, and S. J. Busby, 2001. A simple mechanism for co-dependence on two activators at an Escherichia coli promotor. EMBO J. 20:7160–7167.
- Shin et al. [2001] Shin, M., S. Kang, S.-J. Hyun, N. Fujita, A. Ishihama, P. Valentin-Hansen, and H. E. Choy, 2001. Repression of deoP2 in Escherichia coli by CytR: conversion of a transcription activator into a repressor. EMBO J. 20:5392–5399.
- Kumar and Gromiha [2006] Kumar, M. S., and M. Gromiha, 2006. Protein-protein Interactions Thermodynamic Database. Nucleic Acids Res. 34:D195–D198.
- Kohler et al. [1999] Kohler, J. J., S. J. Metallo, T. L. Schneider, and A. Schepartz, 1999. DNA specificity enhanced by sequential binding of protein monomers. Proc. Natl. Acad. Sci. USA 96:11735–11739.
- Kohler and Schepartz [2001] Kohler, J. J., and A. Schepartz, 2001. Kinetic Studies of Fos-Jun-DNA Complex Formation: DNA Binding Prior to Dimerization. Biochemistry 40:130–142.
- Koch [1983] Koch, A. L., 1983. The protein burden of lac operon products. J. Mol. Evol. 19:455–462.
- Dekel and Alon [2005] Dekel, E., and U. Alon, 2005. Optimality and evolutionary tuning of the expression level of a protein. Nature 436:588–592.
- Lang et al. [2008] Lang, G. I., A. W. Murray, and D. Botstein, 2008. The cost of gene expression underlies a fitness trade-off in yeast. Proc. Natl. Acad. Sci. USA 106:5755–5760.
- Galletto et al. [2006] Galletto, R., I. Amitani, R. J. Baskin, and S. C. Kowalczykowski, 2006. Direct observation of individual RecA filaments assembling on single DNA molecules. Nature 443:875–878.
- Lohman and Ferrari [1994] Lohman, T. M., and M. E. Ferrari, 1994. Escherichia coli single-stranded DNA-binding protein: multiple DNA-binding modes and cooperativities. Annu. Rev. Biochem. 63:527–570.
- Ptashne and Gann [2001] Ptashne, M., and A. Gann, 2001. Genes and Signals. Cold Spring Harbor Laboratory Press, 1st edition.
- Bintu et al. [2005a] Bintu, L., N. Buchler, H. Garcia, U. Gerland, T. Hwa, J. Kondev, and R. Phillips, 2005. Transcription regulation by the numbers: models. Curr. Opin. Genet. Dev. 15:116–124.
- Miller and Widom [2003] Miller, J. A., and J. Widom, 2003. Collaborative Competition Mechanism for Gene Activation in Vivo. Mol. Cell. Biol. 23:1623–1632.
- Koslover and Spakowitz [2009] Koslover, E. F., and A. J. Spakowitz, 2009. Twist- and Tension-Mediated Elastic Coupling between DNA-Binding Proteins. Phys. Rev. Lett. 102:178102.
- Wang et al. [2009] Wang, Y., L. Guo, I. Golding, E. Cox, and N. Ong, 2009. Quantitative Transcription Factor Binding Kinetics at the Single Molecule Level. Biophys. J. 96:609–620.
- Winter et al. [1981] Winter, R. B., O. G. Berg, and P. H. von Hippel, 1981. Diffusion-Driven Mechanisms of Protein Translocation on Nucleic Acids. 2. The Escherichia coli Repressor-Operator Interaction: Kinetic Measurements and Conclusions. Biochemistry 20:6961–6977.
- Bilu and Barkai [2005] Bilu, Y., and N. Barkai, 2005. The design of transcription-factor binding sites is affected by combinatorial regulation. Genome Biol. 6:R103.
- Lefstin [1998] Lefstin, K. R., Jeffrey A.and Yamamoto, 1998. Allosteric effects of DNA binding on transcriptional regulators. Nature 392:885–888.
- Buchler et al. [2003] Buchler, N. E., U. Gerland, and T. Hwa, 2003. On schemes of combinatorial transcription logic. Proc. Natl. Acad. Sci. USA 100:5136–5141.
- Buchler et al. [2005] Buchler, N. E., U. Gerland, and T. Hwa, 2005. Nonlinear protein degradation and the function of genetic circuits. Proc. Natl. Acad. Sci. USA 102:9559–9564.
- Richet [1996] Richet, E., 1996. On the Role of the Multiple Regulatory Elements Involved in the Activation of the Escherichia coli malEP promoter. J. Mol. Biol. 264:852–862.
- Majdalani et al. [2005] Majdalani, N., M. Heck, V. Stout, and S. Gottesman, 2005. Role of RcsF in Signaling to the Rcs Phosphorelay Pathway in Escherichia coli. J. Bacteriol. 187:6770–6778.
- Bintu et al. [2005b] Bintu, L., N. Buchler, H. Garcia, U. Gerland, T. Hwa, J. Kondev, T. Kuhlmann, and R. Phillips, 2005. Transcription regulation by the numbers: applications. Curr. Opin. Genet. Dev. 15:125–135.
- Hermsen et al. [2006] Hermsen, R., S. Tans, and P. R. ten Wolde, 2006. Transcriptional Regulation by Competing Transcription Factor Modules. PLoS Comput. Biol. 2:1552–1560.

Supporting Material

## Appendix S1 Exact calculation of steady-state activities

#### Single TF molecules.

We first treat the case where the cell contains only a single molecule of each TF species, . The equilibrium statistics of the system is described by the canonical ensemble of statistical physics. The appropriate Boltzmann weight for a single TF binding to one of sites in a non-specific DNA background is (see below for the most general case with an arbitrary background and larger TF numbers). For a purely non-specific background and unbound states, the partition function is

(S1) | |||||

The first three terms describe the non-interacting states, where and are either separately bound to the DNA to non-adjacent sites, or both are free but not dimerized, or one is DNA-bound and the other is free. The fourth term corresponds to the states where and are dimerized, either on the DNA or unbound. The fraction of dimers in the background corresponds to the ratio of the weights of the dimerized states to the weight of all possible states, . Rewriting this expression in terms of the monomer DNA binding ratio , one obtains

(S2) |

For a binding ratio of one, i.e. when the monomers are optimized for independent search, , which is the case plotted in Fig. 3A. Here, a dimerization probability of is reached at , while we would have for and for .

Eq. S1 provides the binding-statistics on non-target states. To study the full system, we add the target states with weights for the full partition function

(S3) |

where the second term is the weight of a single occupied target and the third term is the weight for both targets to be occupied simultaneously. Hence the double target occupation probability is . This equation can be interpreted as a quadratic equation for for given values of and (since our analysis assumes a fixed corresponding to the optimal occupation-probability of the targets in the ON-state). Hence we obtain an explicit expression for (not shown), which we use throughout this paper to determine for the kinetic model and stochastic simulations in the case. Furthermore, to calculate the fold-change at a given we determine the probability of single TF target binding in the absence of a partner. By calculating the partition function for a system of a single TF, we find

(S4) |

For small , this probability scales as .

#### Multiple TF molecules.

For the case of multiple TF molecules, we calculate the exact equilibrium statistics of our full model using the standard transfer matrix approach from statistical physics, see e.g. [43, 44]. The calculation is based on the grand canonical ensemble, i.e. the average copy numbers , of the proteins and are set by the corresponding chemical potentials , . The total partition function of the complete system then factorizes,

(S5) |

into a product of a “DNA partition function” involving only the DNA-bound states of the TFs and a “cytosol partition function” involving only the unbound states (the factorization is possible because DNA-bound TFs do not interact with unbound TFs and because the TF numbers are not conserved in the grand canonical ensemble). Due to the low TF concentrations in the cytosol, steric exclusion between unbound TFs is negligible, and takes the simple form

(S6) |

where is the number of solvent states (i.e. the ratio of the cell volume to a characteristic TF volume, ) and the statistical weight for an unoccupied solvent state is one. For the calculation of the DNA partition function , we do take the steric exclusion of DNA-bound TFs into account. The number of base pairs covered by a single TF molecule is denoted by . Each base pair on the genome can then be in one of states: In state 0, the base pair is not covered by a TF. In state 1, it is the leftmost contact position of a TF of type , in state 2 it is the second leftmost contact position, and so on, up to state corresponding to the rightmost contact position of . States up to are analogous for . The transfer matrix describes the statistical coupling between the states of the neighboring DNA positions and . Each is a square matrix of dimension , defined such that the partition function is equal to the trace of the (ordered) product of all transfer matrices,

(S7) |

for a circular DNA with basepairs (for a linear DNA molecule, the trace operation would have to be replaced by multiplication of a row vector from the left and a column vector from the right, with the vector components properly chosen to enforce the boundary conditions). Let us denote by the element in row and column of the transfer matrix at position . It takes on a non-negative value, which corresponds to the conditional statistical weight of finding position in state , provided that position is in state . Thus, each is a Boltzmann factor that accounts for the contribution to the total configurational energy that stems from position and its interaction with position . The Boltzmann factor is zero, if the two states are incompatible (overlapping TFs or a single TF binding to non-contiguous basepairs). The non-zero entries of contain the protein-DNA binding energy landscapes and , the cooperativity , and the chemical potentials. For illustration, we show the transfer matrix for TFs of length ,

(S8) |

The entries with value one reflect the mere compatibility of neighboring states without an energetic contribution (e.g., when position is in state , position must be in state , and there is no additional energy contribution to take into account). Note that we assume a directional interaction between the TFs and (the attractive contact only occurs when is bound directly downstream from ).

From the partition function (S5), we can obtain exact expressions for the occupation probabilities of DNA sites by differentiation. For instance, the probability that a TF molecule of type is bound to the site starting at position on the DNA is

(S9) |

The derivative is straightforward to evaluate explicitly, leading to an expression of the form , where the restricted partition function has the same form as (S7), but with a projection matrix next to inside the trace. This exact expression is easily computed numerically, in particular when large parts of the binding energy landscapes and are flat (equal to the non-specific binding energy ), since large parts of the product in (S7) then reduce to matrix powers (which are quickly calculated via diagonalization). Similarly, the probability of cooperative binding at site is calculated starting from the expression

(S10) |

where the derivatives enforce that a molecule is bound directly adjacent to the molecule, such that together they cover the DNA positions from to . Finally, the average number of TF molecules in the system at given values of the chemical potentials , are obtained by summing over the occupation numbers of all states, e.g.

(S11) |

Similarly, the average number of dimers in the system is

(S12) |

from which the fraction of dimers, , in Fig. 6A is computed. The fold-change in Fig. 6B is calculated as the ratio of the dimer occupancy (S10) at the target site pair in the presence of both TFs ( such that ) to the monomer occupancy (S9) at its target site when only one TF is present ( chosen such that while is set to a large negative value such that ).

The above framework can be used to calculate any equilibrium observable exactly for our full model and it also provides a reference point for our kinetic simulations, which produce equilibrium values in the long-time average. However, it is also useful to derive a simple approximation to the exact solution of the multiple TF molecule case, which still incorporates the effect of a (nonspecific) DNA background, but neglects steric exclusion between the TFs in the background. Assuming and , and again taking a DNA binding ratio of one, such that , we find

(S13) |

which leads to the background dimerization fraction

(S14) |

that we use in Eq. 4 of the main text for the approximative form of the cooperative search time.

## Appendix S2 Stochastic simulation of cooperative search kinetics

To study the cooperative search process within the full reaction scheme of Fig. 2B, we implemented a kinetic Monte Carlo simulation based on the standard Gillespie algorithm. For our simulations, we used fixed numbers, and , of and molecules (i.e., any equilibrium values computed in these simulations correspond to thermodynamic averages in the canonical ensemble). The state of the system is specified by the state of each TF molecule, which can be either free or dimerized in solution, or bound to the DNA at position . The simulations generate stochastic continuous-time trajectories in this discrete state space. Each simulation step consists of one of the moves depicted in Fig. 2B, however the set of available moves depends on the current state of the system. In particular, moves that would violate the steric constraint that each DNA basepair can be be in contact with only a single TF molecule cannot be chosen. Thus, TF molecules can, for instance, not change the order at which they are bound along the DNA solely via sliding moves.

To measure the average cooperative search time , we perform 100 simulations for each set of model parameters. Each simulation run is initialized in the state where all molecules are unbound (this mimics the condition of a cell prior to receiving a signal that triggers allosteric activation of TF-DNA binding), and terminated once the the two adjacent target sites are both occupied simultaneously. The data points in Fig. 3C, Fig. 4, and Fig. 6C correspond to the simulation time averaged over the 100 runs. Another observable of interest here is the relative contribution of the dimer pathway to the search process, as shown in Fig. 3D and Fig. 6D. This observable corresponds to the fraction of simulation runs where the final state is reached by a dimer move, such that both targets simultaneously become occupied by their cognate TF molecule.

## Appendix S3 Analytical description of the cooperative search kinetics

Here, we develop a simplified analytical description of the cooperative search kinetics, which distinguishes only the target occupation states and the two search modes (dimeric vs. monomeric). As shown in Fig. S1, this description corresponds to a kinetic scheme with four states and six effective rates. The scheme amounts to two competing Michaelis-Menten type processes which lead to the same final state. The initial state 2 corresponds to the state of our TF-DNA system where both proteins are unbound. From there, the target state can either be reached via state 1 (dimer pathway) or via state 3 (monomer pathway). The dimer pathway is kinetically characterized by the effective dimerization rate , the effective dissociation rate , and the dimer search rate . Similarly, the monomer pathway is characterized by the three rates , , and . Since state 3 does not distinguish whether or is bound, the rate is twice the monomer search rate. In contrast, the rate corresponds to only half the search rate of a monomer because one target is already occupied and the other target is accessible from one side only. Finally, is the total rate at which a monomer dissociates from its target, either via sliding or unbinding.

We can express the three remaining undetermined rate constants , , and in terms of our underlying model parameters. For arbitrary binding energy landscapes, the effective dimerization rate is

(S15) |

where we have used the equilibrium probabilities introduced above in section A of ‘Methods’, and , denote the equilibrium probabilities for the TFs to be unbound in solution. The rates and denote the forward and backward sliding rates from position , see section ‘Full model’. Using our approximations from section A for a non-specific background, we find the simpler form for the effective dimerization rate

(S16) |

where is the probability to find a TF molecule bound to DNA. Similarly, the effective dissociation rate has the general form

(S17) |

where denotes the site-specific DNA-unbinding rate for and is the probability to find the two TFs dimerized in solution. The simplified effective dissociation rate for a non-specific background is

(S18) |

where is the total probability to find the TFs non-specifically bound to the DNA as a heterodimer. Finally, the total rate for monomer loss from a target is

(S19) |

where the index indicates that these are unbinding and sliding rates from the target site, which are slower than their bulk counterparts by the additional Boltzmann factor corresponding to the energy difference between the non-specific binding energy and the target binding energy, see section ‘Full model’.

With these rates, the average assembly time of the two TFs on the double target corresponds to the mean first passage time (MFPT) of a random walker hopping between the four sites at the given site-dependent jump rates. The random walker starts at site 2 and terminates on the target site. We use the standard MFPT formalism as described, for instance, in Ref. [45] to calculate this cooperative search time. The general formula for the MFPT starting from site on a linear lattice with sites, with the two boundary sites and both absorbing, is

(S20) |

where is the total probability to exit to site ,

(S21) |

For the problem at hand, we have and . Defining the Michaelis-Menten-type constant for state 1 and for state 3, we can rewrite the cooperative search rate, i.e. the inverse average search time, in the compact form

(S22) |

which is the expression used to obtain the lines in Fig. 3C. In the limit where vanishes, this reduces to the average search rate for two independent monomers,

(S23) |

Using the relation , we can rewrite the corresponding search time in the form

(S24) |

which best explains the effect of missed encounters where is the average number of times a TF must return to the target before finding the other target occupied. In the small regime the cooperative search process corresponds to an independent monomer search and . Given that , this form also explains the scaling of the search time at small cooperativities.

We can further simplify Eq. S22 by noting that the average search time is virtually identical (in the parameter regime considered here) when the search begins in state 1 instead of state 2. With state 1 as the initial state, we find

(S25) |

The first term corresponds to the dimer pathway, while the second term corresponds to the monomer pathway. As expected, the contribution of either pathway depends on the dimerization probability and on the search rate of the respective mode. It follows that the relative weight of the dimer pathway can be written as

(S26) |

which was used to obtain the lines in Fig. 3D. It is straightforward to generalize these equations also to the case of , where the dimerization probability becomes a function of both and , and the search rate for each mode increases by a factor of : and . In this case we obtain Eq. 4 from the main text which is used to obtain the analytical curves in Fig. S2C. Using the dimerization probability , we also extend Eq. S26 to the case of , to obtain the curves in Fig. S2D.

## Appendix S4 Additional notes

To obtain an estimate of the number of E. coli operons which are regulated by two or more transcription factors, we perused the “RegulonDB” database [46]. At the time of writing, this database lists 370 E. coli operons as regulated by a single transcription factor, while 383 operons are listed as regulated by two or more transcription factors (188 of these are believed to be regulated by exactly two transcription factors).

## References

- Schwabl [2006] Schwabl, F., 2006. Statistical Mechanics. Springer, 2nd edition.
- Teif [2007] Teif, V. B., 2007. General transfer matrix formalism to calculate DNA-protein-drug binding in gene regulation: application to OR operator in phage lambda. Nucleic Acids Res. 35:e80.
- Gardiner [2004] Gardiner, C., 2004. Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences. Springer, 3rd edition.
- Gama-Castro et al. [2011] Gama-Castro, S., H. Salgado, M. Peralta-Gil, A. Santos-Zavaleta, L. Muniz-Rascado, H. Solano-Lira, V. Jimenez-Jacinto, V. Weiss, J. S. Garcia-Sotelo, A. Lopez-Fuentes, L. Porron-Sotelo, S. Alquicira-Hernandez, A. Medina-Rivera, I. Martinez-Flores, K. Alquicira-Hernandez, R. Martinez-Adame, C. Bonavides-Martinez, J. Miranda-Rios, A. M. Huerta, A. Mendoza-Vargas, L. Collado-Torres, B. Taboada, L. Vega-Alvarado, M. Olvera, L. Olvera, R. Grande, E. Morett, and J. Collado-Vides, 2011. RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units). Nucleic Acids Res. 39:D98–D105.