To function as gene regulatory elements in response to environmental signals, riboswitches must adopt specific secondary structures on appropriate time scales. We employ kinetic Monte Carlo simulation to model the time-dependent folding during transcription of TPP riboswitch expression platforms. According to our simulations, riboswitch transcriptional terminators, which must adopt a specific hairpin configuration by the time they have been transcribed, fold with higher efficiency than Shine-Dalgarno sequesterers, whose proper structure is required only at the time of ribosomal binding. Our findings suggest both that riboswitch transcriptional terminator sequences have been naturally selected for high folding efficiency, and that sequesterers can maintain their function even in the presence of significant misfolding.
Abstract: \@abstract Keywords: Nucleic Acids; Hairpin; Folding; Monte Carlo; Kinetic; Simulation
The riboswitch is a mechanism of self-regulation in messenger RNA that is found primarily in metabolic genes of bacteria. Riboswitches possess an aptamer that is capable of binding a specific ligand, and an expression platform that regulates the gene’s expression according to the binding state of the aptamer . The expression platform can regulate expression through formation of an intrinsic (rho-independent) terminator hairpin, by sequestering a Shine-Dalgarno ribosomal binding site, or by cleaving the messenger. The terminator hairpin operates by halting transcription while the Shine-Dalgarno sequesterer, also a hairpin, operates by preventing translation.
Because riboswitches function through conformational changes resulting from the ligand-bound or unbound state of the aptamer, they rely on both RNA thermodynamics and structural kinetics. Of particular importance is the secondary structure, the pattern of pairing among complementary bases (see Fig. 1). This secondary structure forms both during and after mRNA transcription [2, 3], leading to a time-dependent free energy landscape for RNA folding. The importance of kinetics for the operation of some terminator-type riboswitches is supported by the presence of transcriptional pause sites following the aptamer and antiterminator .
Here we concentrate on the dynamical folding of the expression platform as it grows during transcription. A minimum free energy (MFE) structure adopted by an incomplete sequence may become metastable once the sequence is complete. The lifetime  of such a metastable structure may exceed the time allowed for the switch to function, leading to a failure of gene regulation. By comparing the folding efficiencies of transcriptional terminator-type riboswitch terminator hairpins with that of with translational sequesterers, we suggest that riboswitch transcriptional terminators have been naturally selected to fold reliably under the time constraint imposed by the mRNA transcription rate. By inspection of specific poorly folding sequesterers, we propose that even misfolded sequesterers may retain some function provided their Shine-Dalgarno sequence remains bound.
Riboswitch aptamers are highly conserved and well annotated in the RFam database . Unfortunately, the expression platform sequences are poorly conserved and are not generally annotated. Their hairpin topology is their main conserved feature, along with either a trailing poly-U pause site in a terminator (see Fig. 1) or a Shine-Dalgarno ribosomal binding site in a sequesterer. We choose to study a set of TPP (thiamine pyrophosphate, vitamin B) riboswitches whose expression platforms have been independently annotated . Out of the 135 annotated riboswitches, 73 are classified as sequester-type, 52 as terminator-type, 9 as both terminator and sequesterer, and one as neither. We choose to examine only those expression platforms with a definite classification, and we include an additional four nucleotides on each end to provide genomic context for our folding studies. All sequences studied fold to a hairpin as their MFE structure, as annotated. Some statistical properties of the resulting sequences are shown in table 1. Notice that on average the sequesterers are longer than terminators by 10 nucleotides, and also that their lengths are highly variable. To explore the dependence of folding efficiency on length, we constructed an artificial family of extended terminators by adding 5 nucleotide pairs randomly to each terminator, drawing the additional pairs from the pairs already present in the original terminators. We then randomly shuffle the pairs while preserving the topology of bulges and loop in the minimum free energy structure.
For Minimum Free Energy calculations of secondary structures, RNAfold  is used with the ViennaRNA 1.4 energy model  at temperature C. Note that we will not be able to capture the influence of tertiary contacts or of pseudoknots within the confines of this model. The default energy parameters for M NaCl are used despite cellular conditions being mM Na and mM Mg, since the energetics of the secondary structures in these conditions are similar . That is, 1M NaCl has approximately equivalent ionic strength to real cellular conditions, since the doubly-charged Mg is far more effective at compensating the phosphate backbone of nucleic acids than the singly charged Na. A suitable validated energy model for true cellular conditions is not available .
Folding is simulated at the level of secondary structure by kinetic Monte Carlo using the ViennaRNA program kinfold . The rate for transitions in kinfold is given in arbitrary units that require calibration to real time. As an estimate for the kinfold timescale, is taken to be about , from the calibration of Liu and Ou-Yang . To simulate folding during transcriptional growth, additional nucleotides are added to the end of the chain at regular time intervals. Typical bacterial transcription rates, , range from 20-80 nt/s, with 50 nt/s taken as standard. The possibility that a significant transcriptional pause might take place inside the antiterminator or the terminator is neglected though this could be important in some specific cases . Simulating at the level of secondary structure is more efficient, though less realistic, than applying molecular dynamics to coarse-grained continuum models [13, 14].
Because we focus on the competition between folding rates and transcription rates, the chief parameter governing the simulation is the product , representing Monte Carlo step performed between each nucleotide addition. Our standard value is MC steps/nt transcribed. Because of the range of transcription rates , as well as the uncertainty concerning the timescale calibration , we carry out simulations over a range of values of . Our primary result, the high efficiency of terminator folding relative to sequesterers, holds over several orders of magnitude in .
2.3 Statistical analysis of distributions
Results of this study will be presented in the form of distributions over repeated kinetic folding attempts for many individual sequences. For example, Fig. 2a displays a histogram showing the relative frequency with which terminators reach a given fraction of their MFE structures under our standard growth conditions. According to this figure, in our complete population of transcriptional terminators, each folded 100 times, 100% of the expected nt pairs are obtained in the majority of trials, and more than 60% of the expected pairs are obtained in all the trials. However, 0% of the expected pairs are obtained in a non-negligible subset of trials for Shine-Dalgarno sequesterers. It is not known what fraction of the MFE hairpin structure is required for successful termination, but one can set a threshold anywhere between 1% and 80% with almost no impact on the fractions of terminator folds that lie above and below this threshold, because the distribution nearly vanishes over this range. We use the fraction of expected pairs as a metric for termination, rather than the free energy of the folded structure, because deep metastable traps are precisely what is to be avoided for efficient folding and termination.
For a given sequence , its folding efficiency is defined as the fraction of attempted folds that form a viable hairpin. Define the viability of a fold as a function of the MFE structure fraction through the equation , where is the step function and is the viability threshold. Thus averaged over independent folding attempts. If sequence has probability distribution for folding to MFE fraction , its efficiency can be evaluated as . This is precisely the fraction of attempted folds whose folding fraction lies above the threshold . As discussed above, considerable freedom exists in the choice of the threshold , but is taken as a reasonably conservative limit because a high fraction of the MFE structure is presumably required for actual functionality of the terminator. Hence we define a structure with a fraction of its MFE base pairs below as “misfolded”.
3 Terminators vs. sequesterers
Riboswitch intrinsic terminator hairpins can be expected to fold with greater efficiencies than sequesterers because terminators act at the time of transcription. The constraint that terminators must perform within the transcription time means that terminators must fold quickly. Meanwhile, sequesterers act at the time of translation, effectively relaxing this constraint. Here the folding efficiencies of the sequesterers are compared to transcriptional terminator hairpins across a family of riboswitches. TPP-binding riboswitches are chosen, because of the availability of annotated terminator and sequesterer riboswitches .
According to Fig. 2, the terminator hairpins do indeed fold quite efficiently, with all but one of Rodionov’s annotated terminators having a folding efficiency greater than 80% under our standard growth conditions. However, the sequesterers fold with substantially lower efficiency. Table 2 enumerates the numbers of hairpins of each type folding efficiently () and inefficiently (). The -value for the null hypothesis (i.e. the assertion that the proportion of efficient sequesterers equals the proportion of efficient terminators) is (Fisher exact test), providing strong support for the claim that terminator-type riboswitch hairpins fold with higher efficiency during transcription than do sequesterer-types. Figure 3 shows the proportion of efficiently folding terminators and sequesterers for a range of timescales , allowing for and to vary over orders of magnitude without affecting the conclusion that terminators fold with higher efficiency than sequesterers.
What explains the relative folding efficiencies of terminators and sequesterers? As outlined in Table 1, some gross features of rho-independent terminator sequences differ from Shine-Dalgarno sequesterers. Perhaps the primary difference between them lies in their length distributions. TPP rho-independent terminators are 38 nucleotides long on average, while the Shine-Dalgarno sequesterers average 48 nucleotides in length. Indeed, longer sequences will tend to possess more and deeper metastable states that would compete with the MFE state. The length dependence of folding efficiency was tested by duplicating 5 base pairs in each TPP rho-independent terminator in order to mimic the lengths of sequesterers. The results shown in Fig. 2c, indicate that while there is some effect detrimental to efficient folding in the longer hairpins, this length difference alone does not account for the difference in folding efficiencies. Furthermore, while we note a weak correlation of decreasing efficiency with increasing terminator sequence length, neither the extended terminators nor the sequesterers exhibit any significant correlation between efficiency and length.
A second difference lies in the nucleotides frequencies (Table 1) and their distribution among five regions of the hairpins as illustrated in Fig. 4. Here region 3 represents the hairpin loop, with regions 1 and 2 lying along the side of the hairpin and regions 4 and 5 along the side. Terminators exhibit an excess of U in region 5 associated with the beginning of the poly-U pause site, and a weak corresponding enhancement of complementary A nucleotides in region 1. Sequesterers, in contrast, exhibit an enhancement of A and G in regions 4 and 5 corresponding to the Shine-Dalgarno consensus sequence of AGGAGG, and a corresponding enhancement of complementary C and U in regions 1 and 2. Another difference is the excess U in the loop region 3 of terminators that can be attributed to an internal pause site allowing time for aptamer and antiterminator folding  prior to completion of terminator transcription. The enhancement of the specifically-binding C and its non-complementary U in regions 1 and 2 of the sequesterer might have been expected to aid in folding efficiency, yet still the terminators, dominated in most regions by the promiscuously-binding U and G, manage to fold with relatively high efficiency. However, the weak enhancement of specifically-binding A in region 1 of the terminator, complementary to the poly-U pause site in region 5, may play some small role in terminator folding efficiency.
Overall, neither the differences in sequence length nor in nucleotide content appear capable of explaining the difference in folding efficiency between terminators and sequesterers. The most likely explanation available is simply that the folding efficiencies differ as a result of natural selection. Selection pressure apparently favors relatively short hairpins and disfavors sequences containing metastable traps in terminators that must fold under the constraint of short transcription time. This selection pressure is reduced or absent in the case of Shine-Dalgarno sequesterers. Indeed, as evidenced in Fig. 3, many sequesterers fail to fold efficiently even on very long time scales. Perhaps sequesters function in an ensemble of metastable structures, provided the Shine-Dalgarno sequence remains bound, while in contrast transcriptional terminators require very specific structures in order to function [16, 17, 18].
4 Specific examples
Here we analyze specific cases of poorly folding terminators and sequesterers. The most poorly folding terminator is the ThiD terminator of Thermoanaerobacter tengcongensis (Tte) which folds with efficiency =0.18 at the fastest transcription rate (smallest timescale) =250 MC steps/nt transcribed. Similarly, the ThiC riboswitch of Sinorhizobium meliloti (Sm) stands out for having the lowest observed efficiency () at the slowest transcription rate (largest timescale), =512000 MC steps/nt transcribed. Two alternate folds of each sequence are illustrated in Fig. 5. The most common specific fold of Tte-ThiD (Fig. 5a), which occurs in 35% of folding attempts, shares no common pairs with the MFE structure (Fig. 5b), which occurs in 6% of folding attempts. Likewise, for Sm-ThiC, the most common specific fold (Fig. 5c) occurs in 10% of attempts and shares no common pairs with the MFE structure (Fig. 5d), which occurs in 3% of attempts.
The misfolded terminator (Fig. 5a) lacks the necessary hairpin preceding the poly- pause site that terminates transcription. It is notable that the Shine-Dalgarno sequence remains sequestered in the misfolded sequesterer, suggesting that perhaps the function is preserved. This might explain how low folding efficiency sequesterers could remain functional even while misfolded on the time scale of translation initiation.
At the largest timescale, =512000, the efficiency of Tte-ThiD rises to 91%. To understand the high efficiency of Tte-ThiD relative to Sm-ThiC at long times, we compare their free energy landscapes in Fig. 6. The misfold of Tte-ThiD is relatively weakly bound (only -2.3 kcal/mol) with a barrier of 5.8 kcal/mol separating the misfold from the MFE structure. This barrier has high entropy, as it corresponds to complete unfolding followed by almost any single base pairing yielding a net energy for the saddle state of +3.5 kcal/mol. This high barrier entropy reduces the effective free energy barrier . Furthermore, as Tte is a thermophile, relatively high thermal energy is available to aid in escape from metastable traps. In contrast, Sm-ThiC is relatively strongly bound (-11.6 kcal/mol). The saddle state separating the misfold from the MFE is only partially unbound, at energy -4.0, but the net barrier of 7.6 kcal is nearly 2 kcal/mol (about ) larger than for Tte-ThiD and also is relatively low entropy.
The common misfolds of both Tte-ThiD and Sm-ThiC share a common feature - their paired nucleotides lie to the (earlier transcribed) side of the pairs comprising the MFE structures. That is, they contain structure that can form before the sequence is fully transcribed. To see how widespread this mechanism is, we examined the 16 sequesterers that fold with efficiency less that 0.5 at our standard transcription rate, =4000. In all but one case the most common misfold places the hairpin loop to the side of its location in the MFE structure. That is, they involve structures that can form earlier in time than the MFE. The sole exception, is a very short sequence for which a few missing pairs reduce the matched fraction below 0.7 even while the sequence lies in the basin of the MFE structure.
In conclusion, this study addressed whether riboswitch transcriptional terminators fold with unusually high efficiency, indicating selection for reliability of folding. It was shown that transcriptional terminators in TPP riboswitches are unusually easy to fold during transcription in comparison with Shine-Dalgarno sequesterers, resulting in a strongly significant -value for the null hypothesis. Experimental validation of this prediction might be feasible using optical tweezer studies .
Detailed examination of a specific terminator (Tte-ThiD and sequesterer (Sm-ThiC) which fold with relatively low efficiency, reveals a generic mechanism for misfolding, namely trapping into minimum free energy conformations of partially transcribed sequences that become potentially long-lived metastable states of the fully transcribed sequence. We also suggest that sequesterers may be more tolerant of misfolds than terminators, provided that the Shine-Dalgarno sequence remains bound in the misfolded structure.
The authors thank Jay Kadane, Maumita Mandal and Jon Widom for useful discussions. We acknowledge financial support of this research by the DSF Charitable Foundation.
The authors declare no conflict of interest
- Tucker and Breaker  Tucker, B.J.; Breaker, R.R. Riboswitches as Versatile Gene Control Elements. Curr. Opin. Struct. Biol. 2005, 15, 342–348.
- Pan and Sosnick  Pan, T.; Sosnick, T. RNA folding during transcription. Ann. Rev. Biophys. and Biomol. Struct. 2006, 35, 161–75.
- Proctor and Meyer  Proctor, J.R.; Meyer, I.M. CoFold: An RNA secondary structure prediction method that takes co-transcriptional folding into account. Nucleic Acids Research 2013, 41, e102.
- Wickiser et al.  Wickiser, J.; Winkler, W.; Breaker, R.; Crothers, D. The Speed of RNA Transcription and Metabolite Binding Kinetics Operate an FMN Riboswitch. Molecular Cell 2005, 18, 49–60.
- Sauerwine and Widom  Sauerwine, B.; Widom, M. Arrhenius lifetimes of RNA structures from free energy landscapes. J. Stat. Phys 2011, 142, 1337–1352.
- Griffiths-Jones et al.  Griffiths-Jones, S.; Moxon, S.; Marshall, M.; Khanna, A.; Eddy, S.R.; Bateman, A. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 2005, 33, D121–D124.
- Rodionov et al.  Rodionov, D.A.; Vitreschak, A.G.; Mironov, A.A.; Gelfand, M.S. Comparative genomics of thiamin biosynthesis in procaryotes: new genes and regulatory mechanisms. J. Biol. Chem 2002, 277, 48949–59.
- Hofacker et al.  Hofacker, I.L.; Fontana, W.; Stadler, P.F.; Bonhoeffer, S.; Tacker, M.; Schuster, P. Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f. Chemie 1994, 125, 167–188.
- Mathews et al.  Mathews, D.H.; Sabina, J.; Zuker, M.; Turner, D.H. Expanded Sequence Dependence of Thermodynamic Parameters Improves Prediction of RNA Secondary Structure. J. Mol. Biol. 1999, 288, 911–940.
- Tan and Chen  Tan, Z.J.; Chen, S.J. RNA Helix Stability in Mixed Na+/Mg2+ Solution. Biophysical Journal 2007, 92, 3615–3632.
- Liu and Ou-Yang  Liu, F.; Ou-Yang, Z.C. Monte Carlo Simulation for Single RNA Unfolding by Force. Biophysical Journal 2005, 88, 76–84.
- Flamm et al.  Flamm, C.; Fontana, W.; Hofacker, I.L.; Schuster, P. RNA Folding at Elementary Step Resolution. RNA 2000, 6, 325–338.
- Ding et al.  Ding, F.; Sharma, S.; Chalasani, P.; Demidov, V.V.; Broude, N.E.; Dokholyan, N.V. Ab initio RNA folding by discrete molecular dynamics: from structure prediction to folding mechanisms. RNA 2008, 14, 1164–73.
- Denesyuk and Thirumalai  Denesyuk, N.; Thirumalai, D. Coarse-grained model for predicting RNA folding thermodynamics. J. Phys. Chem. B 2013, 117, 4901–11.
- Crooks et al.  Crooks, G.E.; Hon, G.; Chandonia, J.M.; Brenner, S.E. WebLogo: A sequence logo generator. Genome Research 2004, 14, 1188–1190.
- Wilson and von Hippel  Wilson, K.S.; von Hippel, P.H. Transcription Termination at Intrinsic Terminators: the role of the RNA hairpin. PNAS 1995, 92, 8793–8797.
- Lesnik et al.  Lesnik, E.A.; Sampath, R.; Levene, H.B.; Henderson, T.J.; McNeil, J.A.; Ecker, D.J. Prediction of rho-independent transcriptional terminators in Escherichia coli. Nucleic Acids Research 2001, 29, 3583–94.
- Nudler and Gottesman  Nudler, E.; Gottesman, M.E. Transcription termination and anti-trmination in E. coli. Genes to Cells 2002, 7, 755–68.
- Frieda and Block  Frieda, K.L.; Block, S.M. Direct Observation of Cotranscriptional Folding in an Adenine Riboswitch. Science 2012, 338, 397–400. © 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).