Article, Discoveries

Reciprocal Nucleopeptides as the Ancestral Darwinian
Eleanor F. Banwell, Bernard Piette, Anne Taormina and Jonathan Heddle
 RIKEN, Hirosawa 2-1, Wako-shi, Saitama, 351-0198 Japan
 Department for Mathematical Sciences, Durham University, South Road, Durham DH1 3LE, United Kingdom
 Bionanoscience and Biochemistry Laboratory, Malopolska Centre of Biotechnology, Jagiellonian University 30-387, Krakow, Poland.
 These authors contributed equally
 Corresponding author: Heddle, J.G. (

This is a draft version of the manuscript. The final version was published in Molecular Biology and Evolution, msx292 (2017) and is available in open access from

Even the simplest organisms are too complex to have spontaneously arisen fully-formed, yet precursors to first life must have emerged ab initio from their environment. A watershed event was the appearance of the first entity capable of evolution: the Initial Darwinian Ancestor. Here we suggest that nucleopeptide reciprocal replicators could have carried out this important role and contend that this is the simplest way to explain extant replication systems in a mathematically consistent way. We propose short nucleic-acid templates on which amino-acylated adapters assembled. Spatial localization drives peptide ligation from activated precursors to generate phosphodiester-bond-catalytic peptides. Comprising autocatalytic protein and nucleic acid sequences, this dynamical system links and unifies several previous hypotheses and provides a plausible model for the emergence of DNA and the operational code.

Keywords: Initial Darwinian Ancestor; abiogenesis; RNA world; protein world; nucleopeptide replicator; reciprocal replicator; polymerase; ribosome; evolution; early earth; Hypercycle.

In contrast to our good understanding of more recent evolution, we still lack a coherent and robust theory that adequately explains the initial appearance of life on Earth (abiogenesis). In order to be complete, an abiogenic theory must describe a path from simple molecules to the Last Universal Common Ancestor (LUCA), requiring only a gradual increase in complexity.

The watershed event in abiogenesis was the emergence of the Initial Darwinian Ancestor (IDA): the first self-replicator (ignoring dead ends) and ancestral to all life on Earth (Yarus 2011). Following the insights of von Neumann, who proposed the kinematic model of self-replication (Kemeni 1955), necessary features of such a replicator are: Storage of the information for how to build a replicator; A processor to interpret information and select parts; An instance of the replicator.

In order to be viable, any proposal for the IDA’s structure must fit with spontaneous emergence from prebiotic geochemistry and principles of self-replication. Currently, the most dominant abiogenesis theory is the “RNA world”, which posits that the IDA was a self-replicating ribozyme, i.e. an RNA-dependent RNA polymerase (Cech 2012). Although popular, this theory has problems (Kurland 2010). For example, while it is plausible that molecules with the necessary replication characteristics can exist, length requirements seem to make their spontaneous emergence from the primordial milieu unlikely, nor does the RNA world explain the appearance of the operational code (Noller 2012; Robertson and Joyce 2012). Furthermore, it invokes three exchanges of function between RNA and other molecules to explain the coupling of polynucleotide and protein biosynthesis, namely transfer of information storage capability to DNA and polymerase activity to protein as well as gain of peptide synthesis ability. This seems an implausible situation in which no extant molecule continues in the role it initially held. Others have posited peptide and nucleopeptide worlds as solutions, but to the best of our knowledge, no single theory has emerged that parsimoniously answers the biggest questions.

Here we build on several foregoing concepts to propose an alternative theory based around a nucleopeptide reciprocal replicator that uses its polynucleotide and peptide components according to their strengths, thus avoiding the need to explain later coupling. We advocate a view of the IDA as a dynamical system, i.e. a system of equations describing the changes that occur over time in the self-replicator presented here, and we demonstrate that such an entity is both mathematically consistent and complies with all the logical requirements for life. While necessarily wide in view we hope that this work will provide a useful framework for further investigation of this fundamental question.

Model and Results
Solving the chicken and egg problem
Given that any IDA must have been able to replicate in order to evolve, extant cellular replication machinery is an obvious source of clues to its identity. Common ancestry means that features shared by all life were part of LUCA. By examining the common replication components present in LUCA, and then extrapolating further back to their simplest form, it is possible to reach a pre-LUCA, irreducibly complex, core replicator (Figure 1).

                                   (a)                           (b)
Figure 1: Replication Schemes. (a) This simplified cellular replication schematic is common to all life today and likely reflects the ancestral form present in LUCA. Shading by molecule type (purple for nucleic acid and orange for protein), reveals a reciprocal nucleopeptide replicator. Although the ribosome is a large nucleoprotein complex, the catalytic centre has been shown to be a ribozyme (Moore and Steitz 2003) and so it is shaded purple in this scheme. (b) Comparison of the method of action of the extant ribosome with the proposed primordial analogue (components are shaded like for like). Today, tRNA molecules (mid purple) loaded with amino acids (orange) bind the mRNA (dark purple) in the ribosome (light purple), which co-ordinates and catalyses the peptidyl-transferase reaction. Although the present day modus operandi is regulated via far more complex interactions than the primordial version, the two schemes are fundamentally similar. Mixed nucleic acid structures, one performing a dual function as primordial mRNA and primordial ribosome (p-Rib) and a second functioning as a primordial tRNA (p-tRNA), provide a system wherein the former structure templates amino acid-loaded molecules of the latter.

We see that in all cells, the required functions of a replicator are not carried out by a single molecule or even a single class of molecules, rather they are performed variously by nucleic acids (DNA, RNA) and proteins. When viewed by molecular class, the replicator has two components and is reciprocal in nature: polynucleotides rely on proteins for their polymerisation and vice versa. The question of which arose first is a chicken and egg conundrum that has dogged the field since the replication mechanisms were first elucidated (Giri and Jain 2012). In this work we suggest that, consistent with common ancestry and in contrast with the RNA world theory, the earliest replicator was a two - rather than a one - component system, composed of peptide and nucleic acids.

Assumptions of the model
We postulate that, in a nucleopeptide reciprocal replicator, the use of each component according to its strengths could deliver a viable IDA more compatible with evolution to LUCA replication machinery. Although seemingly more complex than an individual replicating molecule, the resulting unified abiogenesis theory answers many hard questions and is ultimately more parsimonious. In constructing our model, we make the following assumptions:

  1. The existence of random sequences of short strands of mixed nucleic acids (XNA) likely consisting of ribonucleotides, deoxyribonucleotides and possibly other building blocks, as well as the existence of random amino acids and short peptides produced abiotically.

    For this first assumption we have supposed a pool of interacting amino acids, nucleotides and related small molecules as well as a supply of metal ions, other inorganic catalysts and energy. A number of potential early earth conditions and reaction pathways resulting in these outcomes have been proposed, including the formamide reaction (Saladino et al 2012a) and cyanosulfidic chemistries (Patel et al 2015). Pools of pure molecules are unlikely; instead, mixtures would likely have comprised standard and non-standard amino acids as well as XNAs with mixed backbone architectures (Trevino et al 2011; Pinheiro et al 2012). Such conditions would be conducive to the occasional spontaneous covalent attachment of nucleotides to each other to form longer polymer chains (Da Silva et al 2015).

  2. The existence of abiotically aminoacylated short XNA strands (primordial tRNAs (p-tRNAs))

    The second assumption is potentially troubling as amino acid activation is slow and thermodynamically unfavourable. However, amino acylation has been investigated in some detail and has been shown to be possible abiotically including, in some cases, the abiotic production of activated amino acids (Illangasekare et al 1995; Leman et al 2004; Giel-Pietraszuk and Barciszewski 2006; Lehmann et al 2007; Turk et al 2010; Liu et al 2014). Taken together these data suggest that multiple small amino-acylated tRNA-like primordial XNAs could have arisen. Though likely being XNA in nature, we refer to them as p-tRNA, reflecting their function. A similar nomenclature applies to p-Rib and p-mRNA.

  3. Conditions that allow a codon/anti-codon interaction between two or more charged p-tRNA for sufficient time and appropriate geometry to allow peptide bond formation, i.e. the functionality of a primordial ribosome (p-Rib)

    Our proposed p-Rib is an extreme simplification of the functionality of both the present day ribosome and mRNA (Figure 1). Initially, the p-Rib need only have been a (close to) linear assembly template for the p-tRNAs to facilitate the peptidyl transferase reaction through an increase in local concentration. This mechanism is simple enough to emerge spontaneously and matches exactly the fundamental action of the extant ribosome (Figure (iii)). The idea that a p-Rib may have an internal template rather than separate mRNA molecules and that an RNA strand could act as a way to bring charged tRNAs together has previously been suggested (Schimmel and Henderson 1994; Wolf and Koonin 2007; Morgens 2013) and is known as an “entropy trap” (Sievers et al 2004; Ruiz-Mirazo 2014). The concept has been demonstrated to be experimentally viable (Tamura 2003) although in the latter case it is the primordial ribosomal rRNA strand itself that provides one of the two reacting amino acids.

    Figure 2: Models of primitive polymerization reactions. An XNA strand can function like a primordial ribosome (p-Rib) whereby one strand (+ strand) can template the production of a primordial polymerase (p-Pol) as indicated by the solid arrow. The action of this p-Pol is represented by the double-headed dotted arrow whereby it acts on the p-Rib (+ strand) to catalyse synthesis of the complementary sequence (- strand) and also on the - strand to produce more of the + strand.

    A functional operational system requires preferential charging of particular p-tRNAs to specific amino acids. Although there is evidence for such relationships in the stereochemical theory (Woese 1965; Yarus et al 2009), so far unequivocal proof has been elusive (Yarus et al 2005b; Koonin and Novozhilov 2009). However, there is sufficient evidence to suggest at least a separation along grounds of hydrophobicity and charge using just a two-base codon (Knight and Landweber 2000; Biro et al 2003; Rodin et al 2011). Furthermore only a reduced set of amino acids (Angyan et al 2014) - possibly as few as four (Ikehara 2002) - need to have been provided in this way. The “statistical protein” hypothesis proposes that such a weak separation may have been sufficient to produce populations of active peptides (Ikehara 2005; Vetsigian et al 2006). Such “primordial polymerases” (p-Pol) need only have been small (see below) and spontaneous emergence of a template coding loosely for such a sequence seems plausible. The failure rate of such syntheses would be high but a p-Rib using the outlined primordial operational code to produce statistical p-Pol peptides could have been accurate enough to ensure its own survival.

  4. The viability of a very short peptide sequence to function as an RNA-dependent RNA polymerase

    Templated ligation is often proposed as a primordial self-replication mechanism, particularly for primitive replication of nucleic acid in RNA world type scenarios. However, these are associated with a number of problems as mentioned earlier. In addition, extant RNA/DNA synthesis proceeds via terminal elongation (Paul and Joyce 2004; Vidonne and Philip 2009). To be consistent with the mechanism present in LUCA and pre-LUCA, the p-Pol should, preferably, have used a similar process.

    During templated ligation, a parent molecule binds and ligates short substrates that must then dissociate to allow further access, but the product has greater binding affinity than the substrates and dissociation is slow. This product inhibition results in parabolic growth and limits the usefulness of templated ligation for replication (Issac and Chmielewski 2002). Conversely, in 1D sliding (or more accurately jumping), the catalyst may dock anywhere along a linear substrate and then diffuse by “hops” randomly in either direction until it reaches the reaction site; a successful ligation reaction has little impact on binding affinity and leaves the catalyst proximal to the next site. For simplicity our model assumes a single binding event between p-Pol and p-Rib followed by multiple polymerization events. A p-Pol proceeding via 1D sliding could catalyze phosphodiester bond formation between nucleotides bound by Watson and Crick base-pairing to a complementary XNA strand. Because p-Pol activity would be independent of substrate length, a relatively small catalyst could have acted on XNAs of considerable size. From inspection of present day polymerases such a peptide may have included sequences such as DxDGD and/or GDD known to be conserved in their active sites and consisting of the amino acids thought to be amongst the very earliest in life (Iyer et al 2003 Koonin 1991).

    In our simple system any such p-Pol must be very short to have any realistic chance of being produced by the primitive components described. We must therefore ask if there is evidence that small (e.g. less than 11 amino acid) peptides can have such a catalytic activity. Catalytic activity in general has been demonstrated for molecules as small as dipeptides (Kochavi et al 1997). For polymerase activity in particular, it is known that randomly produced tripeptides can bind tightly and specifically to nucleotides (Schneider et al 2000; McCleskey et al 2003). We suggest that a small peptide could arise with the ability to bind divalent metal ions, p-Rib and incoming nucleotides. It is interesting to note that small peptides can assemble into large and complex structures (Bromley et al 2008; Fletcher et al 2013) with potentially sophisticated functionality: di-and tripeptides can self-assemble into larger nanotubes and intriguingly it has even been suggested that these structures could have acted as primitive RNA polymerases (Carny and Gazit 2005).

    In summary, the essence of the model is that on geological timescales, short linear polynucleotides may have been sufficient to template similar base-pairing interactions to those seen in the modern ribosome with small amino-acylated adapters. Given that the majority of ribosome activity stems from accurate substrate positioning, such templating could be sufficient to catalyze peptide bond formation and to deliver phosphodiester-bond-catalytic peptides. As backbone ligation reactions are unrelated to polynucleotide sequence, these generated primordial enzymes could have acted on a large subset of the available nucleic acid substrates, in turn producing more polynucleotide templates and resulting in an autocatalytic system.

Mathematical Model

The IDA described above is attractive both for its simplicity and continuity with the existing mixed (protein/nucleic acid) replicator system in extant cells. However, the question remains as to whether such a system is mathematically consistent, could avoid collapse and instead become self-sustaining. The number of parameters and variables needed to analyse the system in its full complexity is such that one is led to consider simplified models which nevertheless capture essential features of interest. Here we consider a simple model of RNA-protein self-replication.

  1. Constituents

    The main constituents of the simplest model of XNA-protein self-replication considered here (see also Figure 1b and Figure 2) are a pool of free nucleotides and amino acids, polypeptide chains - including a family of polymerases - and polynucleotide chains as well as primordial tRNAs (p-tRNA) loaded with single amino acids.

    We introduce some notations. Generically, we consider polymer chains made of types of building blocks labelled . In our models, the polymer chains are polypeptides and polynucleotides, and the building blocks are amino acids and codons respectively. With a slight abuse of language, we call the number of constituents (building blocks) of a polymer chain its length. So hereafter, ‘lengths’ are dimensionless. The order in which these constituents appear in any chain is biologically significant, and we encode this information in finite ordered sequences of arbitrary length denoted , whose elements label the building blocks forming the chains, in the order indicated in the sequences. Each element in the sequence is an integer in the set which refers to the type of building block occupying position in the chain. There are therefore sequences of length if the model allows types of building blocks. For instance, the sequence in a model with, say, types of building blocks (amino acids or codons), corresponds to a polymer chain of length 5 whose first component is a type 1 building block, the second component is a type 4 and so on. Given a sequence , we introduce subsequences (resp. ), , whose elements are the leftmost (resp. rightmost) elements of . In particular, , and . We write

    In what follows we sometimes refer to families of polymer chains differing only by their length and obtained by removing some rightmost building blocks from a chain of maximum length . Denoting by a polymer chain of length and sequence or subsequence , both having elements with , the family of polymer chains obtained from a chain of maximal length and sequence is given by .

    In the specific case of XNA/polynucleotide chains entering our model, we use and the sequences are generically labelled as . Their elements correspond to types of codons, and the complementary codon sequences in the sense of nucleic acids complementarity are . Therefore a large class of XNA strands of length and sequence are denoted by , and in particular, is a codon of type . Besides the generic sequences introduced above, a sequence denoted , together with its subsequences and for play a specific role: they correspond to polynucleotide chains that template the polymerisation of a family of primordial peptide polymerases (p-Pol) through a process described in the next subsection, see also Figure 3. Using to denote polypeptide chains, this family of polymerases derived from of maximal length , is . These polymerases are such that , with an amino acid . We use the notation for a generic polymerase in the family. Alongside these polymerases, generic polypeptide chains of length and sequence are labelled as . Proteins of length 1, , are single amino acids of type .

  2. RNA-Protein replication scenario

    The scenario relies on three types of mechanisms:

    • The spontaneous polymerisation of polynucleotide and polypeptide chains, assumed to occur at a very slow rate, and their depolymerisation through being cleaved in two anywhere along the chains at a rate independent of where the cut occurs.

    • The non-spontaneous polypeptide polymerisation occurring through a polynucleotide chain on which several p-tRNA molecules loaded with an amino acid dock and progressively build the polypeptide chain. More precisely, each codon of type of the polynucleotide chain binds with a p-tRNA, itself linked to an amino acid of type . Note that we assume the same number of types of codons and amino acids. This leads to a chain of amino acids matching the codon sequence of the polynucleotide chain. The process is illustrated in Figure 3 for a polypeptide chain of length and amino acid sequence .

    • The duplication of a polynucleotide chain , of length , as a two-step process. In the first step, a polypeptide polymerase , obtained by polymerisation via mechanism (B) using a polynucleotide , scans the polynucleotide chain to generate its complementary polynucleotide chain . This is shown in Figure 4. The resulting polynucleotide chain is then used to generate a copy of the original polynucleotide chain via the same mechanism (C).

    Figure 3: Mechanism (B): Polypeptide polymerisation in our model. The square boxes represent the codons of a polynucleotide chain (here, of length ) and the circles represent amino acids. The p-tRNA molecules are labelled .
    Figure 4: First phase of Mechanism (C): Polymerisation of the complementary polynucleotide chain catalysed by a primordial polymerase .

    The replicator crudely operates as follows:

    • Mechanism (A) provides a small pool of polymer chains; among them, one finds short strands of XNA with dual function (p-mRNA and p-Rib)

    • Mechanism (B) provides polypeptide chains, including the polymerases (p-Pol, called here), by using the XNA produced through Mechanism (A) and Mechanism (C)

    • are involved, through Mechanism (C), in the duplication of polynucleotides present in the environment, including the strands of XNA that participate in the very production of

  3. Reactions driving the replication and physical parameters

    For simplicity, we consider the polymerisation of polypeptide chains and the duplication of polynucleotide chains as single reactions where the reaction rates take into account all sub-processes as well as failure rates.

    This leads to the following schematic reactions:


    where denotes primordial tRNA loaded with a single amino acid.

    The parameters for these reactions are (see the Supplementary Information for more details on the estimation of the parameter values):

    • : polymerisation rate of polynucleotide chains [Equation (1)]; we have estimated the catalysed XNA polymerisation rate to be .

    • : depolymerisation rate of polynucleotide chains (hydrolysis) [Equation (2)]; taken to be .

    • : polymerisation rate of polypeptide chains [Equation (3)]; we have estimated it to be .

    • : depolymerisation rate of polypeptide chains of length and sequence [Equation (4)]; we have estimated it to be in the range .

    • : polymerisation rate of a polypeptide of length from the corresponding polynucleotide chain [Equation (5)]. It is reasonable to assume that and we have estimated to be .

    • : the rate at which a polymerase attaches to a polynucleotide chain [Equation (6)] which we have estimated to be .

    • : the rate of attachment of a free polynucleotide to a polynucleotide chain attached to a p-Pol [Equation (6)]. We have estimated it to be .

    • : the rate at which a polymerase moves by one step on the polynucleotide [Equation (6)]. We have estimated it to be in the range to .

    We now argue that the three parameters and enter the dynamical system for the polymer concentrations in our model as two physical combinations denoted and that we describe below.

    First recall that we assume the existence of a pool of nucleotides, amino acids and p-tRNA. The amount of free nucleotides and amino acids is taken to be the difference between the total amount of these molecules and the total amount of the corresponding polymerised material, ensuring total conservation.

    We denote the concentration of polypeptide and polynucleotide chains respectively by and , all expressed in . In particular, and are the concentrations of each type of free amino acids and nucleotides respectively, and we assume, for simplicity, that all types of amino acids/codons are equally available.

    We also assume that the amount of loaded p-tRNA, , remains proportional to the amount of free amino acids and that the concentration of p-tRNA is larger than so that most amino acids are loaded on a p-tRNA. With these conventions, one has


    Total reaction rate of polynucleotide polymerisation

    If a complex reaction is the result of one event at rate , and other, identical, events at rate , the average time to complete the reaction is the sum of the average times for each event. Hence the reaction rate is given by

    One such complex reaction in our model is the polymerisation of a polynucleotide chain of length , say, from its complementary chain (second phase of Mechanism (C)). Polymerases are characterised by the polymerising efficiency which, we assume, increases with , up to . The first step in polymerisation requires a polymerase to attach itself to the template polynucleotide. This is only possible if the template polynucleotide has a minimum length, which we assume to be . In the following, we assume that polymerases can polymerise polynucleotide chains of any length greater or equal to . The corresponding reaction rate is given by for a polymerase of length .

    The free nucleotides must then attach themselves to the polynucleotide-polymerase complex and the polymerase must move one step along the polynucleotide. The rate for each of these steps is

    and hence, the rate of polymerisation for a polynucleotide of length and polymerase of length is . However, it is assumed that polymerases of several lengths are available and therefore, the total rate is given by

    where it is understood that is the lower bound length for polymerase activity and is a quality factor given by

    Indeed, we expect long polymerases to be more efficient, so is taken to increase with in the range , while polymerases of length have the same level of activity as those with length i.e. .

    To avoid proliferation of parameters in our simulations, we have taken , where is the maximal polynucleotide chain’s length.

    Binding probability of a polynucleotide and a polymerase of length

    First note that it takes times longer to synthesise a polypeptide chain of length from its corresponding polynucleotide chain than it takes for one amino acid to bind itself to the polynucleotide. The rate is thus given by .

    We now offer some considerations on depolymerisation. We assume that if a polymer depolymerises, it does so by (potentially consecutive) cleavings. In the first step, can cleave in different positions, resulting in two smaller chains with and . This is the origin of the factor in the terms describing the depolymerisation of polymer chains in the dynamical systems equations presented in the next subsection.

    The concentration variations resulting from such depolymerisations must be carefully evaluated. A polymer of length and sequence , where stands for any of , or , can be obtained by cleaving a polymer of length and sequence where is a sequence of length . Similarly it can be obtained by cleaving of sequence where is also of length . If the rate of cleaving, , is assumed to be independent of the polymer length, and since there are different sequences and , where is the number of amino acid or codon types, the rate of concentration variation of polymers of length resulting from the depolymerisation of longer polymers is


    Recall that we use the same notation for the concentration of a polymer of sequence and length and the polymer itself, namely , and is supposed to be set to or in our model. As already stressed, we assume polymers have at most length . Finally, when the concentrations and are equal, [Equation (8)] can be rewritten as

    The depolymerisation of polymerase requires special treatment. When depolymerises, it generates a polymerase with . On the other hand, any can be obtained through depolymerisation of one of types of polymers of length , one of which being and the remaining being of type with , or with any of the types of amino acids. More generally, they can be obtained from and polymers of type where and with , or for any type . The same is true for the corresponding polynucleotide chains.

    When the polymerase is bound to a polynucleotide, it becomes more stable either through induced folding of a (partially) unfolded sequence, or through the inaccessibility of bound portions, or both. We thus define as the depolymerisation reduction coefficient for the bound polymerase of length , with that reduction coefficient being 1 when no depolymerisation occurs at all. We estimate it to be

    with a parameter controlling how much of the polymerase is stabilised. The term can be interpreted as a Boltzmann factor with a free energy expressed in units of . The hydrogen bond binding energy between RNA and a polypeptide is approximately [Dixit et al (1999)], so assuming that the number of such hydrogen bonds between the polymerase and the polynucleotide is , one has .

    The binding rate of a polymerase to a polynucleotide of length and sequence is where is the total number of polynucleotides of length . The probability that a polymerase of length binds to a polynucleotide of length is therefore given by

    The total time the polymerase remains bound to a polynucleotide of length is estimated to be . Therefore the probability for a polymerase to be bound is given by the average binding time divided by the sum of the average binding time and the average time needed to bind:

    As a result the polymerase depolymerisation rate will be

  4. Equations

    For any chain of length , our model considers the concentrations of polynucleotides and polypeptides corresponding to the polymerase sequence , its complementary sequence and the generic sequences . We assume that the concentrations of polynucleotides and polypeptides of a specific length, bar the polymerase and its complementary sequence, are identical. For the chains that share the first elements of their sequence with those of the polymerase (or its complementary chain), and differ in all other elements, this is only an approximation, but it is nevertheless justified, as the concentrations of these polymers only differ slightly from those of polymers with sequences of type , and their contribution to the variation of the polymerase concentration is expected to be small.

    The variations in polymer concentrations as time evolves are governed in our model by a system of ordinary differential equations. In the equations, is the length of the polymer chains, spanning all values in the range where is the maximal length of polypeptide and polynucleotide chains. We thus have a system of equations. We recall that is the number of codon types, assumed to be equal to the number of amino acid types.


    Alongside the seven physical parameters appearing in the differential equations above, we need to consider two parameters yielding the ‘initial’ concentrations of amino acid and nucleotide inside the system, namely and . In the absence of actual data for these quantities, we explore a range of realistic values in the analysis of our model. The concentration of free amino acids and nucleotides at any one time is then given by and respectively, with for any value of in the range and sequence .


The system of equations [Equation (9)] is non-linear and too complex to solve analytically. We therefore analyse it numerically, starting from a system made entirely of free nucleotides, amino acids, as well as charged p-tRNA, and letting the system evolve until it settles into a steady configuration.

The main quantities of interest are the relative concentrations of the polymerase () and of the peptide chains (). We have

and evaluate the ratios

while monitoring the evolution of each quantity over time. corresponds to the relative amount of polymerase of any length compared to other proteins (for a specific arbitrary sequence ), while corresponds to the relative amount of polymerase of length compared to an arbitrary protein of length . Unit ratios indicate that the polymerase has not been selected at all, while large values of or on the other hand indicate a good selection of the polymerase.

The complexity of the system [Equation (9)] also lies in the number of free parameters it involves. A systematic analysis of the high-dimensional parameter space is beyond the scope of this article, and we therefore concentrate on the analysis and description of results for a selection of parameter values that highlight potentially interesting behaviours of our model.

Recall that our model assumes that the number of different amino acids is equal to the number of codon types, and throughout our numerical work we have set . Note that the word ‘codon’ here is used by extension. Indeed, there are four different nucleic acids in our model and the ‘biological’ codons are made of two nucleic acids, bringing their number to sixteen. However, they split into four groups of four, each of which encoding one of the four amino acids. From a mathematical modelling point of view, this is completely equivalent. It is well accepted that early proteins were produced using a reduced set of amino acids (Angyan et al 2014). The exact identity and number is unclear though experimental work has shown that protein domains can be made using predominantly five amino acids (Riddle et al 1997) while the helices of a 4-alpha helix bundle were made using only 4 amino acids (Regan and DeGrado 1988). We have used mostly and , but have investigated other values as well.

While these figures are somewhat arbitrary, an of 7 was chosen as it corresponds to the typical minimum number of amino acids required to produce a stable, folded alpha helix structure (Manning et al 1988). The choice of is based on the fact that while the polymer peptide chains could be significantly longer, they would need correspondingly long polynucleotide sequences to encode them, which becomes increasingly unlikely as lengths increase. Furthermore, we expected polymers of length 10 to have very low concentrations, a hypothesis confirmed by our simulations. We have nevertheless investigated larger values of as well, and found little difference, as outlined below.

In a first step, guided by data on parameter values gleaned from the literature and gathered in the Supplementary Information section, we set


We let the system evolve under a variety of initial concentrations of free amino acids and nucleotides, and , in the range , and with all polymer concentrations set to . We monitored the concentration of all polymers, in particular the concentration of polymerase and its ratio to the concentration of polypeptide chains, . In most cases we found that the nucleotides polymerised spontaneously (Mechanism (A)) in small amount and this led, indirectly, to the polymerisation of the polypeptides, including the polymerases (Mechanism (C)). The polymerases then induced further polymerisation of the polynucleotides (Mechanism (B)) and the system slowly equilibrated.

The end result was an excess of polymerase of all lengths compared to polypeptide chains with for all initial concentrations (Figure 5). Moreover the total amount of polymerase reached, for initial concentration of free amino acids , was a concentration of approximately (as illustrated by the bottom 2 rows in Table 1). The concentration of polymerase of length 10, on the other hand, was very small for but was very large, effectively showing that the only polypeptide chain of length was the polymerase.

We found hardly any polymerisation of the polymerase when , with and , while with , we obtained and (fig 5 a). This highlights a very sharp transition at a critical concentration above which polymerases are generated. We summarise the data in Table 1.

polymerase production
Table 1: Sharp transition in the production of polymerases due to variations in the initial concentrations of free peptides and nucleotides, all other parameters kept fixed at the values [Equation (10)].

We then fixed the initial concentration to four different values and varied to identify the critical initial concentration of nucleotides necessary for the production of polymerases. The results in Table 2 show that the critical concentration is nearly constant and of the order of for a very wide range of amino acid initial concentrations.

Table 2: Values of the free nucleotide initial critical concentration given the initial concentration of free peptides. The other parameters are kept fixed at the values [Equation (10)].
                           (a)                            (b)
Figure 5: a) Time evolution of the polymerase for initial concentration , and . b) for initial concentration . Parameter values: , , .

Many of the parameters we have used were estimated or measured in conditions which, in all likelihood, were not identical to the ones existing when the polymerisation we are modelling occurred. In a second step, we departed from the set of values [Equation (10)] and found that in all cases investigated, varying these parameters modified the critical concentrations of and , but did not affect significantly the value of while remained extremely large.

More specifically, taking marginally increased the critical concentration to . Similarly, taking increased slightly the critical concentrations: . On the other hand, taking lead to a decrease of the critical concentrations: . Varying to values as small as did not change the critical concentrations.

In our model we have considered the concentrations of free amino acids () and charged p-tRNA to be identical: (see [Equation (7)]. To consider other values of we only need to multiply the polymerisation rate of a peptide () by as it is p-tRNAs that bind to XNA chains, not free amino acids. We have considered a large range of values for and found that for , the critical concentrations had not changed significantly while for , they increased to . This shows that taking much smaller values of has a very small impact on our results and that having a concentration of charged p-tRNA much smaller than that of free amino acids would only increase marginally the critical concentrations we have obtained using our original assumption.

The parameters on which the model is the most sensitive are and . We found that for , and for , . Similarly, for we found that and for that . This shows that the spontaneous polymerisation of polynucleotide is essential to reach a minimum concentration of polynucleotides to kick start the whole catalysis process and that the stability of the polynucleotides plays an important role.

To investigated this, we have run simulations with for a fixed duration, , after which was set to . We found that if was long enough, the polymerisation of polypeptide and polynucleotide chains was identical to the one obtained when was not modified. When was too short, on the other hand, one was only left with short polypeptide and polynucleotide chains in an equilibrium controlled by the spontaneous polymerisation and depolymerisation parameters. The minimum value for depends on the concentrations and and the results are given in Table 3.

18000 254 12.7 years 2.2
Table 3: is the minimum duration of spontaneous polynucleotide polymerisation, given here for different initial concentrations of free nucleotides needed to induce large polymerase concentrations.

This shows that while is an important parameter in the process, what matters is to have a spontaneous generation of polynucleotides at the onset (Mechanism (A)). This then leads to the production of polypeptides, including polymerase (Mechanism (C)) and, once the concentration of polymerase is large enough, the catalysed production of polynucleotides (Mechanism (B)) dominates the spontaneous polymerisation.

We have also varied once the system had settled and we found that for , could be increased up to while still keeping a large amount of polymerase. Above that value, the polynucleotides are too unstable and one ends up again with mostly short polymer chains and .

We have also considered values of and found that the main difference is a slight increase of the critical concentrations. For example, for and , are respectively equal to and . At given concentrations and remain unchanged but deceases approximatekly by a factor of each time is increased by 1 unit.

We have also taken and and found that the critical concentrations were respectively and , while took the values of approximately , and . on the other hand remained constant.

A summary of the parameter values investigated outside the set [Equation (10)] and the corresponding critical concentrations are given in Table 4. Only one parameter was changed at a time.

Modified Parameter