Is modularity the reason why recombination is so ubiquitous?
Homologous recombination is an important operator in the evolution of biological organisms. However, there is still no clear, generally accepted understanding of why it exists and under what circumstances it is useful. In this paper we consider its utility in the context of an infinite population haploid model with selection and homologous recombination. We define utility in terms of two metrics - the increase in frequency of fit genotypes, and the increase in average population fitness, relative to those associated with selection only. Explicitly, we explore the full parameter space of a two-locus two-allele system, showing, as a function of the landscape and the initial population, that recombination is beneficial in terms of these metrics in two distinct regimes: a relatively landscape independent regime - the search regime - where recombination aids in the search for a fit genotype that is absent or at low frequency in the population; and the modular regime, where recombination allows for the juxtaposition of fit “modules” or Building Blocks. Thus, we conclude that the ubiquity and utility of recombination is intimately associated with the existence of modularity and redundancy in biological fitness landscapes.
keywords:Population genetics Model, Building Blocks, Fitness Landscapes, Modularity, Redundancy
The existence, prevalence and utility of genetic recombination is an old and enduring puzzle of biology maynard1971use (). Seminal works, such as FeldmanEshel (); Fisher30 (); Kondrashov88 (); Muller32 () among others, have provided theoretical justifications that add to a long list of putative mechanisms that may account for recombination’s enduring role in most higher species. Classic Felsenstein () and more contemporary reviews BartonAndCharlesworth (); Watson () on the subject summarize many of these candidates. Even though the number of potential explanations is large, none of them has been found compelling enough to have settled the debate. Additionally, some older propositions have come under more scrutiny thanks to improved experimental data Kouyos (); KeightleyAndOtto (), and it has even been suggested that the hidden value of sexual recombination might not even lie mainly in the improvement of genetic variability or fitness, or in its defining properties. As stated in OttoAndLenormand (): “…it is generally accepted that the long-term maintenance and ubiquity of Eukaryotic sex cannot be explained as an approximate consequence of the inherent properties of sex itself.”, a position exemplified in Baker76 (), where it is suggested that recombination might serve mainly as a stabilizer of mitosis, and that any drawn benefit regarding genetic inheritance is circumstantial. The plethora of proposed models ranges from simple ones that are case based FeldmanEshel (); feldman1972selection (); otto1997deleterious (), to sophisticated simulations that incorporate many-locus, multiple allele genotypes, dynamic recombination rates and sites Barton (); liberman2008evolution (); zhivotovsky1994evolution (), different levels and types of epistasis, mutation, complex and variable fitness landscapes, etc. Charlesworth (); KeightleyAndOtto (). Studies typically focus on measuring the effects of recombination on average fitness, but others concentrate on other quantifiable benefits; Watson (), for example, reports the virtues of recombination regarding the exploration of the fitness landscape, while in pepper2000evolution () the change over generations of the genetic linkage distance between epistatic units is discussed; and christiansen1998waiting () focuses on the mean time for a beneficial epistatic group of two alleles to appear on the same gamete with and without recombination. For a review on the experimental backing or counterevidence to theoretical explanations for the prevalence of recombination see Rice02 ().
Of course, if we are to understand the benefits of recombination in the context of a mathematical model, a requirement is that the model itself captures the very mechanisms by which it is useful in the first place. This then leads us to ask if the apparent inability to find an agreed universal advantage for recombination is due to the fact that the considered models are incapable of modeling the benefits - a defect of the model - or, rather, that the benefits are not transparent in the analyses of the models that have been studied. If the models themselves are inadequate then new models with new features must be developed. On the contrary, if the analyses themselves are at fault, one must understand why. In this paper we will start with the hypothesis that standard population genetics models are capable of showing universal mechanisms by which recombination is useful. However, by restricting to a simple two-locus two-allele model we will be able to exhaustively study the full parameter space of the model. We will show that the reason why universal mechanisms have been difficult to identify is twofold: that the benefits are more visible in terms of Building Blocks (subsets of loci defined by the recombination distribution) not genotypes, as in standard analyses, and that the benefits of recombination are particularly associated with “modular” landscapes which will be discussed below. Thus, we believe, the results of this paper link two fundamental concepts in biology - the utility and ubiquity of recombination with the existence of modularity.
2 Recombination - a Building Block Perspective
In this section we introduce the theoretical framework and the chief diagnostics we will use to examine the utility of recombination. As we are interested here in the interaction of selection and homologous recombination we will omit mutation. We will consider the evolution111We will restrict attention here to a generational model with no overlap. of a population of length haploid sequences governed by the equation stephens1 ()
where is the expected frequency of genotype at generation . In the first term on the right-hand side is the selection probability for the genotype . For proportional selection, which is the selection mechanism we will consider here, , where is the “survival” fitness222By survival fitness, in the absence of factors such as fertility, differences in mating success etc., we mean viability, the probability to reach reproductive age, in distinction to absolute fitness which measures the overall reproductive success of a type. of genotype , is the average population fitness in the th generation and is the proportion of genotype in the population. In the second term, the recombination distribution, , is modeled using the concept of a recombination mask , which is such that, if , the th locus of the offspring is taken from the th locus of the first parental sequence, while, if it is taken from the th locus of the second parental sequence. Finally, is the Selection-weighted linkage disequilibrium (SWLD) coefficient stephens:03 () for the genotype . Explicitly,
where is an indicator function that represents the conditional probability that the offspring genotype is formed given the parental genotypes and and the mask . For example, for two loci, , with binary alleles, and , , while . The contribution of a particular mask depends, as we can see, on all possible parental combinations. In this sense, , in the space of genotypes, is an exceedingly complicated function. In the case of diploids, the SWLD coefficient is equivalent to the functions of Nagylaki nagylaki1999convergence () and described in burger2000mathematical (). For a given target genotype and mask, is a matrix on the indices and associated with the parents. For binary alleles, for every mask there are possible combinations of parents that need to be checked to see if they give rise to the offspring . Nevertheless, only elements of the matrix are non-zero. The question is: which ones? Although, , or equivalently or , gives a complete summary of the effect of recombination in a given generation it is an exceedingly complicated function to analyze. However, the complication of in terms of genotypes is just an indication of the fact that the latter are not a natural basis for describing the action of recombination.
A more appropriate basis is the Building Block Basis (BBB) stephens:03 (); stephens_stadler (), wherein only the Building Block (BB) schemata that contribute to the formation of a genotype enter. In this case 333Equation (1) with the substitution of equation (3) has a long history, starting with the seminal work of Hilda Geiringer geiringer44 () who derived a version of the equation for a diploid population without selection. Versions of the equation were then rederived and discussed in altenberg1995schema (), who used it to discuss the performance of recombinative Genetic Algorithms using Price’s theorem, showing that schemata were a natural consequence of recombination; and in stephens1 (); stephens1998effective () where the Building Block Hypothesis was examined and it was discussed under what circumstances recombination led to an increase in the effective fitness of a given genotype. Also, in the latter the relation to the concept of coarse graining was emphasized and discussed.
where is the selection probability of the BB and is the complementary block such that . Both blocks are uniquely specified by the associated recombination mask, . For instance, for three loci, , if and then and , where is the canonical “wildcard” symbol, familiar from Evolutionary Computation, indicating that the corresponding locus has been summed over thus leading to marginal probabilities. Thus, the probability for the schema is . The selection probability for the BB schema is , where the fitness of is and depends on the actual composition of the population. It is important to emphasize that the SWLD is distinct from the well-known linkage disequilibrium coefficient, , which depends only on the allele frequencies and the crossover mask , and not on the fitness landscape. In the case of a flat fitness landscape, , but not otherwise. In particular, a population at linkage equilibrium with does not necessarily satisfy . Selection effects generally move the system away from the Geiringer or Robbins manifold stephens1 (); poli2002allele (), which is the set of points in the space of populations defined by . In terms of BBs,
with and being the frequencies, not the selection probabilities, of the BBs and . Therefore, in linkage equilibrium implies , i.e., the probability to find any genotype is the same as the product of the probabilities to find its constituent BBs. Thus, at linkage equilibrium the SWLD coefficient is given by
Note that the structure of is particularly simple when both and are BB schemata. For a given and one unique BB, , is picked out. The second BB then enters as the complement of in . This means that is skew diagonal on the indices and , with only one non-zero element on that skew diagonal for a given and . At a particular locus of the offspring, the associated allele is taken from the first or second parent according to the value of . If it is taken from the first parent, then the corresponding allele in the second parent is immaterial. As seen above, this fact is represented by the normal schema wildcard symbol . It is important to emphasize that the BBs form an alternative basis to that of the genotypes. This means that genetic dynamics can not only potentially be described without any reference to genotypes but also that with the dynamics of the BBs the dynamics of any and all genotypes can be derived. For instance, for two loci with binary alleles, and , the possible genotypes are , , and . The corresponding BBs are , , and , where we arbitrarily chose the genotype as the type around which to develop the BBB. The relationship between the two bases is given by
is the coordinate transformation matrix that transforms from one basis to another. As bases, the genotype and BBB have equivalent dynamics. However, the dynamics of recombination is fundamentally simpler in the BBB due to the immense simplification of in the latter. In other words, just as Walsh/Fourier modes goldberg1 (); vose_wright:walshII (); Weinberger:91b (); wright00:_exact_schem_theor () are the natural basis for describing mutation, so BB schemata are the natural basis for describing homologous recombination. They are the natural effective degrees of freedom of any genetic system with recombination.
From Equation (1) for the time evolution of the probability distribution for the system, we may derive the time evolution of any derived quantity, such as the average population fitness, which is given by
2.1 Why Recombination?
As mentioned in the introduction, a great amount of work has been done on trying to understand why recombination is ubiquitous. Here, rather than trying to understand the potential benefits of homologous recombination at the most general phenomenological or conceptual level, we will restrict attention to what we may deduce purely from its mathematical representation in equation (1). Of course, it may be that the benefits of recombination are not manifest in this model. However, given that the model is the generally accepted framework for classical population genetics it behooves us to at least use it as a starting point. Further, we will analyze the model concentrating on two simple metrics for measuring the benefits of recombination, asking: i) under what circumstances can recombination lead to the generation of a higher frequency of a fit offspring than would be the case with only selection? and, relatedly, ii) under what circumstances can recombination lead to a larger increase in the average population fitness relative to selection only? From equations (1) and (8) we see that it is the SWLD coefficient that quantifies the effect in both cases.
From equation (1), we can see that if then recombination leads, on average, to a higher frequency of the genotype than in its absence. In other words, in this circumstance, recombination is giving you more of than you would have otherwise. On the contrary, if then the converse is true, recombination provides less of the genotype of interest than would be the case in its absence. With this is mind, as mentioned, we will consider two complementary metrics to evaluate the utility of recombination in time: the change in number of optimal genotypes from one generation to the next and the change in average population fitness. In the infinite population limit, the former is given by
For fitness-proportional selection,
The first term on the right-hand side is the increase in the number of optimal genotypes due to the effect of selection only and the second term the contribution due to recombination. Now passing to the average population fitness, we can consider two reference points for measuring the effect of recombination relative to selection. The first is to consider in the infinite population limit
where, once again, the first term on the right-hand side is the contribution from selection only, and corresponds to Fisher’s Fundamental Theorem, while the second term is the contribution from recombination. In both and , we are considering metrics that measure the relative contribution of recombination generation by generation, not the cumulative effect of recombination versus selection. As a measure of the latter we consider
Thus, if is positive then the average fitness of the population evolving in the presence of recombination and selection (s+r) is higher than that of the same population evolving in the presence of selection only (s).
For both, generation by generation metrics the qualitative contribution of recombination is purely controlled by the sign of . For increasing the frequency of a fit genotype relative to the case of selection only, we see that this will be the case, passing from generation to generation , if and only if , with the sign and magnitude of fixed completely by the fitness landscape and the actual population. So, whether recombination is beneficial or not passing from one generation to another, in this sense, is equally fixed by the fitness landscape and the actual population. Similarly, the increase in the average population fitness from one generation to the next, relative to selection only, is controlled by the fitness weighted average of and, hence, once again, by the fitness landscape and the current population. However, in the case of the cumulative measure we see that the potential contribution of recombination is more subtle as besides the explicit term there is also the effect of the difference between and which depends implicitly on .
So, once again, we are led to ask first: When is ? The answer is when , i.e., the probability to select the genotype is less than the probability to select its component BB schemata, where the action of recombination is modeled to be such that the blocks are selected independently. There are several distinct regimes in which , which we will explore further and which categorize the different conditions under which homologous recombination can be deemed useful. First, there is the regime in which , i.e., the genotype is non-existent, or at a very small frequency, in the actual population. In this case directly and then, remembering that we are neglecting the effects of mutation, recombination is the only mechanism by which the genotype can be generated. This regime emphasizes the search property of recombination, independent of the fitness landscape.
In general though, as emphasized, the effects of recombination depend on the fitness landscape. Taking the classic Muller’s ratchet argument as a reason why recombination exists it has been shown that modifier genes that lead to higher recombination rates could increase in the presence of negative multiplicative epistasis Charlesworth (); feldman1970quasilinkage (); kondrashov1984deleterious (); charlesworth1993effect (). However, if the epsitasis was too great the effect disappeared. Thus, in the parameter space for the landscape the advantage for recombination only appeared in a smal region and therefore could not be offered as a generic explanation for the ubiquity of recombination and sex. In other work, stephens2007just (); stephens_cervantes () have provided evidence that recombination is particularly beneficial in an additive landscape with zero additive epistasis and very detrimental in a landscape with high positive additive epistasis. A simple way to see this is to eliminate any bias that comes from a particular choice of initial population and assume equal proportions for all genotypes. In this situation, it can be shown that for any that does not cut an epistatic link between loci. For instance, for a genotype , if , i.e., the landscape is additive, then for any . This result is also valid when the correspond to multiple loci when recombination does not cut any epistatic link between the loci. This is the case for a modular landscape, where loci divide up into disjoint sets with epistasis between the loci in a set but not between sets. The benefit of recombination in this case is that it efficiently increases the number of fit non-epistatically linked BBs in an offspring genotype relative to the numbers present in the parental types. On the contrary, for a highly additively epistatic fitness landscape, such as “needle-in-a-haystack” (NIAH)444This landscape corresponds to one optimal genotype with fitness , while other types have equal fitness, . It has been used extensively in molecular evolution in the context of the Eigen model eigen (), where the dynamics is naturally understood in terms of quasi-species. one can show that for all . As is well known, for a multiplicative landscape, .
One may argue, of course, that proving that over one generation for a particular choice of population and in particular fitness landscapes does not correspond to a “universal” mechanism for explaining the benefits of recombination. That is why in this paper we consider the general situation of an arbitrary fitness landscape and an arbitrary population, as well as considering multiple generations. To consider such generality, however, the price we must pay is to restrict to a small number of loci.
So, we would argue that two significant, and potentially related, regimes in which recombination is beneficial are: i) the search regime, where recombination searches for fit genotypes that presently either do not exist or are at very low frequency in the population; and ii) the modular regime, with either weak positive or negative additive epistasis, where recombination allows for the juxtaposition of distinct fit modules in different parental types into an even fitter offspring. Of course, in the search regime the question arises as to whether recombination is more efficient than mutation. This will depend on the Hamming or edit distance between parents and offspring. An example, that we will not consider in more detail, that exhibits the benefits of recombination over mutation in generating innovation, is the development of antibiotic resistance in bacteria through horizontal gene transfer. Generically, it will be the case that the Hamming or edit distance between the original parental sequences, say bacterium and virus, and the offspring sequence, bacterium with viral gene, will be potentially large. In other words, the difference between the initial and final sequences is not a single-nucleotide, or even a small number of them. In this sense, recombination-like555By “recombination-like” we mean any genomic change where one or more sub-sequences in one or more parental sequences are transferred to an offspring sequence. This is termed “generalized recombination” in poli_stephens_genrecomb () and comprehends unequal crossing over, transposition, translocation and related operations, as well as homologous recombination. events are the only way to generate innovation that is associated with large genomic changes, “large” meaning that the Hamming or edit distance between parental and offspring sequences is large.
3 Modularity and Fitness Landscapes
Before considering our explicit model we wish to discuss the concept of modularity in terms of the fitness landscape. For simplicity, we restrict to binary alleles , where refers to the locus. We will consider two representations of the fitness function, a direct one where we use the directly and another one where the fitness function can be written as an expansion of the form
where represents an epistatic interaction between alleles located at loci and . The advantage of this latter representation is that the degree of epistasis between different loci and alleles can be simply deduced.
Any landscape that contains only Fourier components of is said to be an elementary landscape of order . For instance, a completely additive landscape has a fitness function of the form
and is therefore an elementary landscape of order one, as all Fourier components other than order one are zero. This is a consequence of the fact that there are no epistatic interactions between loci. Similarly, a multiplicative landscape, where
is an elementary landscape of order , as all Fourier components other than order are zero, there being epistatic interactions of order between the loci but no others. Other landscapes will be intermediate between these extremes. Once again, we emphasize here that we are measuring epistasis relative to the additive limit not the multiplicative one as has been the norm in most papers on recombination and population genetics.
A particularly interesting class of landscapes in terms of their relevance for recombination are those of “modular” type, where the loci of a genotype partition into disjoint subsets666Intuitively these modules will be formed by contiguous loci such as is natural for an exon or gene., modules, . We will consider two complementary notions of modularity here, one where the landscape can be decomposed as the sum of the individual fitnesses of these disjoint subsets, and one where the fitness is associated with a Boolean ”OR” function on the alleles of the modules. In the first case the fitness of a genotype is given by
the sum of the fitnesses of its constituent modules. This modularity will obviously leave an imprint in the expansion (3). For instance, if each module consists of loci and there is no epistasis between the modules then in (3) we will have for . In the second case, our notion of modularity is associated with the idea of genetic redundancy, whereby the fitness of a genotype is similar in the presence of different copy numbers of a given gene. The extreme limit of this is when the landscape is associated with an “OR” function, so that the fitness of a type is the same whether there is one or multiple copies of a gene. The intuition of a module in this context is that in the presence of redundancy with multiple copy number one, or maybe more, genes can be removed or mutated without affecting the fitness of the type. Thus, a gene acts as a module as it can be changed independently without affecting the fitness of the type. As we will see, this corresponds to a system with a maximal degree of negative epsitasis.
As mentioned previously, a full analysis for loci with arbitrary landscape and population is prohibitively difficult, so here we will focus on the case of two loci, as in this case we can study in the context of an exactly solvable model the different regimes under which recombination can be beneficial. So, restricting ourselves to the case of two loci, , we have
For an additive (modular) landscape . For a multiplicative landscape . For a redundant (modular) landscape which, as mentioned, can be understood in terms of a Boolean ”OR”, fitness being the same if either one or both alleles are optimal. For a NIAH landscape which, in contrast to the redundant landscape corresponds to a Boolean ”AND” as fitness is only different if both alleles are optimal.
4 Recombination in an exact two-locus model
4.1 Analytic results
Clearly, trying to characterize the efficacy of recombination quantitatively, and in detail, is prohibitively complicated. As we saw in section 2, however, within the confines of the model we are considering, in a given generation, it can be characterized using only one fundamental function: the SWLD coefficient. The SWLD coefficient, though, depends not only on the recombination distribution, but also on the fitness landscape and the current state of the population. In other words it is a function of a large number of parameters. To circumvent this problem we consider the case of two loci and calculate the SWLD coefficient as a function of the fitness landscape and the population. Note that by two loci here we do not necessarily imply that they represent “genes”. They may represent any two structural units, such as exons, introns or other motifs, or nucleotides themselves, that can be separated or recombined by crossover and which can be characterized, as an approximation, by a fitness landscape that is independent of the rest of the genome.
For two loci all genotypes can be characterized by a multi-index , with , where is the cardinality of the alphabet that labels the loci, or alleles in the case of genes. For , there is only one non-trivial mask777The masks and correspond to cloning, where both offspring loci come from a single parent. , and its conjugate, that lead to the BBs and . The sum over masks in the general expression for the SWLD coefficient is thus reduced to only one term:
Direct evaluation shows that
and thus the evolution equations in the two-allele, two-locus problem are:
The whole state of this system can be characterized by 3 frequencies that are naturally represented in a three dimensional simplex. Figure 1 shows typical population trajectories in the two-locus, two-allele system for a generic landscape, with arbitrarily taken as the optimum genotype and several different initial population ratios.
As far as the fitness landscape is concerned the general parametrized two-locus two allele landscape is
where is the measure of the additive epistasis between the two loci. We take the genotype as the wild type, the genotypes and as single mutants and as a double mutant which is the optimal genotype. There are just three main landscape categories for the two-bit, two-locus model:
The wild type and the double mutant are the anti-optimum and optimum respectively.
One of the single mutants (10 or 01) is the antioptimum.
The two lowest fitness phenotypes are the single mutants.
Any other case can be brought to one of the previous by a relabeling that doesn’t affect the dynamics. In the first two landscape types, a generic population will always eventually evolve towards the global optimum. In the third type, the population may converge to the optimum or the suboptimal wild type depending on the initial population and the recombination probability.888The latter two landscape categories are known as deceptive landscapes of Type I and Type II respectively in the Genetic Algorithm literaturegoldberg1 (). It has been proved takahashi1998convergence () that Type I systems always converge to the global optimum whereas Type II systems converge to the optimum or double mutant depending on the population and recombination probability.
For the optimal genotype
As mentioned, the sign of determines the qualitative effect of recombination in a given generation. To develop some intuition for how the characteristics of the landscape affect our metrics we will set for the moment , i.e., a homogeneous population with no initial bias for one genotype versus another. As the parameter just sets the scale for the landscape we can without loss of generality for fitness proportional selection set . We will also set so that both single mutants have the same fitness. In this case,
For a multiplicative landscape and , as is well known. For an additive landscape and therefore . In this case recombination leads to a higher frequency of the optimal genotype in the next generation than selection alone. For a deceptive landscape, , but and so and recombination in this region of the parameter space leads to a lower frequency of the optimal genotype in the next generation. In terms of BBs, for deceptive landscapes, the marginal fitnesses are such that and , and so the reason why recombination is unfavourable is that the necessary mutant alleles for constructing the optimal genotype are deleterious relative to the corresponding alleles of the genotype . For additive epistasis, such that , we have and recombination once again leads to a lower frequency of the optimal genotype in the next generation than selection alone. Generally, if we take as signifying negative multiplicative epistasis then we see that in such landscapes recombination has a positive effect in terms of our metric and on the contrary for positive multiplicative epistasis. Note that the additive limit corresponds to negative multiplicative epistasis. Interestingly, equation (24) shows that the greatest benefit from recombination, i.e., the minimum value of is associated with landscapes with negative additive epistasis, i.e., . Maximum negative epistasis is given by the minimum value of , . In this case .
Why would this maximum negative epistasis be associated with the utility of recombination, at least in terms of metric (10)? Examining equation (23) we see that the first term, proportional to , corresponds to elimination of the optimal genotype by recombining it with the suboptimal genotype , whereas the term proportional to corresponds to construction of via recombination of the single mutants and . It is the competition between these two effects that measures the benefits of recombination in terms of (10). Additive landscapes with reduce the impact of destruction without compromising the positive effect of reconstruction. Negative epistasis, on the other hand, does not affect the construction of the optimal genotype by recombining the single mutants, but it does minimize the effect of destruction of the optimal genotype. The maximal effect is when and corresponds to a Boolean ”OR” landscape where . This is the situation where there is genetic redundancy, as the fitness of the optimal phenotype requires the presence of only one optimal allele not both. At this naive level we also see that the benefit of recombination is not restricted to small negative multiplicative epistasis but, rather, the larger the additive negative epistasis the larger the benefit conferred by it.
In terms of the metric (11) the contribution from recombination is given by
For this term to give a positive contribution to the average population fitness we require . For this requires , which we will term weak positive additive epistasis. On the other hand, for , and recombination apparently leads to a decrease in the average population fitness, while in the additive limit, , there is no change. Together, a one generation analysis of our two metrics would indicate that there are benefits to recombination from both of them only for weakly positively additively epistatic landscapes such that and . We will characterize these landscapes as being “modular”, i.e., quasi-additive. It is important however, to go beyond a single generation, and for that we will consider metric (13) in section 5.
4.1.1 Muller’s Ratchet.
Muller’s ratchet Muller32 ()999A good, although somewhat dated, review of the different potential mechanisms, and in particular Muller’s ratchet, by which recombination can be beneficial can be found in Felsenstein ()., and variations thereof, have been frequently invoked in considerations of the potential benefits of recombination. Essentially, the argument is that recombination increases the evolvability of a population by allowing beneficial mutations on different genomes to be recombined into one more efficiently than the process of generating a double mutation. Similarly, deleterious mutations can be eliminated more efficiently from a population by having them recombined into a single genome, thus allowing selection to eliminate them more efficiently. We will consider these arguments in the context of our two locus system.
There are two regimes of interest related to Muller’s ratchet, one is that advantageous mutations
appear in a population and the second that deleterious mutations appear. The question is: How does
recombination affect the dynamics of these mutants? Considering the first case, if
we consider the population to be such that the fit double mutant is absent, i.e.,
,101010In this case there is an initial linkage disequilibrium, i.e.,
. then . So
From Equation (26) we see that the number of fit double mutants increases from generation to generation due to the effect of recombination relative to selection only dynamics. This is, in fact, independent of the fitness landscape, being associated with the search regime of recombination alluded to in section 2.1. In contrast, in Equation (27), we see that the average population fitness will increase in the presence of recombination if and only if , which is a direct measure of the degree of additive epistasis between the two loci. As noted, for a purely additive landscape, and so recombination is neutral in this setting. For the other genotypes we have the fraction of wild types increases due to the effect of recombination, while the frequency of single mutants decreases. What happens in the case where will be considered in section 5 as the benefit from recombination then depends on the actual population as well as the landscape.
Turning now to the case of deleterious mutants: in this case we take the wild type to be the genotype and the types and to be deleterious single mutants and to be an even more deleterious double mutant. In this case, just as for beneficial mutants, and hence the proportion of optimal wild types increases. In terms of average population fitness, the increase from generation to is given by Equation (27). In other words the change in average population fitness per generation for the case of beneficial versus deleterious mutations is identical if we are considering the same fitness landscape.
4.1.2 Asymptotic behavior of
Before going on to consider the full numerical solution of the two-locus model we will consider what can be said analytically about the asymptotic behavior of the system. Although there are 7 parameters that control the dynamics, the asymptotic behavior can be most naturally written in terms of just two parameters
where, for brevity, we use for , and
The one generation evolution equation for is
Without loss of generality we again choose to be the optimal genotype. The evolution of the genotype frequencies, , as given by equation (1), ensures the eventual dominance of one of the genotypes111111Karlin, see for example karlin1975general () section vii, has shown that there are no stable polymorphisms in the model type considered in this paper.. The first part of this derivation is analogous to section 3 in feldman1970quasilinkage (). We suppose a priori that the limit
exists, which in turn implies that
With these elements in hand we can calculate the putative limit of equation (30) to find:
Solving this last equation for we obtain:
Finally, since and , we note that the negativity of is equivalent to the condition
which reduces to for . So, we can see that the asymptotic benefit of recombination in terms of increasing the fraction of optimal genotypes relative to selection only, is determined by only 2 parameters - and and is independent of the initial population.
With this formula in hand, we can easily map any fitness landscape to a range of values for and thus determine if recombination will be asymptotically favorable for that particular landscape. we have
To simplify further the visualization of the asymptotic behavior, we again assume that , i.e., that the two mutants have the same fitness. As eventually the optimal genotype dominates for non-deceptive landscapes, recombination is asymptotically neutral. However, how approaches zero depends on . Small values of values of and correspond to a more neutral fitness landscape, where selection effects are small. For an additive landscape and so recombination is asymptotically beneficial in that tends to zero from negative values. Small values of relative to correspond to highly positively additively epistatic landscapes and in this case and recombination is asymptotically disadvantageous in that approaches zero from positive values. The multiplicative landscape with and, hence, , separates the two classes of behavior. The dependence of the parameter (= as a function of and is shown in the next graph: Values of greater than mean that the iterates must eventually reach negative values of . The sign of is then conserved, although the magnitude approaches zero as the system reaches linkage equilibrium associated with a population dominated by the optimal genotype. The opposite happens when . Note that the locus defined by the intersection of the surfaces and is given by and corresponds to the case of multiplicative landscapes.
5 Exact Numerical Results
Turning now to the non-asymptotic behavior, we performed an exploration of the 7 dimensional parameter space of the two-locus, two-allele system to determine under which conditions recombination is beneficial in terms of our two metrics (21) and (22). In such a high dimensional space, visualization of the resulting graphs requires separation into several distinct cases. We set in all the following as just affects the magnitude of the effects of recombination but not whether it is beneficial or not as this is controlled by the sign of . 121212Save for the non-generic values and , there are no important qualitative changes as a function of the recombination probability.
5.1 Recombination as a function of fitness landscape
We first consider graphs for arbitrary fitness landscapes but for a fixed initial population, with a further subdivision into cases made according to the type of initial population. As we have fixed and set we display the graphs as functions of and . The valid region, all fitnesses positive with the genotype as optimum, is given by , and . The deceptive region is given by . For ease of interpretation we also show lines associated with the multiplicative limit (yellow) and the additive limit (green). Note that both the additive and multiplicative limits require . The “needle-in-a-haystack” landscape is given by , and lies on the border that separates non-deceptive and deceptive landscapes. The point , corresponds to a flat fitness landscape where there is no selection pressure.
Two kinds of graphs are provided, one that displays the value of the SWLD coefficient in different generations, and another that displays (Equation (13)), defined as the change in average fitness between generation and generation in a population evolving with both selection and recombination minus the change in average fitness of the same population but evolving with selection only. In the graphs we show four representative time slices - , 2, 6, and 10 generations after the initial one. The plane that separates the recombination advantageous/disadvantageous regimes is displayed (turquoise in the online version). For a given generation, those values of and where are shaded in red (below the plane), while those where correspond to a darker shading (above the plane).
5.1.1 Initial Population
In this first case we consider the dynamics when the initial population is dominated by the non-optimal wild type , with , , , . So, we are here interested in the effects of recombination on the dynamics of favourable mutations as a function of the fitness landscape and in the background of an initial population dominated by a non-optimal wild type. We fix and study the variation in as a function of and , remembering the restrictions and . The most notable feature of 3 is that negative values of are most associated with additive or negatively epistatic landscapes. Note that earlier in the evolution, , the benefits of recombination are clear to see, even for quite positively epistatic interactions with only deceptive landscapes showing a disadvantage. This, however, is due to this region being still in the search regime, as the initial frequency of optimal genotypes was zero. Gradually, the population moves away from the search regime and enters the modular regime, where we see that it is only for landscapes that are either weakly positively epistatic, additive or negatively epistatic that recombination is beneficial. Note that the relative benefit of recombination is not fixed but evolves, thus showing the dependence on the relative frequencies of the different genotypes. In terms of BBs, becomes positive when so, as the frequency of the optimal type increases, eventually recombination becomes unfavourable relative to selection only, with the point at which it becomes unfavourable, , being dependent on the fitness landscape, as well as the initial population.
Turning now to the graphs of the change in average fitness of the population; at , in the search regime, we see that recombination leads to an increase in average population fitness, over and above that of selection only, for basically all landscapes. This is due to the addition of optimal genotypes in an initial population dominated by the non-optimal wild type. Gradually, however the effect of recombination diminishes as one enters the modular regime so that for positively epistatic landscapes the difference between selection only and recombinative dynamics is minimal. However, we note that there is still a strong pronounced effect for either weakly positively epistatic, additive or weakly negatively epistatic landscapes.
So, how do we interpret these results in terms of BBs? Both in the search and modular regimes the advantage of recombination is associated with the fact that BBs of the optimal genotype, and , are recombined to form the type . As the graphs show, this recombination of BBs is, in fact, a more efficient process in generating optimal types and increasing overall population fitness than selection alone for weakly epistatic landscapes. In fact, the benefit in the search regime is actually relatively independent of the degree of epistasis of the landscape. Later on though, in the modular regime, the generation of optimal genotypes by recombining optimal BBs competes against the generation that evolved through pure selection effects. For positively epistatic landscapes, once there are enough optimal types selection can produce new ones as or more efficiently than recombination. For modular landscapes however, recombination retains its advantage. Indeed, this is, in fact, what characterizes the modular regime, i.e., that weakly epistatic BBs or modules are juxtaposed by recombination into even fitter genotypes leading to a faster evolution and a faster increase in average population fitness. The fact that the recombination is even more beneficial in the presence of additive negative epistasis is due to the fact that the destruction of the optimal type produces two single mutants that have fitness very similar to that of the optimal type. This is the advantage of genetic redundancy.
5.1.2 Initial Population
We now turn to the case where the initial population is dominated by the optimal genotype as the wild type with the presence of genotypes with a single deleterious mutation and a small proportion of deleterious double mutant genotypes. Specifically, , , and . The question now is: What is the dynamics of the deleterious mutations in the population as a function of the landscape parameters? Once again, we fix and study the variation in as a function of and ,
In Figure 5 the first thing to notice is that, in distinction to the case where the initial population is dominated by the non-optimal genotype, here there is no dinstinct behavior associated with the search regime, as the optimal genotype is already dominant in the population. Thus, for positively epistatic landscapes the difference due to recombination is small. However, for additive or negatively epistatic landscapes we see that recombination is advantageous, with the advantage being more significant in the presence of negative epistasis. This is due to the fact that in such landscapes the elimination of the suboptimal double mutant is more efficient.
Considering now the average population fitness, we see clearly in Figure 6 how the advantage of recombination manifests itself in the modular regime where epistasis is weak. Interestingly, we see how negatively epistatic landscapes are, in the early part of the evolution, associated with . This is due to the fact that for negative epistasis the overall contribution to the population fitness of a deleterious double mutant and an optimal genotype is less double mutant, selection can eliminate the mutations thereby purifying the population more efficiently than selection alone. The more modular the landscape the more efficient this process becomes.
5.1.3 Initial Population , ,
We now consider a scenario similar to that of sub-section 5.1.1, where the initial proportion of optimal genotypes is zero; but now, however, the frequency of the BBs, and , represented by the beneficial mutants and , relative to the less fit wild type is much higher. Concretely, the initial population is: , , and so that the BBs and form about a quarter of the population each one.
We see in Figure 7 that the graphs are qualitatively similar to those of Figure 3. The chief difference now is that recombination is even more disadvantageous in the search regime for deceptive landscapes than before and more advantageous for modular landscapes - weak or zero positive epistasis or negative epistasis. This is due to the wider availability of the BBs and thus obstructing/facilitating the construction of the optimal type according to whether the landscape is deceptive or modular. As evolution progresses, as before, we see a passage from the search regime to the modular regime, where the relative benefit of recombination is restricted to weakly positively epistatic, additive or negatively epistatic landscapes.
Similarly, in Figure 8 we see a similarity with the corresponding graphs of Figure 4 the average population fitness showing a strong increase, relative to the selection only case, due to the efficient formation of the optimal type, which in its turn is due to the large number of BBs in the population. Even for strongly epistatic landscapes there is a strong benefit to recombination in this regime. At later times, in the modular regime, we see that the advantage of recombination is again associated with additive, weakly positively epistatic or negatively epistatic landscapes, i.e., modular landscapes.
So, we see that the principle effect of increasing the BB frequency in the initial population is to accelerate the rate of evolution so that the frequency of the optimal genotype and the average population fitness increase more rapidly.
5.1.4 Initial Population ,
We now look at an even more extreme case, where the initial population is completely dominated by the single mutants and with the initial population being , , and . Qualitatively the results are as in sub-sections 5.1.3 and 5.1.1; the strong presence of the BBss and leading to a very efficient production of the optimal genotype . This is, in fact, another good illustration of Muller’s ratchet. Although recombination leads to the generation of optimal genotypes it also leads to the production of the sub-optimal double mutants . The latter, however, as the graphs clearly show, are flushed out by selection. In fact, as Figure 9 shows, they are produced and then flushed out most efficiently in the presence of recombination for modular landscapes when compared to selection only.
5.1.5 Initial Homogeneous Population
The final initial population type we will consider is that of a uniform initial population where all genotypes have the same initial frequency, . Here we see behaviour that is qualitatively similar to that found for other populations. The chief difference here is that given the ample presence of the optimal genotype in the initial population there is no search regime and so the dynamics begins and remains in the modular regime. With no population bias we can see the role played by the multiplicative limit with at being positive for landscapes with positive multiplicative epistasis and, particularly, deceptive landscapes. It is negative for weakly postively epistatic, additive and negatively epistatic landscapes. As evolution progresses we can see that the relative advantage diminshes such that at the advantage of recombination is only noticeable for larger negative epistasis.
In terms of average population fitness in Figure 12 we see an analogous story: at average population fitness is increased only for landscapes with negative multiplicative epistasis, up to the additive limit, but is, in fact, negative for negative additive epistasis. However, as evolution progresses, once again, we see the dominant role played by modular landscapes - i.e., weakly positively epistatic, additive and negatively epistatic landscapes.
5.2 Recombination as a function of population
Having explored the effect of recombination on the space of fitness landscapes, by varying continuously the landscape parameters and for a variety of distinct initial populations, we now consider the complementary viewpoint of considering how the effect of recombination changes by varying continuously the initial population for a variety of fixed fitness landscapes. Due to the conservation of probability, the population vector is characterized by only three frequencies. For simplicity of visualization we will consider intitial populations such that and consider the population dynamics as a function of and .
A general observation on almost all the graphs in this section is that since there is generic convergence to the optimal genotype for non-deceptive landscapes so clearly all the surfaces have in the corner.
5.2.1 Additive landscape .
The first landscape we will consider is an additive landscape (). For this landscape (Figure 13) the tendency is clear, that the more BBs and the fewer optimal types there are, the more recombination helps. This is again a manifestion of the search regime. In this landscape, as can be seen at , recombination in terms of is only unfavorable when the proportion of optimal types is appropriately larger than the frequencies of the BBs, as then selection can act more efficiently to increase the frequency of the optimal type than can recombination of the single mutants. However, we see that this effect is temporary. By basically any initial population is associated with . We can see that the SWLD increases in time, approaching zero asymptotically, this regime being associated with the approach to a population completely dominated by the optimal genotype. This dynamics, in fact, shows an important universality associated with recombination, that demonstrates the role of Muller’s ratchet: that the action of recombination is to drive the system to particular frequencies for the optimal type and its BBs that correspond to quite special initial conditions at . To understand this, note that at and the proportion of optimal types is high. If we imagine the value of , for example, that is a consequence of evolution in the presence of recombination, then we can map those values such as to imagine them as initial conditions, say at , for further evolution. However, we can observe at that values of close to 1 correspond to positive values of except in a very narrow wedge where the values of are as high as possible. This wedge is associated precisely with a lower relative frequency of the suboptimal