Number of adaptive steps to a local fitness peak
We consider a population of genotype sequences evolving on a rugged fitness landscape with many local fitness peaks. The population walks uphill until it encounters a local fitness maximum. We find that the statistical properties of the walk length depend on whether the underlying fitness distribution has a finite mean. If the mean is finite, all the walk length cumulants grow with the sequence length but approach a constant otherwise. Experimental implications of our analytical results are also discussed.
Ecology and evolution Probability theory, stochastic processes, and statistics Random walks and Levy flights
The evolutionary process of adaptation is common in nature  and during the last decades, the dynamics of adaptation have been studied in several experiments on microbial populations . The nature of the adaptive process depends crucially on the availability of beneficial mutations that improve the fitness . If such mutations are readily available as in populations of very large size, the dynamics are well described by a deterministic theory  while for moderately large populations, a stochastic theory which accounts for competing multiple mutations can be applied . Here we work in the parameter regime where beneficial mutations are rare and a population of genotype sequences performs an adaptive walk on a fitness landscape [6, 7].
More precisely, the adaptive walk model assumes that the number of mutants produced per generation is small so that the population is genetically homogeneous and may be represented by a single particle. The weak mutation assumption also renders the sequences differing by more than one mutation inaccessible. Furthermore the sequences carrying mutations that decrease the fitness do not survive and hence the adaptive walker always walks uphill. On a rugged fitness landscape with many local optima, the walk ends when a local fitness maximum is encountered since a better fitness is at least two mutations away as illustrated in Fig. 1. Remarkably, under these assumptions, the model depends only on a small set of parameters namely the sequence length and the fitness distribution underlying the fitness landscape. Recently some theoretical predictions for the first step  in the walk were tested in an experiment on a ssDNA virus  and a reasonable agreement between theory and experiment was found. As the adaptive walk describes a simple and biologically realistic model of adaptation, it is important to analyse it in detail to extend our present understanding of adaptation dynamics.
In this Letter, we focus on the statistical properties of the length of adaptive walk defined as the number of beneficial mutations accumulated until the population reaches a local fitness maximum. Recently the walk length distribution was calculated within an approximation for the model described above  and the mean walk length was computed exactly in a simplified version of the adaptive walk . However these studies assume that the fitness distribution has a finite mean. Here we relax this assumption and interestingly, we find that in the limit of infinitely long sequence, there is a transition in the behavior of the walk length distribution: it vanishes for fitness distributions with finite mean but remains finite otherwise. For finite sequences, this result implies that the walk length diverges with the sequence length for distributions with finite mean. For such distributions, we show that all the walk length cumulants grow logarithmically with the sequence length and find the proportionality constant for the first few cumulants. Our analytical results are compared with the numerical results and their experimental implications are also discussed.
We work with binary sequences of length so that each sequence has neighbors which are one mutation away. As the fitness always increases in an adaptive walk (see Fig. 1), the mutants that lower the current fitness of the walker are rejected and a mutant with given fitness is chosen with a transition probability proportional to the fitness difference . Thus the normalised transition probability is given by
where the fitnesses are independent random variables chosen from a common distribution with support on the interval . Following previous works [12, 11], we choose the fitnesses from a generalised Pareto distribution defined as
where the fitness is unbounded for and for . The distribution of the beneficial mutations is however governed by the upper tail of the fitness distribution  and hence can be one of the three universal distributions only [13, 14]. The fitness distribution lies in the domain of the extreme value distribution given by Weibull distribution for , Gumbel distribution if and Fréchet distribution for . Although much of the experimental data on distribution of beneficial mutations is consistent with [15, 9], recent works also support  and .
The adaptive walk in the limits is well studied theoretically. When , the adaptive walk model reduces to a greedy walk  for which the walk length distribution is finite for infinitely long sequences  while for , a random adaptive walk is obtained  for which the walk length distribution is a Poisson distribution with mean . Recently the adaptive walk model described above was studied in detail for and and the walk length distribution was computed . Here we are interested in the properties of adaptive walk when is arbitrary but finite.
Following , we consider the conditional probability that the walker takes at least steps and has a fitness at the th step given that the initial fitness is . For long sequences, one can write down the following recursion relation for :
where . The above equation expresses the fact that the walker can proceed to the next step if at least one fitness value greater than the current fitness is available which occurs with a probability . The walk length distribution that exactly steps are taken is related to according to the following relation :
This is because in order to terminate the walk at the th step, none of the mutant fitnesses at the next step should exceed the fitness at step . In the following, we set the initial fitness to be zero, which ensures that the walker does not start at a local fitness maximum.
3 Transition in the behavior of walk length
Using a scaling analysis and extreme value theory, we now show that the qualitative behavior of walk length distribution changes at . We find that the walk length distribution vanishes for but remains finite for as . We note that the behavior of discussed above for is in accordance with our result.
where . A generating function for the distribution can be calculated (see (14)) which shows that is finite. Thus from (4), it immediately follows that as for all . Our numerical results in Fig. 2 for show that for , the distribution and for , . Thus the distribution decreases with increasing .
For , the sum in (1) can not be replaced by an integral as the mean of the distribution is infinite. For such fat-tailed distributions, the sum of random variables is dominated by the largest value amongst them [13, 14]. If at most one fitness exceeds , we have or for any . Using this result in the recursion equation (3) and changing the variable to , we find that for ,
where the proportionality constant depends on and is omitted for brevity. Since the distribution for large is writeable as
it follows that for large , the fitness distribution at the th step of adaptive walk is of the following scaling form:
4 Walk length cumulants for fitness distributions with finite mean
For , the probability that the walk terminates at the th step is zero or in other words, the walk goes on indefinitely for infinitely long sequences and hence the mean number of adaptive steps diverges with . We now show that all the walk length cumulants increase logarithmically with .
where prime denotes a -derivative. The boundary conditions are given by 
As (9) is non-diagonal in , we work with a generating function which obeys the following second order differential equation:
The above differential equation does not appear to be exactly
solvable due to the factor on the RHS. As this cumulative
probability decreases from one to zero with increasing , we
consider (11) by approximating
)^-1κ ≈ 1 , f ¡ ~f
r(f) , f ¿ ~f where as found earlier. Equation (11) has been solved by choosing in  for and . Here we show that the leading order behavior of the cumulants does not depend on the choice of . For , as a result of (4), we have
whose solution is of the form where
and the constants can be determined using the boundary conditions (10) to finally yield
whose solution is of the form
where the functions obey (15) and are constants. In order to compute the walk length cumulants for large , it is sufficient to find the dependence of . This can be done by matching the solutions and and their first derivative at and we find
where are independent of .
where the integral
is independent of which can be seen using the upper bounds namely for and infinity for . Since for any , on taking the limit in (21), we find that the walk length distribution vanishes as discussed earlier.
The fact that is independent of leads to a considerable simplification of the problem and allows us to find the cumulants to leading order in sequence length. The th cumulant is defined as 
where . As the first term on the RHS of (21) decays less rapidly than the second term for any , we have . Using this, we immediately obtain the cumulants to leading order in as
where . Thus we find that all the walk length cumulants increase logarithmically with . The first three cumulants computed using the last expression are given by
In the limit , all the above cumulants are equal to in agreement with the results for random adaptive walk . We also recover the previous results for uniformly and exponentially distributed fitnesses . Equations (24) and (25) also match the results of  in which a fixed set of mutants during the entire walk is assumed. In contrast, we have considered a more realistic mutation scheme in which a novel set of mutants are available to the population at each adaptive step. The above expressions for and have also been seen in a deterministic model of evolution  and a relationship of this model to adaptive walks has been recently elucidated . Figure 3 shows that our expressions (24)-(26) agree very well with the numerical results.
In this article, we studied a biologically realistic model of adaptation and showed that to leading orders in , the average walk length is a constant for fitness distributions with infinite mean but increases logarithmically with the sequence length otherwise. Our analytical results agree well with the numerical simulations.
Our broad theoretical result that the adaptive walks are short (see Fig. 3) is consistent with the experiments on microbes  and fungus  in which adaptive substitutions have been observed. However more detailed experimental studies are needed to test our predictions. Our result (24) shows that the walk should last longer in systems with smaller . This may be checked by measuring the mean walk length in populations with ,  and . To find the dependence of walk length properties on , varying the sequence length may not be experimentally viable but it should be possible to set up experiments along the lines of  and vary the initial fitness rank. If the initial ranks are of the order , we expect our analysis to hold . Experimental data for the walk length distribution showing insensitivity to the initial rank would then imply an underlying fat-tailed fitness distribution with infinite mean.
The author thanks J. Krug for useful comments on the manuscript.
- H. A. Orr. Nat. Rev. Genet., 6:119–127, 2005.
- S.F. Elena and R.E. Lenski. Nat. Rev. Genet., 4:457–469, 2003.
- K. Jain and J. Krug. Genetics, 175:1275, 2007; K. Jain, J. Krug, and S.-C. Park. Evolution, 65:1945, 2011.
- K. Jain and J. Krug. J. Stat. Mech.: Theor. Exp., page P04008, 2005; D.B. Saakian and C.-K. Hu. Proc. Natl. Acad. Sci. USA, 103:4935-4939, 2006.
- P.J. Gerrish and R.E. Lenski. Genetica, 102:127–144, 1998; S.-C. Park and J. Krug. Proc. Natl. Acad. Sci. USA, 98:18135–18140, 2007.
- J. Maynard Smith. Nature, 225:563, 1970.
- J. H. Gillespie. Oxford University Press, Oxford, 1991.
- H. A. Orr. Evolution, 56:1317–1330, 2002; H. A. Orr. Evolution, 60:1113, 2006.
- D.R. Rokyta, P. Joyce, S.B. Caudle, and H.A. Wichman. Nat. Genet., 37:441–444, 2005.
- K. Jain and S. Seetharaman. arXiv:1104.5583 (to appear in Genetics)
- J. Neidhart and J. Krug. arXiv:1105.0592 (to appear in Phys. Rev. Lett.)
- P. Joyce, D.R. Rokyta, C. J. Beisel and H.A. Orr. Genetics, 180:1627-1643, 2008.
- J.-P. Bouchaud and A. Georges. Phys. Rep., 195:127-293, 1990.
- D. Sornette. Springer, Berlin, 2000.
- A. Eyre-Walker and P.D. Keightley. Nat. Rev. Genet., 8:610, 2007.
- D.R. Rokyta, C. J. Beisel, P. Joyce, M. T. Ferris, C. L. Burch, and H.A. Wichman. J Mol Evol, 69:229, 2008.
- H. A. Orr. J. theor. Biol., 220:241–247, 2003.
- H. Flyvbjerg and B. Lautrup. Phys. Rev. A, 46:6714–6723, 1992.
- C. Sire, S.N. Majumdar and D.S. Dean. J. Stat. Mech., L07001, 2006.
- S.E. Schoustra, T. Bataillon, D.R. Gifford and R. Kassen. PLoS Biol., 7:e1000250, 2009.