Universal asymptotic clone size distribution for general population growth
Abstract
Deterministically growing (wildtype) populations which seed stochastically developing mutant clones have found an expanding number of applications from microbial populations to cancer. The special case of exponential wildtype population growth, usually termed the LuriaDelbrück or LeaCoulson model, is often assumed but seldom realistic. In this article we generalise this model to different types of wildtype population growth, with mutants evolving as a birthdeath branching process. Our focus is on the size distribution of clones  that is the number of progeny of a founder mutant  which can be mapped to the total number of mutants. Exact expressions are derived for exponential, powerlaw and logistic population growth. Additionally for a large class of population growth we prove that the long time limit of the clone size distribution has a general twoparameter form, whose tail decays as a powerlaw. Considering metastases in cancer as the mutant clones, upon analysing a dataset of their size distribution, we indeed find that a powerlaw tail is more likely than an exponential one.
Keywords:
LuriaDelbrück Branching process Clone sizeCancer∎
1 Introduction
Cancerous tumours spawning metastases, bacterial colonies developing antibiotic resistance or pathogens kickstarting the immune system are examples in which events in a primary population initiate a distinct, secondary population. Regardless of the scenario under consideration, the number of individuals in the secondary population, and how they are clustered into colonies, or clones, is of paramount importance. An approach which has offered insight has been to bundle the complexities of the initiation process into a mutation rate and assume that the primary, or wildtype, population seeding the secondary, or mutant, population is a random event.
This method was pioneered by microbiologist Salvador Luria and theoretical physicist Max Delbrück Luria and Delbrück (1943). In their Nobel prize winning work, they considered an exponentially growing, virus susceptible, bacterial population. Upon reproduction, with small probability, a virus resistant mutant may arise and initiate a mutant clone. This model was contrasted with each wildtype individual developing resistance upon exposure to the virus with a constant probability per individual. By considering the variance in the total number of mutants in each case they demonstrated that bacterial evolution developed spontaneously as opposed to adaptively in response to the environment.
In the original model of Luria and Delbrück, both wildtype and mutant populations grow deterministically, with mutant initiation events being the sole source of randomness. Lea and Coulson Lea and Coulson (1949) generalised this process by introducing stochastic mutant growth in the form of the pure birth process and were able to derive the distribution of the number of mutants for neutral mutations. This was again extended by Bartlett and later Kendall Bartlett (1955); Kendall (1960), who considered both populations developing according to a birth process. An accessible review discussing these formulations is given by Zheng Zheng (1999).
Recent developments have focused on cancer modelling, where usually mutant cell death is included in the models. The main quantity of interest in these studies has been the total number of mutant cells. Explicit and approximate solutions appeared for deterministic, exponential wildtype growth, corresponding to a fixed size wildtype population Angerer (2001); Dewanji et al (2005); Iwasa et al (2006); Komarova et al (2007); Keller and Antal (2015), and fully stochastic wildtype growth either at fixed time or fixed size Durrett and Moseley (2010); Antal and Krapivsky (2011); Kessler and Levine (2015). An exciting recent application has been to model emergence of resistance to cancer treatments Kessler et al (2014); Bozic et al (2013); Bozic and Nowak (2014). The current study continues in this vein with our inspiration being primary tumours (wildtype) seeding metastases (mutant clones).
Interestingly, in the large time small mutation rate limit, the clone size distribution at a fixed wildtype population size coincide for stochastic and deterministic exponential wildtype growth Kessler and Levine (2015); Keller and Antal (2015). The intuition behind this observation is that a supercritical birthdeath branching process converges to exponential growth in the large time limit, and, for a small mutation rate, mutant clones are initiated at large times. So asymptotically the two methods are equivalent, but the deterministic description of the wildtype population has twofold advantages: (i) the calculations are much simpler in this case Keller and Antal (2015), and (ii) the method can be easily generalised to arbitrary growth functions. This is the programme that we develop in the present paper.
The present work differs from previous approaches in two ways. Firstly, motivated by populations with environmental restrictions, we move away from the assumption of exponential wildtype growth, a setting which has received limited previous consideration as discussed in Foo and Michor (2014). We shall first review and extend results for the exponential case, and then provide explicit solutions for powerlaw and logistic growth. Next we present some general results which are valid for a large class of growth functions. This extends the classic results found in Kendall (1948); Athreya and Ney (2004); Karlin and Taylor (1981); Tavare (1987) and recent work in Tomasetti (2012); Houchmandzadeh (2015) who considered the wildtype population growth rate to be timedependent but coupled with the mutant growth rate. Secondly, rather than the total number of mutants, our primary interest is on the distribution of mutant number in the clones initiated by mutation events. This complements Hanin et al (2006), which allowed deterministic wildtype and mutant growth, and the treatment of clone sizes for constant wildtype populations found in Dewanji et al (2011). Whilst we focus on clone sizes, we demonstrate that the distribution for the total number of mutants follows as a consequence, and hence results hold in that setting also.
The outline of this work is as follows. We define our model in Section 2, utilising formalism introduced in Karlin and Taylor (1981), and demonstrate a mapping between the mutant clone size distribution and the distribution for the total number of mutants. The exact timedependent size distribution is given for exponential, powerlaw and logistic wildtype growth function in Section 4. Section 5 pertains to universal features of the clone size distribution and contains our most significant results. There, for a large class of wildtype growth functions, we demonstrate a general two parameter distribution for clone sizes at large times. The distribution has powerlaw tail behaviour which corroborates previous work Iwasa et al (2006); Durrett and Moseley (2010); Williams et al (2016). Large time results are also given for the mean and variance of the clone sizes under general wildtype growth. Adopting the interpretation of the wildtype population as the primary tumour and mutant clones as metastases, we test our results regarding the tail of the distribution on empirical metastatic data in Section 6. Section 7 considers alternative methods to ours and we give some concluding remarks in Section 8.
2 Model
In our model a wildtype population gives rise to mutants during reproduction events. The arisen mutant also reproduces and so mutant clones stem from the original initiating mutant’s progeny. In many applications, the wildtype population is significantly larger than the mutant clones and so we treat the wildtype population’s growth as deterministic, with size dictated by a timedependent function . The mutant clones are smaller in comparison and so their growth is stochastic. For logistic wildtype growth a sample realisation of the process is shown in Figure 1. The exact formulation is now given.
2.1 The birthdeath process
Stochastic growth of mutants will follow a birthdeath branching process Athreya and Ney (2004). Time is scaled such that each mutant has unit birth rate and death rate . A brief note on converting our results to the case when the birth rate is arbitrary is given in Appendix B. Let be the size of a population at time , with . The forward Kolmogorov equation for the distribution is given by
(1) 
with . Its solution in terms of the generating function, given on page 76 of Bartlett (1955), is
(2) 
Due to our timescale, is the probability of eventual extinction for a mutant clone for , and is the mutant fitness. When and so the stochastic proliferation follows a pure birth or Yule process, the mutants will be denoted immortal. By expanding the generating function around , we obtain for the probability of the population size being a geometric distribution with a modified zero term
(3) 
with the shorthand notation
(4) 
For the particular case of a critical branching process, i.e. when birth and death rates are equal, the above probabilities are simplified by observing
(5) 
2.2 Mutant clone size distribution
Here we employ standard methods as outlined in, for instance, Karlin and Taylor (1981); Dewanji et al (2005). The system is observed at a fixed time and we let the number of wildtype individuals be denoted by for . Since mutants are produced by wildtype individuals, the rate of mutant clone initiations will be proportional to the product of and the mutation rate . More precisely, the process of clone initiations is an inhomogeneous Poisson process Karlin and Taylor (1998) with intensity . Let the Poisson random variable denote the number of clones that have been initiated by , which has mean
(6) 
Now, assuming , we consider a mutant clone sampled uniformly from the initiated clones and denote its size to be the random variable . The clone was initiated at the random time and as we must have , the density of is given by
(7) 
Where
(8) 
is the expected number of clones seeded when the mutation rate is unity. The size of the clone is dictated not only by the initiation time but also by its manner of growth, here the birthdeath process. Hence, by conditioning on the arrival time, we have
(9) 
An immediate consequence is that the generating function of the clone size is given by
(10) 
where is the generating function of the birthdeath process (2).
We make the following remarks on the above. (i) The mutation rate does not appear in the density for initiation times in (7), hence mutant clone sizes are independent of the mutation rate and thus all following results regarding clone sizes will be also. (ii) The integral in (9) is a convolution and as convolutions commute we may swap the arguments of the integrand functions (). (iii) If we start with wildtype individuals, so the wildtype follows , then both the numerator and denominator in (7) will have a factor of , which cancel. So henceforth, apart from when (used occasionally for analytic convenience), we set without loss of generality. (iv) By similar logic, a positive random amplitude for the wildtype growth function, i.e. for a general positive random variable , would also cancel and so our results on clone sizes hold in that case also.
3 Mapping distributions: clone size to total mutant number
This section is related to the classic LuriaDelbrück problem. Let be the total number of mutants existing at time . Then is the sum of generic clones
(11) 
where all are iid random variables specifying the clone sizes. As such, is a compound Poisson random variable, and hence its generating function is
(12) 
which can be derived by conditioning on . It follows that
(13) 
The link between the mass functions of the mutant clone size, , and the total number of mutants, , is given by
(14) 
This relationship may be found as Lemma 2 in Zheng (1999) and a short proof is provided for convenience in Appendix B, Lemma B.1. Hence while we may initially work in the setting of size distribution of a single clone, by the above discussion, results are transferable to the total number of mutants case.
Often longtime results are sought, which significantly reduces the complexity of the distributions. For any fixed positive mutation rate, in the longtime limit, an infinite number of clones will have been initiated, and thus the probability distributions of will not be tight Durrett (1996). A common solution to this problem is the Large PopulationSmall Mutation limit Keller and Antal (2015), where is kept constant. Then for exponential wildtype growth (or exponentialtype, see Section 5) the expected number of initiated clones, , tends to for large times. Hence we see that
(15) 
demonstrating that the limit of the clone size distribution is of primary concern. Furthermore, if the expected number of initiated clones is small, we have the following proposition, whose proof can be found in Appendix B.
Proposition 1.
For a small expected number of initiated clones, conditioned on survival, the size of a single clone and the total number of mutants are approximately equal in distribution. That is,
(16) 
One immediate consequence of this result is that for immortal mutants () and we have
(17) 
This agrees with intuition as for small enough , we expect only 0 or 1 clones to be initiated and hence the total number of mutants will be dictated by the clone size distribution. With exponential wildtype growth this approximation was used in Iwasa et al (2006) to investigate drug resistance in cancer.
4 Finite time clone size distributions
Three particular cases of wildtype growth function, , will be considered in detail, namely: exponential, powerlaw and logistic. Exponential and logistic growth are widely used in biological modelling Murray (2002). For the powerlaw cases, under the assumption that the radius of a spherical wildtype population is proportional to time, quadratic and cubic powerlaw growth represents mutation rates proportional to the surface area and volume respectively. In each case we give the generating function and probability mass function. We stress again that the mutation rate and an arbitrary positive prefactor for cancel in (9) and so are irrelevant for our results.
4.1 Exponential wildtype growth
Let the wildtype population grow exponentially, that is with and so from (8), . The distribution for the total number of mutants, , was treated exhaustively in Keller and Antal (2015) and we follow their notation by letting . Using (12) and the results found in section 3 of Keller and Antal (2015), the generating function is
(18) 
Similarly the mass function is
(19) 
and for
(20) 
Here is Gauss’s hypergeometric function and is the Pochhammer symbol defined in Appendix A. The above expressions are given in terms of to allow easy comparison to the formulas in Keller and Antal (2015). For these exact timedependent formulas, their form is somewhat cumbersome, however simpler longtime limit expressions are given in Section 5. A reduction in complexity is also obtained for the special case of neutral mutants () where, by using (66), the generating function in (18) simplifies to
If additionally the neutral mutants are immortal, the above expression further simplifies to
The probabilities are then concisely given by
which corresponds to the classical LeaCoulson result Lea and Coulson (1949)
with .
4.2 Powerlaw wildtype growth
Now we assume that the wildtype population grows according to a general powerlaw , for some nonnegative integer , and therefore . With powerlaw wildtype growth and stochastic mutant proliferation, the mutant clone size generating function is given by
(21) 
Here is the polylogarithm of order defined in Appendix A. Details of the derivation are given in Appendix C. For immortal mutants, the mass function may be explicitly written as
(22) 
If mutants may die, the exact mass function is most easily obtained via Cauchy’s integral formula which may be efficiently computed using the fast Fourier transform. For a brief discussion on implementation see Antal and Krapivsky (2010) and references therein.
Note for , which, while useful for analytic tractability, is unrealistic. This can be overcome by letting . Then, by splitting the integral in the generating function (10) and using the above analysis, one can obtain the mass function for any . However for practical purposes the contribution of is negligible.
4.3 Constant size wildtype
For the specific powerlaw growth when , i.e. (recall that this is equal to the general case when ), we recover some classical results for constant immigration Kendall (1948). We note that the distribution of the ordered clone size, depending on initiation time, was discussed in Jeon et al (2008). From (21) with , the generating function is
(23) 
with as given in (4). By expanding this generating function in terms of we obtain the probabilities
(24) 
Then using (12) with the clone sizes (23) we obtain the generating function of the total number of mutants
and from the binomial theorem we also get the probabilities
We recognise this as a negative binomial distribution under the interpretation that is the number of failures until successes, with failure probability . This result for was first derived by Kendall Kendall (1948) who was attempting to explain the appearance of the logarithmic distribution for species number when randomly sampling heterogeneous populations, conjectured by R.A. Fisher. From the distribution of , by an argument which may be considered a special case of Proposition 1, he derived that for constant rate initiation, the clone size conditioned on nonextinction is logarithmically distributed again with parameter , which can be obtained via (24).
Constant immigration may imply a constant size source, hence mutants with equal birth and death rates (i.e. evolving as a critical branching process) are particularly interesting. This case yields analogous formulas to those above but is replaced with the expression given in (5).
4.4 Logistic wildtype growth
Starting from a population of one and having a carrying capacity , logistic growth is given by . We assume neutral mutations, i.e. is also the wildtype growth rate. Integrating the growth function gives .
We aim to calculate the generating function using (10). Recalling the definition of we observe that
(25) 
where is an integration constant. Therefore the generating function is
(26) 
Agreeing with intuition for we recover the generating function of the constant case, and gives the generating function for exponential wildtype growth. Therefore the logistic case interpolates between the constant and exponential growth cases. The mass function can be obtained by expanding the nonlogarithmic and logarithmic function in and using the Cauchy product formula. However, this method provides little insight and numerically it is simpler to use the fast Fourier transform.
4.5 Monotone distribution and finite time cutoff
We conclude this section by demonstrating general features that exist in the clone size distribution at finite times. Again proofs are provided in Appendix C. Firstly we see that, regardless of the particular wildtype growth function, the monotone decreasing nature of the mass function for the birthdeath process is preserved in the clone size distribution.
Proposition 2.
As long as is positive for some subinterval of , then for we have for any finite .
Whether depends on and , but the inequality is typically true for long times. Note that in contrast, the mass function of the total number of mutants is not monotone in general Keller and Antal (2015).
Now restricting ourselves to the case, as an example, consider the mass function when the size of the wildtype population is constant, which is given by (24), and specifically for . For any moderate , is typically close to unity but for large , will become the dominant term in the mass function, dictating exponential decay. We term this a cutoff in the distribution which occurs at approximately . It is an artifact of the mass function for the birthdeath process (3). Hence we will have (at least) two behaviour regimes for the mass function for finite times. Here we show that this cutoff exists generally for finite times.
Theorem 4.1.
Let and be continuous and positive for . Then
(27) 
where is an unspecified subexponential factor, i.e. , and is given by (4).
Note that for finite , and exponentially fast for large . Hence the cutoff will disappear for long times and the subexponential factor, discussed in detail in Section 5, will completely determine the tail of the distribution. Also notice that the powerlaw cases, , for are not covered as, to make the analysis tractable, they artificially start at . However the generating function in this case (21) also has its closest to origin singularity at so the cutoff exists there also.
5 Universal large time features
Here we give results regarding the large time behaviour of our model which is relevant in many applications and also provides general insight. In many applications the cutoff location ()) is so large that the distribution at or above this point is of little relevance and hence for this purpose the limiting approximations now discussed are of particular interest. Using the notation of Theorem 4.1, this section investigates the large time form of . The proofs for the results presented in this section can be found in Appendix D. We highlight the powerlaw decaying, “fat” tail found in each case. Henceforth we again assume , i.e. a supercritical birthdeath process.
5.1 General wild type growth functions
To give general results we introduce the following assumption which will be assumed to hold for the remainder of this section.
Assumption 1.
For wildtype growth function we assume

for , continuous for and right continuous at .

is positive and monotone increasing for .

For the limit exists, is positive and finite.
We note that the cases discussed in Section 4 are all covered by Assumption 1. The reason for the seemingly arbitrary limit assumed in (iii) becomes clear with the following result which is an application of the theory of regular variation found in Bingham et al (1987).
Lemma 1.
For
(28) 
Often the longtime behaviour of the clone size distribution may be separated into and , and so we give the following definition Flajolet and Sedgewick (2009).
Definition 1.
Consider a real valued function such that
(29) 
holds for some . Then is of exponentialtype for and is subexponential for .
Simple examples of subexponential functions are , , while , are of exponentialtype, with .
5.2 Mean and variance
We now address the asymptotic properties of the clone size distribution by first discussing its mean and variance.
Theorem 5.1.
With subexponential functions such that
(30) 
as .
5.3 Large time clone size distribution
Turning to the distribution function we have the following result regarding the generating function at large times.
Theorem 5.2.
Let . Then for
(31) 
This result is made clearer in the next corollary, in which the cases of exponentialtype and subexponential growth are separated. This is as, for ,
(32) 
For a proof see Lemma D.1. Consequently in the exponentialtype setting, the limiting result is a proper probability distribution, whilst in the subexponential case it is not. We can interpret this as the clone sizes staying finite in the exponential case but grow to infinity for subexponential cases at large times. Henceforth, for brevity we do not impose such a separation but the reader should note that for exponentialtype growth the above limit holds and may simplify further results.
Corollary 1.
For ,
(33)  
(34) 
where the second expression is the limit of the first expression. Then for the probabilities for exponentialtype growth are
(35) 
and for subexponential growth ()
(36) 
The expressions obtained in the case also appeared as an approximation in Kessler and Levine (2015) for the total number of mutants with stochastic wildtype and mutant growth when the mean number of clones is small. This can now be interpreted as an application of Proposition 1.
The case of immortal mutants does not simplify the above expressions for subexponential growth, but for exponentialtype growth, by applying (65) then (64) to the limiting generating function, we have the following link to the YuleSimon distribution which appears often in random networks Simon (1955); Krapivsky and Redner (2001).
Corollary 2.
For immortal mutants with exponentialtype wildtype growth the clone size distribution follows a YuleSimon distribution with parameter for large times. That is, for ,
(37) 
and thus, for ,
(38) 
With immortal, neutral () mutants we have
(39) 
which is in agreement with the long time limit of (4.1). For immortal mutants and exponentialtype growth, as the clone size distribution tends to a YuleSimon distribution, we expect powerlaw tail behaviour at large times Newman (2005). Interestingly, we see that this behaviour holds when we include mutant death and have general wildtype growth.
Corollary 3.
At large times, the tail of the clone size distribution follows a powerlaw with index . More precisely,
(40) 
5.4 Large time distribution for total number of mutants
Finally, to conclude this section we give the corresponding results for the total number of mutants in the often used Large PopulationSmall Mutation limit.
Theorem 5.3.
Letting be constant and with subexponential functions as in Theorem 5.1
(41) 
as . For
(42) 
and we have the following tail result
(43) 
6 Tail behaviour in empirical metastatic data
Given the above discussion we expect, for a large class of wildtype growth functions, to see power tail behaviour on approach to the exponential cutoff in the clone size distribution. We take the first steps to verify this theoretical hypothesis by analysing an empirical metastatic data. In this setting the wildtype population is the primary tumour and mutant clones are the metastases.
Our data is sourced from the supplementary materials in Bozic et al (2013). This data is taken from 22 patients; 7 with pancreatic ductal adenocarcinomas, 11 with colorectal carcinomas, and 6 with melanomas. One patient had only a single metastasis so we discard this data. Of the 21 remaining patients the number of cells in a single metastasis ranged from to . Our theoretical model predicts a cutoff in the distribution around . Taking some sample parameters from the literature, namely /day Diaz Jr et al (2012), and years Yachida et al (2010), this leads to a cutoff around cells. Due to the enormity of this value we ignore the cutoff here. Additionally, as the minimum observed metastasis size is cells, we assume that all data points are sampled from the tail of the distribution.
For each of the datasets we examine the likelihood ratio to determine if the data is more likely sampled from a powerlaw decaying or geometrically decaying distribution. 19 of the 21 dataset return the powerlaw hypothesis as more plausible which is in agreement with the theoretical prediction. Both are single parameter distributions and maximum likelihood analysis was utilised to estimate the parameters. The methodology outlined in Clauset et al (2009) was broadly followed and brief details regarding calculating maximum likelihood estimates (MLEs) are given in Appendix E. We note that in this context the likelihood ratio point esimator returns equivalent results to the Akaike information criterion widely used in model selection Burnham and Anderson (1998). Under the powerlaw model, , for 20 of the 21 datasets we find the point estimate of the powerlaw index, , lies in . The outlier comes from the smallest dataset (3 metastases). Due to the small size of datasets, we recognise the influence of statistical fluctuations.
Details of the likelihood ratio are as follows. Let be a dataset of size . We test the hypothesis that y is drawn from a powerlaw distribution, , versus that it is sampled from a geometric distribution, , where are normalising constants and is the parameter for the geometric distribution. The loglikelihood ratio is
(44) 
where gives support to the hypothesis that the data is drawn from the powerlaw distribution with MLE exponent , over the geometric distribution with MLE parameter . The results are given in Figure 4(a).
Assuming a powerlaw distribution the maximum likelihood estimates for the exponent for each dataset are given in Figure 4(b). Due to the small sample size of our datasets and the high variance in the distribution, we do not derive confidence intervals via normal distribution approximations. Instead we show the normalised loglikelihood, , for our best dataset, with , in Figure 4(c), where is the likelihood function. Also, following Hudson (1971), we demonstrate the likelihood interval defined as
(45) 
If a large sample size was possible this interval would correspond to a confidence interval. For the dataset with we numerically determined , demonstrated as the domain between the vertical bars in Figure 4(c).
7 Alternative approaches
7.1 Deterministic approximation
In order to circumvent the complexity introduced by the birthdeath process one might be tempted to simply assume the mutant clone size grows according to , the mean of the birthdeath process. This approach corroborates our results regarding the tail of the size distribution. Indeed, the clone size density may be found to be
(46) 
which has support . This formula can also be found in Hanin et al (2006). Then, as in Section 5 under Assumption 1,
(47) 
Thus, asymptotically the density has the same behaviour as the tail of the limiting result given in Corollary 3, but with a different amplitude.
However despite this agreement the densities given by (46) for specific wildtype growth function differ significantly compared with stochastic mutant proliferation. Letting be the clone size distribution with stochastic mutant growth and be its deterministic approximation specified by (46), we may quantify the approximation error, at least for the moments, by the following theorem, whose proof can be found in Appendix F.
Theorem 7.1.
As
(48) 
7.2 Timedependent rate parameters
Some authors Houchmandzadeh (2015); Tomasetti (2012) have previously considered the case where all rates in the system are multiplied by a timedependent function, say . This is relevant in the scenario where both the wildtype and mutant populations have their growth restricted simultaneously by environmental factors, for example exposure to a chemotherapeutic agent. We observe that under a change of timescale this system is equivalent to our setting with exponential wildtype growth. This is due to the following argument.
In this setting the wildtype population is governed by
(49) 
Mutant clones are now initiated at a rate . Let be the size of a mutant population governed by the birthdeath process with timedependent rates. Once initiated, the size distribution obeys the forward Kolmogorov equation for timedependent stochastic mutant proliferation
(50) 
If we let
(51) 
then under a new timescale, , the mutant clone initiation will occur at a rate . Further, using the chain rule to express (49) and (50) in terms of we see that and that the forward Kolmogorov equation (50) becomes (1). Thus, under a timerescaling, all dynamics are equivalent to the system with exponential wildtype growth and stochastic mutant proliferation with constant birth and death rates, as studied in this article or in Keller and Antal (2015).
7.3 Poisson process characterisation of tail
Complementing Corollary 3 in Section 5, following Tavare (1987), we can also describe the size distribution for large clones at long times via a Poisson process in the following way. Let be independent copies of the birthdeath process as in Section 2 and be the points of a of Poisson process with intensity , for . The represent the clone arrival times and so is the number of less than or equal to .
Let us consider the size of the first clone. By known results about the large time behaviour of the birthdeath process Athreya and Ney (2004), as ,
(52) 
The distribution of the limiting random variable is composed of a point mass at and an exponential random variable, precisely
(53) 
Analogously, with the details given in Tavare (1987) (Theorem 3), the limiting behaviour of the timeordered clone sizes is given by
(54) 
where is as before and all are iid. The random sequence takes nonnegative real values, however if we restrict our attention to only the positive elements (that is clones that do not die), then these can be taken to be points from a nonhomogeneous Poisson process. More precisely, the set are the points (in some order) from a Poisson process on with mean measure
(55) 
The proof of the above only requires minor modification from that of Theorem 4 in Tavare (1987).
The Poisson process description of the large clones, at large times can also offer insight into further properties of the system, including links to the PoissonDirichlet distribution, see Tavare (1987); Durrett (2015). With regards to the present article, the interesting point is that for fixed , as the number of is finite almost surely, we may sample unformly from this set (i.e. ) and construct a random variable with distribution
(56) 
where is as in (55). The new variable can be related to the previously considered random variable by the following result, whose proof is contained in Appendix F.
Theorem 7.2.
For , with as above,
(57) 
Of note is the reappearance of powerlaw behaviour with a cutoff in the density of . For example in the constant wildtype case, , the density, using (55), is given by
(58) 
For exponential growth with neutral mutants, ,
(59) 
Note that the exponents in the powerlaw terms is equal to that given in Corollary 3, indicating the two approaches are complimentary.
8 Discussion
In this study we focus on the size distribution for mutant clones initiated at a rate proportional to the size of the wildtype population. The size of the wildtype population is dictated by a generic deterministic growth function and the mutant growth is stochastic. This shifts the focus from previous studies which have mostly been concerned with exponential, or mean exponential, wildtype growth, and considered the total number of mutants. Results for the total number of mutants are, however, easily obtained from the clone size distribution.
The special cases of exponential, powerlaw and logistic wildtype growth were treated in detail, due to their extensive use in models for various applications. Utilising a generating function centred approach, exact timedependent formulas were ascertained for the probability distributions in each case. Regardless of the growth function, the mass function is monotone decreasing and the distribution has a cutoff for any finite time. This cutoff goes to infinity for large times and is often enormous in practical applications, hence we focused on the approach to the cutoff.
We found that the clone size distribution behaves quite distinctly for exponentialtype versus subexponential wildtype growth. Although the probability of finding a clone of any given size stays finite as for exponentialtype growth, it tends to zero for subexponential type. Despite these differences, with a proper scaling, for a large class of growth functions we proved that the clone size distribution has a universal longtime form. This longtime form possesses a powerlaw, “fat” tail which decays as for subexponential wildtype growth, but faster for exponentialtype growth. This can be intuitively understood as the tail distribution represents clones that arrive early, and the chance that a clone is initiated early in the process is larger for a slower growing wildtype function. Hence we expect a “fatter” tail in the subexponential case.
Note that although we consider subexponential wildtype growth, surviving mutant clones will grow exponentially for large time, which can be unrealistic in some situations. Stochastic growth which accounts for environmental restrictions, for instance the logistic branching process, introduces further technical difficulties and is left for future work. We do note that, despite the drawbacks of deterministic mutant growth as discussed in Section 7, when both the wildtype and mutant populations grow deterministically as , it is easy to see that for large times the clone size distribution still displays a powerlaw tail,
An underlying motivation for this work is the scenario of primary tumours spawning metastases in cancer. We test our hypothesis regarding a powerlaw tail in metastasis size distributions by analysing empirical data. For 19 of 21 datasets the powerlaw distribution is deemed more likely than an exponentially decaying distribution. The exponent of the powerlaw decay was estimated in each case and found to lie between 1 and 2. Interpreting this in light of our theory, either the primary tumour had entered a subexponential growth phase or, if one assumes exponential primary growth, the metastatic cells had a fitness advantage compared to those in the primary. Either way we can conclude that, for the majority of patients, the metastases grew faster than the primary tumour.
Acknowledgments
We thank Peter Keller, Paul Krapivsky, Martin Nowak, Karen Ogilvie, Bartlomiej Waclaw and Bruce Worton for helpful discussions. MDN acknowledges support from EPSRC via a studentship.
Appendix A Special functions, definitions and requisite results
Required definitions and identities taken from DLMF (2016) unless otherwise stated.
With the polylogarithm of order is defined as
(60) 
Note that . A required identity (from Weisstein (2016)) is
(61) 
for . Here are the Stirling numbers of the second kind.
Gauss’s hypergeometric function also appears and for complex is defined by the power series
(62) 
and by analytic continuation elsewhere. Here denotes the Pochhammer symbol or rising factorial, that is
(63) 
Some required identities for the hypergeometric function are:
(64) 
(65) 
(66) 
and the following connection can be made to the incomplete betafunction
(67) 
For any analytic function , we denote the nh coefficient as
Theorem A.1 (Flajolet and Sedgewick (2009): Exponential Growth Formula).
If is analytic at and is the modulus of a singularity nearest the origin in the sense that Then the coefficient satisfies where .
We utilise several results from Bingham et al (1987) on the theory of regularly varying functions which we now define.
Definition 2.
Bingham et al (1987) A Lebesgue measurable function that is eventually positive is regularly varying (at infinity) if for some ,
(68) 
The notation will be used and we will denote as slowly varying functions.
Theorem A.2 (Bingham et al (1987): Characterisation Theorem).
Suppose is measurable, eventually positive, and