Cancer initiation with epistatic interactions between driver and passenger mutations
We investigate the dynamics of cancer initiation in a mathematical model with one driver mutation and several passenger mutations. Our analysis is based on a multi type branching process: We model individual cells which can either divide or undergo apoptosis. In case of a cell division, the two daughter cells can mutate, which potentially confers a change in fitness to the cell. In contrast to previous models, the change in fitness induced by the driver mutation depends on the genetic context of the cell, in our case on the number of passenger mutations. The passenger mutations themselves have no or only a very small impact on the cell’s fitness. While our model is not designed as a specific model for a particular cancer, the underlying idea is motivated by clinical and experimental observations in Burkitt Lymphoma. In this tumor, the hallmark mutation leads to deregulation of the MYC oncogene which increases the rate of apoptosis, but also the proliferation rate of cells. This increase in the rate of apoptosis hence needs to be overcome by mutations affecting apoptotic pathways, naturally leading to an epistatic fitness landscape. This model shows a very interesting dynamical behavior which is distinct from the dynamics of cancer initiation in the absence of epistasis. Since the driver mutation is deleterious to a cell with only a few passenger mutations, there is a period of stasis in the number of cells until a clone of cells with enough passenger mutations emerges. Only when the driver mutation occurs in one of those cells, the cell population starts to grow rapidly.
keywords:Cancer, modeling, somatic evolution, population dynamics
Tumors develop by accumulating different mutations within a cell, which affect the cell’s reproductive fitness (Armitage and Doll, 1954; Lengauer et al., 1998; Hanahan and Weinberg, 2000; Michor et al., 2004; Wodarz and Komarova, 2005; Sjöblom et al., 2006; Greenman et al., 2007; Wood et al., 2007; Jones et al., 2008; Attolini and Michor, 2009; Parmigiani et al., 2009; Gerstung and Beerenwinkel, 2010). As Bozic et al. (2010), we refer to the fitness of a mutated cell as the ratio between a cell’s rate to proliferate and the cell’s rate of apoptosis compared to wild type cells. The higher the fitness, the more likely it is for the cell to proliferate. For high fitness values, the population of cells growths very fast and stochastic effects play a minor role. In our model, this can be thought of as the formation of a tumor.
However, many mutations have no impact on the cell’s fitness, e.g. mutations not affecting coding or regulatory sequences. Other mutations may lead to a fitness disadvantage, which implies that the cell’s risk of apoptosis is higher than its chance of proliferation. However, the same mutations in combination with other mutations within the same cell might lead to a large fitness advantage.
We were motivated by genetic studies in Burkitt Lymphoma, a highly aggressive tumor, where a single genetic alteration has an impact on a wide range of other genes, some of them affect cell growth while others induce apoptosis. More specifically, a chromosomal translocation between the MYC protooncogene on chromosome 8 and one of three immunoglobulin (IG) genes is found in almost every case of Burkitt Lymphoma (Richter et al., 2012; Allday, 2009; Hummel et al., 2006; Sander et al., 2012). This leads to deregulated expression of the MYC RNA and in consequence, to deregulated MYC protein expression. The MYC protein acts as a transcription factor and has recently been shown to be a general amplifier of gene expression (Nie et al., 2012; Lin et al., 2012), targeting a wide range of different genes. Most importantly, MYC expression induces cell proliferation. In Burkitt Lymphoma, the IG-MYC fusion is evidently the key mutation for tumorigenesis (Salaverria and Siebert, 2011; Zech et al., 1976; Schmitz et al., 2014; Campo, 2012). However, MYC plays also a key role in inducing apoptosis (Pelengaris et al., 2002; Wang et al., 2011; Hoffman and Liebermann, 2008). Thus, the IG-MYC translocation alone would lead to cell death. Therefore, the IG-MYC translocation has to be accompanied by additional mutations, which deregulate the apoptosis pathways, such as mutations affecting e.g. TP53 or ARF (Richter et al., 2012; Allday, 2009; Sander et al., 2012). These additional mutations have probably only little direct impact on the cell’s fitness, since apoptosis is rare. Hence, these mutations cannot be considered as primary driver mutations in the context of Burkitt Lymphoma. However, in combination with the MYC mutation these additional mutations decrease the apoptosis rate. Consequently, the cells proliferate fast and the population grows accordingly, leading to tumorigenesis. Because all cells carry the MYC mutation in Burkitt Lymphoma, but fast growth does not start immediately with that mutation, it seems to confer its large fitness advantage only in a certain genetical context. Thus, interactions between different mutations may crucially affect the dynamics of cancer progression. Due to the fact that those additional mutations do not confer a direct fitness advantage, they cannot be considered as driver mutations. Nevertheless, at least some of them are necessary in order for the MYC mutation to become advantageous for the cell. Therefore, they cannot be regarded as true passenger mutations, either. Throughout this manuscript, we therefore call these additional mutations “secondary driver mutations”.
Besides Burkitt Lymphoma, epistatic effects in cancer initiation seem also to be relevant for other cancers. For example, we can think of the inactivation of a tumor suppressor gene discussed by Knudson in the context of retinoblastoma (Knudson, 1971). This inactivation is neutral for the first hit but highly advantageous for the second hit, and can hence be viewed as an interaction of genes (Nowak et al., 2002, 2004; Vogelstein and Kinzler, 2004; Iwasa et al., 2005). Another case is found in lung carcinomas, where activation of each of two oncogenes (SOX2 and PRKCI) alone is insufficient, but in concert they initiate cancer (Justilien et al., 2014). In other cases, there is clear evidence for sign epistasis: The ras family of proto-oncogenes is also discussed to underlie epistatic effects. Amplification of ras leads to senescence in the cell. Nevertheless, ras is a well known oncogenic driver gene. Hence, the ras mutation needs to be accompanied by other mutations (Elgendy et al., 2011; Serrano et al., 1997). Moreover, the difficulty to distinguish between drivers and passengers (Futreal, 2007; Fröhling et al., 2007) suggests that for a full understanding of cancer initiation it is insufficient to think of these two types of mutations only.
So far, most models have focused on the idea that passenger mutations have no effect or only a little effect, whereas each driver mutation increases the fitness of the cell (Michor et al., 2004; Beerenwinkel et al., 2007; Bozic et al., 2010; Gerstung and Beerenwinkel, 2010; Antal and Krapivsky, 2011; Reiter et al., 2013; Durrett et al., 2010; Datta et al., 2013). Other models focus on the neutral accumulation of mutations (Durrett et al., 2009; Luebeck and Moolgavkar, 2002). Moreover, different mutations are typically treated as independent, which is a strong assumption that will often not be fulfilled. In our model, mutations are interacting in an epistatic way (Wolf et al., 2000): the change in fitness induced by the driver mutation depends strongly on the genetic environment, i.e. in our case on the number of secondary driver mutations that are present in that cell. In addition we assume that the secondary driver mutations alone have almost no fitness advantage. Such a dependence between mutations can strongly affect the dynamics of cancer initiation. In evolutionary biology, epistatic systems are often analyzed regarding the structure or ruggedness of the landscape and the accessibility of different pathways (Weinreich et al., 2005; Franke et al., 2011; Szendro et al., 2013). The experimental literature also studies which factors can lead to epistasis (de Visser et al., 2011; Szappanos et al., 2011). Here, we are interested in the dynamics of such an epistatic model, which we illustrate by stochastic, individual based simulations. In addition, we derive analytical results for the average number of cells with different combinations of mutations and find a good agreement with the average dynamics in individual based computer simulations. Furthermore, we discuss the computation of the waiting time until cancer initiation. Our results show that the dynamics in such systems of epistatic interactions are distinct from previous models of cancer initiation (Michor et al., 2004; Beerenwinkel et al., 2007; Bozic et al., 2010; Gerstung and Beerenwinkel, 2010; Antal and Krapivsky, 2011; Reiter et al., 2013), which may have important consequences for the treatment of such cancers. While in previous models there is a steady increase in growth with every new mutation, in our model there is a period of stasis followed by a rapid tumor growth.
Of course, the biology of Burkitt Lymphoma is much more complex than modelled herein. To make the model more realistic one would have to distinguish between the different secondary driver mutations, since different genes contribute differently to the cells fitness, especially in a cell where the IG-MYC fusion is present. Our model is not aimed to realistically describe such a situation in detail. Instead, we focus on the extreme case of the so called all-or-nothing epistasis (Barrick and Lenski, 2013; Meyer et al., 2012) to illustrate its effect on the dynamics of cancer initiation. As there is no theoretical analysis of epistatic effects in cancer initiation so far, a well understood minimalistic model seems to be necessary in order to illustrate the potential impact of epistasis on cancer progression. Our minimalistic model clearly shows that epistasis can lead to a qualitatively different dynamics of cancer initiation.
2 Mathematical model
We analyze cancer initiation in a homogenous population of initially cells with discrete generations. In every generation, each of the cells can either die or divide. If a cell divides, its two daughter cells can mutate with mutation probabilities for the driver mutation and for secondary driver mutations (where the indicates that these would be called passenger mutations in closely related models). In principle, we could drop the assumption that these two mutation probabilities are independent on the cell of origin, but this would lead to inconvenient notation. We neglect back mutations and multiple mutations within one time step, because their probabilities are typically very small. Figure 1 summarizes the possible mutational pathways of the model. The variables denote the number of cells with or without the primary driver mutation ( or respectively), and secondary driver mutations.
A cell’s probability for apoptosis and proliferation depends on the presence of the primary driver mutation and on the number of secondary driver mutations it has accumulated. For cells with no mutations, the division and apoptosis probabilities are both equal to . This implies that the number of cells is constant on average as long as no further mutations occur. We assume that the initial number of cells is high and thus we can neglect that the population would go extinct (Haccou et al., 2005). For our parameter values, the expected extinction time of our critical branching process exceeds the average life time of the organism by far.
For cells without the primary driver mutation, each secondary driver mutation leads to a change in the cell’s fitness by . For cells with the primary driver mutation, the fitness advantage obtained with each secondary driver mutation is . The driver mutation increases both the apoptosis rate and the proliferation rate. The increase in the apoptosis rate is and the increase in the division rate is . With these parameters, the proliferation rate for cells with secondary driver mutations but without the primary driver mutation is
whereas the proliferation rate for such cells with the primary driver mutation is
The apoptosis rates, denoted as and are simply one minus the proliferation rate
Mutations occur at fixed rates and for primary and secondary drivers, respectively. For a long time, the overall fitness does not increase noticeably. For , it stays on average constant. Hence, the total number of cells stays approximately constant. Only when a cell with enough secondary driver mutations and also the primary driver mutation arises, the cell’s fitness is increased substantially beyond the fitness of other cells and its chance of proliferation is significantly increased. At that point, the total number of cells starts to increase rapidly, see Figure 2. In models presented in literature so far, the cell’s fitness is increased independently with every (driver) mutation (see e.g. Beerenwinkel et al., 2007; Bozic et al., 2010). Although the total number of cells increases exponentially, these models do not find a sudden burst in the number of cells.
In Figure 3, the total number of cells is subdivided into the number of cells with different numbers of mutations. The left panel presents the cells that have not acquired the primary driver mutation, the right one shows cells with the primary driver mutation. Cells with the primary driver mutation, but not enough secondary driver mutations, arise occasionally, but those cells die out quickly again – thus, their average abundance is small. Cells without the primary driver mutation do not die out, they also do not induce fast growth, cf. Figure 3. Only cells that have obtained enough secondary driver mutations and in addition acquire the primary driver mutation, divide so quickly that the population size increases rapidly.
The parameters in our figures have been chosen such that a cells acquires a substantial growth advantage once the primary driver mutation co-occurs with 4 secondary driver mutations. This event can occur at any time and hence, in some simulation the number of cells can increase very early, whereas in other simulations the number of cells does not undergo fast proliferation for many generations. Consequently, the rate of progression has an enormous variation. For the parameters from our figures, the time at which rapid proliferation occurs varied between and generations in 500 simulations. The distribution of these times is discussed in more detail below.
3.2 Analytical results
3.2.1 Average Number of Cells
We can calculate the average number of cells with a certain number of mutations at a given generation . The number of cells which do not have the primary driver mutation and secondary driver mutations (i.e. ) changes on average by means of the cell’s fitness and it decreases by the mutation rate
where . The solution of Equation (4) for , i.e. if the secondary driver mutations are not neutral, is
For the case , we take the limit of the -binomial coefficient (e.g. Kac and Cheung, 2002)
which is the result that is also expected if the secondary driver mutations are neutral and accumulated independently of each other.
Intuitively, the term describes the probability of obtaining exactly mutations in generations. There are different possibilities when the mutations happen, these possibilities are captured by the binomial coefficient . Thus, we have a growing polynomial term in and a declining exponential term in , since .
In the case of , the interpretation is similar. Here, additionally the fitness advantage for secondary driver mutations has to be taken into account. Since the number of cells with secondary driver mutations grows with , also the number of cells that can mutate grows. Hence, the factor is multiplied to the expression and the binomial coefficient turns into the -binomial coefficient.
|Mutation rate for secondary driver mutations|
|Mutation rate for the primary driver mutation|
|Probability for a cell without the primary driver mutation to not mutate|
|Probability for a cell with the primary driver mutation to not mutate|
|Fitness change of a secondary driver mutation (see (1), (2))|
|Fitness advantage of the primary driver mutation (see (1), (2))|
|Fitness disadvantage of the primary driver mutation (see (1), (2))|
|Fitness advantage of combination of a secondary driver and the primary|
|Fitness according to a secondary driver without the primary driver mutation|
|Fitness according to the combination of primary and secondary driver mutation|
|Fitness according to a primary driver mutation with no secondary driver|
For cells that have obtained the primary driver mutation and secondary driver mutations, the situation is slightly more complex. There are different possibilities on how to obtain secondary driver and the primary driver mutation, since some of the secondary driver mutations may have occurred before the primary driver mutation has been acquired whereas others may have occurred afterwards. Let denote the number of cells with the primary driver mutation and secondary driver mutations, when the primary driver mutation has happened in a cell with secondary driver mutations. Note that . The change in the number of cells now depends on . Using the abbreviations from Table 1 to simplify our notation, we have
To express the average number of cells in total we need to sum over all possible pathways,
if . The summation over indicates the different mutational pathways. An intuitive explanation of this somewhat lengthy equation is given in C.
Interestingly, the case for is much more challenging. The underlying problem is that the normal binomial coefficient cannot be expressed in a sum in the way the -binomial coefficient can be expressed (Koekoek et al., 2010),
When summing over all generations of the population with secondary driver mutations to derive the expression for the population of cells with secondary driver mutations, we have to calculate the sum
When we go further and try to calculate the expression for the population with secondary driver mutations, we need to apply this sum -times and hence we obtain a multi sum,
Only an analytical expression for this multi sum would allow a closed solution of the problem with . Also taking the limit of our expression for is a substantial mathematical challenge. However, we can use our solution for for arbitrarily small values of . Moreover, numerical considerations show that the result for is very close to the case of .
In Figure 4, the dynamics of the average number of cells with a certain number of mutations, is shown, both without and with the primary driver mutation. Simulation results for agree very well with the analytical result obtained for .
3.2.2 Distribution of time until cancer initiation
Next, let us calculate the distribution of the time it takes until rapid proliferation occurs. Since we use a multi type, time discrete branching process, we can make use of the probability generating function to recursively calculate the probability for a certain cell type cell to be present at a certain time (Haccou et al., 2005). Of particular interest is the probability that a tumor initiating cell is present, i.e. in the example above a cell with the primary driver mutation and 4 secondary driver mutations. Let be the probability generation function for the cell type in the branching process described above. The probability that there is no cell with the primary driver mutation and secondary driver mutations at time , when the process starts with one -cell, can be computed by recursively calculating , where
Starting with cells, we have to consider . Thus, the probability that there is at least one tumor cell at time is . If we are interested in the probability that there is at least one tumor cell exactly at time , we need to subtract the probability, that there has been a tumor cell before . Figure 5 shows the good agreement between simulations and the recursive calculation.
The time distribution for low follows a power law, as shown in the inset of Figure 5. The exponent of the power law is approximately . If all mutations were neutral, one would expect a lead coefficient of approximately 4 to accumulate five mutations, as derived by Armitage and Doll (1954). In our case, the curve increases slower. Numerical considerations show that the main reason for this is that, in contrast to (Armitage and Doll, 1954), we allow extinction: Many lineages that have accumulated mutations go extinct before the final, cancer causing mutation arises.
Most models in literature assume that each mutation leads to an independent and steady increase in the cells’ fitness (Beerenwinkel et al., 2007; Bozic et al., 2010; Gerstung and Beerenwinkel, 2010; Reiter et al., 2013; Michor et al., 2004). In this context, neutral passenger mutations have no causal impact on cancer progression. Only recently, some authors have considered passenger mutations not only as neutral byproducts of the clonal expansion of mutagenic cells, but as having a deleterious impact on the cells’ fitness (McFarland et al., 2013).
Here, we have described a model in which the fitness of the driver mutation strongly depends on the number of passenger mutations the cell has acquired. These passenger mutations, which we have termed secondary driver mutations, lead only to a small change in fitness or no change in fitness at all. As illustrated in Figures 2, 3, and 4, the number of cells stays roughly constant for a long time before it rapidly increases, despite the fact that mutations occur in the process permanently. This dynamic effect of cancer initiation is very different from models in which mutations do not interact with each other. We speculate that this kind of dynamics can have important implications for diagnosis and treatment. In principle, the dynamics presented in Figure 2 can also be the result of one highly advantageous, but very unlikely driver mutation. But in such a case, cells with the driver mutation should not be present in the population before tumorigenesis. This contradicts with current knowledge about the MYC translocation which has also been detected in humans without lymphoma (Müller et al., 1995). This effect is well captured by our model, as shown in Figure 3.
In some tumors, such as Burkitt Lymphoma, the neoplasms is only diagnosed after fast tumor growth has started. In this case, sequencing studies have shown that several mutations are present at the time of examination (Schmitz et al., 2012; Alexandrov et al., 2013; Richter et al., 2012; Love et al., 2012). Since the patients typically do not have any symptoms in before diagnosis of the cancer, it is possible that some mutations have virtually no direct impact on the cells fitness. Nevertheless, they are necessary for the initiation of the cancer, as they indirectly allow the driver mutation to initiate rapid cell growth. This agrees well with our epistatic model, where (nearly) neutral secondary driver mutations occur at a fixed rate before the cancer can be diagnosed.
Of course, not all mutations have such an epistatic effect on primary driver mutations, some might even be considered deleterious (McFarland et al., 2013). Nevertheless, our work shows that mutations that appear to be neutral in one context should not only be regarded as a neutral byproduct of the clonal expansion of mutagenic cells. Instead, in some cases passenger mutations can have a serious impact in cancer initiation, in particular when there are non-trivial interactions between different mutations. In this case the term “passenger” may not be the most appropriate one. To understand the impact of those interactions can be essential for a deeper understanding of the initiation of cancer.
We thank J. Richter for stimulating discussions on Burkitt Lymphoma and B. Werner for helpful comments on our manuscript. Generous funding by the Max-Planck Society is gratefully acknowledged. R.S. is supported the German Ministry of Education and Science (BMBF) through the MMML-MYC-SYS network Systems Biology of MYC-positive lymphomas (036166B)
Appendix A Analytic expression for the average number of cells without the primary driver mutation at generation
a.1 Secondary driver fitness advantage is unequal to zero - secondary driver mutations
We first consider the case without the primary driver mutation. We assume (and consequently ), as discussed in the main text. The rate change of cells with secondary driver mutations is
where . The solution of (16) is formulated in the following theorem:
For any integer , the number of cells with secondary driver mutations and no primary driver mutation is
If (17) is a solution, then it must satisfy (16). Since solutions for recursive functions are always unique, (17) would be the only solution. Hence, we proof (17) by inserting the equation on the right hand side of (16).
Appendix B Analytic expression for the average number of cells with the primary driver mutation at generation
We now turn to the cells which have obtained the primary driver mutation. As discussed in the main text, we only look at the case where the fitness change of the secondary driver mutation is not equal to zero, . While for cells without the primary driver mutation there is only one mutational pathway, cells with the primary driver mutation can be reached via different mutational pathways, because cells that get the primary driver mutation might have different amounts of secondary driver mutations. Hence, we need to sum over all those possible pathways. Let be the number of secondary drivers that are present in the cell which acquires the primary driver mutation. Then denotes the number of cells with the primary driver mutation and secondary driver mutations, when the primary driver mutation has happened in a cell with secondary driver mutations (). With this, the total number of cells with the primary driver mutation is
The change in the number of cells now depends on . We have
The solution of (22) is given by the following theorem:
The average number of cells with the primary driver mutation and secondary driver mutations, given that the primary driver mutation happens in a cell with secondary driver mutations, is given by
where the function is defined
For Equation (27), we have
For Equation (26), we need to insert the definition of
This concludes the proof for the case . Now we look at the case . We have
In order for this to be equal to , we need
Analogue to (26) this equation holds true if
By writing the summation as a -Pochhammer symbol, we have
This concludes the proof also for .
Appendix C Intuitive Description of Equation (11)
Here, we try to understand this equation in a more intuitive way. For each generation , the number of possibilities to distribute the secondary driver mutations over time steps is given by the -binomial coefficient . But the growth of the cells depends on the time when the secondary driver mutations are first acquired. Due to fitness advantage, the earlier the mutations have been acquired, the faster the population grows, and also the sooner the primary driver mutation can be obtained. As in Equation (5), the effect of the fitness advantage on the cells without the primary driver mutation itself is captured by multiplying . The effect on the primary driver mutation is more intricate. To capture this effect, we start from a -binomial coefficient and rewrite the -Pochhammer symbol in the numerator in terms of a sum (Koekoek et al., 2010),
To make this resemble the term in the parentheses in the second line of Equation (11), we divide the numerator by and we obtain