Multifocality and recurrence risk:a quantitative model of field cancerization

Multifocality and recurrence risk:
a quantitative model of field cancerization

Jasmine Foo , Kevin Leder 1, and Marc D. Ryser
1. School of Mathematics, and 2. Industrial and Systems Engineering,
University of Minnesota, Minneapolis, MN

3. Department of Mathematics, Duke University, Durham, NC
Partially supported by NSF grant DMS-1224362Partially supported by NIH grant R01-GM096190-02
August 19, 2019

Primary tumors often emerge within genetically altered fields of premalignant cells that appear histologically normal but have a high chance of progression to malignancy. Clinical observations have suggested that these premalignant fields pose high risks for emergence of secondary recurrent tumors if left behind after surgical removal of the primary tumor. In this work, we develop a spatio-temporal stochastic model of epithelial carcinogenesis, combining cellular reproduction and death dynamics with a general framework for multi-stage genetic progression to cancer. Using this model, we investigate how macroscopic features (e.g. size and geometry of premalignant fields) depend on microscopic cellular properties of the tissue (e.g. tissue renewal rate, mutation rate, selection advantages conferred by genetic events leading to cancer, etc). We develop methods to characterize how clinically relevant quantities such as waiting time until emergence of second field tumors and recurrence risk after tumor resection. We also study the clonal relatedness of recurrent tumors to primary tumors, and analyze how these phenomena depend upon specific characteristics of the tissue and cancer type. This study contributes to a growing literature seeking to obtain a quantitative understanding of the spatial dynamics in cancer initiation.

1 Introduction

The term ‘field cancerization’ refers to the clinical observation that certain regions of epithelial tissue have an increased risk for the development of multiple synchronous or metachronous primary tumors. This term originated in 1953 from repeated observations by Slaughter and colleagues of multiple primary oral squamous cell cancers and local recurrences within a single region of tissue [1]. This phenomenon, also known as the ‘cancer field effect’ has been documented in many organ systems including head and neck (oral cavity, oropharynx, and larynx), lung, vulva, esophagus, cervix, breast, skin, colon, and bladder [2]. Although the exact underlying mechanisms of the field effect in cancer are not fully understood, recent molecular genetic studies suggest a carcinogenesis model in which clonal expansion of genetically altered cells (possibly with growth advantages) drives the formation of a premalignant field [2, 3]. This premalignant field, which may develop in the form of one or more expanding patches, forms fertile ground for subsequent genetic transformation events, leading to intermediate cancer fields and eventually clonally diverging neoplastic growths. The presence of such premalignant fields poses a significant risk for cancer recurrence and progression even after removal of primary tumors. Importantly, these fields with genetically altered cells often appear histologically normal and are difficult to detect; thus, mathematical models to predict the extent and evolution of these fields may be useful in guiding treatment and prognosis prediction.

In this work we develop and analyze a mathematical model of the cancer field effect. This model combines spatial cellular reproduction and death dynamics in an epithelial tissue with a general framework for multi-stage genetic progression to cancer. Using this model, we investigate how microscopic cellular properties of the tissue (e.g. tissue renewal rate, mutation rate, selection advantages conferred by genetic events leading to cancer, etc) impact the process of field cancerization in a tissue. We develop methods to characterize the waiting time until emergence of second field tumors and recurrence risk after tumor resection. We also study the clonal relatedness of recurrent tumors to primary tumors by assessing whether local field recurrences (second field tumors) are more likely than distant field recurrences (second primary tumors), and analyze how these phenomena depend upon characteristics of the tissue and cancer type. The methodology developed in this work is generally applicable to understanding the field effect in many epithelial cancers; in follow-up work we will calibrate and apply this framework to study recurrence risks due to field cancerization in specific cancer types.

The current work will contribute to a growing literature on the evolutionary dynamics of cancer initiation, see e.g. [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. In particular, there have been several previous mathematical models studying the stochastic evolutionary process of cancer initiation from spatially structured tissue, e.g. [14, 15, 16, 17, 18, 19]. In 1977 Williams and Bjerknes proposed a model of clonal expansion in epithelial tissue [16]. This model is closely related to the biased voter model from particle systems theory [20], and in [21, 22] the growth properties and asymptotic shape of the process were established. Komarova studied a 1-dimensional process with mutation and showed that the probability of mutant fixation and time to obtain two-hit mutants differ from the well-mixed setting [14]; this work was generalized to higher dimensions in [19]. Later, in [17, 18] the relationships between migration, mutation, selection and invasion in a spatial stochastic evolutionary model were explored. Martens and colleagues studied the dynamics of mutation accumulation and population adaptation on a discrete time hexagonal lattice model with nearest neighbor replacement upon reproduction in one and two dimensions [23, 24]. In a recent work Antal and colleagues consider a stochastic spatial model of cancer progression where cells acquire successive fitness advantages along the edge of the tumor. In the context of this model they study the shape of the evolving tumor front as well as the number of mutations acquired in the tumor [25]. Lastly, in previous work we studied the accumulation and spread rates of advantageous mutant clones in a spatially structured population of general dimension [26]. In the current work we build upon these previous studies to develop a quantitative understanding of field cancerization and tumor recurrence risks.

The article is organized as follows: in section 2 we introduce the stochastic mathematical model and describe basic properties regarding the survival and growth rate of mutant clones. Using previously derived results on the spread of mutant clones, we introduce a mesoscopic approximation to the model. In section 3 we analyze the model to investigate the characteristics and extent of local and distant premalignant fields at the time of initiation. In particular, we determine how the spatial geometry of the field (e.g. number and size of lesions) depends on cellular and tissue properties such as mutation rate, tissue renewal rate and mutational fitness advantages. In section 4 we analyze the model to understand the risk of recurrence due to local or distant field malignancies, as a function of time and cellular parameters.

Throughout the paper we will use the following notation for the asymptotic behavior of positive functions,

Finally, we use the notation to denote that the random variable has distribution .

2 Mathematical framework and basic properties

Cancer initiation is associated with the accumulation of multiple successive genetic or epigenetic alterations to a cell [27]. A subset of these genetic events may give rise to a fitness advantage (i.e. an increase in reproductive rate of the cell or avoidance of apoptotic signals), and subsequently lead to a clonal expansion within the tissue. The expanding mutant cell populations form the background for further independent genetic events which lead to carcinogenesis. As a result of this spatial evolutionary process, by the time of cancer initiation or diagnosis the tissue field surrounding a tumor can be composed of genetically distinct premalignant lesions of various sizes and stages.

2.1 Cell-based model

To study the dynamics of this process, we consider a stochastic model which describes the accumulation and spread of a clone of cells with genetic alterations throughout a spatially structured tissue (e.g. stratified epithelium). Thus, we consider the model on a regular lattice , where and is the number of spatial dimensions of the tissue. Each location in the lattice is occupied by a single cell, and each cell reproduces at a rate according to its fitness with exponential waiting times. Whenever a cell reproduces, its offspring replaces one of its lattice neighbors at random, see Figure 1A. The type of each cell corresponds to its fitness, which is related to the number of genetic hits a cell has accumulated in a multi-step genetic model of cancer initiation. For example, type-0 cells have fitness normalized to 1 and are labeled as wild-type or normal (with no mutations). Initially our entire lattice is occupied by type 0 cells. Type-0 cells acquire the first mutation at rate to become type-1 cells. The type-1 cell will have a relative fitness advantage to type-0 cells, given by , for some constant . In general, type- cells have a fitness advantage of relative to type- cells, and they acquire the th mutation in the sequence at rate to become type- cells. The process is stopped when a cell develops mutations; we call this the time of cancer initiation. The number of mutation as well as the parameters for depend on the specific cancer type. Although many (epi)genetic events are selectively disadvantageous (i.e. they confer a selective disadvantage ), the progeny of deleterious mutants die out quickly so here we restrict our attention to the case . Note that this process can be thought of as a spatial version of the Moran process, a spatially well-mixed population model that is commonly used to describe carcinogenesis (e.g. see [8, 9, 10, 11, 12]). In addition, the spatial reproduction and death dynamics of this model (without mutation) correspond to the biased voter process which has been well-studied in physics and probability literature. In fact, a similar voter model approach was previously used to model cellular dynamics in epithelial tissue and found to correlate well with experimental predictions of clone size distribution in the mouse epithelium [28].

The total number of cells in the fixed-size population is ; in most cancer initiation settings this number is quite large (at least ), while mutation rates are quite small (orders of magnitude smaller than 1). Therefore we will, unless stated otherwise, restrict our analysis to regimes where and . In Section 2.3, we will briefly discuss the specific conditions that we impose on the relationship between these parameters. For mathematical simplicity, the lattice is equipped with periodic boundary conditions; however in most relevant biological situations the domain size (i.e. cell number) is sufficiently large so that boundary effects are negligible.

Figure 1: Lattice dynamics. (A) Schematic of spatial Moran model in : each cell divides at rate according to its fitness and replaces one of its 2 neighbors: if the light blue cell divides, its offspring replaces one of the dark blue neighbors, chosen uniformly at random. Every lattice site is occupied at all times (not shown). (B) Simulation example of the model: growth of an advantageous clone (light blue) starting from one cell with fitness advantage over the surrounding field (dark blue).

Note on dimension of the model. We analyze the general model in space dimensions . While all epithelial tissues have an intrinsically three dimensional architecture, in some situations considering may be a good approximation. For example, cancer initiation in mammary ducts of the breast, renal tubules of the kidney, and bronchi tubes of the lung could be viewed as approximately one-dimensional processes, due to the aspect ratio of tube radius versus length. On the other hand, cancer initiation in the squamous epithelium of the cervix, the bladder or the oral cavity can be viewed as two-dimensional process, since initiation occurs in the basal layer of the epithelium which is only 1-2 cells thick (e.g. see Figure 2). The validity of such approximations poses an interesting problem in itself, but will not be addressed in this work.

Figure 2: Geometry of squamous epithelium. A Basal layer (vertical perspective) before initiation with local field (left), and after initiation where the tumor is growing within the local field (right). B Sideways view of the fields before and after initiation, along the dashed lines in panel A. The proliferative cells inhabiting the two-dimensional lattice in the model reside in the basal layer of the epithelium. After initiation, malignant growth is not restricted to basal layer only.

2.2 Survival and growth of a single mutant clone

We first establish some basic behaviors of mutant cells and their clonal progeny within a tissue. Of particular interest are: (i) the survival probability of a mutant clone, and (ii) the rate of spatial expansion of the mutant clone through the tissue. In particular, how are these characteristics influenced by tissue parameters and the cellular fitness advantage conferred by a mutation? We have addressed some of these questions in a previous work [26] and restate the results here to make the paper self-contained. In addition, we perform new simulations in this work to fill in gaps where theoretical results are currently not available.

Consider the probability that a mutant cell survives to form a viable clone (i.e. does not die out due to demographic stochasticity). Let type-1 cells have fitness and type-0 cells have fitness , and let denote the type of cell at site in the lattice at time . Define

In other words, is the set of all type-1 cell locations at time . We initiate the model with a single type-1 cell at the origin surrounded by type-0 cells in all other locations:

and assume no further mutations are possible (). This simplified model is known as the Williams-Bjerknes model [16], and if then it corresponds to the biased voter model, see e.g. [29]. Let denote the number of type-1 cells in the model at time . Then we can define the extinction time of the process . The probability of survival of a single mutant clone with selective advantage over the surrounding cells is then the probability of the event . By looking at the the process only at its jump times, we note that the embedded process is a discrete time random walk that moves one up with probability or one down with probability . This can be seen by observing that the process only changes at boundaries between type-0 and type-1 cells, and the only possible resulting events are that the type-0 gets replaced by a type-1 (resulting in a jump up in ) or the type-1 gets replaced by a type-1 (resulting in a jump down in . Analysis of the overall survival probability of this random walk can then be calculated using simple gambler’s ruin analysis,

where the approximation is valid for . Thus, the probability that a mutant clone with fitness advantage survives is , and is independent of the dimension of the tissue.

It is important to understand how the expansion rate of a mutant clone depends on the selection strength of the mutant. To this end, we first state a result proven by Bramson and Griffeath [22, 21] which establishes an asymptotic shape for the type-1 clone.

Theorem 2.1 (Bramson and Griffeath).

For any , there is a such that on

where is a convex set and has the same symmetries as .

In other words, the Bramson-Griffeath shape theorem says that conditional on the clone never going extinct, the radius of the clone expands linearly. In previous work, we studied how this linear rate of expansion depends on the selection strength in the setting of weak selection strength, see Theorem 1 of [26]. More precisely, if we denote by the first unit vector in and define the growth rate such that

then as ,


where is the probability that two simple random walks started at 0 and never hit. In other words, the radius of the asymptotic shape approximating the type-1 clone grows linearly with rate on the order of . Note that as the dimension increases, the growth rate increases due to a larger clone boundary size in higher dimensions.

The previous results hold only in the regime of weak selection or small . For larger values of the selective advantage , simulations can be used to obtain for (in the process can be analyzed directly through simple random walk analysis and we obtain that ). For example, Figure 3 shows that the -dependence of the growth rate is approximately linear for ; in this case simple regression yields the estimate (). Thus, a combination of analysis and simulation gives us a complete picture of how spatial expansion rate of mutant clones in a tissue depend upon the selective advantage for a wide range of selection strengths.

Figure 3: Simulations of clonal expansion rate for large . Dependence of the growth rate on the fitness advantage . Statistics performed on samples for each -value.

2.3 Approximating with a hybrid mesoscopic model

Our results regarding the survival and growth of a single mutant clone suggest a hybrid mesoscopic model simplification that enables our analysis of the field cancerization process. In particular, each successful mutant clone can be well-approximated as growing -dimensional ball with expansion rate as calculated in the previous section. Before proceeding however, let us clarify the notion of clone ‘survival’ a.k.a. ‘success’ in the full model, where multiple mutations can arise and compete in the same finite domain. In particular, we consider a mutant clone with selective advantage over the background to be successful if it reaches size . This criterion guarantees a negligible chance of extinction in an infinite domain with no interference. In particular, if we define to be the extinction time of a biased voter model with selective advantage , one can use the embedded discrete time process and gambler’s ruin calculations to show that if the biased voter process reaches then

Consider the fate of an unsuccessful type-1 clone arising on a background of type-0 cells. The clone evolves as a supercritical () biased voter model conditioned on extinction. In [26] we showed that unsuccessful type-1 mutations typically die out by a time of order


As seen in the previous section, the survival probability in the biased voter model (starting with a single type 1 cell in a sea of type 0 cells) is , but in the more complex spatial Moran model with the possibility of multiple interacting type 1 clones, it is not immediately clear that this survival probability is still given by . However, it was shown in [26] that the above survival probability remains a good approximation as long as


If the total number of type-1 cells is always a negligible fraction of and (A0) holds, then successful type-1 mutations arrive as a Poisson arrival process with approximate rate , where is the total number of cells in the tissue. In particular, these conditions hold for all the numerical examples in this article.

We are now ready to introduce a hybrid mesoscopic model approximation as follows: Type-1 mutations arrive in the healthy tissue as a Poisson arrival process with rate , distributed uniformly at random in the spatial domain. Each mutation event has two potential outcomes:

  • with probability , the mutation is successful and we approximate the subsequent clonal expansion with a ball whose radius grows deterministically. The macroscopic growth rate is , which was derived from individual cellular growth kinetics as described in section 2.2. As a representative simulation in figure 1B suggests, the ball in standard -norm in will be utilized.

  • with probability , the mutation is unsuccessful, and the clone evolves according to the full stochastic (cellular-level) model dynamics conditioned on extinction.

Note that the remainder of the paper discusses properties of this mesoscopic model.

It will be useful to define as the volume of a ball of radius 1 in dimensions,

Note that although the stochastic fluctuations of the shape of expanding clones are lost in this approximation, one gains generality since the mesoscopic model can approximate a whole class of microscopic models that admit a shape result in the line of Theorem 2.1.

2.4 Cancer initiation behavior

Although the methodology developed in this work can be generalized to the setting of -mutation carcinogenesis models, we will consider for simplicity the classic two-mutation model of cancer initiation first introduced by Knudson [30]. Here, type-0 cells are wild-type with fitness , type-1 cells are premalignant with fitness relative to type-0 cells, and type-2 cells are initiated cancer cells with fitness relative to type-1 cells. The time of cancer initiation is defined as the time at which the first successful type-2 cell arrives. In [26], we studied the situation where and found that the timing of cancer initiation is strongly governed by the limiting value of the following meta-parameter:

Roughly speaking, represents the ratio of the rate of producing successful type-1 cells to the subsequent time it take to acquire the first successful type-2. We found that both the mechanisms and distribution of the cancer initiation time vary significantly depending on the regime of :

  • Regime 1 (R1): When , the first successful type-2 mutation occurs within the expanding clone of the first successful type-1 mutation (left panel of Figure 4). The initiation time is exponential and does not depend on the spatial dimension.

  • Regime 2: (R2) For , the first successful type-2 mutation occurs within one of several successful type-1 clones (middle panel in Figure 4). The initiation time is no longer exponential and depends explicitly upon the spatial dimension.

  • Regime 3 (R3): When , the first successful type-2 mutation occurs after many successful type-1 mutations have occurred (right panel of Figure 4). The first successful type-2 can arise from either a successful or an unsuccessful type-1 family; the initiation time represents a mixture distribution of these two events.

  • Note that for and we say that we are in borderline regimes R1/R2 and R2/R3 respectively.

We refer the reader to [26] for mathematical details of these statements. Note that these ‘regimes’ can be thought of as labels highlighting distinct types of initiation behaviors that arise as changes. In fact the system behavior continuously varies through the parameter space, and borderline cases between these regimes do exist. Figure 5 shows how the distribution of the waiting time varies with changing number of cells in . We note that as increases, the waiting time distribution shifts to the left and initiation occurs earlier. By comparing Figures 4 and 5 we see that early initiation times are associated with a diffuse premalignant field with a large number of independent lesions, whereas late initiation times are associated with a single premalignant field harboring the initiating tumor cell.

Figure 4: The three dynamic regimes. Regime 1: first successful type-2 cell (arrow) arises in the first premalignant clone, . Regime 2: several premalignant clones are present at the time of the first successful type-2 cell, . Regime 3: a large number of small premalignant clones are present by the time of the first successful type-2 cell, . Simulations obtained with parameter values as in Figure 5.
Figure 5: Waiting time until first successful type-2. Cumulative distribution function (cdf) of , the waiting time until the first successful type-2 mutation, for increasing (see Theorem 3.1). Regime 1: , Regime 2: , Regime 3: . All other parameters are fixed: , , , , .

To briefly summarize this section, we have described first a microscopic model of cellular division, mutation and death within a regularly structured epithelial tissue. Analysis of the fine-scale dynamics of this model leads to a more tractable hybrid mesoscopic model which approximates the microscopic model. In a previous work, we studied the time until cancer initiation () under this hybrid model and we found three distinct regimes of initiation behavior in which the distribution of has distinct parametric forms. In the next section, we analyze this mesoscopic model to study the characteristics and extent of premalignant fields at the stochastic time of cancer initiation or diagnosis. In the analyses throughout, we will consider parameter ranges spanning all three regimes of initiation behavior; however, for simplicity in regime 3 we will restrict ourselves to the range of parameter space in which successful type-2 mutations arise from successful type-1 mutations (i.e. that do not later die out). The behavior in the final remaining portion of the parameter space in regime 3 will be the subject of further work.

3 Characterizing the premalignant field

The time between cancer initiation and diagnosis, referred to here as , is a subject of great interest, see e.g. [31] for a review. In general, is itself a random variable and may depend on the natural history of the disease until initiation. However, if we assume that is independent of , then we can characterize the premalignant field at time of diagnosis, , by means of the field characterization at time , together with the distribution of the delay time . For this reason, even though the clinically relevant time is , we focus here on characterizing the field at . Note that mathematically, this requires us to condition our analyses upon observing at some time , i.e. condition upon the event .

The starting time of the model () is assumed to be at the end of tissue development and the start of the tissue renewal phase. However for some tissues it is difficult to estimate this time, and thus it may be difficult to ascertain the system time at the time . In such cases, it is simple to adapt our analyses to this scenario and treat as an unknown quantity, by removing the conditioning on and integrating of our results against the density of . The following theorem provides this density in terms of


which is the arrival rate of successful type-1 mutations, and




and .

Theorem 3.1.

The waiting time until birth of the first successful type-2 mutant, has probability density function


See (22) in section 7.1. ∎

3.1 Size of the local field at initation

We are first interested in characterizing the size of the local field, i.e. the region of the premalignant type-1 clone that gives rise to the first successful type-2 clone (see Figure 6). Following the nomenclature of [32], we note the distinction between two different types of recurrent tumors: if the recurrence arises from a transformed cell in the premalignant field that gave rise to the primary tumor, the recurrence is called a second field tumor, see Figure 6A. On the other hand, if the recurrence arises from a premalignant field that is clonally unrelated to the primary malignancy, it is called a second primary tumor, see Figure 6B. These two types of recurrent tumors vary in terms of their degree of clonal relatedness to the primary tumor, and this may have some implications for treatment strategies in primary vs. recurrent tumors.

Figure 6: Local and distant recurrences. Local (blue) and distant (green) premalignant fields give rise to second field tumors and second primary tumors (both red), respectively. In scenario A, there is only one premalignant field (the local field) present at time of cancer initiation (middle panel), and the recurrence occurs inside the local field. In scenario B, two unrelated precancerous fields are present at time of initiation (middle panel), and the recurrence may occur as a second primary tumor in the distant field.

We define now to be the radius of the local field at time , and its corresponding area (). Note that we will use the terminology ‘area’ do describe clone sizes in all dimensions, and reserve the use of the term ‘volume’ for space-time quantities. In the following, we are interested in determining the distributions of these two quantities at time , conditioned on the event . In other words, we are looking for the distributions of and , respectively.

At any given time, each clone produces initiating mutations at a rate proportional to its area. Hence the probability that clone (born at time ) gives rise to the initiating mutation at time is given by the ratio of clone ’s own area,

divided by the total area of type-1 clones present. In other words, the size distribution of the initiating clone is given by the distribution of a size-biased pick from the different clones present at the time the initiated mutation arises.

Definition 3.2 (Size-biased pick).

Let be a family of random variables. A size-biased pick from is defined as a random variable with conditional probability distribution

The following theorem is the main result of this section and characterizes the size-distribution of the local field at the time of initiation. This is recognized as a size-biased pick from the clones present at time , conditioned on the event .

Theorem 3.3.

The distribution of the area of the local field at time , conditioned on , is given by


for .


See section 7.1. ∎

The distribution of the local field radius follows now easily.

Corollary 3.4.

The radius of the local premalignant field at , conditioned on the event , has density

for .

Figure 7: Size-distribution of local field. The size-distribution (7) of the local field is shown for different scenarios, corresponding to the three regimes R1, R2 and R3 illustrated in Figure 4. A For varying arrival times ; B for varying type-1 mutation rates ; C for varying type-2 mutation rates ; (D) for varying type-1 fitness advantages . The non-varying parameters are held constant at , , , , and .

Note that the distribution of the local field size (7) depends on the rate of successful mutations and the growth rate , but is independent of , the arrival rate of type-1 mutations. In Figure 7A, we show how the distribution of the local field area (7) changes with arrival time of the first successful type-2 clone. As expected, the support of the distribution increases with increasing initiation time, and hence the likelihood of having a large local field increases substantially. This suggests that that tumors appearing later have a higher recurrence probability if only the malignant portion is removed during surgery. The finite support of each probability density function reflects the fact that there is a hard upper bound on the size of a premalignant field at finite time in the system.

In Figure 7B,C we illustrate the sensitivity of the size-distribution of the local field to varying mutation rates and , conditioned on observing initiation at the expected time . The mutation rates are tuned to vary across parameter Regimes 1, 2, and 3 as described in the previous section. Observe that for lower mutation rates, the local field size varies widely (and sometimes close to uniformly) over a large range of values, while elevated mutation rates in both cases signify smaller local fields. For the rate (Figure 7B), an intuitive explanation for this behavior is that as the mutation rate increases, the system moves towards regimes 2 and 3, in which the premalignant field is comprised of an increasing number of independent type-1 patches. The necessary number of man-hours to the first successful type-2 mutation is absolved faster, and hence the size of the patch that gives rise to the first successful type-2 decreases accordingly. For (Figure 7C) on the other hand, an increase in the mutation rate signifies a move towards regime 1: fewer type-1 clones are required to produce the first successful type-2, and the size of the type-1 field that yields the first type-2 decreases with increasing . Another observation to note is that the local field size varies across the same range of orders of magnitude as the mutation rates. This suggests for example, that carcinogen exposure or environmental causes changing mutation rates by one order of magnitude could result in predicted field sizes impacted similarly by an order of magnitude.

Finally, we demonstrate the sensitivity of the local field size to the selective advantage of mutant cells, see Figure 7D. For a small fitness gain of , the distribution is peaked at lower field sizes, but as increases the field size distribution shifts to the right. High fitness gains are usually associated with an aggressive tumor phenotype, and Figure 7D suggests that such tumors may also be associated with large surrounding premalignant fields and thus higher recurrence risks.

3.2 Size of the distant field at initiation

Next we are interested in analyzing the size distribution of the distant field at initiation, which is comprised of premalignant clones that are clonally unrelated to the tumor. Define the vector of areas of the distant premalignant lesions at time to be . This vector holds the areas of all premalignant clones except for the local field clone from which the tumor arises. Mathematically speaking, the goal of this section is to characterize the law of conditioned on the event . Before stating the main result some additional notation is needed. First, define the mapping as follows:

Then, we define the random variable , where

The distribution of is the joint distribution of , which characterizes the size distribution of the clones in the distant field at time . We obtain the following result.

Theorem 3.5.

The size-distribution of the distant field clones at time of the first successful type-2 mutation, conditioned on , is given by

where is defined in (24).


See section 7.2. ∎

  • From Theorem 3.5 and Corollary 3.7 below, we see that

Figure 8: The distribution of the total size of the distant field is shown for different scenarios, corresponding to the three regimes R1, R2 and R3 illustrated in Figure 4 for varying type-1 mutation rates . The non-varying parameters are held constant at , , , and .

Figure 8 shows how the probability density function of the total distant field size (i.e. the sum of all distant field patches) changes with increasing mutation rate . For a comparison to the local field size distribution at the same parameter values, we refer to Figure 7B. We note that in regimes 1 and 2 the total distant field size is on the same order of magnitude as the local field size, but in regime three the distant field size is significantly larger than the size of the local field. As will be investigated in more detail below, this suggests that secondary tumor recurrences for cancer types in regime 3 are much more likely to stem from the distant field, and thus are more likely to be clonally unrelated to the primary tumor.

3.3 Number of field patches: evolution until initiation

We next analyze the total number of premalignant lesions over time until tumor initiation. In particular, the following result holds:

Proposition 3.6.

Conditioned on , we have that for all , the number of field patches is distributed as a mixture of a Poisson and a shifted Poisson random variable. In particular,

where and . In particular,


To prove this result we calculate and then differentiate with respect to . Details of the proof are given in section 7.3. ∎

It is interesting to observe that as we see that , therefore as gets closer to time the process looks more like a shifted Poisson. This is stated in the corollary below.

Corollary 3.7.

and . In particular,


where is discussed in Lemma 7.2.

Using Proposition 3.6, we can study the expected number of field patches of a certain size over time. Figure 9 shows the temporal dynamics of clone-size distribution in each regime. In regime 1 the expected number of small clones peaks and then declines as larger clones begin to dominate (consistent with the notion that a single premalignant clone exists prior to initiation), whereas in regimes 2 and 3 we see longer coexistence of large and small clones over time.

Figure 9: Dynamic clone-size distribution. For each of the three regimes in Figure 5, the expected number of type-1 clones of sizes comprised in the corresponding intervals are shown as functions of time up to (expectations are conditioned on }. The intervals are defined as , , and . Parameter values as in Figure 5.
  • The result in Proposition 3.6 can be extended to a result about the entire process conditioned on . We present here the joint distribution of the process at multiple time points, since the proof is similar to Proposition 3.6 we do not include it. For define

    Then for any positive integer , sequence of time points and non-negative integers we have that

    where for ,

    , , and . Note that for each , and , i.e. the ’s form a probability vector.

    The above joint distribution is rather difficult to parse, so we describe how one would generate samples of the increments of the process. For , set , then we can generate the values of the vector under the measure as follows. For each sample according to a Poisson distribution with mean . Choose an integer according to the probability vector , if replace with .

    Note that in contrast to the setting of a Poisson process the random variables are not independent under .

4 Recurrence predictions

Tumor recurrence due to field cancerization poses a substantial clinical problem in many epithelial cancers [3]. We next aim to use the results of the previous section to develop a methodology for assessing the risk of tumor recurrence (as well as the likely type of tumor recurrence) after surgical removal of the primary tumor.

4.1 Local vs. distant field recurrence?

As discussed above, a recurring tumor can either arise in the same premalignant field (a second field tumor), or it can arise in a clonally unrelated field (second primary tumor). In this section we characterize the recurrence time distribution for each of these secondary tumor types, and study how the relative likelihood of local vs. distant recurrence depends upon parameters of the tissue and cancer type.

To this end, we first study the recurrence time distribution for second field tumors, which arise from the local premalignant field. Denote the second field recurrence time by , measured in time units starting from at time . The time is reset at the tumor initiation time , rather than the tumor resection time , to accommodate the possibility that a recurrence occurs prior to detection of the primary tumor. Thus if recurrence occurs at some time , then a secondary tumor already exists at the time of diagnosis of the primary tumor (but may be too small to be detectable). We assume that the primary tumor node is completely resected once it becomes detectable at time , leaving the surrounding field intact (i.e. there are no excision margins).

At time a successful type-2 cell arises from a premalignant clone of radius , whose distribution is characterized in Corollary 3.4. If , the incidence rate of successful type 2 mutations within this field is given by


where is the rate of expansion of the malignant cells into the type-1 field. A new variable is introduced to account for the fact that the malignancy may grow both upwards into the epithelial layer and sideways into the field, depending on the type of tissue involved (see also Figure 2B).

Corollary 4.1.

The probability of a second field tumor having formed before time (measured from ), conditioned on , is given by

In particular, is the probability that smaller, possibly undetectable second field tumors exist at the time of diagnosis.


See section 7.4.∎

In Figure 10A the cumulative distribution function of as calculated in Corollary 4.1 is shown, for varying values of type-2 mutation rates . As one might expect, higher mutation rates yield a decreased time to recurrence (the curves shift to the left for increasing ). However, considering that the size of the premalignant field at initiation of the primary tumor is inversely proportional to the mutation rate , see Figure 10B, the decrease in time to recurrence is a priori not obvious: a bigger precancer field increases the chance of fast recurrence. This example illustrates how a quantitative model enables us to assess the relative importance of competing aspects of the system - in this case, the impact of larger premalignant field versus higher mutation rates on recurrence likelihood.

Figure 10: Time to local recurrence. A The cumulative distribution function of the time to recurrence of a second field tumor is shown for three different scenarios, corresponding to (Regime 1), (Regime 2) and (Regime 2/3), respectively. The remaining parameters are , , , , . B Schematic of the relative initiation times of the primary tumor (yellow) and sizes of the local fields (blue), for the three scenarios in panel A. The numerical values for expected initiation time and local field size are: (a) , ; (b) , ; (c) , .

If the recurrence does not take place in the local field giving rise to the first successful type-2 clone, then it either arises from one of the type-1 clones already present at time of initiation (i.e. the distant field), or it arises in a type-1 clone formed after initiation. In the latter case, the waiting time is again distributed as , and hence we focus here on the distribution of the waiting time , defined as the time from until a second primary tumor arises from the distant field already existing at . We have the following result.

Corollary 4.2.

The probability that the distant field at the time of initiation gives rise to a second primary tumor by time (measured from ), conditioned on , is given by


and is defined in (24).


See section 7.5. ∎

Thanks to the results in this section, it is now possible to evaluate the probability of local versus distant tumor recurrences in each parameter regime. Corollary 4.1 explicitly provides the probability density function , which is the probability that a second field tumor arises at time from the same field that gave rise to the primary tumor. To obtain the corresponding probability density function for recurrence as a second primary tumor, we have to consider recurrences due to distant field lesions that have arisen before and after . While Corollary 4.2 characterizes the recurrence risk due to distant lesions already present at initiation, the time to a successful second primary tumor from a distant field not yet present at initiation is distributed as , see Theorem 3.1. Therefore, the distribution of interest is that of , which is the time of the first distant recurrence event.

In Figure 11 we study how the comparison between the probability density functions of (second field tumor, local) and (second primary tumor, distant) varies in regimes 1, 2 and 3. The likelihood of local vs. distant recurrences depends strongly upon both the timing and parameter regime of the system In regime 1, local recurrence is significantly more likely overall, but at late times the probability of distant recurrences is slightly higher than for local recurrences. In contrast, in regimes 2 and 3 the overall probability of local and distant recurrences are comparable. However, in regime 2, at early times distant field recurrences are more likely, whereas the opposite is true at later times. The same observation, but even more pronounced, holds in regime 3. These results show that fundamental tissue parameters (e.g. renewal rate, size, mutation rate) could be used to provide insights into the timing and clonal relatedness of tumor recurrences, and suggest a new strategy for prognosis prediction and risk stratification.

Figure 11: Local vs. distant recurrence. A For each of the three regimes in Figure 5, we show: the distribution of time to local recurrence , and the distribution of time to distant recurrence . The distribution of is given in Corollary 4.1 and we set to account both for contributions from type-1 clones already existing at as well as contributions from type-1 clones born after (for which time to recurrence is distributed as ). Expected times to recurrence: and (Regime 1); and (Regime 2); and (Regime 3). The parameter values are as in Figure 5.

5 Conclusions and outlook

In this study we performed a quantitative analysis of the cancer field effect by means of a spatial stochastic model of cancer initiation, which had previously been introduced in [26]. Based on this model, we investigated various characteristics of premalignant fields at the time of tumor initiation. In particular, we derived the size-distributions of the local field (the premalignant lesion that gives rise to the tumor) and the distant field (the premalignant lesions that are unrelated to the primary tumor). We calculated the dynamic clone size distribution at times leading up to initiation, and derived the probability density functions of local and distant recurrence times. Finally, we compared the relative likelihood of second field versus second primary tumors, and demonstrated how the clonal relatedness between primary and recurrent tumors depends explicitly upon tissue and cancer type parameters.

Using an example set of parameters in two space dimensions (which is appropriate for describing the cancer initiation process in the basal layer of a stratified epithelium), we found that lower mutation rates (such as in regime 1) were associated with larger local field sizes, whereas higher mutation rates (regimes 2 and 3) led to smaller local fields. We also found that higher mutation rates resulted in larger distant fields, while more aggressive cancers (high selective advantage) led to larger local fields at diagnosis. Finally, we investigated the risk of recurrence after surgical resection of the malignant portion, and found that for low mutation rates (regime 1), local recurrence is much more likely, whereas for larger mutation rates (regimes 2 and 3), the overall probability of local and distant recurrences are comparable. However, in regimes 2 and 3, early recurrences are more likely to be a second primary tumor, whereas the late recurrences are more likely to be second field tumors.

One important limitation of our approach is that the model captures a specific sequence of genetic alterations with specified and , and does currently not allow for permutations of genetic events and divergent pathways. Nevertheless, our model may provide a useful framework for comparing different biological hypotheses and disentangling divergent genetic pathways among cancer subtypes. In particular, it enables us to predict differences in observable dynamics such as initiation times and prognoses between different molecular models. Such an approach could help elucidating the sequence of genetic events during carcinogenesis, and will be the subject of future work. Another limitation of our framework is that we have assumed a static, uniform microenvironment within the tissue. The local microenvironment is in reality determined by a variety of time- and space-dependent factors such as glucose, oxygen, growth factors, drugs and cytokine concentrations. In addition to impacting the growth and mutation rates of cells within the tissue, the local microenvironment is increasingly being recognized as playing an important role in carcinogenesis through stromal signaling. Lastly, in order to apply this framework to specific cancer types (as will be done in a follow-up work), appropriate parameter values for the model need to be determined. Literature estimates can usually be obtained for tissue parameters such as compartment size and tissue regeneration rates. In addition, baseline estimates for point mutation rates are available, usually down to an uncertainty of 1-2 orders of magnitude. Estimates of the relative selective advantage conferred by each mutation are more difficult to ascertain; however some estimates can be obtained from proliferation marker staining and in vitro studies.

In summary, the analyses performed in this work contribute towards a quantitative understanding of how organ-specific physiological parameters (e.g. number of proliferative cells, tissue renewal rates), as well as pathway-specific parameters (e.g. cellular mutation rate, selective advantages conferred by each oncogenic mutation) influence the process of field cancerization and the associated risks of recurrence. We demonstrated that tumor recurrence dynamics and premalignant field characteristics are strongly dependent upon these parameters, which vary across different tissue and cancer types. Once properly calibrated for a specific tissue and cancer type, the proposed methodology can potentially be used to provide insights into key prognostic factors such as risk of multifocal lesions and tumor recurrence, surveillance guidelines, and treatment design. For example, we are able to assess the likelihood and timing of local versus distant recurrences after surgical resection. Since this distinction provides information on the level of clonal relatedness between primary and recurrent tumors, the model predictions may provide insights into whether treatment strategies effective for primary tumors will be useful for recurrent tumors in particular cancer types. In addition, our methodology can be utilized to assess the relative benefits of surgical excision margins (removal of apparently healthy tissue surrounding tumors), and to help determine the minimal margins necessary to prevent recurrence in each tissue type. We consider the work presented here as a proof of concept: a mathematical framework which, once properly calibrated, refined and validated, may provide a useful tool in molecular epidemiology (e.g. mechanistic modeling of carcinogen exposure), as well as the development of probabilistic models for personalized treatment approaches.

6 Acknowledgements

We thank Rick Durrett for insightful discussions on this project as well as his useful suggestions on the manuscript.

7 Appendix: Proofs

7.1 Proof of Theorem 3.3

To prove Theorem 3.3, we first need a few new definitions and preliminary results. Define to be the random total space-time volume covered by successful type-1 families until time ,


where represents the arrival time of the -th family, and is the total number of successful arrivals by time , which is a Poisson process with rate . Let represent the space-time volume conditioned on the event

where . In other words,


For ease of notation we replace with the more compact version . Since and the conditioned process is a compound Poisson process, we obtain that

Similarly, we define to be the total area of clones covered by successful type-1 families at time ,


and we define to be this quantity conditioned on ,


Note that


By considering the space-time volume of type-1 clones we can calculate and . Combining these two formulas and using Bayes rule we get the following result for the joint distribution of the arrival times of successful type-1 mutations, conditioned on the total number of mutations by time .

Lemma 7.1.

Conditioned on and , the arrival times of successful type-1 clones are distributed as order statistics of iid random variables as follows:

where .


The arrival process of successful type-1 mutations is represented by , which is a Poisson process with rate and arrival times . Then for any and sequence we have that




we find using Bayes’ rule

It follows then that

where is a uniform random variable on . . ∎

The distribution in Lemma 7.1 is an exponential twist of the uniform distribution. Note that if the conditioning was placed on the set instead of , then the conditional distribution would no longer have product form because of the term and the arrival times would not be the order statistics from an iid collection of random variables.

Next, we show that the random variable is Poisson if conditioned on .

Lemma 7.2.

Conditioned on


First we note that