Multifocality and recurrence risk:
a quantitative model of field cancerization
Abstract
Primary tumors often emerge within genetically altered fields of premalignant cells that appear histologically normal but have a high chance of progression to malignancy. Clinical observations have suggested that these premalignant fields pose high risks for emergence of secondary recurrent tumors if left behind after surgical removal of the primary tumor. In this work, we develop a spatiotemporal stochastic model of epithelial carcinogenesis, combining cellular reproduction and death dynamics with a general framework for multistage genetic progression to cancer. Using this model, we investigate how macroscopic features (e.g. size and geometry of premalignant fields) depend on microscopic cellular properties of the tissue (e.g. tissue renewal rate, mutation rate, selection advantages conferred by genetic events leading to cancer, etc). We develop methods to characterize how clinically relevant quantities such as waiting time until emergence of second field tumors and recurrence risk after tumor resection. We also study the clonal relatedness of recurrent tumors to primary tumors, and analyze how these phenomena depend upon specific characteristics of the tissue and cancer type. This study contributes to a growing literature seeking to obtain a quantitative understanding of the spatial dynamics in cancer initiation.
1 Introduction
The term ‘field cancerization’ refers to the clinical observation that certain regions of epithelial tissue have an increased risk for the development of multiple synchronous or metachronous primary tumors. This term originated in 1953 from repeated observations by Slaughter and colleagues of multiple primary oral squamous cell cancers and local recurrences within a single region of tissue [1]. This phenomenon, also known as the ‘cancer field effect’ has been documented in many organ systems including head and neck (oral cavity, oropharynx, and larynx), lung, vulva, esophagus, cervix, breast, skin, colon, and bladder [2]. Although the exact underlying mechanisms of the field effect in cancer are not fully understood, recent molecular genetic studies suggest a carcinogenesis model in which clonal expansion of genetically altered cells (possibly with growth advantages) drives the formation of a premalignant field [2, 3]. This premalignant field, which may develop in the form of one or more expanding patches, forms fertile ground for subsequent genetic transformation events, leading to intermediate cancer fields and eventually clonally diverging neoplastic growths. The presence of such premalignant fields poses a significant risk for cancer recurrence and progression even after removal of primary tumors. Importantly, these fields with genetically altered cells often appear histologically normal and are difficult to detect; thus, mathematical models to predict the extent and evolution of these fields may be useful in guiding treatment and prognosis prediction.
In this work we develop and analyze a mathematical model of the cancer field effect. This model combines spatial cellular reproduction and death dynamics in an epithelial tissue with a general framework for multistage genetic progression to cancer. Using this model, we investigate how microscopic cellular properties of the tissue (e.g. tissue renewal rate, mutation rate, selection advantages conferred by genetic events leading to cancer, etc) impact the process of field cancerization in a tissue. We develop methods to characterize the waiting time until emergence of second field tumors and recurrence risk after tumor resection. We also study the clonal relatedness of recurrent tumors to primary tumors by assessing whether local field recurrences (second field tumors) are more likely than distant field recurrences (second primary tumors), and analyze how these phenomena depend upon characteristics of the tissue and cancer type. The methodology developed in this work is generally applicable to understanding the field effect in many epithelial cancers; in followup work we will calibrate and apply this framework to study recurrence risks due to field cancerization in specific cancer types.
The current work will contribute to a growing literature on the evolutionary dynamics of cancer initiation, see e.g. [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. In particular, there have been several previous mathematical models studying the stochastic evolutionary process of cancer initiation from spatially structured tissue, e.g. [14, 15, 16, 17, 18, 19]. In 1977 Williams and Bjerknes proposed a model of clonal expansion in epithelial tissue [16]. This model is closely related to the biased voter model from particle systems theory [20], and in [21, 22] the growth properties and asymptotic shape of the process were established. Komarova studied a 1dimensional process with mutation and showed that the probability of mutant fixation and time to obtain twohit mutants differ from the wellmixed setting [14]; this work was generalized to higher dimensions in [19]. Later, in [17, 18] the relationships between migration, mutation, selection and invasion in a spatial stochastic evolutionary model were explored. Martens and colleagues studied the dynamics of mutation accumulation and population adaptation on a discrete time hexagonal lattice model with nearest neighbor replacement upon reproduction in one and two dimensions [23, 24]. In a recent work Antal and colleagues consider a stochastic spatial model of cancer progression where cells acquire successive fitness advantages along the edge of the tumor. In the context of this model they study the shape of the evolving tumor front as well as the number of mutations acquired in the tumor [25]. Lastly, in previous work we studied the accumulation and spread rates of advantageous mutant clones in a spatially structured population of general dimension [26]. In the current work we build upon these previous studies to develop a quantitative understanding of field cancerization and tumor recurrence risks.
The article is organized as follows: in section 2 we introduce the stochastic mathematical model and describe basic properties regarding the survival and growth rate of mutant clones. Using previously derived results on the spread of mutant clones, we introduce a mesoscopic approximation to the model. In section 3 we analyze the model to investigate the characteristics and extent of local and distant premalignant fields at the time of initiation. In particular, we determine how the spatial geometry of the field (e.g. number and size of lesions) depends on cellular and tissue properties such as mutation rate, tissue renewal rate and mutational fitness advantages. In section 4 we analyze the model to understand the risk of recurrence due to local or distant field malignancies, as a function of time and cellular parameters.
Throughout the paper we will use the following notation for the asymptotic behavior of positive functions,
Finally, we use the notation to denote that the random variable has distribution .
2 Mathematical framework and basic properties
Cancer initiation is associated with the accumulation of multiple successive genetic or epigenetic alterations to a cell [27]. A subset of these genetic events may give rise to a fitness advantage (i.e. an increase in reproductive rate of the cell or avoidance of apoptotic signals), and subsequently lead to a clonal expansion within the tissue. The expanding mutant cell populations form the background for further independent genetic events which lead to carcinogenesis. As a result of this spatial evolutionary process, by the time of cancer initiation or diagnosis the tissue field surrounding a tumor can be composed of genetically distinct premalignant lesions of various sizes and stages.
2.1 Cellbased model
To study the dynamics of this process, we consider a stochastic model which describes the accumulation and spread of a clone of cells with genetic alterations throughout a spatially structured tissue (e.g. stratified epithelium). Thus, we consider the model on a regular lattice , where and is the number of spatial dimensions of the tissue. Each location in the lattice is occupied by a single cell, and each cell reproduces at a rate according to its fitness with exponential waiting times. Whenever a cell reproduces, its offspring replaces one of its lattice neighbors at random, see Figure 1A. The type of each cell corresponds to its fitness, which is related to the number of genetic hits a cell has accumulated in a multistep genetic model of cancer initiation. For example, type0 cells have fitness normalized to 1 and are labeled as wildtype or normal (with no mutations). Initially our entire lattice is occupied by type 0 cells. Type0 cells acquire the first mutation at rate to become type1 cells. The type1 cell will have a relative fitness advantage to type0 cells, given by , for some constant . In general, type cells have a fitness advantage of relative to type cells, and they acquire the th mutation in the sequence at rate to become type cells. The process is stopped when a cell develops mutations; we call this the time of cancer initiation. The number of mutation as well as the parameters for depend on the specific cancer type. Although many (epi)genetic events are selectively disadvantageous (i.e. they confer a selective disadvantage ), the progeny of deleterious mutants die out quickly so here we restrict our attention to the case . Note that this process can be thought of as a spatial version of the Moran process, a spatially wellmixed population model that is commonly used to describe carcinogenesis (e.g. see [8, 9, 10, 11, 12]). In addition, the spatial reproduction and death dynamics of this model (without mutation) correspond to the biased voter process which has been wellstudied in physics and probability literature. In fact, a similar voter model approach was previously used to model cellular dynamics in epithelial tissue and found to correlate well with experimental predictions of clone size distribution in the mouse epithelium [28].
The total number of cells in the fixedsize population is ; in most cancer initiation settings this number is quite large (at least ), while mutation rates are quite small (orders of magnitude smaller than 1). Therefore we will, unless stated otherwise, restrict our analysis to regimes where and . In Section 2.3, we will briefly discuss the specific conditions that we impose on the relationship between these parameters. For mathematical simplicity, the lattice is equipped with periodic boundary conditions; however in most relevant biological situations the domain size (i.e. cell number) is sufficiently large so that boundary effects are negligible.
Note on dimension of the model. We analyze the general model in space dimensions . While all epithelial tissues have an intrinsically three dimensional architecture, in some situations considering may be a good approximation. For example, cancer initiation in mammary ducts of the breast, renal tubules of the kidney, and bronchi tubes of the lung could be viewed as approximately onedimensional processes, due to the aspect ratio of tube radius versus length. On the other hand, cancer initiation in the squamous epithelium of the cervix, the bladder or the oral cavity can be viewed as twodimensional process, since initiation occurs in the basal layer of the epithelium which is only 12 cells thick (e.g. see Figure 2). The validity of such approximations poses an interesting problem in itself, but will not be addressed in this work.
2.2 Survival and growth of a single mutant clone
We first establish some basic behaviors of mutant cells and their clonal progeny within a tissue. Of particular interest are: (i) the survival probability of a mutant clone, and (ii) the rate of spatial expansion of the mutant clone through the tissue. In particular, how are these characteristics influenced by tissue parameters and the cellular fitness advantage conferred by a mutation? We have addressed some of these questions in a previous work [26] and restate the results here to make the paper selfcontained. In addition, we perform new simulations in this work to fill in gaps where theoretical results are currently not available.
Consider the probability that a mutant cell survives to form a viable clone (i.e. does not die out due to demographic stochasticity). Let type1 cells have fitness and type0 cells have fitness , and let denote the type of cell at site in the lattice at time . Define
In other words, is the set of all type1 cell locations at time . We initiate the model with a single type1 cell at the origin surrounded by type0 cells in all other locations:
and assume no further mutations are possible (). This simplified model is known as the WilliamsBjerknes model [16], and if then it corresponds to the biased voter model, see e.g. [29]. Let denote the number of type1 cells in the model at time . Then we can define the extinction time of the process . The probability of survival of a single mutant clone with selective advantage over the surrounding cells is then the probability of the event . By looking at the the process only at its jump times, we note that the embedded process is a discrete time random walk that moves one up with probability or one down with probability . This can be seen by observing that the process only changes at boundaries between type0 and type1 cells, and the only possible resulting events are that the type0 gets replaced by a type1 (resulting in a jump up in ) or the type1 gets replaced by a type1 (resulting in a jump down in . Analysis of the overall survival probability of this random walk can then be calculated using simple gambler’s ruin analysis,
where the approximation is valid for . Thus, the probability that a mutant clone with fitness advantage survives is , and is independent of the dimension of the tissue.
It is important to understand how the expansion rate of a mutant clone depends on the selection strength of the mutant. To this end, we first state a result proven by Bramson and Griffeath [22, 21] which establishes an asymptotic shape for the type1 clone.
Theorem 2.1 (Bramson and Griffeath).
For any , there is a such that on
where is a convex set and has the same symmetries as .
In other words, the BramsonGriffeath shape theorem says that conditional on the clone never going extinct, the radius of the clone expands linearly. In previous work, we studied how this linear rate of expansion depends on the selection strength in the setting of weak selection strength, see Theorem 1 of [26]. More precisely, if we denote by the first unit vector in and define the growth rate such that
then as ,
(1) 
where is the probability that two simple random walks started at 0 and never hit. In other words, the radius of the asymptotic shape approximating the type1 clone grows linearly with rate on the order of . Note that as the dimension increases, the growth rate increases due to a larger clone boundary size in higher dimensions.
The previous results hold only in the regime of weak selection or small . For larger values of the selective advantage , simulations can be used to obtain for (in the process can be analyzed directly through simple random walk analysis and we obtain that ). For example, Figure 3 shows that the dependence of the growth rate is approximately linear for ; in this case simple regression yields the estimate (). Thus, a combination of analysis and simulation gives us a complete picture of how spatial expansion rate of mutant clones in a tissue depend upon the selective advantage for a wide range of selection strengths.
2.3 Approximating with a hybrid mesoscopic model
Our results regarding the survival and growth of a single mutant clone suggest a hybrid mesoscopic model simplification that enables our analysis of the field cancerization process. In particular, each successful mutant clone can be wellapproximated as growing dimensional ball with expansion rate as calculated in the previous section. Before proceeding however, let us clarify the notion of clone ‘survival’ a.k.a. ‘success’ in the full model, where multiple mutations can arise and compete in the same finite domain. In particular, we consider a mutant clone with selective advantage over the background to be successful if it reaches size . This criterion guarantees a negligible chance of extinction in an infinite domain with no interference. In particular, if we define to be the extinction time of a biased voter model with selective advantage , one can use the embedded discrete time process and gambler’s ruin calculations to show that if the biased voter process reaches then
Consider the fate of an unsuccessful type1 clone arising on a background of type0 cells. The clone evolves as a supercritical () biased voter model conditioned on extinction. In [26] we showed that unsuccessful type1 mutations typically die out by a time of order
(2) 
As seen in the previous section, the survival probability in the biased voter model (starting with a single type 1 cell in a sea of type 0 cells) is , but in the more complex spatial Moran model with the possibility of multiple interacting type 1 clones, it is not immediately clear that this survival probability is still given by . However, it was shown in [26] that the above survival probability remains a good approximation as long as
(3) 
If the total number of type1 cells is always a negligible fraction of and (A0) holds, then successful type1 mutations arrive as a Poisson arrival process with approximate rate , where is the total number of cells in the tissue. In particular, these conditions hold for all the numerical examples in this article.
We are now ready to introduce a hybrid mesoscopic model approximation as follows: Type1 mutations arrive in the healthy tissue as a Poisson arrival process with rate , distributed uniformly at random in the spatial domain. Each mutation event has two potential outcomes:

with probability , the mutation is successful and we approximate the subsequent clonal expansion with a ball whose radius grows deterministically. The macroscopic growth rate is , which was derived from individual cellular growth kinetics as described in section 2.2. As a representative simulation in figure 1B suggests, the ball in standard norm in will be utilized.

with probability , the mutation is unsuccessful, and the clone evolves according to the full stochastic (cellularlevel) model dynamics conditioned on extinction.
Note that the remainder of the paper discusses properties of this mesoscopic model.
It will be useful to define as the volume of a ball of radius 1 in dimensions,
Note that although the stochastic fluctuations of the shape of expanding clones are lost in this approximation, one gains generality since the mesoscopic model can approximate a whole class of microscopic models that admit a shape result in the line of Theorem 2.1.
2.4 Cancer initiation behavior
Although the methodology developed in this work can be generalized to the setting of mutation carcinogenesis models, we will consider for simplicity the classic twomutation model of cancer initiation first introduced by Knudson [30]. Here, type0 cells are wildtype with fitness , type1 cells are premalignant with fitness relative to type0 cells, and type2 cells are initiated cancer cells with fitness relative to type1 cells. The time of cancer initiation is defined as the time at which the first successful type2 cell arrives. In [26], we studied the situation where and found that the timing of cancer initiation is strongly governed by the limiting value of the following metaparameter:
Roughly speaking, represents the ratio of the rate of producing successful type1 cells to the subsequent time it take to acquire the first successful type2. We found that both the mechanisms and distribution of the cancer initiation time vary significantly depending on the regime of :

Regime 1 (R1): When , the first successful type2 mutation occurs within the expanding clone of the first successful type1 mutation (left panel of Figure 4). The initiation time is exponential and does not depend on the spatial dimension.

Regime 2: (R2) For , the first successful type2 mutation occurs within one of several successful type1 clones (middle panel in Figure 4). The initiation time is no longer exponential and depends explicitly upon the spatial dimension.

Regime 3 (R3): When , the first successful type2 mutation occurs after many successful type1 mutations have occurred (right panel of Figure 4). The first successful type2 can arise from either a successful or an unsuccessful type1 family; the initiation time represents a mixture distribution of these two events.

Note that for and we say that we are in borderline regimes R1/R2 and R2/R3 respectively.
We refer the reader to [26] for mathematical details of these statements. Note that these ‘regimes’ can be thought of as labels highlighting distinct types of initiation behaviors that arise as changes. In fact the system behavior continuously varies through the parameter space, and borderline cases between these regimes do exist. Figure 5 shows how the distribution of the waiting time varies with changing number of cells in . We note that as increases, the waiting time distribution shifts to the left and initiation occurs earlier. By comparing Figures 4 and 5 we see that early initiation times are associated with a diffuse premalignant field with a large number of independent lesions, whereas late initiation times are associated with a single premalignant field harboring the initiating tumor cell.
To briefly summarize this section, we have described first a microscopic model of cellular division, mutation and death within a regularly structured epithelial tissue. Analysis of the finescale dynamics of this model leads to a more tractable hybrid mesoscopic model which approximates the microscopic model. In a previous work, we studied the time until cancer initiation () under this hybrid model and we found three distinct regimes of initiation behavior in which the distribution of has distinct parametric forms. In the next section, we analyze this mesoscopic model to study the characteristics and extent of premalignant fields at the stochastic time of cancer initiation or diagnosis. In the analyses throughout, we will consider parameter ranges spanning all three regimes of initiation behavior; however, for simplicity in regime 3 we will restrict ourselves to the range of parameter space in which successful type2 mutations arise from successful type1 mutations (i.e. that do not later die out). The behavior in the final remaining portion of the parameter space in regime 3 will be the subject of further work.
3 Characterizing the premalignant field
The time between cancer initiation and diagnosis, referred to here as , is a subject of great interest, see e.g. [31] for a review. In general, is itself a random variable and may depend on the natural history of the disease until initiation. However, if we assume that is independent of , then we can characterize the premalignant field at time of diagnosis, , by means of the field characterization at time , together with the distribution of the delay time . For this reason, even though the clinically relevant time is , we focus here on characterizing the field at . Note that mathematically, this requires us to condition our analyses upon observing at some time , i.e. condition upon the event .
The starting time of the model () is assumed to be at the end of tissue development and the start of the tissue renewal phase. However for some tissues it is difficult to estimate this time, and thus it may be difficult to ascertain the system time at the time . In such cases, it is simple to adapt our analyses to this scenario and treat as an unknown quantity, by removing the conditioning on and integrating of our results against the density of . The following theorem provides this density in terms of
(4) 
which is the arrival rate of successful type1 mutations, and
(5) 
where
(6) 
and .
Theorem 3.1.
The waiting time until birth of the first successful type2 mutant, has probability density function
3.1 Size of the local field at initation
We are first interested in characterizing the size of the local field, i.e. the region of the premalignant type1 clone that gives rise to the first successful type2 clone (see Figure 6). Following the nomenclature of [32], we note the distinction between two different types of recurrent tumors: if the recurrence arises from a transformed cell in the premalignant field that gave rise to the primary tumor, the recurrence is called a second field tumor, see Figure 6A. On the other hand, if the recurrence arises from a premalignant field that is clonally unrelated to the primary malignancy, it is called a second primary tumor, see Figure 6B. These two types of recurrent tumors vary in terms of their degree of clonal relatedness to the primary tumor, and this may have some implications for treatment strategies in primary vs. recurrent tumors.
We define now to be the radius of the local field at time , and its corresponding area (). Note that we will use the terminology ‘area’ do describe clone sizes in all dimensions, and reserve the use of the term ‘volume’ for spacetime quantities. In the following, we are interested in determining the distributions of these two quantities at time , conditioned on the event . In other words, we are looking for the distributions of and , respectively.
At any given time, each clone produces initiating mutations at a rate proportional to its area. Hence the probability that clone (born at time ) gives rise to the initiating mutation at time is given by the ratio of clone ’s own area,
divided by the total area of type1 clones present. In other words, the size distribution of the initiating clone is given by the distribution of a sizebiased pick from the different clones present at the time the initiated mutation arises.
Definition 3.2 (Sizebiased pick).
Let be a family of random variables. A sizebiased pick from is defined as a random variable with conditional probability distribution
The following theorem is the main result of this section and characterizes the sizedistribution of the local field at the time of initiation. This is recognized as a sizebiased pick from the clones present at time , conditioned on the event .
Theorem 3.3.
The distribution of the area of the local field at time , conditioned on , is given by
(7) 
for .
Proof.
See section 7.1. ∎
The distribution of the local field radius follows now easily.
Corollary 3.4.
The radius of the local premalignant field at , conditioned on the event , has density
for .
Note that the distribution of the local field size (7) depends on the rate of successful mutations and the growth rate , but is independent of , the arrival rate of type1 mutations. In Figure 7A, we show how the distribution of the local field area (7) changes with arrival time of the first successful type2 clone. As expected, the support of the distribution increases with increasing initiation time, and hence the likelihood of having a large local field increases substantially. This suggests that that tumors appearing later have a higher recurrence probability if only the malignant portion is removed during surgery. The finite support of each probability density function reflects the fact that there is a hard upper bound on the size of a premalignant field at finite time in the system.
In Figure 7B,C we illustrate the sensitivity of the sizedistribution of the local field to varying mutation rates and , conditioned on observing initiation at the expected time . The mutation rates are tuned to vary across parameter Regimes 1, 2, and 3 as described in the previous section. Observe that for lower mutation rates, the local field size varies widely (and sometimes close to uniformly) over a large range of values, while elevated mutation rates in both cases signify smaller local fields. For the rate (Figure 7B), an intuitive explanation for this behavior is that as the mutation rate increases, the system moves towards regimes 2 and 3, in which the premalignant field is comprised of an increasing number of independent type1 patches. The necessary number of manhours to the first successful type2 mutation is absolved faster, and hence the size of the patch that gives rise to the first successful type2 decreases accordingly. For (Figure 7C) on the other hand, an increase in the mutation rate signifies a move towards regime 1: fewer type1 clones are required to produce the first successful type2, and the size of the type1 field that yields the first type2 decreases with increasing . Another observation to note is that the local field size varies across the same range of orders of magnitude as the mutation rates. This suggests for example, that carcinogen exposure or environmental causes changing mutation rates by one order of magnitude could result in predicted field sizes impacted similarly by an order of magnitude.
Finally, we demonstrate the sensitivity of the local field size to the selective advantage of mutant cells, see Figure 7D. For a small fitness gain of , the distribution is peaked at lower field sizes, but as increases the field size distribution shifts to the right. High fitness gains are usually associated with an aggressive tumor phenotype, and Figure 7D suggests that such tumors may also be associated with large surrounding premalignant fields and thus higher recurrence risks.
3.2 Size of the distant field at initiation
Next we are interested in analyzing the size distribution of the distant field at initiation, which is comprised of premalignant clones that are clonally unrelated to the tumor. Define the vector of areas of the distant premalignant lesions at time to be . This vector holds the areas of all premalignant clones except for the local field clone from which the tumor arises. Mathematically speaking, the goal of this section is to characterize the law of conditioned on the event . Before stating the main result some additional notation is needed. First, define the mapping as follows:
Then, we define the random variable , where
The distribution of is the joint distribution of , which characterizes the size distribution of the clones in the distant field at time . We obtain the following result.
Theorem 3.5.
The sizedistribution of the distant field clones at time of the first successful type2 mutation, conditioned on , is given by
where is defined in (24).
Proof.
See section 7.2. ∎
Figure 8 shows how the probability density function of the total distant field size (i.e. the sum of all distant field patches) changes with increasing mutation rate . For a comparison to the local field size distribution at the same parameter values, we refer to Figure 7B. We note that in regimes 1 and 2 the total distant field size is on the same order of magnitude as the local field size, but in regime three the distant field size is significantly larger than the size of the local field. As will be investigated in more detail below, this suggests that secondary tumor recurrences for cancer types in regime 3 are much more likely to stem from the distant field, and thus are more likely to be clonally unrelated to the primary tumor.
3.3 Number of field patches: evolution until initiation
We next analyze the total number of premalignant lesions over time until tumor initiation. In particular, the following result holds:
Proposition 3.6.
Conditioned on , we have that for all , the number of field patches is distributed as a mixture of a Poisson and a shifted Poisson random variable. In particular,
where and . In particular,
Proof.
To prove this result we calculate and then differentiate with respect to . Details of the proof are given in section 7.3. ∎
It is interesting to observe that as we see that , therefore as gets closer to time the process looks more like a shifted Poisson. This is stated in the corollary below.
Corollary 3.7.
Using Proposition 3.6, we can study the expected number of field patches of a certain size over time. Figure 9 shows the temporal dynamics of clonesize distribution in each regime. In regime 1 the expected number of small clones peaks and then declines as larger clones begin to dominate (consistent with the notion that a single premalignant clone exists prior to initiation), whereas in regimes 2 and 3 we see longer coexistence of large and small clones over time.

The result in Proposition 3.6 can be extended to a result about the entire process conditioned on . We present here the joint distribution of the process at multiple time points, since the proof is similar to Proposition 3.6 we do not include it. For define
Then for any positive integer , sequence of time points and nonnegative integers we have that
where for ,
, , and . Note that for each , and , i.e. the ’s form a probability vector.
The above joint distribution is rather difficult to parse, so we describe how one would generate samples of the increments of the process. For , set , then we can generate the values of the vector under the measure as follows. For each sample according to a Poisson distribution with mean . Choose an integer according to the probability vector , if replace with .
Note that in contrast to the setting of a Poisson process the random variables are not independent under .
4 Recurrence predictions
Tumor recurrence due to field cancerization poses a substantial clinical problem in many epithelial cancers [3]. We next aim to use the results of the previous section to develop a methodology for assessing the risk of tumor recurrence (as well as the likely type of tumor recurrence) after surgical removal of the primary tumor.
4.1 Local vs. distant field recurrence?
As discussed above, a recurring tumor can either arise in the same premalignant field (a second field tumor), or it can arise in a clonally unrelated field (second primary tumor). In this section we characterize the recurrence time distribution for each of these secondary tumor types, and study how the relative likelihood of local vs. distant recurrence depends upon parameters of the tissue and cancer type.
To this end, we first study the recurrence time distribution for second field tumors, which arise from the local premalignant field. Denote the second field recurrence time by , measured in time units starting from at time . The time is reset at the tumor initiation time , rather than the tumor resection time , to accommodate the possibility that a recurrence occurs prior to detection of the primary tumor. Thus if recurrence occurs at some time , then a secondary tumor already exists at the time of diagnosis of the primary tumor (but may be too small to be detectable). We assume that the primary tumor node is completely resected once it becomes detectable at time , leaving the surrounding field intact (i.e. there are no excision margins).
At time a successful type2 cell arises from a premalignant clone of radius , whose distribution is characterized in Corollary 3.4. If , the incidence rate of successful type 2 mutations within this field is given by
(10) 
where is the rate of expansion of the malignant cells into the type1 field. A new variable is introduced to account for the fact that the malignancy may grow both upwards into the epithelial layer and sideways into the field, depending on the type of tissue involved (see also Figure 2B).
Corollary 4.1.
The probability of a second field tumor having formed before time (measured from ), conditioned on , is given by
In particular, is the probability that smaller, possibly undetectable second field tumors exist at the time of diagnosis.
Proof.
See section 7.4.∎
In Figure 10A the cumulative distribution function of as calculated in Corollary 4.1 is shown, for varying values of type2 mutation rates . As one might expect, higher mutation rates yield a decreased time to recurrence (the curves shift to the left for increasing ). However, considering that the size of the premalignant field at initiation of the primary tumor is inversely proportional to the mutation rate , see Figure 10B, the decrease in time to recurrence is a priori not obvious: a bigger precancer field increases the chance of fast recurrence. This example illustrates how a quantitative model enables us to assess the relative importance of competing aspects of the system  in this case, the impact of larger premalignant field versus higher mutation rates on recurrence likelihood.
If the recurrence does not take place in the local field giving rise to the first successful type2 clone, then it either arises from one of the type1 clones already present at time of initiation (i.e. the distant field), or it arises in a type1 clone formed after initiation. In the latter case, the waiting time is again distributed as , and hence we focus here on the distribution of the waiting time , defined as the time from until a second primary tumor arises from the distant field already existing at . We have the following result.
Corollary 4.2.
The probability that the distant field at the time of initiation gives rise to a second primary tumor by time (measured from ), conditioned on , is given by
where
and is defined in (24).
Proof.
See section 7.5. ∎
Thanks to the results in this section, it is now possible to evaluate the probability of local versus distant tumor recurrences in each parameter regime. Corollary 4.1 explicitly provides the probability density function , which is the probability that a second field tumor arises at time from the same field that gave rise to the primary tumor. To obtain the corresponding probability density function for recurrence as a second primary tumor, we have to consider recurrences due to distant field lesions that have arisen before and after . While Corollary 4.2 characterizes the recurrence risk due to distant lesions already present at initiation, the time to a successful second primary tumor from a distant field not yet present at initiation is distributed as , see Theorem 3.1. Therefore, the distribution of interest is that of , which is the time of the first distant recurrence event.
In Figure 11 we study how the comparison between the probability density functions of (second field tumor, local) and (second primary tumor, distant) varies in regimes 1, 2 and 3. The likelihood of local vs. distant recurrences depends strongly upon both the timing and parameter regime of the system In regime 1, local recurrence is significantly more likely overall, but at late times the probability of distant recurrences is slightly higher than for local recurrences. In contrast, in regimes 2 and 3 the overall probability of local and distant recurrences are comparable. However, in regime 2, at early times distant field recurrences are more likely, whereas the opposite is true at later times. The same observation, but even more pronounced, holds in regime 3. These results show that fundamental tissue parameters (e.g. renewal rate, size, mutation rate) could be used to provide insights into the timing and clonal relatedness of tumor recurrences, and suggest a new strategy for prognosis prediction and risk stratification.
5 Conclusions and outlook
In this study we performed a quantitative analysis of the cancer field effect by means of a spatial stochastic model of cancer initiation, which had previously been introduced in [26]. Based on this model, we investigated various characteristics of premalignant fields at the time of tumor initiation. In particular, we derived the sizedistributions of the local field (the premalignant lesion that gives rise to the tumor) and the distant field (the premalignant lesions that are unrelated to the primary tumor). We calculated the dynamic clone size distribution at times leading up to initiation, and derived the probability density functions of local and distant recurrence times. Finally, we compared the relative likelihood of second field versus second primary tumors, and demonstrated how the clonal relatedness between primary and recurrent tumors depends explicitly upon tissue and cancer type parameters.
Using an example set of parameters in two space dimensions (which is appropriate for describing the cancer initiation process in the basal layer of a stratified epithelium), we found that lower mutation rates (such as in regime 1) were associated with larger local field sizes, whereas higher mutation rates (regimes 2 and 3) led to smaller local fields. We also found that higher mutation rates resulted in larger distant fields, while more aggressive cancers (high selective advantage) led to larger local fields at diagnosis. Finally, we investigated the risk of recurrence after surgical resection of the malignant portion, and found that for low mutation rates (regime 1), local recurrence is much more likely, whereas for larger mutation rates (regimes 2 and 3), the overall probability of local and distant recurrences are comparable. However, in regimes 2 and 3, early recurrences are more likely to be a second primary tumor, whereas the late recurrences are more likely to be second field tumors.
One important limitation of our approach is that the model captures a specific sequence of genetic alterations with specified and , and does currently not allow for permutations of genetic events and divergent pathways. Nevertheless, our model may provide a useful framework for comparing different biological hypotheses and disentangling divergent genetic pathways among cancer subtypes. In particular, it enables us to predict differences in observable dynamics such as initiation times and prognoses between different molecular models. Such an approach could help elucidating the sequence of genetic events during carcinogenesis, and will be the subject of future work. Another limitation of our framework is that we have assumed a static, uniform microenvironment within the tissue. The local microenvironment is in reality determined by a variety of time and spacedependent factors such as glucose, oxygen, growth factors, drugs and cytokine concentrations. In addition to impacting the growth and mutation rates of cells within the tissue, the local microenvironment is increasingly being recognized as playing an important role in carcinogenesis through stromal signaling. Lastly, in order to apply this framework to specific cancer types (as will be done in a followup work), appropriate parameter values for the model need to be determined. Literature estimates can usually be obtained for tissue parameters such as compartment size and tissue regeneration rates. In addition, baseline estimates for point mutation rates are available, usually down to an uncertainty of 12 orders of magnitude. Estimates of the relative selective advantage conferred by each mutation are more difficult to ascertain; however some estimates can be obtained from proliferation marker staining and in vitro studies.
In summary, the analyses performed in this work contribute towards a quantitative understanding of how organspecific physiological parameters (e.g. number of proliferative cells, tissue renewal rates), as well as pathwayspecific parameters (e.g. cellular mutation rate, selective advantages conferred by each oncogenic mutation) influence the process of field cancerization and the associated risks of recurrence. We demonstrated that tumor recurrence dynamics and premalignant field characteristics are strongly dependent upon these parameters, which vary across different tissue and cancer types. Once properly calibrated for a specific tissue and cancer type, the proposed methodology can potentially be used to provide insights into key prognostic factors such as risk of multifocal lesions and tumor recurrence, surveillance guidelines, and treatment design. For example, we are able to assess the likelihood and timing of local versus distant recurrences after surgical resection. Since this distinction provides information on the level of clonal relatedness between primary and recurrent tumors, the model predictions may provide insights into whether treatment strategies effective for primary tumors will be useful for recurrent tumors in particular cancer types. In addition, our methodology can be utilized to assess the relative benefits of surgical excision margins (removal of apparently healthy tissue surrounding tumors), and to help determine the minimal margins necessary to prevent recurrence in each tissue type. We consider the work presented here as a proof of concept: a mathematical framework which, once properly calibrated, refined and validated, may provide a useful tool in molecular epidemiology (e.g. mechanistic modeling of carcinogen exposure), as well as the development of probabilistic models for personalized treatment approaches.
6 Acknowledgements
We thank Rick Durrett for insightful discussions on this project as well as his useful suggestions on the manuscript.
7 Appendix: Proofs
7.1 Proof of Theorem 3.3
To prove Theorem 3.3, we first need a few new definitions and preliminary results. Define to be the random total spacetime volume covered by successful type1 families until time ,
(11) 
where represents the arrival time of the th family, and is the total number of successful arrivals by time , which is a Poisson process with rate . Let represent the spacetime volume conditioned on the event
where . In other words,
(12) 
For ease of notation we replace with the more compact version . Since and the conditioned process is a compound Poisson process, we obtain that
Similarly, we define to be the total area of clones covered by successful type1 families at time ,
(13) 
and we define to be this quantity conditioned on ,
(14) 
Note that
(15) 
By considering the spacetime volume of type1 clones we can calculate and . Combining these two formulas and using Bayes rule we get the following result for the joint distribution of the arrival times of successful type1 mutations, conditioned on the total number of mutations by time .
Lemma 7.1.
Conditioned on and , the arrival times of successful type1 clones are distributed as order statistics of iid random variables as follows:
where .
Proof.
The arrival process of successful type1 mutations is represented by , which is a Poisson process with rate and arrival times . Then for any and sequence we have that
(16) 
Since
(17) 
we find using Bayes’ rule
It follows then that
where is a uniform random variable on . . ∎
The distribution in Lemma 7.1 is an exponential twist of the uniform distribution. Note that if the conditioning was placed on the set instead of , then the conditional distribution would no longer have product form because of the term and the arrival times would not be the order statistics from an iid collection of random variables.
Next, we show that the random variable is Poisson if conditioned on .
Lemma 7.2.
Conditioned on
Proof.
First we note that