Lévy processes with marked jumps II :
Application to a population model with mutations at birth
Consider a population where individuals give birth at constant rate during their lifetimes to i.i.d. copies of themselves. Individuals bear clonally inherited types, but (neutral) mutations may happen at the birth events. The smallest subtree containing the genealogy of all the extant individuals at a fixed time is called the coalescent point process. We enrich this process with the history of the mutations that appeared over time, and call it the marked coalescent point process.
With the help of limit theorems for Lévy processes with marked jumps established in , we prove the convergence of the marked coalescent point process with large population size and two possible regimes for the mutations - one of them being a classical rare mutation regime, towards a multivariate Poisson point process. This Poisson point process can be described as the coalescent point process of the limiting population at , with mutations arising as inhomogeneous regenerative sets along the lineages. Its intensity measure is further characterized thanks to the excursion theory for spectrally positive Lévy processes. In the rare mutations asymptotic, mutations arise as the image of a Poisson process by the ladder height process of a Lévy process with infinite variation, and in the particular case of the critical branching process with exponential lifetimes, the limiting object is the Poisson point process of the depths of excursions of the Brownian motion, with Poissonian mutations on the lineages.
I would like to thank my supervisor, Amaury Lambert, for his very helpful advice and encouragement.
Key words and phrases : splitting tree, coalescent point process, Poisson point process, invariance principle, Lévy process, excursion theory.
AMS Classification : 60J85,60F17 (Primary), 92D25, 60G55, 60G51, 60J55 (Secondary)
A splitting tree (, , ) describes a population of individuals with i.i.d. lifetime durations, which distribution is not necessarily exponential, giving birth at constant rate during their lives. Each birth gives rise to a single child, who behaves as an independent copy of her parent. We consider here the extended framework of  : for each individual, the birth times and lifetimes of her progeny is given by a Poisson process with intensity , where the so-called lifespan measure is a Lévy measure on satisfying . In particular, the number of children of a given individual is possibly infinite. In addition, we assume that individuals carry types, and that every time a birth occurs, a mutation may happen, giving rise to a mutant child. Mutations are assumed to be neutral, meaning that they do not affect the behaviour of individuals. In order to take this into account, we introduce marked splitting trees : to each birth event we associate a mark in , which will code for the absence () or presence () of a mutation. In other words, a -type birth means a clonal birth, and a -type birth produces a mutant child. The mutations experienced by the population are then described by these marks.
Population models with mutations have inspired lots of works in the past, and have many applications in domains such as population genetics, phylogeny or epidemiology. Such models have been well studied in the particular case of populations with fixed size. In the Wright-Fisher and Moran models with neutral mutations, as well as in the Kingman coalescent, explicit results on the allelic partition of the population are provided by Ewens’ sampling formula (,). Relaxing the hypotheses of constant population size, branching processes with mutations at birth are studied in the monography . More recently, results have been obtained for the allelic partition and frequency spectrum of splitting trees, with mutations appearing either at birth of individuals () or at constant rate along the lineages (, , ), and are reviewed in . The present work focuses on asymptotic results when the size of the population gets large, for the genealogy (with mutational history) of splitting trees with mutations at birth, and relies on a previous article .
Genealogy of the -th population
Let us fix some positive real number . For , consider a marked splitting tree , and condition it on having a fixed positive number of individuals alive at level . Note that we use here the word ’level’ to denote the real time in which the individuals live, whereas we reserve the word ’time’ for the index of stochastic processes. This paper follows on from a work of L. Popovic () in the critical case with exponential lifetimes, without mutations, in which she proved the convergence in distribution of the coalescent point process (i.e. the smallest subtree containing the genealogy of the extant individuals) towards a certain Poisson point process. Our aim is to provide asymptotic results as gets large, for the structure of the genealogy of the population up to level , enriched with the random levels at which marks occurred on the lineages. To this aim, after a proper rescaling of , we introduce a random point measure which we call the marked coalescent point process. This point measure has atoms ; its -th one is itself a random point measure, whose set of atoms contains all the levels where mutations occurred on the -th lineage, and the coalescence time between individuals and . This sequence of point measures is the mathematical object for which we aim to get convergence as , after having set some convergence assumptions, which we discuss later.
Our work mainly relies on the study of splitting trees with the help of the so-called jumping chronological contour process (or JCCP). This process is an exploration process of the tree (without mutations) introduced by A. Lambert in , visiting all the existence levels of all the individuals exactly once, and ending at level . He showed in this paper that the JCCP of a tree truncated up to level is a compensated compound Poisson process with no negative jumps (spectrally positive Lévy process with finite variation) reflected below and killed when hitting . In particular, the labeling of the excursions of the JCCP below provides a labeling of the extant individuals at level . Inferring properties concerning the genealogy of the alive population at level in the tree then essentially consists in studying the excursions away from of this reflected Lévy process.
We introduce in  a generalization of this contour process to the framework of our rescaled marked splitting trees . We are thereby led to study a bivariate Lévy process . Roughly speaking, codes for the JCCP of (without mutations), and codes for the mutations. Namely, since a jump of corresponds to the encounter of a birth event when exploring the tree, will jump as well (with amplitude 1) if this birth was of type . The process is in one-to-one correspondence with the marked tree . We now want to characterize the law of the atoms of using this property. Let us first give an idea of our reasoning in the case where there is no mutations. The JCCP of , truncated up to , is distributed as reflected below . The set of levels at which births occurred on the lineage of the -th individual, up to its coalescence with the rest of the tree, is then exactly the set of values taken by the future infimum of the -th excursion of the JCCP under . First, this entails that the atoms of are i.i.d. Second, using a time reversal argument, the distribution of this set can be read from the ascending ladder height process of . A similar reasoning for the splitting tree with mutations leads to the following facts. Consider the ascending ladder height process of , and put marks on its jumps in agreement with the marks on the corresponding jumps of . Note that this implies a selection of the marks that are carried by jumps of the supremum process of . Denoting by the counting process of these marks, the bivariate process is a (possibly killed) bivariate subordinator which we call the marked ladder height process. The mutations on a lineage form then an inhomogeneous regenerative set, distributed as the image by of the jump times of under the excursion measure of away from , which finally yields a simple description of the law of the (i.i.d.) atoms of .
Obtaining an invariance principle for a population model in a large population asymptotic requires to assume that as , the population converges in a certain sense. A classical example would be the convergence of the rescaled Bienaymé-Galton-Watson process towards the Feller diffusion (). Now regardless of mutations, the JCCP offers a one-to-one correspondence between and a continuous time process. Our first assumption arises then naturally as the convergence in distribution of the properly rescaled Lévy process towards a Lévy process (with infinite variation, Assumption A). In particular, the lifetimes of individuals do not necessarily vanish in the limit. Besides, two different assumptions concerning the mutations are considered. The first one (B.1) falls within the classical asymptotic of rare mutations : every birth is of type with a constant probability , and as . Asymptotic results in this framework are obtained in  for the genealogical structure of alleles in a critical or subcritical Bienaymé-Galton-Watson process (however contrary to ours, they do not concern the extant population at a fixed time horizon, but the whole population). The second one (B.2) examines the case where the probability of an individual to be a mutant is correlated with her lifetime, in the sense that mutations favor longer lifetimes.
While Assumption A alone ensures the convergence in distribution of towards the classical ladder height process of , Assumptions B.1 and B.2 are designed to allow that of the marked ladder height process. Indeed, we prove in  the convergence in law of towards a (possibly killed) bivariate subordinator , such that is the ladder height process of . Note nevertheless that in this framework there is in general no convergence of the whole mutation process, namely . In the case of Assumption B.1, and are independent, and is a Poisson process with parameter , which arises as the limit of the sequence after a proper rescaling. This means that the contribution to the mutations in the limit exclusively comes from individuals with vanishing lifetimes. This is no longer the case under Assumption B.2, yet additional independent marks can appear if has a Gaussian component. Using this convergence to deduce that of the (rescaled) law of the mutations on a lineage, the convergence of to a Poisson point measure is then a straightforward consequence of the law of rare events for null arrays (see e.g. [14, Th. 16.18]). Under B.1, its intensity measure is the law of the image by of an independent Poisson process with parameter , under the excursion measure of away from zero. A very similar but slightly more complicated result, involving the limiting marked ladder height process , is available under B.2. Besides, in the case where is a Brownian motion, is simply a drift, and thus the intensity measure is the law of a Poisson process killed at some independent random time, distributed as the depth of an excursion of the Brownian motion away from 0.
The paper is organized as follows : Section 2 sets up notation for the topological framework, and provides some background on the excursion theory for Lévy processes (see e.g.  and ). Section 3 is devoted to the statement of our results, and Section 4 to their proofs. In the appendix, we give proof of some properties that are consequences of Assumption A, and which we make frequent use of throughout the paper.
We consider the Euclidean space and endow it with its Borel -field . We denote by
the space of all càd-làg functions from to . We endow the latter with the Skorokhod topology, which makes it a Polish space (see [13, VI.1.b]). In the sequel, for any function and , we will use the notation , where .
Now for any Polish space , with its Borel -field , the space of positive finite measures on can be endowed with the weak topology :
It is the coarsest topology for which the mappings are continuous for any continuous bounded function . In the sequel, we will use the notation .
Hence we endow here and with their respective weak topologies. The notation will be used for both weak convergence in and in , and we will use the symbol for the equality in distribution. Recall that for any sequence of -valued càd-làg processes , the weak convergence of towards a process of is equivalent to the finite dimensional convergence of towards along any dense subset , together with the tightness of . For more details about convergence in distribution in , see [13, VI.3].
From now on, we fix . We consider the space of positive point measures on , and endow it with the -field generated by the mappings . Then we denote by the subset of the point measures on of the form
The trace -field on is in particular generated by the class
2.2 Excursion theory for spectrally positive Lévy processes
We provide here some background about the excursion theory for spectrally positive Lévy processes. For the basic properties concerning spectrally positive Lévy processes that will be needed here, we refer to a summary we provide in [8, Section 2] (these properties can otherwise be found in  or , for example).
Let be a spectrally positive Lévy process with Lévy measure . We define its past supremum for all . We denote by the set of excursions of away from : is the set of the càd-làg functions with no negative jumps for which there exists , which will be called the lifetime of the excursion, and such that , has values in for and in the case where , .
The reflected process is a Markov process for which one can construct a local time at 0. We denote by its inverse, and we consider the process with values in (where is an additional isolated point), defined by :
Then according to Theorem IV.10 in , if does not drift to , then is recurrent for the reflected process, and is a Poisson point process with intensity , where is some constant depending on the choice of , and is a measure on . Else, is a Poisson point process with intensity , stopped at the first excursion with infinite lifetime.
In the same way, we denote by the set of excursions of away from : is the set of the càd-làg functions with no negative jumps for which there exists , and such that , has values in for , and if . We then introduce .
Denoting by a local time at of and by its inverse, we define the process with values in
If has no Gaussian component, any excursion first visits , and we necessarily have (but possibly infinite). On the other hand, if has a Gaussian component, it can creep upwards and then .
Again, according to Theorem IV.10 in , is a Poisson point process with intensity , stopped if is subcritical at the first excursion with infinite lifetime. Here is some constant depending on the choice of and a measure on .
We have for all :
If has finite variation,
If has infinite variation and no Gaussian component (i.e. ),
Moreover, in both cases, under , the reversed excursion
is equal in law to under .
Finally, the same statement holds replacing by and by .
3 A limit theorem for splitting trees with mutations at birth
3.1 JCCP of a marked splitting tree
Formally, a splitting tree (without mutations) is a random real tree characterized by a -finite measure on , satisfying . Consider such a splitting tree, and assume first that there is extinction of the population. In , A. Lambert considers a contour process of this tree called JCCP (jumping chronological contour process). He establishes that the tree and its contour process are in one-to-one correspondence and characterizes the law of the latter : conditional on the first individual in the tree to have life duration , its JCCP is distributed as a finite variation, spectrally positive Lévy process with drift and Lévy measure , starting at , and killed upon hitting . In the case of non extinction, we then can consider the JCCP of the tree truncated up to level , which has the law of the Lévy process described above, starting at , and reflected below level .
As noticed in Section 1, the exploration of the tree by its JCCP defines a way of ordering the individuals. In the sequel, when we label the extant individuals at level , we refer to that order.
Consider now a marked splitting tree as defined in Section 1. We assume that the probability for a child to be a mutant can only (possibly) depend on her life span , and if we denote by this probability, where is a function from to , will be called the mutation function of the tree. Then is characterized by its mutation function and its lifespan measure .
Then similarly as in the case without mutations, we define the JCCP of . First assume that there is extinction of its population. Then the JCCP of the marked tree is a bivariate process from to , whose first coordinate is the JCCP of the splitting tree without marks, and whose second coordinate is the counting process of the mutations (see Figure 1).
More precisely, for every jump time of (which corresponds to the encounter of a birth event in the exploration process), jumps (with amplitude ) iff this birth was a -type birth. Hereafter we say that a jump of occurring at time carries a mark (or a mutation) if
This bivariate process is in one-to-one correspondence with . Besides, conditional on the first individual to have life duration , it is distributed as a bivariate Lévy process with drift , and Lévy measure (where denotes the Bernoulli probability measure with parameter ), starting at , and killed as soon as its first
coordinate hits . As in the non-marked case, if the assumption of extinction does not hold, the law of the JCCP of the truncated tree can be obtained from the Lévy process we just described.
3.2 Definitions and notation
3.2.1 Rescaling the population
Let be a sequence of measures on satisfying for all , and a sequence of continuous functions from to .
We now consider a sequence of marked splitting trees such that for all , has lifespan measure , and mutation function . Recalling that denotes the Bernoulli probability measure with parameter , we consider an independent bivariate Lévy process with finite variation, Lévy measure and drift , and make the following assumption :
Assumption A : There exists a sequence of positive real numbers such that as , the process defined by
converges in distribution to a (necessarily spectrally positive) Lévy process with infinite variation. We denote by its Lévy measure and by its Gaussian coefficient ().
For all and for all , set . With an abuse of notation, the law of conditional on , and the law of conditional on , will both be denoted by , and we write for .
Denote by the splitting tree obtained from by rescaling the branch lengths by a factor . The introduction of the process is motivated by its fundamental role in the characterization of the law of the JCCP of truncated up to level (see later Lemma 4.8).
Some notation :
The Laplace exponents of , of and of are defined by
We denote by (resp. ) the largest root of (resp. ) and by (resp. ) the inverse of (resp. ) on (resp. ). We denote by (resp. ) the scale function of (resp. ). Finally, we denote by the Lévy measure of .
Remarks about :
Writing for , , we get from the Lévy-Khintchine formula [8, (2)] that has drift , Lévy measure and Laplace exponent . In particular, this gives . We prove in the appendix that converges pointwise to as , and besides, the assumption of infinite variation of ensures . Thereby we know that necessarily as .
3.2.2 Asymptotic for the mutations
In order to allow the convergence in distribution of the mutation levels on the lineages, we have to make some technical assumptions on the mutation functions . Here we suggest two possible assumptions : in the first one, the probability of a child in to be a mutant is constant, while in the second one, this probability depends on its life duration.
Assumption B.1 :
For all , for all , , where .
As , converges to some finite real number .
Assumption B.2 :
The sequence converges uniformly to on .
There exists such that as .
Note that in B.1, necessarily as , corresponding to the classical rare mutation asymptotic. Then if we denote by the limit of the sequence , we have . Besides, in Assumption B.2 the choice of and is independent of and .
These two possible assumptions for the rescaling of the mutations have been chosen so that as , the marked coalescent point process converges. However this choice does not imply, despite Assumption A, the convergence of the bivariate process . As pointed out in , it is even never the case under B.2.
3.2.3 Marked genealogical process
From now on, we consider the sequence of rescaled marked splitting trees , and condition on having extant individuals at level , where as .
Consider a realization of , and label the individual alive at from to (according to Section 3.1). Then to the -th one we associate a simple point measure , with values in , as follows :
Consider the lineage of individual , and assume it contains -type birth events. Denote by the level where the lineage coalesces with the rest of the tree, and by the successive levels (in increasing order) where the 1-type birth events happened. Then we set
Hence the point measure is in the space , and keeps record of all the mutation events on the -th lineage, and of the coalescence level of this lineage with the rest of the tree (see Figure 2). The quantity will be called the coalescence time of the lineage (the word ’time’ is here to interpret as a duration). Note that in case the coalescence corresponds to a 1-type birth event, we have .
Now for all , we define the following random point measure on :
3.3 Main results
We first introduce some notation. To begin with, we define the mapping as follows (see figure 3.a) : for all ,
The function has values in the point measures on , and if and , then is in the set .
For any càd-làg piecewise-constant function , if denotes the sequence of its jump times (with in case ), we will use the notation instead of .
We denote by the current supremum process of , and by the ladder height process of , where is a local time at the supremum for , which will be specified later (see Section 4.1.2), and its inverse local time. We denote by the first entrance time of in the Borel set , and write for .
Finally, we denote by the excursion measure of away from zero (see Section 2.2), and we choose the normalization of the local time according to , i.e. satisfies the equality . Recall that for , denotes its first entrance time into . Define the set of all càd-làg functions with lifetime , such that and for all . Then we define a measure on as follows (see Figure 3.b) : for all ,
and that in the case where does not drift to , the excursions of have finite lifetime, and from a time reversal argument we have for any measurable set of , , where denotes the restriction of to the interval .
Results under Assumption B.1
In this paragraph we suppose that Assumptions A and B.1 are satisfied.
Consider an independent Poisson process with parameter . We introduce , a random element of , defined on by
Then the sequence converges in distribution towards a Poisson point measure on with intensity measure where Leb denotes the Lebesgue measure, and is a measure on defined by
Denote by the set of point measures of having at least points with second coordinate in the interval , which can be interpreted here as the presence of at least mutations on a lineage. Then the measure is not necessarily finite (see Example 1).
Note that we excluded in the first lineage , for which without additional assumption, we cannot easily get a similar result as for the other lineages. However, if we assume that the lifetime of the first individual in converges as towards some value greater than , we can adapt Theorem 3.2. The limiting object is then obtained by adding to a Dirac mass on .
Conditioning on survival at level
We obtain a similar result if, instead of conditioning on having extant individuals at level , we condition it on survival at level . Indeed, if we denote by the number of extant individuals in at level , we know that conditional on , follows a geometric distribution with parameter (see [17, prop.5.6] ). Then thanks to the pointwise convergence of towards (see Proposition 4.1), we get that converges in distribution towards an exponential variable with parameter .
Then the sequence converges in law to a Poisson point measure on with intensity Leb, where e is an independent exponential variable with parameter .
where is defined in Theorem 3.2 and for all . Hence in the limit, the mutations appearing on a lineage are distributed according to a point measure , where is distributed as the image of the jump times of an independent Poisson process with parameter , by the ladder height process of conditioned on , and starting at the opposite of the undershoot of an excursion with depth smaller than .
Finally, the following proposition expresses the law of under in terms of the image of an independent Poisson process by an inhomogeneous killed subordinator.
Let and be as in Theorem 3.2. For all ,
with a killed inhomogeneous subordinator with drift and jump measure , defined for all and by :
and the killing time of .
Results under Assumption B.2
We suppose now that Assumptions A and B.2 are satisfied. We establish in this case some very similar results as under B.1, but in a slightly more complicated version. Indeed, Assumption B.1 ensures the independence of with a certain process we define later (namely the subordinator that appears in the following statement), while in case B.2 these two subordinators are no longer independent.
There exists a process , starting at under , such that is a (possibly killed) bivariate subordinator, and such that converges in distribution towards a Poisson point measure on with intensity measure where
The processes and are not independent unless is a Brownian motion with drift, and the law of is explicitly characterized in Theorem 4.3.
If the limiting process is a Brownian motion with drift, is a deterministic drift and hence and are automatically independent. Hence in this case, Theorem 3.2 remains valid under Assumption B.2.
Similarly as under B.1, if has no Gaussian component we can reexpress the measure as follows :
where for , .
Furthermore, as in Proposition 3.6, we have for all ,
with a bivariate killed inhomogeneous subordinator, starting at under , with drift and jump measure , defined for all , and by :
and the killing time of .
We close this section by giving some explicit calculations in the cases where the limiting process is either the standard Brownian motion, or an -stable Lévy process ().
Example 1 : The Brownian case
Consider the case where the population of have exponential life spans with mean . Then an appropriate rescaling of the JCCP of leads in the limit to the standard Brownian motion.
We set :
Then, Assumption A is satisfied : for all , we have , which converges to as , and this implies the convergence in of towards the standard Brownian motion (see [13, Th. VII.3.4]). Moreover, if we assume for some , Assumption B.1 holds with .
The genealogical structure of this process (without mutations) and its asymptotic behaviour are studied by L. Popovic in , and in particular, results taking into account a -sampling of extinct individuals (each individual in the genealogy is recorded with a probability ) are provided. The following results are presented as a consequence of Theorem 3.2 but can also be derived from , since -sampling can be directly interpreted as recording -type birth events in the genealogy.
The distribution of is completely explicit. We know that , and a.s. for all . Note that the image by of a Poisson process with parameter is itself a Poisson process, with parameter . As a consequence, if we denote by the ranked sequence of the atoms of the measure appearing in Theorem 3.2, under , conditional on , is distributed as the sequence of jump times of a Poisson process with parameter , restricted to .
Besides, from the criticality of Brownian motion, we have , and since an excursion of Brownian motion away from is such that or ,
Finally, we have
The measure can then be expressed as follows :
where denotes the space of point measures on , is the law of a Poisson process with parameter restricted to the interval , and for any , denotes the sequence of jump times of .
In other words, in the limit the mutations on a lineage are distributed as an independent Poisson process with parameter , stopped at an independent random time distributed as the depth of an excursion away from , with depth lower than . Note furthermore that simple calculations lead to and (using the notation introduced in Remark 3.3) : the number of lineages carrying at least one mutation (resp. two mutations) is a.s. infinite (resp. finite).
Moreover, contrary to what is announced in Remark 3.4, the loss of memory of the exponential distribution ensures here that there is no need to add extra assumptions to extend the result to the first lineage. In this case, the limiting object is then obtained by adding to a Dirac mass on where is an independent Poisson process on with parameter .
Finally, according to Remark 3.8, these results are still valid for any choice of a sequence of functions and of a real number satisfying B.2, replacing by .
Example 2 : The stable case
Fix and set :
then we have for all , which is the Laplace exponent of an -stable spectrally positive Lévy process and Assumption A is satisfied. If we now set for some , Assumption B.1 holds with .
In this case we are able to characterize explicitly the inhomogeneous killed subordinator defined in Proposition 3.6. Indeed, we know that has no Gaussian component,