Coexistence in the face of uncertainty
Over the past century, nonlinear difference and differential equations have been used to understand conditions for coexistence of interacting populations. However, these models fail to account for random fluctuations due to demographic and environmental stochasticity which are experienced by all populations. I review some recent mathematical results about persistence and coexistence for models accounting for each of these forms of stochasticity. Demographic stochasticity stems from populations and communities consisting of a finite number of interacting individuals, and often are represented by Markovian models with a countable number of states. For closed populations in a bounded world, extinction occurs in finite time but may be preceded by long-term transients. Quasi-stationary distributions (QSDs) of these Makov models characterize this meta-stable behavior. For sufficiently large “habitat sizes”, QSDs are shown to concentrate on the positive attractors of deterministic models. Moreover, the probability extinction decreases exponentially with habitat size. Alternatively, environmental stochasticity stems from fluctuations in environmental conditions which influence survival, growth, and reproduction. Stochastic difference equations can be used to model the effects of environmental stochasticity on population and community dynamics. For these models, stochastic persistence corresponds to empirical measures placing arbitrarily little weight on arbitrarily low population densities. Sufficient and necessary conditions for stochastic persistence are reviewed. These conditions involve weighted combinations of Lyapunov exponents corresponding to “average” per-capita growth rates of rare species. The results are illustrated with how climatic variability influenced the dynamics of Bay checkerspot butterflies, the persistence of coupled sink populations, coexistence of competitors through the storage effect, and stochastic rock-paper-scissor communities. Open problems and conjectures are presented.
Keywords:Random difference equations, stochastic population dynamics, coexistence, quasi-stationary distributions, demographic noise, environmental stochasticity, Markov chains
Lest men believe your tale untrue, keep probability in view. –John Gay
To appear as a refereed chapter in Recent Progress and Modern Challenges in Applied Mathematics, Modeling and Computational Science in Fields Institute Communication Series edited by Roderick Melnik, Roman Makarov, and Jacques Belair
A long standing, fundamental question in biology is “what are the minimal conditions to ensure the long-term persistence of a population or the long-term coexistence of interacting species?” The answers to this question are essential for guiding conservation efforts for threatened and endangered species, and identifying mechanisms that maintain biodiversity. Mathematical models have and continue to play an important role in identifying these potential mechanisms and, when coupled with empirical work, can test whether or not a given mechanism is operating in a specific population or ecological community [Adler et al., 2010]. Since the pioneering work of Lotka  and Volterra  on competitive and predator–prey interactions, Thompson , Nicholson and Bailey  on host–parasite interactions, and Kermack and McKendrick  on disease outbreaks, nonlinear difference and differential equations have been used to understand conditions for persistence of populations or communities of interacting species. For these deterministic models, persistence or species coexistence is often equated with an attractor bounded away from the extinction states in which case persistence holds over an infinite time horizon [Schreiber, 2006]. However (with apologies to John Gay), lest biologists believe this theory untrue, the models need to keep probability in view. That is, all natural populations exhibit random fluctuations due to mixture of intrinsic and extrinsic factors known as demographic and environmental stochasticity. The goal of this chapter is to present models that account for these random fluctuations, review some mathematical methods for analyzing these stochastic models, and illustrate how these random fluctuations hamper or facilitate population persistence and species coexistence.
Demographic stochasticity corresponds to random fluctuations due to populations consisting of a finite number of individuals whose fates aren’t perfectly correlated. That is, even if all individuals in a population appear to be identical, some undetectable differences between individuals (e.g. in their physiology or microenvironment) result in some individuals dying while others survive. To capture these “unknowable” differences, models can assign the same probabilities of dying to each individuals and treat survival amongst individuals as independent flips of a coin – heads life, tails death. Similarly, surviving individuals may differ in the number of offspring they produce despite appearing to be identical. To capture these unknowable differences, the number of offspring produced by these individuals are modeled as independent draws from the same probability distribution. The resulting stochastic models accounting for these random fluctuations typically correspond to Markov chains on a finite or countable state space111See, however, the discussion for biologically motivated uncountable state spaces. e.g. the numbers of individuals, , in a population. When these models represent populations or communities whose numbers tend to stay bounded and have no immigration, the populations in these models always go extinct in finite time [Chesson, 1978]. Hence, unlike deterministic models, the asymptotic behavior of these stochastic models is trivial: eventually no one is left. This raises the following basic question about the relationship between models accounting for demographic stochasticity and their deterministic counterparts:
“Any population allowing individual variation in reproduction, ultimately dies out–unless it grows beyond all limits, an impossibility in a bounded world. Deterministic population mathematics on the contrary allows stable asymptotics. Are these artifacts or do they tell us something interesting about quasi-stationary stages of real or stochastic populations?” – Peter Jagers 
As it turns out, there is a strong correspondence between the quasi-stationary behavior of the stochastic models and the attractors of an appropriately defined mean-field model. Moreover, this correspondence highlights a universal scaling relationship between extinction times and the size of the habitat where the species live. These results and their applications are the focus of the first half of this review.
While demographic stochasticity affects individuals independently, environmental stochasticity concerns correlated demographic responses (e.g. increased survival, growth or reproduction) among individuals. These correlations often stem from individuals experiencing similar fluctuations in environmental conditions (e.g. temperature, precipitation, winds) which impact their survival, growth, or reproduction. Models driven by randomly fluctuating parameters or brownian motions, such as random difference equations or stochastic differential equations, can capture these sources of random fluctuations. Unlike models for demographic stochasticity, these Markov chains always live on uncountable state spaces where the non-negative reals represent densities of populations of sufficiently large size that one can ignore the effects of being discrete and finite. Consequently, like their deterministic counterparts, extinction in these random difference equations only occurs asymptotically, and persistence is equated with tendency to stay away from low densities [Chesson, 1982]. Understanding what this exactly means, reviewing methods for verifying this stochastic form of persistence, and applying these methods to gain insights about population persistence and species coexistence are the focus of the second half of this review.
Of course, all population systems experience a mixture of demographic and environmental stochasticity. While the theoretical biology literature is replete with models accounting for each of these forms of noise separately, I know of no studies that rigorously blend the results presented in this review. Hence, I conclude by discussing some open problems and future challenges at this mathematical interface.
2 Demographic stochasticity
To model finite populations and account for demographic stochasticity, we consider Markov chains on a countable state space which usually is the non-negative cone of the integer lattice. Many of these stochastic models have a deterministic counterpart, sometimes called the “deterministic skeleton” or the “mean field model”. As I discuss below, these deterministic models can provide some useful insights about the transient behavior of the stochastic models and when coupled with large deviation theory provide insights into the length of these transients.
To get a flavor of the types of models being considered, lets begin with a stochastic counterpart to the discrete-time Lotka-Volterra equations. This example motivates the main results and will illustrate their applicability.
Example 1 (Poisson Lotka-Volterra Processes)
The continuous time Lotka-Volterra equations form the bedrock for much of community ecology theory. While there are various formulations of their discrete-time counterparts, a particularly pleasing one that retains several key dynamical features of the continuous-time models was studied by Hofbauer et al. . These models keep track of the densities of interacting species, where the subscripts denote the species identity and time (e.g. year or day). As with the classical continuous time equations, there is a matrix where corresponds to the “per-capita” effect of species on species and a vector of the “intrinsic per-capita growth rates” for all of the species. With this notation, the equations take on the form:
The state space for these dynamics are given by the non-negative orthant
of the -dimensional Euclidean space .
To define the Poisson Lotka-Volterra process, let be the size of the habitat in which the species live. Let denote the vector of species abundances which are integer-valued. Then the density of species is . Over the next time step, each individual replaces itself with a Poisson number of individuals with mean
If the individuals update independent of one another, then is a sum of independent Poisson random variables. Thus, is also Poisson distributed with mean
The state space for is the non-negative, dimensional integer lattice
while the state space for is the non-negative, rescaled integer lattice
Now consider a solution to deterministic model and the stochastic process initiated at the same densities . To see how likely deviates from , we use Chebyshev’s inequality. As the mean and variance of a Poisson random variable are equal, Chebyshev’s inequality implies
where denotes the variance of a random variable . In words, provided the habitat size is sufficiently large, a substantial deviation between and is unlikely. In fact, one can show that over any finite time interval , the stochastic dynamics are likely to be close to the deterministic dynamics over the time interval provided the habitat size is sufficiently large:
Figure 1 illustrates this fact for a Poisson Lotka-Volterra process with two competing species. Equation (4) is the discrete-time analog of a result derived by Kurtz  for continuous-time Markov chains. Kurtz  also provides “second-order” approximations for finite time intervals using Gaussian processes and stochastic differential equations. While these approximations are also useful for discrete-time models, we do not review them here.
Despite stochastically tracking with high probability for long periods of time, eventually their behavior diverges as Poisson Lotka-Volterra processes go extinct in finite time or exhibit unbounded growth.
Let be a Poisson Lotka-Volterra process with . Then
Furthermore, if is pre-compact i.e. for some , then
The strategy used to prove the first statement of the proposition is applicable to many models of closed populations. The key ingredients are that there is a uniform lower bound to the probability of any individual dying, and individuals die independently of one another [Chesson, 1978]. Proving, however, that extinction always occurs with probability one requires additional elements which aren’t meet by all ecological models, but is meet for “realistic” models.
For the first assertion, take any integer . Let
Next we use the following standard result in Markov chain theory [Durrett, 1996, Theorem 2.3 in Chapter 5].
Let be a Markov chain and suppose that
Let and . Proposition 2 with and implies that
The complement of the event equals the event . As is an increasing sequence of events,
where the final inequality follows from (5). This completes the proof of the first assertion.
To prove the second assertion, assume that there exists such that for all i.e. is pre-compact. Define
Applying Proposition 2 with and completes the proof of the second assertion. ∎
Equation (4) and Proposition 1 raise two fundamental questions about these stochastic, finite population models: How long before extinction occurs? Prior to extinction what can one say about the transient population dynamics? To get some insights into both of these questions, we build on the work of Freidlin and Wentzell  and Kifer  on random perturbations of dynamical systems, and Barbour  on quasi-stationary distributions.
2.1 Random perturbations and quasi-stationary distributions
The Poisson Lotka-Volterra process (Example 1) illustrates how Markovian models can be viewed as random perturbations of a deterministic model. To generalize this idea, consider a continuous, precompact222Namely, there exists such that lies in . map , where is a closed subset of . will be the deterministic skeleton of our stochastic models. A random perturbation of is a family of Markov chains on whose transition kernels
enjoy the following hypothesis:
For any ,
where denotes the -neighborhood of a point .
Hypothesis 2.1 implies that the Markov chains converge to the deterministic limit as i.e. the probability of being arbitrarily close to given is arbitrarily close to one for sufficiently small. Hence, one can view as the “deterministic skeleton” which gets clothed by the stochastic dynamic . The next example illustrates how to verify the Poisson Lotka-Volterra process is a random perturbation of the Lotka-Volterra difference equations.
Example 2 (The Poisson Lotka-Volterra processes revisited)
Consider the Poisson Lotka-Volterra processes from Example 1 where and and . For many natural choices of and , Hofbauer et al.  have shown there exists such that i.e. is pre-compact. While the corresponding Lotka-Volterra process lives on , the process can be extended to all of by allowing to be any point in and update with the transition probabilities of (2). With this extension, always lies in and is characterized by the following probabilities
and otherwise. With this extension, Hypothesis 1 for the Lotka-Volterra process follows from equation (3).
As with the Poisson Lotka-Volterra process, stochastic models of interacting populations without immigration always have absorbing states corresponding to the loss of one or more populations. Hence, we restrict our attention to models which satisfy the following standing hypothesis:
The state space can be written , where
is a closed subset of ;
and are positively -invariant, i.e and ;
the set is assumed to be absorbing for the random perturbations:
absorption occurs in finite time with probability one:
for all and .
The final bullet point implies that extinction of one or more species is inevitable in finite time. For example, Proposition 1 implies this hypothesis for Poisson Lotka-Volterra processes whenever is pre-compact.
Despite this eventual absorption, the process may spend exceptionally long periods of time in the set of transient states provided that is sufficiently small. This “metastable” behavior may correspond to long-term persistence of an endemic disease, long-term coexistence of interacting species as in the case of the Poisson Lotka-Volterra process, or maintenance of a genetic polymorphism. One approach to examining these metastable behaviors are quasi-stationary distributions which are invariant distributions when the process is conditioned on non-absorption.
A probability measure on is a quasi-stationary distribution (QSD) for provided there exists such that
Equivalently, dropping the superscript and subscripts, a QSD satisfies the identity
where denotes the law of the Markov chain , conditional to being distributed according to .
In the case that the Markov chain has a finite number of states and is the transition matrix (i.e. ), Darroch and Seneta  showed that the QSD is given by where is the normalized, dominant left eigenvector of the matrix given by removing the rows and columns of corresponding to extinction states in . In this case, is the corresponding eigenvalue of this eigenvector. For the Poisson Lotka-Volterra processes in which the unperturbed dynamic is pre-compact, Proposition 6.1 from Faure and Schreiber  implies the existence of QSDs for these processes. Examples of these QSDs for these processes are shown in Figures 2, 3, and 4. More generally, the existence of QSDs has been studied extensively by many authors as reviewed by Méléard and Villemonais .
What do these QSD’s and tell us about the behavior of the stochastic process? From the perspective of metastability, QSDs often exhibit the following property:
where the limit exists and is independent of the initial state . In words, the QSD describes the probability distribution of , conditioned on non-extinction, far into the future. Hence, the QSD provides a statistical description of the meta-stable behavior of the process. The eigenvalue, provides information about the length of the metastable behavior of . Specifically, given that the process is following the QSD (e.g. is distributed like ), and equals the probability of persisting in the next time step. Thus, the mean time to extinction is . Grimm and Wissel  call , the “intrinsic mean time to extinction” and, convincingly, argue that it is a fundamental statistic for comparing extinction risk across stochastic models.
2.2 Positive attractors, intrinsic extinction risk, and metastability
When the habitat size is sufficiently large i.e. is small, there is a strong relationship between the existence of attractors in (i.e. “positive” attractors) for the unperturbed system and the quasi-stationary distributions of . This relationship simultaneously provides information about the metastable behavior of the stochastic model and intrinsic probability of extinction, . To make this relationship mathematically rigorous, we need to strengthen Hypotheses 2.1 and 2.2. Faure and Schreiber  presents two ways to strengthen these hypothesis. We focus on their large deviation approach as it is most easily verified. This approach requires identifying a rate function that describes the probability of a large deviation between and . That is, for a sufficiently small neighborhood of a point , the rate function should have the property
Hypothesis 2.3 provides the precise definition and desired properties of .
There exists a rate function such that
is continuous on ,
if and only if ,
for any ,
for any open set , there is the lower bound
that holds uniformly for in compact subsets of whenever is an open ball in . Additionally, for any closed set , there is the uniform upper bound
Equations (7) and (9), in particular, imply that Hypothesis 2.1 holds. Furthermore, as is absorbing, equation (8) implies that for all , . Identifying the rate function typically requires making use of the Gärtner-Ellis theorem [Dembo and Zeitouni, 1993, Theorem 2.3.6] which provides large deviation estimates for sums of independent random variables. Example 3 below describes how this theorem was used for the Poisson Lotka-Volterra processes.
We strengthen Hypothesis 2.2 as follows:
For any , there exists an open neighborhood of such that
Equation (10) implies that
for sufficiently small. Namely, the probability of absorption near the boundary, at most, decays exponentially with habitat size. The following example discusses why these stronger hypotheses hold for the Poisson Lotka-Volterra process.
Example 3 (Return of the Poisson Lotka-Volterra Processe)
Using the Gärtner-Ellis theorem [Dembo and Zeitouni, 1993, Theorem 2.3.6], Faure and Schreiber [2014, Proposition 6.4] showed that is the rate function for any Poisson processes with mean including the Poisson Lotka-Volterra Process of Example 1. To see why Hypothesis holds for the Poisson Lotka-Volterra process, assume is such that for some and . Then
Hence, for any , choose sufficiently small to ensure that for all , whenever . In which case, choosing for some satisfies (10).
As many discrete distributions are used in models with demographic stochasticity (e.g. negative binomial, mixtures of bernoullis and negative binomials), an important open problem is the following:
For which types of random perturbations of an ecological model do Hypotheses 3 and 4 hold?
To relate QSDs to the attractors of the deterministic dynamics, we recall the definition of an attractor and weak* convergence of probability measures. A compact set is an attractor for if there exists a neighborhood of such that (i) and (ii) for any open set containing , for some . A weak* limit point of a family of probability measures on is a probability measure such that there exists a sequence satisfying
for all continuous functions . Namely, the expectation of any continuous function with respect to converges to its expectation with respect to as . The following theorem follows from [Faure and Schreiber, 2014, Lemma 3.9 and Theorem 3.12].
there exists a neighborhood of such that all weak* limit points of are -invariant and , and
there exists such that
Alternatively, assume that is a global attractor for the dynamics of . Then any weak*-limit point of is supported by .
Theorem 2.5 implies the existence of a positive attractor of the deterministic dynamics ensures the stochastic process exhibits metastable behavior for large habitat size, and the probability of extinction decreases exponentially with habitat size. Equivalently, the mean time to extinction increases exponential with habitat size. These conclusions are illustrated in Figure 2 with a one-dimensional Poisson Lotka-Volterra process (the Poisson Ricker process described below in Example 4).
Even if has no positive attractors, may not be a global attractor as there might be an unstable invariant set in . For example, single species models with positive feedbacks can have an uncountable number of unstable periodic orbits despite almost every initial condition going to extinction [Schreiber, 2003]. Hence, the necessary and sufficient conditions for metastability in Theorem 2.5 are not equivalent. However, if has no positive attractors, one can show that all points in can with arbitrarily small perturbations be “forced” to [Schreiber, 2006, 2007]. Hence, this raises the following open problem.
If has no positive attractors, are all the weak*-limit points of the QSDs supported by the extinction set ?
While the methodology used to prove Theorem 2.5 provides an explicit expression for , this expression is fairly abstract and only provides a fairly crude lower bound. This suggests the following questions which, if solved, may provide insights into how extinction probabilities depend on the nature of the nonlinear feedbacks within and between populations and the form of demographic stochasticity.
If has positive attractors, when does the limit
exist? If the limit exists, under what circumstances can we derive explicit expressions for ? or good explicit lower bounds for ?
Theorem 2.5 only ensures that the metastable dynamics concentrates on an invariant set for the deterministic dynamics. However, it is natural to conjecture that the QSDs should concentrate on the positive attractors of . These positive attractors, however, may coexist with complex unstable behavior. For example, the Ricker equation can have a stable periodic orbit coexisting with an infinite number of unstable periodic orbits (e.g. the case of a stable period orbit as illustrated in Fig. 3).
To identify when this intuition is correct, a few definitions from dynamical systems are required. For , let there exists such that be the -limit set for and there exist and such that and be the -limit set for . Our assumption that is precompact implies that there exists a global attractor given by the compact, -invariant set . For all , and are compact, non-empty, -invariant sets. A Morse decomposition of the dynamics of is a collection of -invariant, compact sets such that
is isolated i.e. there exists a neighborhood of such that it is the maximal -invariant set in the neighborhood, and
for every , there exist such that and .
Replacing the invariant sets by points, one can think of being gradient-like as all orbits move from lower indexed invariant sets to higher indexed invariant sets. Finally, recall that a compact invariant set is transitive if there exists an such that is dense in . Faure and Schreiber [2014, Theorem 2.7, Remark 2.8, and Proposition 5.1] proved the following result about QSDs not concentrating on the non-attractors of . The assumptions of this theorem can be verified for many ecological models.
For random perturbations of deterministic models without absorbing states (e.g. models accounting for immigration or mutations between genotypes), the work of Kifer  and Freidlin and Wentzell  can be used to show that the stationary distributions often concentrate on a unique attractor. However, due to the singularity of the rate function along the extinction set , the approach used by these authors doesn’t readily extend to the stochastic models considered here. This raises the following open problem:
If has multiple, positive attractors, under what conditions do the QSDs concentrate on a unique one of these positive attractors as ?
Lets apply some of these results to the Poisson Lotka-Volterra processes from Example 1.
Example 4 (The Ricker model)
The simplest of Poisson Lotka-Volterra processes is the stochastic Ricker model for a single species where with . Kozlovski  proved that for an open and dense set of values, the Ricker map has a Morse decomposition consisting of a finite number of unstable, intransitive sets (more specifically, hyperbolic sets) and a unique stable period orbit . As the stable periodic orbit is the only attractor, Theorem 2.6 implies the following result.
Consider the Ricker process with such that has the aforementioned Morse decomposition. Then any weak*-limit point of is supported by the unique stable periodic orbit .
Figure 3 illustrates this corollary: QSDs concentrating on the stable periodic orbit of period for , period for , period for , and period for . Remarkably, in the case of the stable orbit of period , there exists an infinite number of unstable periodic orbits which the QSDs do not concentrate on. We note that Högnäs , Klebaner et al. , Ramanan and Zeitouni  proved similar results to Corollary using inherently one dimensional methods.
Example 5 (Revenge of the Poisson Lotka-Volterra processes)
For higher dimensional Lotka-Volterra processes, we can use properties of Lotka-Volterra difference equations in conjunctions with Theorems 2.5 and 2.6 to derive two algebraically verifiable results for the stochastic models. First, if the deterministic map with is pre-compact and there is no internal fixed point (i.e. there is no strictly positive solution to ), then Hofbauer et al.  proved that the boundary of the positive orthant is a global attractor. Hence, Theorem 2.5 implies the following corollary.
Let be a Poisson Lotka-Volterra process such that is pre-compact and admits no positive fixed point. Then any weak*-limit point of is supported by , the boundary of the positive orthant of .
On the other hand, Hofbauer et al.  derived a simple algebraic condition which ensures that the deterministic dynamics of has a positive attractor. Namely, there exist such that
for any fixed point on the boundary of the positive orthant. Hence, Theorem 2.5 implies the following corollary.
Assume with is pre-compact and satisfies (12) for some choice of . If is the corresponding Poisson Lotka-Volterra process, then any weak*-limit point of is supported by where is the global, positive attractor for . Moreover, there exists such that for all sufficiently small.
Figure 4 illustrates the convergence of the QSDs to the attractor of for a Lotka-Volterra process of two competing species. Even for populations of only hundreds of individuals (), this figure illustrates that species can coexist for tens of thousands of generations despite oscillating between low and high densities, a key signature of the underlying deterministic dynamics. However, only at much larger habitat sizes (e.g. ) do the metastable behaviors clearly articulate the underlying deterministic complexities.
3 Environmental stochasticity
To understand how environmental fluctuations, in and of themselves, influence population dynamics, we shift our attention to models for which the habitat size is sufficiently large that one can approximate the population state by a continuous variable. Specifically, let denote the state of the population or community at time . The components of corresponds to densities or frequencies of subpopulations. To account for environmental fluctuations, let (for some ) be a compact set representing all possible environmental states e.g. all possible precipitation and temperature values. I assume that represents the environmental state of the system over the time interval that determines how the community state changes over that time interval. If the population or community state depends continuously on and , then
for a continuous map . If the are random variables, then (13) is known as a continuous, random dynamical system. Arnold  provides a thorough overview of the general theory of these random dynamical systems.
To state the main hypotheses about (13), recall that a sequence of random variables, is stationary if for every pair of non-negative integers and , and have the same distribution. The sequence is ergodic if with probability one all realizations of the sequence have the same asymptotic statistical properties e.g. time averages (see, e.g., Durrett  for a more precise definition).
are an ergodic and stationary sequence of random variables taking value in . Let be the stationary distribution of this sequence i.e. the probability measure on such that for all Borel sets .
This hypothesis is satisfied for a diversity of models of environmental dynamics. For example, could be given by a finite state Markov chain on a finite number of environmental states, say (e.g. wet and cool, wet and hot, dry and cool, dry and hot) with transition probabilities . If the transition matrix is aperiodic and irreducible, then is asymptotically ergodic and stationary. Alternatively, could be given by a sequence of independent and identically distributed random variables or, more generally, an autoregressive process.
Our second hypothesis simply assumes that population densities remain bounded and allows for the possibility of extinction.
There are compact sets and such that , , and where .
For example, may equal where is the maximal density of a species or may be the probability simplex where corresponds to the vector of genotypic frequencies. As in the case of demographic stochasticity, corresponds to the set where one or more populations have gone extinct. Invariance of implies that once the population has gone extinct it remains extinct i.e. the “no cats, no kittens” principle. Invariance of implies that populations can not go extinct in one time step but only asymptotically. This latter assumption is met by most (but not all) models in the population biology literature.
For these stochastic difference equations, there are several concepts of “persistence” which are reviewed in Schreiber . Here, we focus on the “typical trajectory” perspective. Namely, “how frequently does the typical population trajectory visit a particular configuration of the population state space far into the future?” The answer to this question is characterized by empirical measures for :
where and is a Borel subset of . equals the fraction of time that spends in the set over the time interval . Provided the limit exists, the long-term frequency that enters is given by . It is important to note that these empirical measures are random measures as they depend on the particular realization of the stochastic process. Figure 3 provides graphical illustrations of empirical measures for a single species model (top row) and a two species model (bottom row). For both models, the empirical measure at time can be approximated by a histogram describing the frequency spends in different parts (e.g. intervals or hexagons) of the population state space .
Stochastic persistence corresponds to the typical trajectory spending arbitrarily little time, arbitrarily near the extinction set . More precisely, for all there exists a such that
where . In contrast to the deterministic notions of uniform persistence or permanence, stochastic persistence allows for trajectories to get arbitrarily close to extinction and only requires the frequency of these events are very small. One could insist that the trajectories never get close to extinction. However, such a definition is too strict for any model where there is a positive probability of years where the population is tending to decline e.g. the models discussed in section 7. Regarding this point, Chesson  wrote
“This criterion…places restrictions on the expected frequency of fluctuations to low population levels. Given that fluctuations in the environment will continually perturb population densities, it is to be expected that any nominated population density, no matter how small, will eventually be seen. Indeed this is the usual case in stochastic population models and is not an unreasonable postulate about the real world. Thus a reasonable persistence criterion cannot hope to do better than place restrictions on the frequencies with which such events occur.”
Conditions for verifying stochastic persistence appear in papers by Benaïm and Schreiber , Schreiber et al. , Schreiber , Roth and Schreiber . As the results by Roth and Schreiber  are the most general, we focus on them. We begin with single species models and then expand to multi-species models.
3.1 Single species models
Consider a single species for which an individual can be in one of states. For example, these states may correspond to age where is the maximal age, living in one of spatial locations or “patches”, discrete behavioral states that an individual can move between, different genotypes in an asexual population coupled by mutation, or finite number of developmental stages or size classes. corresponds to population density of individuals in state and is the population state. The population state is updated by multiplication by a matrix dependent on the population and environmental state:
Assume satisfies the following hypothesis.
is a continuous mapping from to non-negative matrices. Furthermore, there exists a non-negative, primitive matrix such that has the same sign structure as for all i.e. the –-th entry of is positive if and only if the –-th entry of is positive.
The primitivity assumption implies that there is a time, , such that after time steps, individuals in every state contribute to individuals in all other states. Specifically, has only positive entries for any and . This assumption is met for most models.
To determine whether or not the population has a tendency to increase or decrease when rare, we can approximate the dynamics of (14) when by the linearized system
Iterating this matrix equation gives
Proposition 3.2 from [Ruelle, 1979] and Birkhoff’s ergodic theorem implies there is a quantity , the dominant Lyapunov exponent, such that
whenever . Following Chesson [1994, 2000a, 2000b], we call the low-density per-capita growth of the population. When , with probability one grows exponentially and we would expect the population state to increase when rare. Conversely when , with probability one converges to . Consistent with these predictions from the linear approximation, Roth and Schreiber [2014, Theorems 3.1,5.1] proved the following result.
The assumption in the partial converse is a weak form of negative-density dependence as it requires that the best conditions (in terms of magnitude of the entries of ) occurs at low densities. There are cases where this might not be true e.g. models accounting for positive density-dependence, size structured models where growth to the next stage is maximal at low densities.
Example 6 (The case of the Bay checkerspot butterflies)
The simplest case for which Theorem 3.4 applies are unstructured models where . In this case, are scalars and
The exponential corresponds to the geometric mean of the . By Jensen’s inequality, the arithmetic mean is greater than or equal to this geometric mean , with equality only if is constant with probability one. Hence, environmental fluctuations in the low-density fitnesses reduce and have a detrimental effect on population persistence.
To illustrate this fundamental demographic principle, we visit a study by McLaughlin et al.  on the dynamics of Bay checkerspot butterflies, a critically endangered species. In the 1990s, two populations of this species went extinct in Northern California. The population densities for one of these populations is shown in the left hand side of Figure 6. Both extinctions were observed to coincide with a change in precipitation variability in the 1970s (right hand side of Fig. 6): the standard deviation in precipitation is approximately 50% higher after 1971 than before 1971.
To evaluate whether this shift in precipitation variability may have caused the extinction of the checkerspots, McLaughlin et al.  developed a stochastic difference equation of the following type
where is precipitation in year . Using linear regression on a log-scale yields a model whose fit for one-year predictions are shown as red diamonds in Figure 6.
To compare the pre- and post- population dynamics of the populations, McLaughlin et al.  ran their stochastic difference equations with given by independent draws from the corresponding years of precipitation data. The resulting models satisfy all of the assumptions of Theorem 3.4. The model with random draws from the pre- precipitation data yields . Hence, Theorem 3.4 implies stochastic persistence with this form of climatic variability (left hand side of Fig. 7). In contrast, the model with random draws from the post- precipitation data yields . Hence, Theorem 3.4 implies the population is extinction bound with this form of climatic variability (right hand side of Fig. 7).
Example 7 (Spatially structured populations)
To illustrate the application of Theorem 3.4 to structured populations, consider a population in which individuals can live in one of patches (e.g. butterflies dispersing between heath meadows, pike swimming between the northern and southern basin of a lake, acorn woodpeckers flying between canyons). is the population density in patch . Let be the environmental state over where be the low-density fitness of individuals in patch . To account for within-patch competition, let be the fitness of an individual in patch where measures the strength of competition within patch . This fitness function corresponds to the Beverton-Holt model in population biology.
To couple the dynamics of the patches, let be the fraction of dispersing individuals that go with equal likelihood to any other patch. In the words of Ulysses Everett McGill in O Brother, Where Art Thou?
“Well ain’t [these patches] a geographical oddity! Two weeks from everywhere!”
Despite this odd geographic regularity, these all-to-all coupling models have proven valuable to understanding spatial population dynamics. Under these assumptions, we get a spatially structured model of the form
For this model, is the matrix whose –-th entry equals for and for .
The low density per-capita growth rate is the dominant Lyapunov exponent of the random product of the matrices . Theorem 3.4 implies this model exhibits stochastic persistence if and asymptotic extinction with probability one if . In fact, as this spatial model has some special properties (monotonicity and sublinearity), work of Benaïm and Schreiber [2009, Theorem 1] implies if , then there is a probability measure on such that
for any and any continuous function . Namely, for all positive initial conditions, the long-term behavior is statistically characterized by the probability measure that places no weight on the extinction set. When this occurs, running the model once for sufficiently long describes the long-term statistical behavior for all runs with probability one. The probability measure corresponds to the marginal of an invariant measure for the stochastic model.
But when is ? Finding explicit, tractable formulas for , in general, appears impossible. However, for sedentary populations () and perfectly mixing populations (), one has explicit expressions for . In the limit of ,
as . As varies continuously with (cf. Benaïm and Schreiber [2009, Proposition 3]), it follows that persistence for small (i.e. mostly sedentary populations) only occurs if . Equivalently, the geometric mean of the low-density fitnesses is greater than one in at least one patch.
When , the fraction of individuals going from any one patch to any other patch is . In this case, the model reduces to a scalar model for which
Namely, is equal to the geometric mean of the spatial means [Metz et al., 1983]. Applying Jensen’s inequality to the outer and inner expressions of , one gets
Hence, persistence requires that the expected fitness in one patch is greater than one (i.e. for some in the left hand side), but can occur even if all the patches are unable to sustain the population (i.e. for all on the right hand side). Hence, local populations which are tending toward extinction (i.e. in all patches) can persist if they are coupled by dispersal. Even more surprising, Schreiber  shows that stochastic persistence is possible in temporally autocorrelated environments even if for all patches.
To better understand how depends on , I make raise the following problem which has been proven have an affirmative answer for two-patch stochastic differential equation models by Evans et al. .