# An exactly solvable coarse-grained model for species diversity

## Abstract

We present novel analytical results about ecosystem species diversity that stem from a proposed coarse grained neutral model based on birth-death processes. The relevance of the problem lies in the urgency for understanding and synthesizing both theoretical results of ecological neutral theory and empirical evidence on species diversity preservation. Neutral model of biodiversity deals with ecosystems in the same trophic level where per-capita vital rates are assumed to be species-independent. Close-form analytical solutions for neutral theory are obtained within a coarse-grained model, where the only input is the species persistence time distribution. Our results pertain: the probability distribution function of the number of species in the ecosystem both in transient and stationary states; the -points connected time correlation function; and the survival probability, defined as the distribution of time-spans to local extinction for a species randomly sampled from the community. Analytical predictions are also tested on empirical data from a estuarine fish ecosystem. We find that emerging properties of the ecosystem are very robust and do not depend on specific details of the model, with implications on biodiversity and conservation biology.

## 1 Introduction

Statistical physics is decisively contributing to our understanding of ecological processes. In fact it is providing powerful theoretical tools and innovative steps towards the comprehension and the synthesis of broad empirical evidences on macro-ecological laws and emerging biodiversity patterns [1, 2, 3, 4, 5]. One field where statistical physics has been particularly successful is the neutral theory of biodiversity [6, 7, 8, 9, 10, 11, 12, 13]. This theory is based on the assumption that, within the same trophic level, per–capita vital rates are species-independent. It offers a unified theoretical framework for ecosystems dynamics by invoking solely basic ecological processes such as birth, death, migration and dispersal limitation [6, 7]. Exact solutions have been found for several ecologically relevant quantities, such as: the relative species abundance distribution (RSA) [9, 10, 14]; species spatial patterns and clustering [12, 13]; patterns of -diversity (i.e. intra- and inter-species spatial correlation) [15, 16]; species area-relationship (SAR) [4, 18]; and species’ persistence time distributions [19, 20, 21]. These results enabled the theory to be tested contrasting empirical data, and to assess the power of neutral models in predicting ecological patterns in many ecosystems.

Let consider a given trophic level (i.e. plants) in a local ecosystem. ”Local” means a community immersed in a larger one, called meta-community, that is considered as infinite in size with respect to the local community (serving as a ï¿½reservoirï¿½). We assume that each individual in the local community has a natural death rate and birth rate (linear birth and death rates). Moreover, due to speciation or diversification events (i.e. immigration from an outside region) ï¿½newï¿½ species (i.e. species not already present in the local community) enter in the system with rate .

A general formulation of the neutral theory employs the birth-death master equation (ME) for the abundance dynamics of a species in the ecosystem [8, 11, 23]. Several exact results are known for birth-death models where the population, , of the local community is strictly conserved and/or the corresponding scales as (e.g see the infinite alleles model with mutation [17] or the voter model [25, 26, 27]).

In particular, a very well known and studied pattern in ecology is the RSA of the local community, defined as the fraction of species of a given abundance. It describes key elements of biodiversity, as the frequency of rare species in the ecosystem [6, 7]. Another important quantity in conservation ecology is species richnesses, the number of different species in the community, independently of their abundance. Analytical solutions for both of these quantities can be found in the literature [8, 11, 17, 28].

Unfortunately, transient solutions for species richness and its points time correlations functions (i.e. the probability of having species at time , species at time , etc..) are not easy to calculate using this approach. Moreover, higher complications comes into play if we want to generalize the dynamics with non linear birth and death rates, i.e. birth and death rates which are not proportional to the population size [24]. For this reason, in the next section we introduce a coarse grained version of this model which will allow us to deduce in a simpler way novel analytical results about ecosystem biodiversity, where the only input is the species persistence time distribution. We note that, given the assumption of no interaction among species, no approximation is made to obtain these results, but they are limited to knowing whether a species is present/absent, as opposed to knowing how many individuals there are of that species at a given time. In particular we obtain close-form analytical results for: the probability distribution function (pdf) of the number of species in the ecosystem in transient states; the point connected time correlation function; and the survival probability, , defined as the pdf of time-spans to local extinction for a species randomly sampled among the observed assemblages at a certain given time. A comparison of analytical predictions with empirical data from a estuarine fishes ecosystem will be carried out. A set of conclusions closes then the paper.

## 2 Theoretical Framework

### 2.1 Grand-Canonical Formulation of Neutral Theory

In the natural/realistic case where the total number of species () and individuals () in the local community is not fixed, we can described the abundance dynamics of the species in the system employing a ”grand-canonical” formulation of the neutral theory. In particular, if gives the number of species with individuals, and , then the probability - of having at time , species with one individuals, species with two individuals, and so on - is univocally described once the initial condition is known and the transition rates from to due to birth/death or speciation events are given. In particular, for with birth rate (birth event within species of individuals) we have the transitions , while with death rate (death of an individual belonging to species with individuals) we have the transitions . Finally with rate the transition occurs. We remark that in our framework, the total number of individuals is not fixed and the rate at which new species enters in the system is independent of .

In our grand-canonical formulation, the stationary solution of the master equation corresponding to can be written as with

(1) |

where and . Note that represents the ratio of effective per capita birth and death rate and if it has to be less than 1 in order to avoid demographic explosion. On the other hand if , then at equilibrium there are no individuals in the community because all species eventually go extinct [8, 9].

In our theoretical framework the RSA is deduced by the first moments of , while species richness is described by the probability of having different species in the community at stationarity, that turn out to be Poisson distributed with mean :

(2) |

where is the Kronecker delta, which is when its argument zero and zero otherwise.

Since to maintain a finite local community size, must be less than one, then each species is eventually doomed to extinction. We thus can define the persistence time of a species as the time incurred between its emergence in the system (due to diversification or immigration) and its extinction [19, 20, 21]. In particular, persistence times for a species that undergoes the linear birth-death dynamics presented above, are distributed according to the species persistence time (SPT) pdf [19],

(3) |

From now on, without loss of generality, we set . From Eq. (3) we can express the mean persistence time, a key ecological quantity also known as mean extinction time [19, 17, 28], as . Finally, as expected, the mean number of species at stationarity is related to , through Eqs. (2) and (3), by the simple formula .

### 2.2 The Coarse-Grained Model

A coarse-grained view of the grand-canonical formulation of the neutral theory can be described as follows. Each species within the local community emerges at a random time with a rate , that is, the probability a new species emerges in the infinitesimal time interval is . Therefore the probability of having new species in the ecosystem up to time , , is given by (see Appendix A):

(4) |

where is the number species at time , i.e . Then the newcomer species persists for a random duration where the random variables are independent and identically distributed with a given pdf . Therefore species arrive as a Poisson process (with rate ), and then depart after some non-Poissonian waiting time .

This model is coarse grained in the sense that we do not take explicitly into account information on species abundance, implicitly contained in the functional form of (see Figure 1). This approach enable us to go beyond Eq. (2), and to generalize these results also for general birth and death rates, by taking different functional shape of the SPT pdf. We remark that the mapping is exact only with respect to the grand-canonical formulation of the neutral theory, while, as we will show, is only a good approximation of the classic birth-death ME approach in the limit of large local community ( fixed and ).

Within our framework, the number of persistent species in the ecosystem at a given time (see Figure 2) is given by:

(5) |

where is the number of new species that entered into the system in the time interval , i.e. for and for . is the Heaviside step function, which is when its argument is positive and zero otherwise. is the time when the -th species enters into the ecosystem and when it extincts. The probability of having species present in the ecosystem at time is thus , i.e.

(6) |

It is useful and customary to define the generating function of the process (the discrete Laplace transform), , and analogously for . Eqs. (4) and (6) lead to (see Appendix A)

(7) |

where

(8) |

with . If, for example, we assume that (and therefore ), then from Eq. (7) we get

(9) |

One can see that in the limit. This follows from , for all and the Lebesgue’s dominated convergence theorem (see for example [29]). Since is continuous in the interval we have in the large time limit, implying that the initial condition is forgotten in this limit, as expected. If , then and from Eq. (7) we get , or equivalently

(10) |

i.e. a Poisson pdf with mean given by the product between the species emergence rate and the mean average persistence time. As expected, we found the same result given by Eq. (2).

From Eq. (7) we can easily calculate the average numbers of species at a generic time . Indeed, if , we have that admits left derivative at and taking the left derivative at of Eq. (7) we obtain

(11) |

where again we see that the initial condition is forgotten in the large time limit. These results have been tested numerically (see Figure 3).

## 3 Generating Function Approach

We are interested in the point connected time correlation function of our process (or cumulant, see [30] ) in the stationary conditions (min) and fixed, for . The generating function is given by

(12) | |||||

where

(13) |

with .

The point correlation function is obtained by choosing

(14) |

and

(15) |

where we assume .

We will show that in the stationary conditions and if ,

(16) |

Using Eqs. (13) and (14) we obtain

(17) | |||||

(18) |

where we have expanded the product on the r.h.s. of Eq. (17) as . Using the Lebesgue’s dominated convergence theorem, the average in Eq. (18) tends to zero in the limit and thus , which proves . Let us consider now

(19) |

Since and , again for the Lebesgue dominated convergence theorem, Eq. (19) leads to

(20) |

and therefore we have that Eq. (12) becomes

(21) |

Using Eqs. (15) and (21) we finally prove Eq. (16). In particular, we note that in the large time limit, the point connected time correlation function

(22) |

is independent of .

## 4 Survival times pdf

The survival time is defined as the time to local extinction of a species randomly sampled among the observed assemblage of species at a certain time (see Figure 2).

We can express as a function of the random variables (emergence time of that species) and for which the pdf is known:

(23) |

We then express the survival time (ST) pdf conditional to a persistence as:

where the constant ensures normalization. Solving the ensemble average operators yields:

and, by marginalizing over , we obtain the ST pdf:

(24) |

Without loss of generality, it can be assumed that is not affected by the boundary condition in , i.e. and . Particularizing now to the case of persistence distributions for linear birth-death processes (Eq. (3)), the survival pdf asymptotic behavior is:

(25) |

where .

## 5 Applications and Comparison with Empirical Data

### 5.1 Gamma Species Persistence Time Distribution.

Recent results [20, 21] have shown that SPT distribution for several different type of ecosystems exhibit power-law behavior with an exponentiallike cut-off:

(26) |

where is the normalization constant, and is the incomplete Gamma function. The exponent of the power law is suggested to depend on the spatial structure of the embedding ecosystem. SPT distributions exhibit progressively smaller scaling exponents for increasing constraints in the connectivity structure of the environmental matrix [20]. Specifically, numerical simulations [20] show that the exponents range between , that corresponds to the case of global dispersal (mean field), for a 3D lattice, in a savannah (2D lattice), for river network topology, up to for 1D systems.

For the functional shape of the SPT distribution given by Eq. (26) with , the following asymptotic limits are obtained:

(27) | |||||

while for . The average persistence time follows from Eq.(8)

(28) |

where is the exponential integral function. Setting for simplicity in Eq. (26), the corresponding two point correlation function derived from (22) is:

(29) |

Therefore, the covariance is a decreasing function of , that for large times goes to zero, i.e.,

### 5.2 Scale-Free SPT distributions

An interesting case is obtained when , i.e. the system is in the scaling regime where it does not exhibit a characteristic time scale. This is typically reported when SPT of families or genera – as opposed to species – are considered, possibly measuring them from the fossil record [31]. Thus, longer time scales and long timeseries must be considered.

Under such assumption, the normalized SPT pdf reads as:

(30) |

whereas the cumulative SPT distribution is and from Eq.(8) we obtain:

(31) |

For this case, and in the limit , the ST pdf given by Eq. (24) becomes

(32) |

We note that two different cases can be pointed out. 1) exists and depends on a microscopic time-scale . This imply , leading to . 2) For the stationary pdf does not exists. In fact in this case for and ( is the Neper number). The two point correlation function is thus obtained as:

(33) |

leading to

### 5.3 Hinkley Fish estuarine Database

To test the results of the coarse grained neutral null model, we employ a long-term monthly database of a estuarine fishes ecosystem. Fish samples were collected from the cooling-water filter screens at Hinkley Point Power Station, located on the southern bank of the Bristol Channel in Somerset (England). The data span the period from January 1981 to January 2010. A full description of the intake configuration and sampling methodology is given in [32]. A matrix is built using presence-absence records for each months during the 29 years. Each element of the matrix is equal to 1 if species is observed during month , otherwise = 0. The empirical persistence time are defined as the number of consecutive months in which the measurements reveal the presence of the species. The presence-absence time series thus form a vector of length , where is the total number of moths of observation.

Persistence time is defined as the length of a contiguous sequence of s in the time series. From such time series we can thus reliably estimate the empirical average persistence time and the species emergence rate . For the analyzed fish estuarine ecosystem we find [month] and [month]. Moreover, by summing over rows of the matrix , we obtain the total number of observed species on month , i.e., . We can thus calculate (assuming stationarity) the distribution of the number of persistent species in the system (Figure 3a), its first moments, (Figure 3b), and the empirical two-point correlation function (Figure 4):

(34) |

where is the time lag and , so that .

The average number of species predicted by the model can be obtained by Eq. (11), , whereas the standard deviation is that is in good agreement with the observed ecosystem diversity calculated from the presence-absence matrix (see Figure 3a). The null hypothesis of a Poisson species distribution given by Eq. (10) is accepted by both the Kolmogorov-Smirnov test and the test within a 95% confidence interval (). This analysis suggests that species diversity data cannot be used by themselves to discriminate among different mechanisms of demographic growths, i.e type of birth-death processes. Similar conclusion have been achieved using RSA data [24].

The autocorrelation reveals a relevant periodic behavior in the species times series (Figure inset 4a). To investigate such periodicity we study the power spectrum of the timeseries , i.e. , where is the Fourier transform of and () is the frequency. The spectrum exhibits a peak for , indicating that the period is months (see Figure 4a). Therefore the periodicity is a trivial effect due to seasonality of weather patterns and is not related to the fluctuations or intrinsic noise of the system dynamics [33]. To test Eq. (29) against empirical data, we then smooth out the periodicity due to seasonality. In order to do that, we calculate the empirical autocorrelation function only considering a specific month for every year and then averaging over all twelve months, i.e., , and , where and is the total number of type month in the whole time series. As can be seen from the inset in Figure 4b, once we remove seasonality Eq. (29) describes well the autocorrelation function .

An analysis of the fish SPT distribution have been also carried out. We find that SPT distributions are well described by the Gamma SPT with parameters compatible with and . This result holds with particular accuracy only when large SPT are considered. In fact, at monthly time scales, seasonality affects short SPTs, increasing the slope of the first part of the SPT pdf. Due to the negligible effect of the cut-off at the monthly time scale, we find that also a power-law SPT with fits the empirical SPT pdf. Note that here we are only interested in the estimation of the input parameters of the coarse grained neutral model, and , as they can be calculated directly from the SPT measured timeseries without the need of fitting the entire SPT distribution.

## 6 Robustness of the results

### 6.1 Agreement with mean field voter model results.

As we claimed in section 2, the proposed model is a coarse grained view of the grand-canonical version of neutral theory. Nevertheless we now show that our coarse grained model give a good approximation of results stemmed from the mean field approximation of the voter model with speciation [16, 25, 26] or, equivalently the infinite alleles model with mutation [17] (for details on the mapping between these two models we refer to [27]). The scheme of these models is the following. Consider a local community of individuals. At every time step a randomly selected individual in the ecosystem dies. With probability the individual is replaced by a colonizing offspring of another individual, randomly selected within the ecosystem whereas with probability the offspring belongs to a new species. For fixed and the above dynamics is described a linear birth-death ME with and (and thus ):

(35) |

where and is the probability for the species of having an abundance at time . For the hypothesis of neutrality we have that can be expressed as:

where is the probability for a single species to have abundance at time after its emergence (under the neutral assumption is species invariant). The average population of the -th species, given that it is still present at time , is . The mean population size at time for the mean field voter model can be calculated as:

(36) |

where is the number of diversification events occurred until time and the ensemble average is over the random variables and . By using the fact that for all , is Poisson distributed with the same frequency and is the pdf of the variable (see Eq. (4)), then equation (36) simplifies to Thus, from the definition of , it follows that: or, the mean population of a species after a time from its emergence. Using Eq. (35) it obeys the deterministic equation , which solution is [19] . Thus we have that . We observe that is the probability that the species has more than one individual at time , that is the cumulative distribution of the SPT pdf and therefore

(37) |

Using the above relations and (since by definition ), we obtain

(38) |

from which it follows that . Therefore if we approximate with the average number of individuals in the corresponding grand-canonical ensamble Eq. (2), i.e. , we found that Eq. (38) is the same result we have obtained from the coarse grained neutral model (compare with Eq. (11).

Interestingly, it has also been found that, for linear birth and death processes, the survival probability function has the same asymptotic functional shape given by Eq. (25) [19]. In fact, by assuming an initial population distribution given by the Fisher log-series [8], and defining as the probability that a species starting with individuals is still present at time , it has been shown that [19]:

(39) |

that is, it has the same asymptotic behavior of the ST pdf as obtained in Eq. (25) from the coarse grained model.

### 6.2 Universal relation between persistence and survival distributions

The fact that the survival probability function has the same asymptotic behavior in two different neutral models suggests that, rather than by chance, it possibly happens as a consequence of a deeper, and more general relationship between the SPT distribution and the ST probability function, valid regardless of the specific birth and death processes assumptions or, in other words, independently of the functional shape of and .

To address this issue, we start by calculating the RSA at a stationary time (with absorbing boundary condition in ), assuming that each species has only one individual when it emerges. Regardless of the type of birth/death rate, such relation can be written as:

(40) | |||||

where is the probability to find individuals at time given that there are at time . The normalization has been calculated using Eq. (37).

The ST probability function is thus: