A computational model for proliferation dynamics of division- and label-structured populations

A computational model for proliferation dynamics of division- and label-structured populations

J. Hasenauer, D. Schittler, and F. Allgöwer

Institute of Systems Theory and Automatic Control
University of Stuttgart, Germany


In most biological studies and processes, cell proliferation and population dynamics play an essential role. Due to this ubiquity, a multitude of mathematical models has been developed to describe these processes. While the simplest models only consider the size of the overall populations, others take division numbers and labeling of the cells into account. In this work, we present a modeling and computational framework for proliferating cell population undergoing symmetric cell division. In contrast to existing models, the proposed model incorporates both, the discrete age structure and continuous label dynamics. Thus, it allows for the consideration of division number dependent parameters as well as the direct comparison of the model prediction with labeling experiments, e.g., performed with Carboxyfluorescein succinimidyl ester (CFSE). We prove that under mild assumptions the resulting system of coupled partial differential equations (PDEs) can be decomposed into a system of ordinary differential equations (ODEs) and a set of decoupled PDEs, which reduces the computational effort drastically. Furthermore, the PDEs are solved analytically and the ODE system is truncated, which allows for the prediction of the label distribution of complex systems using a low-dimensional system of ODEs. In addition to modeling of labeling dynamics, we link the label-induced fluorescence to the measure fluorescence which includes autofluorescence. For the resulting numerically challenging convolution integral, we provide an analytical approximation. This is illustrated by modeling and simulating a proliferating population with division number dependent proliferation rate.

Keyword: Proliferating population, label-structured population, flow cytometry, CFSE, division-structured population

1 Introduction

Cell proliferation is a central aspect of most biological processes, among others bacterial growth [1, 2], immune response [3, 4, 5], stem cell induced tissue remodeling [6, 7], and cancer progression [8]. Depending on the biological process, cellular proliferation has different characteristics. Cell division can be symmetric or asymmetric, and the daughter cells may or may not inherit the age of the mother cells (see Figure 1). While many micro-organisms, such as the budding yeast, grow a daughter cell [9] which does not inherit the age of the mother cell and is born young, in most multicellular organism the mother cell divides symmetrically into two daughter cells which inherit the age of the mother cell [10]. The latter proliferation type – which will be the focus of this work – results in an accumulation of DNA damage and telomere shortening, which may be interpreted as aging of the individual cell. This results in a reduced proliferation potential, a reduced proliferation speed and finally in cell cycle arrest [10, 11, 12], known as senescence [13]. This has been discovered by Hayflick [10] in the 1960s and the upper limit for the number of cell divisions a normal cell can undergo has been termed Hayflick limit.

(a) Mother cells undergoing symmetric cell division split up into almost identical daughter cells. The difference of daughter cells is merely caused by the stochasticity of DNA replication, resulting in genetic and epigenetic differences, and the stochasticity of cell partitioning, yielding different protein abundances.
(b) Mother cells undergoing asymmetric cell division yield daughter cells with different cell fates. In the most extreme case, the mother cells grows a daughter cell – this is called budding. The daughter cell is genetically identical to the mother cell but is born young, meaning that it does not inherit the age of the mother cell.
Figure 1: Illustration of symmetric and asymmetric cell division. The color of the cells indicates the cell’s age. In case of symmetric cell division the age strongly correlates with the number of divisions a cell has undergone up to this point.

A variety of approaches are employed to investigate proliferation, ranging from the analysis of the cell cycle [14] to the model-based study of population heterogeneity and subpopulations [6]. Nowadays, for human cell lines especially label-based proliferation assays are used to analyze the proliferation dynamics of cell populations. Common labels are Bromodeoxyuridine (BrdU) [15] and Carboxyfluorescein succinimidyl ester (CFSE) [16], while mainly the latter is used in recent studies.

CFSE is a fluorescent cell staining dye which stays in cells for a long time and is distributed at cell division approximately equally among daughter cells. Thus, the proliferation of labeled cells results in a progressive dilution of the dye [17], as depicted in Figure 2, and quantitative information about the proliferation dynamics can be gathered using flow cytometry [18]. To determine the proliferation properties of cells, e.g., the rates of cell division and of cell death, from these data, analysis tools are required. The first proposed approaches employ peak detection and devolution [5, 18, 19]. Unfortunately, these methods are only applicable if the modes, corresponding to cells with a common division number, are well separated and if the data are not strongly noise corrupted. To overcome these limitations, different model-based approaches have been introduced.

In the literature, mainly three different classes of population models are described: exponential growth models, division-structured population models and label-structured population models. The exponential growth models (EGM) are the simplest ones, and merely describe the number of individuals in a cell population. For this task a one-dimensional ODE, like the Gompertz equation [1], is sufficient. While exponential growth models allow the description of the proliferation of many bacterial populations, they are in general not capable of describing the dynamics of human tissue cells. One reason for this is that the cell division and cell death rates are found for many cell systems [10], e.g., B cells [4], T cells [4], osteoblasts [20], to depend on the division number. To capture these effects, a multitude of division-structured population models (DSP) has been introduced [4, 19, 21, 22, 23, 24, 25, 26, 27]. The state variables of these models describe the sizes of the subpopulations, which are defined by a common division number. Hence, these models allow for the consideration of division number dependent properties. Still, these models do not provide information about the label concentrations and thus cannot be compared to data directly but require complicated and error-prone data processing.

To avoid this, label-structured population models (LSP) are employed [28]. These models describe the evolution of the population density on the basis of a one-dimensional hyperbolic PDE. Hence, they provide predictions for the label distributions at the individual time points and may be fitted to data directly [17, 28, 29, 30]. This renders complex data processing redundant and simplifies the model-data comparison. Still, these models do not allow for a direct consideration of division number dependent parameters. To partly circumvent this problem, complex dependencies of the cell division and cell death rate on time and label concentration are introduced [29]. These are neither intuitive nor easy to interpret. Furthermore, the simulation of label-structured population models is computationally demanding and requires discretization, entailing further problems.

In the following a model is presented and analyzed which combines the division-structured population models and the label-structured population models and thereby overcomes their individual shortcomings. This population model, which we termed division- and label-structured population model (DLSP), is based on our own work [31]. The same model has later also been used in [32, 33] for parameter estimation, including slight modification. Here we provide the first rigorous in-depth assessment its properties.

The DLSP model is introduced in Section 2 and incorporates both aspects: Discrete changes of the cell division number due to cell divisions and continuous dynamics of the label distribution. The overall model is a system of coupled partial differential equations. We discuss how this system of PDEs can be split up into two decoupled parts in Section 3, namely a single PDE and a set of ODEs, which significantly simplifies the solution. The obtained model is reduced further by truncation of the state space. This truncation and the resulting truncation error can be controlled using the a priori error bound which we derive. As the proposed model unifies the existing models, we outline the relations of the models in Section 4. In Section 6, the method is employed to study a population model with division number dependent division rates and an analysis of the computational complexity of the model is performed. The paper is concluded in Section 7.

Figure 2: Illustration of label dilution due to cell division. The division process results in halving of the concentration at each cell division (top), and the label intensity distribution within the cell populations (bottom). The latter one is accessible, e.g., via labeling with CFSE.

2 Modeling division- and label-structured populations

As outlined above, the study of proliferation dynamics in cell populations using labeling methods requires the consideration of two important distinct features:

  • the label concentration and

  • the number of cell divisions a cell has undergone.

The importance of the label concentration (with ) arises from the fact that this is the quantity which can be observed, e.g., using flow cytometry or microscopy [18]. On the other hand, a direct observation of the number of cell divisions a cell has undergone is in general not possible, though the division number often plays a crucial role within the model. A cell which has divided once is expected to have different properties, e.g. a different division rate, than a cell which has already divided several dozen times [10, 20].

In this paper we propose a model which captures both features of cells, distinct division numbers as well as distinct label concentrations among cells. Therefore, instead of a single PDE model describing the label dynamics of the overall population, a PDE model is defined for every subpopulation. Thereby, the th subpopulation contains the cells which have divided times. Cell division generates a flux from subpopulation to subpopulation , thus inducing coupling. The system of coupled PDEs is given by


with initial conditions

In this system, denotes the label density in the th subpopulation at time . The structure of the models for the individual subpopulations is highly similar to a single PDE which is employed in label-structured models [28]. The fluxes influencing the label distribution are:

  • , decay of label in each cell with label loss rate .

  • , disappearance of cells from the th subpopulation due to cell division with rate and due to cell death with rate .

  • , appearance of two cells due to cell division in the th subpopulation with division rate . The factor is the rate of label dilution due to cell division (cf. [29, 17]).

It has to be emphasized that the division rates as well as the death rates may depend on division number and time . To ensure existence and uniqueness of the solutions we require . As it is assumed that the labeling does not affect cell function, we do not allow and to depend on the label concentration . Furthermore, only label loss rates are considered which follow a linear degradation


The time dependence of the degradation rate may be arbitrary, but mainly constant degradation processes [28, 17, 29], , or Gompertz decay processes, [30], are used.

Note that by construction model (1) provides information about cell numbers and label density for the overall as well as for individual subpopulations. Hence, it combines advantages of common ODE models [4, 21, 22] and common PDE models [17, 28, 29] of cell populations and permits for more biologically plausible degrees of freedom than both of them. In detail, the available information are:

Number of cells in the subpopulations: Given , the number of cells contained in the th subpopulation can be computed as


This number of cells may help to understand the relative contribution of subpopulations to the overall population.

Normalized label density in the subpopulations: Given , the label density within the th subpopulation can be computed as


The normalized label density provides the probability of finding a cell within the th subpopulation with label concentration ,


Besides the properties of the subpopulations, the model permits also the analysis of the properties of the overall population. The unnormalized label density in the overall cell population is given by


From the overall population size


and the normalized label density in the overall population


can be derived. These are the two experimentally observable variables of the system. The overall population size can be determined by cell counting, while the population density can be assessed by the labeling with CFSE or BrdU. By combining these two, can be reconstructed. As there is currently no direct cell division marker available, experimental assessment of the subpopulation sizes or of the label distribution within the subpopulations is in general not feasible. All common experimental techniques only provide the marginalization over the division number  [17, 28, 29].

3 Analysis of division- and label-structured population model

Besides the advantages the DLSP model offers, its potential drawback is its complexity. The model is a system of coupled PDEs, which are in general difficult to analyze, and their simulation is often computationally demanding or even intractable. In the following it is shown that these problems can be solved for the DLSP model (1). The approach presented allows to efficiently compute the solution of the DLSP model, without solving a system of coupled PDEs.

3.1 Solution of the DLSP via decomposition

In order to provide an efficient method for computing the solution of (1), we define the initial number of cells


and the initial label density


according to (3) and (4). Given these definitions the following theorem holds:

Theorem 1.

The solution of model (1) is


in which:
(i)    is the solution of the system of the ODE:


with initial conditions: and .
(ii)    is the solution of the PDE:


with initial conditions .

The state variables and of the ODE system and the PDEs correspond to the number of cells (3) and the label density (4) in the th subpopulation, respectively.


To prove that Theorem 1 holds, (11) - (13) are inserted in (1) and it is shown that the resulting equation holds. The proof is only shown for , since the case can be treated analogously. Furthermore, for notational simplicity the dependence of , , and on and is omitted where not required.

Inserting (11) in (1) for yields


The left hand side of this equation can be reformulated:


By inserting this result in (14) and substituting with (12), we obtain


which can be simplified to


It can be proven that this last equality holds, e.g., by using the analytical solution of (13), which can be found below. This yields that (17) holds which concludes the proof of Theorem 1. ∎

Remark 1.

Note that it can be verified that (17) holds if and only if the label loss rate is linear in .

With Theorem 1, the original system of coupled PDEs can be decomposed into a system of ODEs (12) and a set of decoupled PDEs (13). This means that the size of the individual subpopulations can be decoupled from the label dynamics. This already tremendously simplifies the analysis, but a further simplification is possible:

Corollary 1.

The solution of model (1) is


in which is the solution of the ODE (12).


To prove Corollary 1 note that the PDE (13) is linear. Thus, the method of characteristics [34] can be employed to obtain an analytical solution (Appendix A). This yields


which can be inserted into (11), proving Corollary 1. ∎

The general solution simplifies in cases of specific choices for . A constant degradation rate yields


while for a Gompertz decay process one obtains,


Corollary 1 provides a solution for any label degradation rates, including those considered in [32, 33].

By solving the decoupled PDEs analytically, the solution of the DLSP model can be obtained in terms of the solution of a system of ODEs. This reduces the complexity drastically and enables also a compact representation of the overall label density :

Corollary 2.

The overall label density (6) is


in which is the solution of the ODE (12).


By substituting (18) into (6), Corollary 2 is proven. ∎

Given Corollary 1 and 2, it is apparent that merely the ODE system (12) has to be solved in order to compute the solution of the DLSP. This problem is approached in the remainder of this section.

3.2 Calculation of the subpopulation sizes

In order to solve ODE system (12), we note that the change of subpopulation only depends on the size of subpopulation . This chain-like structure enables the solution of via recursion. By doing so, analytical solutions for the ODE system have been found for two cases [5, 21]:

Lemma 1.

Given that , the solution of (12) is:


This result has been derived in [21], where the authors studied this ODE system to model the number of cells that have undergone a certain number of divisions, without modeling label dynamics. The derivation as provided in Appendix B is generalized for later use.

Lemma 2.

Given that and , then the solution of (12) is:


in which

Solution (24) was first stated in [5] and for completeness the proof is provided in Appendix C. It basically employs mathematical induction in the frequency domain, exploiting properties of the partial fraction under the provided assumptions. Despite the prerequisites, this result is quite powerful as for almost all cases of time invariant division number dependent parameters and the ODE system (12) can be solved analytically.

In cases in which neither prerequisites for Lemma 1 nor 2 hold, then the solution of (12) can still be computed using numerical integration. This is possible if only the sizes of the first subpopulations , , , , are of interest, where is finite.

3.3 Truncation of division numbers in the population model

In Section 3.1 a decomposition approach has been described to decouple the size of the subpopulations from the label distribution in the individual subpopulations. While this simplifies the computation of the properties of individual subpopulations drastically, the analysis of the overall label density and of the overall population size still requires the calculation of an infinite sum (22). Even in cases for which the individual subpopulation sizes are available analytically (see (23) and (24)), we could not derive a closed form solution for . Therefore, in this section we present a method to find an approximation of of the form


with truncation index . Instead of considering an infinite number of subpopulations, only the first subpopulations are taken into account. While it might be argued that a bound can be determined from experimental data collected in proliferation assays [32, 33], this is not true for long times. In case of long observation intervals, the autofluorescence – which will be discussed in Section 5 – avoids an estimation of . Thus, reliable selection rules for the truncation index are necessary.

In order to approximate with arbitrary precision by the truncated sum , convergence of (6) and (7) with respect to the subpopulation index is required and can be proven:

Theorem 2.

The sums (6) converge for any finite time , if there exist


The proof of Theorem 2 is provided in Appendix E. It employs a system of ODEs of which its states are an upper bound for the states of (12), and which can be solved analytically. Given these upper bounds the comparison theorem for series [35] can be used to verify convergence. Note that Theorem 2 is powerful as it holds for all biological plausible functions and .

Given convergence the question arises how large the truncation index must be to ensure a predefined error bound at a given time. For the considered system it can be shown that:

Theorem 3.

Given a truncation index and a time , as well as , , and as defined in Theorem 2, the truncation error is upper bounded by :


To prove Theorem 3, we show that . This sum can be upper bounded for all biologically plausible functions and using the ODE system employed to verify Theorem 2. The full proof is provided in Appendix F. Note that, if , the bound (27) is precise and equality holds.

Remark 2.

In this work we considered an error bound which is relative to the initial condition. This is reasonable as for this system the superposition principle holds and the relative truncation error is thus independent of .

Given Theorem 3, an upper bound can be derived which ensure that a relative error is bounded by :

Corollary 3.

Assuming that , , and exist as defined in Theorem 2, the error bound


holds if


Corollary 3 follows directly from Theorem 3, using . ∎

Despite the generality of Theorem 3 and Corollary 3 for the considered system class, it suffers the small disadvantage that no explicit expression for has been found. Rather, the minimum truncation index which is required to ensure a certain error bound has to be found iteratively by increasing or decreasing based on the current error. Fortunately, this search is computationally cheap as it is not necessary to solve a system of ODEs or PDEs, but the error bound is available analytically.

A study of the a priori error bound (29) shows that if the acceptable relative error is kept constant, the truncation index grows monotonically as function of the final simulation time. This is due to the exponential growth of vs. the polynomial growth . Merely for cases in which , does not have to increase arbitrarily over time but stays bounded, as under these conditions the population dies out. Note that the increase of is often not critical. Due to label dilution in general only the first seven or eight cell divisions can be observed [18], which limits the timespan of interest and therefore the required truncation index .

Aside from an approximation of the population density , also approximations for the overall population size and of the normalized overall label density may be necessary to compare model predictions to measurements. Accordingly to (7) and (8), plausible choices for these approximations are


Theorems 2 and 3 can be extended to verify convergence and determine truncation errors for these quantities. For this is straightforward, while for it is slightly more complicated. The proofs are not provided here as this is beyond the scope of this work and would reduce the readability.
To summarize, in this section the DLSP model has been analyzed in-depth. We have shown that for a very general class of division and death rates and , the solution of the DLSP can be computed by solving a system of ODEs. This ODE system has an analytical solution for a rather general class of time independent parameterizations. By determining rigorous error bounds, we furthermore enable the calculation of the required truncation index to achieve a predefined precision. As shown later, this will allow for many systems to predict the population response employing a low-dimensional ODE system.

4 Comparison of different proliferation models

In the last section we have analyzed the DLSP model and outlined a method to solve it. The question which remained open is how the DLSP model and its solution relate to existing population models for cell proliferation. To answer this question we confine ourselves to the in our opinion most common models, the exponential growth model (EGM), the division-structured population model (DSP) and the label-structured population model (LSP):

  • EGM: An ODE describing the dynamics of the overall population size [1].

  • DSP: A system of ODEs describing the dynamics of the number of cells contained in the individual subpopulations, where the subpopulations are defined via a common number of cell divisions [21, 4].

  • LSP: A PDE describing the dynamics of the label density in the overall population [29, 17, 28].

These models are used in many more publications than cited here and various extensions of these models exist.

4.1 Relation between EGM and DLSP

The EGM is the simplest available model which describes population dynamics. It has only one state variable, which corresponds to the size of the overall cell population. In general, the EGM is written as


in which is the effective growth rate. A common choice is which results in a Gompertz equation [1].

As the EGM only describes the overall population size, it is contained in the DLSP. By choosing , and , the overall population size predicted by the DLSP is equivalent to . This can be shown using the time derivative of ,


which has the initial condition .

4.2 Relation between DSP and DLSP

In contrast to the EGM, the DSP resolves the subpopulations, and the state variables correspond to the number of cells which have divided times. To our knowledge this model has first been proposed in [21] and its most common form is equal to (12). Thus, the DSP is contained in the DLSP and is obtained by marginalization over the label concentration . Actually, according to Theorem 1, a DSP model is solved to compute the solution of the DLSP. As for the PDE component of the DLSP an analytical expression can be derived (Corollary 1), solving the DLSP model has basically the same complexity as solving the DSP.

4.3 Relation between LSP and DLSP

For the comparison of model predictions and labeling experiments with CFSE or BrdU, the LSP model has been introduced [29, 17, 28]. The state variable of the LSP denote the label density in the population. In general, the evolution of is modeled by the PDE


with initial condition  [29]. As this model allows for label dependent division and death rates, and , it is in this respect more general than the DLSP.

However, it is not obvious why the cell division or death rates should depend on the label concentration. If the experiments are performed at low label concentrations far from the toxic regime, the population dynamics should be independent of the labeling [16, 36]. In particular, complex dependencies of and on the label concentrations , like those shown in [29], are hard to argue. Additionally, a recent study supports that the introduced nonlinearities are correlated with the division number [29].

Therefore, we just consider division and death rates which solely depend on time , and . As proven in Appendix G, for this case, the solution of the DLSP, with and and , is equivalent to . This shows that under these assumptions, the information provided by the LSP is a subset of the information available from the DLSP. This renders the DLSP more useful, as also subpopulation sizes are accessible.

Furthermore, for time dependent and , the solution of the DLSP can be approximated by a low-dimensional ODE system (Theorem 2 and 3). Hence, instead of computing using a PDE solver as done in all available publications, one may solve only a low-dimensional ODE system. Using the analytical results for the ODE system (12) even analytical solutions are available, e.g.,


for constant rates and . Although this result for the LSP may be helpful to study various systems, we have not found it in the literature yet. The reason might be that a direct derivation of (34) is rather complex, whereas the study of the DLSP renders it straightforward.

Clearly, label dependent cell division and death rates or constant label loss rates were not considered here, in contrast to what was done in [29, 17, 28]. This was avoided as the decomposition of the solution shown in Section 3.1 becomes impossible and solving the DLSP model gets computationally challenging. Nevertheless, the loss of these degrees of freedom is compensated by allowing for biologically more plausible division dependent cell parameters in the DLSP.

4.4 DLSP as a unifying modeling framework

The implications of the findings in Section 4.1-4.3 are that the three most prevalent classes of population models are captured by the DLSP. Furthermore, it is more general, as label distributions and division dependent parameters may be considered, which are both important and well motivated from a biological point of view. Figure 3 illustrates the relations and shows how the EGM, the DSP, and the LSP may be constructed from DLSP via marginalization.

In contrast to the generality, the simulation effort increases only marginally when studying the DLSP instead of the DSP or the LSP. This is due to the decomposition into a system of ODEs (which is equivalent to the DSP), and a single set of PDEs. The set of PDEs can be solved analytically, and in several cases even analytical solutions for the ODE exist, facilitating an analytical solution of the overall system. Such analytical solutions can then be used to determine previously unknown analytical solutions for DSP and LSP, e.g., like (34).

Figure 3: Illustration of the relation between the exponential growth model (EGM), the division-structured population model (DSP), the label-structured population model (LSP), and the division- and label structured population model (DLSP). The models are distinguished using two properties, the availability of division numbers (vertical axis) and of information about the label distribution (horizontal axis). Arrows indicate whether and arrow labels describe how a model can be obtained from another model. It is apparent that the DLSP model is the most general model, as all remaining models can be constructed from it via marginalization.
Remark 3.

Obviously, there exist extensions of the LSP and the DSP which are not captured by the current version of the DLSP. Examples are the aforementioned label concentration dependent division and death rates for the LSP [28, 17, 29] as well as DSP models with recruitment delay [24, 4]. While the DLSP model can easily be extended to take such effects into account, the numerical analysis will get more challenging.

5 Computation of measured label distribution

In the last section, we related the division- and label-structured population model to existing models. In this section, the prediction of the DLSP models will be related to data collected in proliferation assays.

5.1 Autofluorescence and measured overall label distribution

As outlined in the introduction, to obtain quantitative information about the proliferation dynamics, the fluorescent levels of individual cells are assessed using flow cytometry [18]. The fluorescence level of an individual cell, , summarizes the label induced fluorescence, , and the autofluorescence, ,


The background, which might be interpreted as measurement noise, avoids a precise reconstruction of the label concentration. Furthermore, it limits the number of cell divisions which can be observed. While the label induced fluorescence, , halves at cell division, this is not true for the autofluorescence. As the initial label concentration cannot be arbitrary high to avoid interference with the cell’s functionality and toxicity, even for highly optimized labeling strategies only six to eight division can be observed before the observed fluorescence becomes indistinguishable from the background fluorescence [18].

To address these problem a modified label-structured population model is introduced in [30] for the case of constant background fluorescence, . This modified label-structured population model directly describes the evolution of , accounting for the facts that (1) only is divided among daughter cells and (2) only is degraded over time. Unfortunately, this complicates the numerical treatment – for this model no analytical expression for the label evolution is known – and does not allow for the a separate analysis of the contributions. Furthermore, experiments showed that the background fluorescence varies among cells [18]. The autofluorescence, also called background fluorescence, is a stochastic variable , which is independent of the level of label concentration. The distribution of , , with , can be assessed using control experiments [18, 32, 33].

In this section, we propose an approach to predict the measured distribution of fluorescence, while explicitly distinguishing label dynamics and measurement process. The label dynamics are described by the DLSP model and the measured distribution of fluorescence is simply the convolution of the label induced fluorescence, , and the autofluorescence distribution, ,


Hence, the measured fluorescence distribution , which is a number density function, can be obtained by simulating (22) and computing the convolution integral (36). This is comparable to results described in [32, 33], where is partially contained in the model. However, the decomposition of the computation of in dynamics and measurement is far more intuitive than a combined model as in [30, 32, 33] which combines the effects.

5.2 Efficient approximation of measured overall label distribution

It has been shown that the overall label distribution, , can be computed efficiently using the simulation of a low-dimensional ODE model and the analytical solution of a simple PDE. Unfortunately, this efficiency is corrupted by the need for solving the convolution integral (36). A repeated evaluation, as required for parameter estimation (see, e.g., [30]), results in a large computational burden.

To reduce the computational complexity, we propose an approximation for of which can be computed without integration. To allow for this approximation, we assume that the initial condition is a weighted sum of log-normal distributions,


with fraction parameters , with , parameters , and


The faction parameters, , determine which fraction of cells belongs to which log-normal distribution. The number of different log-normal distributions is denoted by . In addition, we restrict the measurement noise to be log-normally distributed, . These two assumptions are not restrictive, as any smooth distribution can be approximated arbitrarily well by a sum of log-normal distributions and as autofluorescence levels are known to be approximately log-normally distributed (see, e.g., [18]).

Given (37), it can be shown that the label distribution in the individual subpopulation is


with (for proof see Appendix H). This follows directly from the analytical solution of . Thus, log-normal distributions are conserved under the considered class of partial differential equations, and log-normal initial conditions result in log-normal label distribution for . This implies that also the label induced fluorescence distribution is a sum of log-normal distributions,


By inserting this in the convolution integral (36), we obtain by linearity of integration


The individual summands of , , are the measured fluorescence distributions in the subpopulations defined by a common division number. Therein, the summands of ,


describe the contribution of the -th log-normal distribution in the initial condition to . This can be traced back as the superposition principle holds. Apparently, the efficient assessment of is possible, using an efficient computational scheme for computing .

The probability density is the probability density of the sum of two log-normally distributed random variables. Although, this density is of interest in many research fields (see [37, 38] and references therein), no analytical formula for computing is known. Still, several approximations are available. One of the most commonly used approximation has been proposed by Fenton [37]. Fenton employs the fact that although the distribution of the sum of two log-normally distributed random variables is not log-normal, it can still be closely approximated by a log-normal distribution. In [37], this approximating log-normal distribution is chosen to have the same first two central moments, mean and variance , as the actual distribution of the sum.

The time-dependent central moments of are the sums


of the time-dependent central moments of the label distribution of the -th subpopulation, and , and the static autofluorescence, and , as it is known from basic statistics [39]. These central moments are


for the label distribution and


for the measurement noise. Following [37], the log-normal distribution exhibiting the same overall mean and variance has parameters


yielding the approximation


of . Own studies revealed (not shown), that this approximation is for narrow distributions almost indistinguishable from the true distribution. In particular, if one of the distribution becomes narrow, the approximation can be made arbitrary good. This is helpful, as the precise parameterization of the initial condition might be a degree of freedom, which can be used to regulate the approximation quality.

Given the approximation of , the approximation


of the measured fluorescence distribution can be computed. This approximation is the sum of log-normal distributions those parameters can be computed analytically. Therefore, it merely requires the evaluation of the log-normal distribution at different points, which can be made fairly efficient using lookup tables. The approximation (53) can be determined orders of magnitude faster than the actual convolution integral (36) used, e.g., in [32, 33]. Apparently, this approximation can also be combined with the truncation introduced in the last section.

Similar to the actual value, the approximation might be employed to perform parameter estimation. There, is compared directly [28] or indirectly [17, 29, 30] with the measured flow cytometry data. This enables the inference of the model parameters, for instance, proliferation and death rate.

Remark 4.

Parameter estimation for the DLSP model is beyond the scope of this work. We focus on the development of modeling and simulation tools for structured cell population, which might in a second step be employed to infer parameters.

6 Example: Population with division number dependent parameters

To demonstrate the properties of the DLSP model, an illustrative simulation study is performed. Therefore, a hypothetical cell population system with division number dependent proliferation rates is considered. The existence of division number dependent proliferation dynamics is known for many cell systems [4, 10, 20], whereas the magnitude of the effect varies between them. This example shall illustrate the power of the DLSP and the proposed numerical procedure and therefore does not focus on a particular biological system.

The hypothetical cell population is assumed to have an initial proliferation rate of [1/hour], corresponding to an initial doubling time of 35 hours. This initial proliferation rate changes upon cell division. It is assumed that the proliferation rate decreases exponentially, [1/hours], with [-]. This rate law is based on the findings in [20] and results in a reduction of the proliferation rate by a factor of 2 when proceeding through 3 generations, thus . The cell death rate is set to a constant value, [1/hours]. Concerning the labeling, a log-normal initial label density is assumed, as observed in many studies, e.g., [17, 28, 29]. The label dilution factor and the degradation rate are set to [-] and [UI/hour], respectively, in which UI denotes the unit of label intensity. The autofluorescence is assumed to be log-normally distributed with and . All parameter values are comparable to those available in the literature [17, 28, 29].

Figure 4: Label density in cell populations at different points in time, computed from the first subpopulations. In order to ensure comparability with common histograms plots whose bins are logarithmically distributed, the label density is multiplied with the label concentration .

The resulting cell population model is simulated for days. The label density and the size of the overall population are depicted in Figure 4. Both quantities are computed using a truncation index of , thus merely the first 20 subpopulations are taken into account. This already ensures a small truncation error.

The actual truncation error and the bound for the truncation error are now studied in more detail. As no analytical solution is available, we compare all results to . For this case, the analytical expression (27) for the error bound yields over the whole time interval days. Therefore, is considered as the exact solution. Given the truncation error is evaluated. From the results depicted in Figure 5(a) it is apparent that over the considered time interval, already provides an error smaller than . This illustrates that a small number of subpopulations is sufficient to obtain a good approximation of the population density. This result is also supported by the derived truncation error bound (Figure 5(b)), while the expected truncation error is overestimated. This is visible for instance for and , where the truncation error bound is several orders of magnitude higher than the actual truncation error.

(a) Truncation error, .
(b) Bound for the truncation error, .
Figure 5: Truncation error \subreffig-ex: truncation error - exact and truncation error bound \subreffig-ex: truncation error - bound as function of the truncation index and the final time .

To assess the truncation error bound more precisely, we compute the minimal truncation index required to ensure a predefined error bound . This analysis is performed using the exact truncation error (blue) and the truncation error bound (orange). The results are depicted in Figure 6, where Figure 6(b) shows the time dependency and Figure 6(a) shows the error level dependency of the minimal truncation index . As verified previously, the computation from the exact truncation error yields lower truncation indices. The maximal observed difference for this system is a factor of 2. The difference increases over time and interestingly, for this system, the truncation index as a function of approximates a line with a slope of 2.5. While the slope is problem dependent, this effect has been observed for all considered systems. It probably originates from the structure of the truncation error bound (29). Besides the time dependency, the index depends also on . When is decreased by a factor of 10, the index has to increase by 2. This is a quite reasonable scaling and allows for very good approximations. For the system at hand, the analysis of the exact truncation error shows that ensures an error of 0.01 % of the original population size. Employing the truncation error bound we compute . Thus, the truncation error is overestimated by a factor of 2 but this number is computed without simulation and available even if the exact solution is not known. Given the upper bound of the truncation error, we can verify a priori that a very good approximation () of the solution of the coupled system of PDEs (1) can be calculated by solving a system of 20 ODEs. This reduces the computational effort drastically.

(a) Truncation index required to ensure that