Model distinguishability and inference robustness in mechanisms of cholera transmission and loss of immunity

Model distinguishability and inference robustness in mechanisms of cholera transmission and loss of immunity

Elizabeth C. Lee, Michael R. Kelly, Jr., Brad M. Ochocki, Segun M. Akinwumi,
Karen E. S. Hamre, Joseph H. Tien, and Marisa C. Eisenberg
Corresponding author. Email addresses:,,,,,,

Department of Biology, Georgetown University, 37th and O Streets, NW, Washington, DC 20057, United States of America

Department of Mathematics, The Ohio State University, 231 West 18th Ave, Columbus, OH 43210, United States of America

Department of BioSciences, Program in Ecology and Evolutionary Biology, Rice University, 6100 Main Street, MS-170, Houston, TX 77005-1892, United States of America

Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB, T6G 2G1, Canada

Division of Global Pediatrics and Division of Epidemiology and Community Health, University of Minnesota, 717 Delaware Street SE, 3rd Floor, Minneapolis, MN 55414, United States of America

Departments of Epidemiology and Mathematics, University of Michigan, Ann Arbor, 1415 Washington Heights, Ann Arbor, 48109, USA

Mathematical models of cholera and waterborne disease vary widely in their structures, in terms of transmission pathways, loss of immunity, and a range of other features. These differences can affect model dynamics, with different models potentially yielding different predictions and parameter estimates from the same data. Given the increasing use of mathematical models to inform public health decision-making, it is important to assess model distinguishability (whether models can be distinguished based on fit to data) and inference robustness (whether inferences from the model are robust to realistic variations in model structure).

In this paper, we examined the effects of uncertainty in model structure in the context of epidemic cholera, testing a range of models with differences in transmission and loss of immunity structure, based on known features of cholera epidemiology. We fit these models to simulated epidemic and long-term data, as well as data from the 2006 Angola epidemic. We evaluated model distinguishability based on fit to data, and whether the parameter values, model behavior, and forecasting ability can accurately be inferred from incidence data.

In general, all models were able to successfully fit to all data sets, both real and simulated, regardless of whether the model generating the simulated data matched the fitted model. However, in the long-term data, the best model fits were achieved when the loss of immunity structures matched those of the model that simulated the data. Two transmission and reporting parameters were accurately estimated across all models, while the remaining parameters showed broad variation across the different models and data sets. Forecasting efforts were not successful early in the outbreaks, but once the epidemic peak had been achieved, most models were able to capture the downward incidence trajectory with similar accuracy.

Our results suggest that we are unlikely to be able to infer mechanistic details from epidemic case data alone, underscoring the need for broader data collection, such as immunity/serology status, pathogen dose response curves, and environmental pathogen data. Nonetheless, with sufficient data, conclusions from forecasting and some parameter estimates were robust to variations in the model structure, and comparative modeling can help to determine how realistic variations in model structure may affect the conclusions drawn from models and data.

Keywords: model misspecification, parameter estimation, model structure, comparative modeling

1 Introduction

Cholera is a waterborne disease caused by the bacterium Vibrio cholerae, which manifests as severe diarrhea and vomiting leading to dehydration. Left untreated, cholera can be up to fatal, but rehydration treatment can greatly reduce case fatality rates to as low as 1% [1, 2]. Worldwide, cholera causes three to five million cases and over 100,000 deaths per year [3]. Numerous mathematical models of cholera transmission have been proposed to investigate factors that impact the dynamics and transmission of waterborne diseases [4, 5, 6, 7, 8, 9, 10, 11, 12, 13], and the ongoing cholera epidemic in Haiti has spurred additional interest in the subject [14, 15, 16, 12, 17, 18].

Due to the range of indirect transmission pathways and timescales, which may be represented by environmental water sources, household water containers, foodborne transmission, and more [19, 20], commonly used mathematical models for cholera vary widely in their level of detail, spatial scale, and model structure. Some models use a single term to represent a composite set of transmission mechanisms, while others include multiple timescales of transmission [4, 8]. In addition to standard mass-action transmission [8, 14], many models use nonlinear transmission functions for waterborne transmission [4, 21] to reflect the dose response for cholera in the water. Models may also include an asymptomatic transmission pathway, a hyperinfectious state for the bacteria immediately after shedding, and other ecological and environmental factors in the environmental reservoir, such as effects of vibriophages, plankton, weather, and climate [8, 6, 4, 5, 10, 22, 23, 18, 24, 25, 26].

In addition to the variation in transmission mechanisms, loss of immunity to cholera is poorly understood and therefore, modeled with many different assumptions. Estimates of the length of immunity in the literature range widely from several months to three to ten years [27, 6, 28]. Immunity to cholera is of particular interest given the recent and ongoing oral cholera vaccine campaigns worldwide, including in Haiti, Bangladesh, and Thailand [29, 30, 31, 32], which raise additional questions of how vaccine-derived immunity compares to immunity derived from infection.

As modeling gains prevalence among policy makers in public health [16, 33, 34, 35, 36, 37], comparative or ensemble modeling approaches have been increasingly viewed as a way to ensure that the results of parameter estimation, forecasting efforts, and the evaluation of intervention strategies are conserved across the range of realistic model structures [38, 39]. Two related concepts are useful to consider in these efforts—model distinguishability addresses whether candidate models can be distinguished by their fits to empirical data [40], and inference robustness assessments examine whether conclusions drawn from a particular model are robust to realistic variations in the model structure [38]. Both of these concepts are important to evaluate in the model-building process when model results are used as the basis for decision making. The importance of parameter and model uncertainty for cholera has been highlighted by several recent studies, both in general [41, 26], and in the context of the recent Haiti epidemic specifically [41, 25].

In this paper, we examine the effects of uncertainty in model structure on cholera disease dynamics and inference by considering five models with different transmission and loss of immunity mechanisms. The models under consideration share a common base, the SIWR model of Tien and Earn [8]. The SIWR model is an extension of the classic Susceptible-Infected-Recovered (SIR) [42] with an added compartment for pathogen concentration in an aquatic reservoir (W) [8]. In addition to the person-to-person and water transmission pathways of the base SIWR model, we evaluate two additional transmission-related features: a nonlinear Hill-function dose response for waterborne transmission and an asymptomatic infection and transmission pathway [43, 4, 6, 24]. We also consider three loss of immunity features not included in the SIWR model: exponential loss of immunity, stage-progression gamma-distributed loss of immunity, and a novel model that features progressively increasing susceptibility after recovery from infection.

Using these five deterministic SIWR-based models, we first simulate data from each model with different types of added noise. In the frame of model distinguishability and inference robustness assessment, we estimate model parameters and fit model data for all five models to each simulated dataset and an empirical dataset from the 2006 cholera epidemic in Angola. We determine how well each model recaptures the underlying parameter and values, as well as how each model fits to simulated data generated from different models. Finally, we forecast trajectories using parameters estimated from truncated simulated incidence data and compare model forecasts with that of the true simulated data.

2 Methods

2.1 Model descriptions

Here we introduce the five models of epidemic cholera disease dynamics that were used in our analyses. We base these models on the SIWR model of Tien and Earn [8], which includes two timescales of transmission: a fast, direct route (represented by ) which incorporates direct person-person transmission, foodborne and household transmission, and other non-environmental transmission pathways; and a slow, indirect route () which represents long-term transmission mediated by bacteria in the water. The scaled non-dimensional model equations are given by [8]:


where , , and represent the fractions of the population that are susceptible, infectious, and recovered, respectively. The variable is proportional to the concentration of V. cholerae in the environment, and represents the decay rate of the pathogen in the water (noting that the nondimensionalization in [8] rescales the water so that the parameter for pathogen shedding into the water is canceled and replaced with , yielding the form of seen above—see Supplementary Information for nondimensionalization details). Parameter definitions and values are given in Table 1. The basic reproduction number () for the baseline model includes both transmission pathways and is given by


where is the recovery rate and is the natural death rate in the population. The five models in our study build upon this baseline SIWR model (Figure 1), with the aim of capturing a range of features commonly considered in loss of immunity and cholera dynamics. These features include: loss of immunity following exponential or gamma distributions, nonlinear dose response effects on transmission, and asymptomatic cases.

To connect these models with data, we use the same measurement equation of (i.e. data are assumed to have mean with some measurement error), as has been used with a range of data sets using the SIWR model [14, 20, 18] (including the same epidemic dataset as considered here). The parameter represents a combination of factors including the reporting rate, at-risk population size, potentially the infectious period (if used as an approximation for incidence data), and potentially the symptomatic fraction (depending on the model considered), among others. We define and fit rather than as this constrains the parameter estimation to be within rather than , making easier to estimate [20].

Figure 1: Diagrams of study models. First row (left to right): Exponential model, Dose Response model, Asymptomatic model. Second row (left to right): Gamma model and Waning Immunity model. Red compartments represent the infected population and red arrows represent person-person transmission. Blue compartments represent pathogen concentration in water while blue arrows represent pathogen-person transmission. Black compartments are susceptible or partially susceptible, while white compartments are immune. Grey arrows indicate pathogen shedding.

Exponential model. The Exponential model builds on the baseline SIWR model by enabling recovered individuals to return to the susceptible state. We assume those in the recovered class () become susceptible () again after losing their immunity with rate parameter . The duration of immunity among recovered individuals follows an exponential distribution. for this model has the same structure as that of the baseline model (Eq. 2). The equations are given by:


Dose Response model. The Dose Response model adds to the Exponential model by accounting for nonlinear dose response effects in the waterborne transmission pathway [44, 4, 43, 41]. Challenge studies in which volunteers ingested cholera and other pathogens at various doses have shown that the probability of transmission is a nonlinear function of dose size, with sigmoidal or Hill function-shaped curves that show low probabilities of infection at low concentrations and saturating probability of infection once the bacteria attain sufficiently high concentrations in the aquatic reservoir. Following previous studies, this model incorporated a Hill function dose response curve in the parameter for waterborne transmission [4, 43]. We assume susceptible individuals become infected at a rate , where is the maximum force of infection from the water and is the concentration of V. cholerae that gives a chance of water-transmitted cholera infection. As in the Exponential model, individuals lose immunity at a rate . The model equations are given by:


and for this model is given by:

Asymptomatic model. This model expands upon the Exponential model to include an asymptomatic infection pathway. Estimates of the ratio of asymptomatic to symptomatic cholera infections range from 3 to 100 [6]. We assumed 20% of infections were symptomatic () [41]. We assumed that all pathogens decay at the same rate , regardless of which class they are shed from. Shedding rates differ by symptomatic/asymptomatic status [41], with differential shedding rates and . Scaling by these shedding rates and the other model parameters in the nondimensionalization (given in the Supplementary Information) results in the following model equations:


where for this model is given by:

Gamma model. As opposed to exponential loss of immunity, the Gamma model represents waning immunity through a chain of recovered classes, which results in an immunity duration that has a gamma distribution (See [45] and [46] for examples). The number of classes depends on the assumed distribution on length of immunity; a greater number of compartments yields a duration that begins to approximate a time delay. In this model, individuals in the chain of recovered classes retain complete immunity until they return to the susceptible class, with equations given by:


and for this model is same as Equation 2. We note that the progression through each stage of immunity is at rate , so that this model has the same overall duration of immunity as the previous models.

Waning Immunity model. Given the wide range of estimates for the length of immunity in cholera, we also developed a new model that allows for progressively increasing susceptbility after infection over time. This may more realistically reflect the dynamics of loss of immunity, which is likely to be progressive rather than switch-like. Similar models of progressive waning has been used in a range of other contexts [47], and a distribution of cholera antibody titers are often observed in cholera vaccine studies [48]. The Waning Immunity model again represents loss of immunity through a chain of classes, but individuals are at least partially susceptible at all points in the chain. The transmission rates are given as functions of location in the chain (Figure 1), where more recently recovered individuals are less likely to become infected. As individuals progress through the chain, immunity to water-borne and person-person infection wanes, with susceptibility given by functions and , respectively. The forms of and were chosen so that the probability of infection is close to zero for individuals in the first susceptible class () and equal to one for individuals in the last susceptible class (). The model equations are:


where we take and . For simplicity, we assumed baseline values of , i.e. equal rates of waning immunity for waterborne and direct transmission. This form was chosen to make and equal to 1 for and near zero for . We also make the same adjustment to the waning immunity rates as in the Gamma model. Again, for this model is given by Equation 2.

2.2 Data simulation and baseline parameters

Model parameters were taken from previous modeling and empirical studies of epidemic cholera, primarily from the estimates in [20], which used a SIWR model fitted to the 2006 outbreak of cholera in Angola (Table 1). However, some updates to the parameter values were made, based on literature estimates and known biology, as described in more detail in the Supplementary Information (Section A.2). For the epidemic scenarios, we simulated six sets of data from each model to represent several possible combinations of measurement error assumptions—noise-free, normal noise ( = model output, ), and poisson noise ( = model output), and simulation durations—epidemic (100 days) and long-term (3 years). Throughout the paper, we use the term simulation model to refer to the model that simulated the data for the analysis. These 30 simulated datasets (five models three noise options two simulation durations) were used for both parameter estimation/model fitting and forecasting.

We used the average life expectancy of Angola, 55 years [49], to determine . Length of immunity to cholera () acquired after infection is not well understood, with estimates ranging from weeks to months to years [27, 6, 28]. For our simulated data, we assumed immunity lasted one year. The complete set of values, descriptions, and sources for the baseline parameters used for data simulation is given in Table 1.

While most parameters were common across models, some models required additional, specialized parameters. For the Asymptomatic model, we chose and such that was four times greater than and the weighted average of and (where symptomatic infections represent of the infected population) was equal to for all of the other models. To compare across models, in the Results for the Asymptomatic model we report this weighted average of the two transmission parameters. Reasoning that individuals with symptomatic infections will have longer lasting immunity than those with asymptomatic infections, we assumed that immunity from symptomatic infections () would last two years and immunity from asymptomatic infections () would last 0.5 years. Similarly, the reported estimates of for this model are given as the weighted average of and . Additionally, the value used for the Asymptomatic model was adjusted based on the fact that only symptomatic cases are observed (Table 1), although the unadjusted form was reported to facilitate model comparison in the Results.

The Dose Response model parameter structure also deviated from the other models, with the waterborne transmission term expanded to form a Hill function. The half-saturation constant, , was determined based on previous literature estimates [50]. To determine the value of the maximum transmission level, , there are in general several natural ways that one might match up the Hill function with the single waterborne transmission parameter value used in the original SIWR model. For example, one might match this value to the linear approximation of the Hill function (i.e. let , which approximates the Hill function for low values of ), or to the maximum level of waterborne transmission (i.e. let ), or to some intermediate value of transmission. The Dose Response model behaved quite similarly to the Exponential model when the linear approximation was used; we chose the saturated, maximum value of transmission to match the models, letting for our baseline parameters, in order to accentuate the differences between models.

For the simulated data, we assumed simulation model initial conditions with 1% of the population infected and the remaining population susceptible. For the Asymptomatic model, we assumed that 0.2% of the population was , 0.8% of the population was , and the remaining population was susceptible. All model simulations and parameter estimation were done in MATLAB. We also evaluated the baseline for each model using the baseline parameter values—with these parameters, all models have an of 3, except for the Dose Response model, which has an of 6.

Parameter Description Value Units Source Models
natural birth/death rate in Angola [51] All
1/(mean duration of infectiousness), or recovery rate [52, 2, 14, 20] All
person-person transmission rate 0.25 [20, 8, 14] All except Asymptomatic
person-person transmission rate in asymptomatics See Model Description, Section 2.2 Asymptomatic
person-person transmission rate in symptomatics See Model Description, Section 2.2 Asymptomatic
water-borne transmission rate 0.50 [20, 8, 14] All
loss of immunity rate [27, 6, 28], See Section 2.2 All except Asymptomatic
loss of immunity rate in asymptomatics See Section 2.2 Asymptomatic
loss of immunity rate in symptomatics See Section 2.2 Asymptomatic
decay rate of cholera in water [20, 14] All
1/(population at risk reporting rate) [20], See Section 2.2, SI All except Asymptomatic
proportion symptomatic cases / (population at risk reporting rate) [20], See Section 2.2, SI Asymptomatic
bacteria concentration that gives chance of infection 0.4 [50], See Model Description, Section 2.2 Dose Response
modifies the susceptibility of individuals to direct infection 10 See Model Description Waning Immunity
modifies the susceptibility of individuals to waterborne infection 10 See Model Description Waning Immunity
Table 1: Model parameters. Additional details about the baseline values for each parameter are given in Section 2.2 and the Supplementary Information (SI).

2.3 Identifiability of parameters

Identifiability and uncertainty quantification methods allow us to address whether, and with what degree of certainty, it is possible to uniquely recover the parameters of a model for a given dataset. In general, two broad classes of identifiability are often considered—structural and practical identifiability [53]. Structural identifiability addresses whether it is possible to estimate the parameters based purely on the structure of the model and the measurements, typically assuming a best-case of perfect, noise-free data. Practical identifiability considers how the amount and quality of the data (e.g. level of measurement error, number of replicates) may affect estimation of the parameters. One commonly used method of evaluating identifiability is via the Fisher Information Matrix (FIM), a symmetric matrix that represents the amount of information about the parameters that is contained in the data [54, 55]. The rank of the FIM corresponds to the number of (locally structurally) identifiable parameters, and taking the inverse of the FIM gives an approximation of the covariance matrix for the parameters using the Cramer-Rao bound [54, 56, 57, 53]. We calculated the FIM for each model (assuming the same parameters and starting parameters as for the simulated data, with daily data for one year), and evaluated both the rank and covariance matrix. From the parameter variances, we calculated the percent coefficient of variation () to evaluate parameter uncertainty. The takes into account the size of the parameter value when evaluating uncertainty, with (where is the standard deviation and is the value of the parameter). We used a tolerance value [58], where indicates parameter identifiability. Typically, structurally unidentifiable parameters will have extremely large (e.g., numerically near-infinite, typically [58]) s, while practically unidentifiable parameters have finite but large s.

2.4 Parameter estimation and model fitting

Using maximum likelihood estimation and Nelder-Mead optimization in MATLAB, we fit each of the five model structures to the 30 simulated datasets. When fitting our models to data, we estimated all parameters except the birth/death rate, , and the recovery rate, , which are known [49, 14, 52]. Throughout the paper, we use the term fitting model to refer to the model that was used to fit an epidemic dataset. To perform model fitting and parameter estimation from the simulated data, we used two sets of parameters to initialize the optimization: informed starting parameters, which were the true parameters from the simulation model (i.e. the parameters used in generating the simulated data, given in Table 1), and naive starting parameters, which were randomly chosen at of the true parameters. To compare model fits to simulated data, we calculated and compared Akaike information criterion (AIC) values for the models [59]; for ease of comparison, we report values, defined as the difference in AIC between the model of interest and the best fit model (i.e., model with the lowest AIC) for a given dataset. To compare parameter estimates among the fitting models, we examined the percent deviation from the true value, which was calculated as the difference between the estimate and the true value, divided by the true value. In this descriptive analysis, a deviation less than 20% of the true value was considered ‘accurately recaptured’ because the variation should be smaller than the potential deviation in starting parameters. Additional details on the parameter estimation methods are also given in the Supplementary Information.

For model initial conditions when fitting, we let , where is the initial data value, and then assumed the remaining population to be susceptible. For the Asymptomatic model, we assumed that the observed infected were symptomatic, and generated an additional fraction of asymptomatically infected individuals based on the proportionality parameter (with the remainder of the population again assumed susceptible).

2.5 Forecasts of epidemic data

In addition to parameter estimation, we evaluated the ability for the study models to forecast epidemic trajectories when provided with truncated and noisy simulated epidemic data. Using 10, 30, and 50 days of simulated incidence data (out of 100-day simulated epidemic data) with normal noise from the five study models, we estimated parameters from each fitting model. These parameter estimates were then used to generate a trajectory forecast for the remainder of the 100-day epidemic. Throughout the study, we use forecasting model to refer to the model used to estimate these parameters and generate the forward-looking trajectory. For the parameter estimates in this section, we used the informed starting values.

2.6 Application to 2006 cholera outbreak in Angola

We investigated the ability for the five study models to fit and forecast data from an empirical cholera epidemic. We applied the same methods of parameter estimation, model fitting, and forecasting to data from the recent 2006 cholera outbreak in Angola (data courtesy of the WHO Cholera Task Force), shown in Figure 6. The outbreak began in Luanda province (February 13, 2006), and eventually spread to 16 of 18 provinces, resulting in 82,204 cases and 3,092 deaths [60, 20]. The ‘true’ parameter values are unknown, so we considered both the ‘informed’ and ‘naive’ sets of starting parameter values used in the simulated data for fitting to real data as well. In this case, the ‘informed’ starting parameter values of course do not represent the true parameter values, but are in some sense more ‘informed’, as the baseline values given in Table 1 are based on the best fit parameters for the the original SIWR model to this data set in [20]. For fitting, we used weighted least squares assuming a weight of [20]. The same initial condition setup was used in these fits as for fitting to the simulated data.

3 Results

3.1 Identifiability of parameters

The FIMs for all models except for the Asymptomatic model were full rank. All FIM’s were numerically invertible, with the percent coefficients of variation (s) shown in Table 2. Apart from the aforementioned rank deficiency in the Asymptomatic model, all parameter s were small enough that the parameters would be considered structurally identifiable.

Parameters Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.016 0.019 - 0.011 0.016
- - - -
- - - -
0.56 0.27 0.13 0.34 0.48
0.34 0.094 0.11 0.23
- - 0.60 - -
- - 0.072 - -
0.66 0.33 0.097 0.37 0.55
0.025 0.0081 0.028 0.042 0.044
- - - - 0.55
- - - - 8.33
Table 2: Estimated for each parameter. A means that the parameter is practically unidentifiable. A ‘-’ means that the parameter is not applicable to the model.

The Asymptomatic model was rank deficient by one, although it was still possible to numerically invert the FIM, which showed that the direct transmission parameters for symptomatic vs. asymptomatic transmission form an identifiable combination. To determine the form of the identifiable combination, we simulated noise-free data using the Asymptomatic model as described in the Data Simulation section, and then plotted the sum-of-squares goodness-of-fit surface across a range of values of and , shown in Figure 2. We found a linear canyon which achieved the same minimum goodness-of-fit as the true parameters, corresponding to a structurally identifiable combination [61, 58]. The form of the identifiable combination was given by the sum of the two transmission parameters, weighted by the proportion of symptomatic cases, : . If we fixed either or using the identifiable combination as a constraint, and re-calculated the FIM, all s were . Because the dynamics for the two infected compartments ( and ) are the same up to a constant scaling factor, it is impossible to distinguish the proportion of cases generated from one compartment vs. the other. Given these results, in our subsequent analyses for the Asymptomatic model we report only this identifiable combination , rather than the individual unidentifiable parameters and .

Figure 2: Goodness-of-fit surface for and in the Asymptomatic model, highlighting the identifiable combination. The red line highlights the minimum sum of squares, which corresponds to an identifiable combination given by the line .

3.2 Parameter estimation and model fitting to epidemic data

For the 100-day simulation duration, across the three noise cases, all five models fit the simulated data well (Figure 3 and Supplementary Figure S5). Nearly all model fits matched the simulated noisy data closely throughout the trajectory, with few runs (strings of consecutive data points on one side of the fit) of obviously correlated residuals. Correspondingly, the AIC values were nearly indistinguishable across model fits, but the Exponential or Dose Response model fits consistently had the lowest (i.e., best) numerical values in 25 out of 30 cases, while the Asymptomatic and Waning Immunity model fits frequently matched or followed close behind (Supplementary Tables S1 - S6).

The true parameter values of direct transmission () and measurement scaling () were recovered much more accurately than the other parameters that were common across all models (Figure 4), and neither of these parameters had identifiability issues in our study (Table 2) or a previous cholera modeling study [20]. For , the magnitude in parameter estimate deviations across four of five fitting models was under 20%; the Asymptomatic model under-estimated by 33% in one instance. For , the magnitude in parameter estimate deviations across three of five fitting models was under 20%; the Gamma model underestimated by 21-22% in two instances and the Waning Immunity model both underestimated and overestimated with a deviation magnitude greater than 20%.

In congruence with a previous study [20], we found that and formed a practically unidentifiable combination in all models (Figure 4 and Supplementary Figure S1), which might explain their large deviations from the true parameter value. We also considered that some of the deviations in (and by extension ), may be due to the different choice of structure for in the Dose Response model. We evaluated the parameter variation without the Dose Response parameter estimates or simulated data, and found that while there was some attenuation of the more extreme outliers for the estimates of across all models, still showed a much broader range than or , and the remaining parameter distributions remained largely identical (shown in Supplementary Figure S1).

While most of the parameters were neither consistently underestimated nor overestimated by any of the models, the estimates of tended to be underestimated for both the Gamma and Waning Immunity models when fitted to epidemic data (although other runs of estimation, e.g. using different optimizer settings did not necessarily yield this). Deviations were systematically smaller when models were fitting Exponential model data, and as expected, parameter estimate recapture accuracy improved as simulated data decreased in noise (Figure S2). Parameter estimates for all combinations of simulation model data and fitting models are reported in Supplementary Tables S1 - S6.

Figure 3: Fits to 100-day simulated model data (indicated by row) without noise (left column), with normal noise (middle column), and with poisson noise (right column), using naive starting parameters. Model fits are overlaid, thus obscuring some of the model fits in the figure. All fitting models were able to capture the mean epidemic data despite added noise and model misspecification.

3.3 Parameter estimation and model fitting to long-term data

For the 3-year data, all models were able to fit the qualitative dynamics of the simulated data, and with a few exceptions, visually all models matched all data sets well (Supplementary Figures S6 and S7). However, unlike the 100-day data, the lowest AIC values were most often achieved when the same model which generated the data was used for fitting (though often other models yielded very close AIC values, with differences ). AIC values were consistently lower when models where the waning immunity distributions matched those of the data simulation model (exponentially-distributed waning immunity: Exponential, Dose Response, and Asymptomatic models; gamma-distributed waning immunity: Gamma and Waning Immunity models) (Supplementary Tables S7 - S12). Similarly, estimates from models where the waning immunity distributions matched those of the data simulation model were more accurate (Supplementary Figure S4). We note that our ability to distinguish model fits through large differences in AIC was facilitated by the large volume of simulated data—daily data over three years. In practice, time series data would be much sparser, making it more difficult to distinguish the different models.

The parameter estimate accuracy did not appear to differ strongly as a function of the type of added noise (Supplementary Figure S4). As with the 100-day data analyses, was the most accurate estimated parameter across all models, followed by , although estimates appeared to be biased when the waning immunity distribution of the data simulation model did not match that of the fitting model (Figure 4, Supplementary Tables S7 - S12). In general, was underestimated for gamma-distributed waning models fitted to exponentially-distributed waning models, and overestimated in the reverse case. The effect of the practically identifiable combination between and was also present in the 3-year data, with underestimates of tending to be compensated by overestimates of (Figure 4, Supplementary Tables S7 - S12).

Figure 4: Top: Percent deviation of estimates from true parameter values, grouped by parameter for the simulated 100-day data (left) and the simulated 3-year data (right). The model used to fit the data and estimate the parameter is indicated by color. The median across all estimates (i.e., across added noise type, simulation model, and data duration) is marked with a black point in the distribution and the black dashed line represents deviation. Distribution ranges for the , , and subplots are trimmed for visibility. Bottom: Scatterplot of and estimates by colored fitting model. Four data points in the distribution tails are truncated for ease of viewing. See Supplementary Figure S1 for the full plot.

3.4 Forecasts from simulated epidemic data

Epidemic trajectories were projected forward based on 10, 30, and 50 days of observed, simulated normal noise data for each study model. All forecasting models generated trajectories that were poorly representative of the true simulated data when only 10 or 30 data points were observed (Figure 5). In particular, even when forecasting models matched simulation models, these truncated epidemic datasets did not allow for accurate forecasts. While forecasts generated from 30 observations as compared to 10 observations were improved, performance remained poor overall, with only one out of 25 forecasts able to accurately capture both the timing and magnitude of the simulated epidemic peak (Exponential model forecasting from Dose Response simulated data). In fact, 11 of 25 forecasts continued to project exponential growth upwards past the 100-day simulation period.

Figure 5: Forecasts to 100-day data (indicated by row) up though 10 days, (left column), 30 days (middle column), and 50 days (right column). Model fits are overlaid, thus obscuring some of the model fits in the figure.

Forecasts generated from 50 observed data points were significantly improved, with 16 of 25 forecasts roughly capturing the downward trajectory of the true data. Notably, most models underestimated the peak infection incidence and timing of the simulated Exponential model data, and the Dose Response model forecasts were least able to capture the correct disease dynamics of any model besides its own. As with the previous forecasts, a match between the simulation and forecasting model did not necessarily generate the most accurate forecast.

3.5 Application to 2006 cholera outbreak in Angola

We now turn our attention to data collected from a 2006 cholera outbreak in Angola. The fits of the models to the data are shown in Figure 6. The corresponding parameter estimates and AIC values for these fits, accounting for both informed and naive starting parameters, are given in Table 3. All models fit the data well, with similar trajectories that captured the overall shape, peak, and duration of the epidemic. The lowest s were found in the parameters and , for both the ‘informed’ and ‘naive’ starting parameters (where ‘informed’ here indicates the values chosen from the literature for data simulation, although these are not the true values as they were for the simulated data). The Waning Immunity model gave the lowest AIC for both naive and informed starting parameters (2940 and 2864, compared to AICs greater than 3400 for all other models), suggesting that the Waning Immunity model provides the best fit of the Angola data. Interestingly, this model showed very little direct transmission (), suggesting that waterborne transmission may be enough to explain the outbreak.

(a) Informed starting parameters
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.1460 0.1981 0.1600 0.2629 0.0358
0.6033 0.5055 0.6814 1.2485 1.0063
0.0051 0.0020 0.0028 0.000026 0.000048
0.0544 0.0148 0.0413 0.0074 0.0530
1.37e-5 1.30e-5 1.37e-5 1.13e-5 1.95e-5
3667 3679 3657 4311 2940
3.00 5.85 3.03 6.04 4.17

(b) Naive starting parameters
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.1865 0.0655 0.1827 0.2629 3.5e-8
0.8420 0.3101 0.7384 1.2479 0.9696
0.0029 0.0050 0.0021 0.00017 0.000089
0.0257 0.0651 0.0312 0.0074 0.0668
1.33e-5 1.36e-5 1.33e-5 1.13e-5 1.96e-5
3731 3431 3696 4311 2864
4.11 3.36 3.68 6.04 3.88

Note: The average estimates across starting conditions for and for the Asymptomatic model were 2.61e-9 and 0.005, respectively.

Table 3: Parameter estimates for the 2006 Angola epidemic data.
Figure 6: “Naive” (left) and “informed” (right) starting parameter fits to 2006 Angola epidemic data.

4 Discussion

Mathematical modelers often need to make decisions on how to model transmission and loss immunity mechanisms, so it is critical to understand the robustness and sensitivity of results to realistic variations in model structures. Toward that end, we have examined how model uncertainty and misspecification affect parameter estimates, model fits, and model forecasting ability. We considered five deterministic SIWR-based model structures, each including different hypothesized mechanisms of cholera transmission and loss of immunity, using both simulated data and data from the 2006 cholera epidemic in Angola. We found that goodness-of-fit criteria were unable to distinguish misspecified model fits from those of the true model for simulated epidemic data, and that forecasting from short-term data of one month or less was universally poor, even when the model that generated the data and the model that produced the forecast were the same. However, model fits for long-term data were found to be specific to the type of waning immunity that was implemented (i.e., exponential- vs. gamma-distributed waning immunity). Moreover, some parameters were consistently estimated across all models and datasets, suggesting that it may be possible to estimate some parameters even when the underlying mechanistic model is unknown.

We showed that each model was able to capture the epidemic trajectories for all datasets, real and simulated, regardless of the source of the simulated data, type of noise, or starting parameters. Indeed, often the best fit to a given epidemic dataset did not come from the model which generated it. This suggests that it may not be possible to infer the true underlying mechanisms of loss of immunity and transmission solely from epidemic cholera incidence or prevalence data. This highlights the distinction between the issues of identifiability (parameter uncertainty) and distinguishability (model uncertainty)—all model parameters were structurally identifiable (excepting the parameter combination for the Asymptomatic model), but the model structure itself may not be possible to infer from the data.

When a longer, 3-year data set was considered—as noted above, in this case it was generally possible to distinguish exponentially-distributed waning models (Exponential, Dose Response, and Asymptomatic) from gamma-distributed waning models (Gamma and Waning Immunity). Indeed, the best-fit model to the simulated 3-year data was often the model which generated it, although typically the other models with the same waning distribution had very similar AIC values (differences ), such that the models may not be distinguishable in practice. Additionally, the simulations here assumed frequent data taken daily, while more realistic data over three years would likely be weekly or monthly, making distinguishabilty by goodness of fit even less likely. Nonetheless, these results do suggest that some degree of model selection, at least for loss of immunity mechanisms, may be possible with longer term data sets. Other additional data might add to the distinguishability of realistic models, such as data on pathogen shedding, bacterial concentration in water sources, or immunity/serology data such as antibody titers.

Our study design also enabled the systematic assessment of parameter estimates under conditions of model misspecification. As noted above, all parameters were structurally identifiable (apart from the identifiable combination for and ), but unfortunately, identifiability analyses only inform the validity of parameter estimates when the true underlying model is known. When we consider the variation of the parameter estimates across different model misspecification scenarios, the picture is more complex. As was recovered quite well for all simulated data sets, this suggests that the “fast” transmission parameter and reporting rate/population factor may be accurately estimated from any of the common SIWR model variants, even if the true underlying disease mechanisms are unknown. Interestingly, was more accurately recovered from the misspecified models when the 100-day data sets were used rather than the 3-year data (Figure 4), suggesting that when model misspecification is present, more data may not always be better. It is likely that the bias in was being used to help correct for fitting issues that came from using an incorrect model structure, but this would be difficult to assess in practice with real data.

Previous work noted that in the SIWR model, and became practically unidentifiable when noisy data was considered, and formed a practically identifiable combination [20]. It was conjectured there that this issue of practical identifiability may extend more broadly to many models that include a waterborne or environmental transmission pathway. Indeed, here we confirmed that and are practically unidentifiable and form a practically identifiable combination across multiple SIWR-based models. This underscores the need for improved data collection regarding waterborne cholera transmission in order to better inform modeling and public health intervention efforts. However, we note that while the two parameters were practically unidentifiable in all models, the curve of the identifiable combination traced in Figure 4 was quite similar across all models and data sets. This suggests that while these two parameters may be practically unidentifiable in all models, their combination may be more tightly estimated across models and datasets, similarly to and . In part due to this practical unidentifiability of , estimates of varied widely across all models, both for the epidemic data in Angola and for all types of simulated data.

For the epidemic data, the estimates of were uniformly underestimated for both the Gamma and Waning Immunity models when fitted to epidemic data. This may be due to the fact that both of these models have a gamma-distributed type of waning process, meaning that significant waning would be unlikely to be observed over the 100-day epidemic period. This may have resulted in highly insensitive estimates for these two models, so that the optimization algorithm wandered to very low values for . In general, in the 3-year datasets, was underestimated for gamma-distributed waning models fitted to exponential waning models, and underestimated in the reverse case. This may be because the gamma-distributed waning models used a slower loss of immunity to generate the flat trajectories after the initial peak, while the exponential-distributed waning models may have used a more rapid to generate the subsequent peaks in the disease trajectory.

The new Waning Immunity model presented here had the best absolute fit to the Angola epidemic data, but as previously noted, this does not necessarily indicate that the waning immunity mechanism was the true driver of these epidemic dynamics. Longitudinal data on sequential infections of cholera or on antibody titers over time are needed to validate the idea that recently infected individuals regain susceptibility progressively after recovery. If waning immunity does indeed best represent epidemic cholera dynamics, the parameter estimates in our study suggest the hypothesis that the Angola epidemic was spread primarily by waterborne transmission; the Waning Immunity model estimates for were at least two orders of magnitude smaller than those for (and much smaller than in any other model). We note that this somewhat contradicts a previous study using the base SIWR model, which showed stronger waterborne transmission but still significant direct transmission [20].

Further work is needed to examine the ability for SIWR-based models to project epidemic trajectories based on limited and noisy data early in an outbreak. Only projections that took place after the epidemic peak appeared to capture any semblance of disease prevalence; projections were also poor when the forecasting model matched the simulation model, perhaps suggesting that it may not be possible to forecast accurately with deterministic SIWR models until the latter half of the epidemic. This is to be expected in the very early part of the epidemic, during the exponential growth phase, as during this phase the epidemic is largely linear on a log-scale and so can be explained by only two parameters. This idea is borne out in other real-time forecasting efforts—for example, efforts to forecast the trajectory of the 2014 Ebola epidemic in West Africa met with difficulty [62, 63, 64, 65]. We note that our results illustrate how agreement across a range of models may not guarantee accuracy; however once the epidemic peak was observed, all models tended to converge on similar and more accurate forecasts. This suggests that once sufficient data is obtained, forecasting is possible even if the model structures do not match the underlying mechanisms. Future work may examine whether early outbreak forecasting, discrimination of model structures, and inference from forecasting comparisons may be aided by additional data on weather and climate variables such as rainfall or sea surface temperature [66, 67, 68, 25, 18].

There are several limitations and future directions for this work, to consider different model structures, frameworks, and empirical datasets. Alternative cholera transmission mechanisms (such as hyperinfectivity [69, 5, 22, 10, 24]), spatial structure [70, 14, 17, 71], and climate drivers [66, 67, 68, 18, 25] were also realistic features that could have been added to the suite of model structures we considered. Model comparison with hyperinfectivity and two potential environmental reservoirs were considered in study by Rinaldo et al. [25], using a larger, detailed spatial model of the 2010 Haiti cholera epidemic (including human and pathogen movement, river networks, climate drivers, etc.). We also did not evaluate our conclusions outside of the deterministic ODE framework, although partial differential equation (PDE), agent-based, and stochastic models are also frequently used in similar contexts [72, 73, 68, 6]. In addition, most of the parameter values were motivated from a single outbreak dataset (Angola, 2006), and may not reflect the breadth of parameter space for cholera epidemics. Additional work to further examine the identifiability, uncertainty of these models, and to test alternative parameter estimation methods (e.g. Bayesian and/or global approaches) would be warranted as well. It would also be useful to consider alternate data assumptions (e.g. weekly or monthly data frequency, additional noise and error assumptions), and even to consider the generalizability of these results to other infectious disease models. Further, here we only considered the effects of model structure on parameter estimation and forecasting. Additional work could use model comparison to evaluate the effects of model and parameter uncertainty on the optimal control of interventions, similar to that proposed by Akman and Schaefer [26]. Their focus on the optimal balance of vaccination and sanitation strategies may be particularly interesting, as the lack of distinguishability between models may be less problematic if multiple models yield consistent control strategies.

In conclusion, our study presents a systematic framework for the comparison of model mechanisms and inference that can be applied broadly in infectious disease epidemiology. Modeling requires simplifying assumptions to be an effective tool, but we show here that the choice of simplifications can have significant effects on the parameter estimates, and that it may be difficult or impossible to use fit to data as a criteria to distinguish between plausible simplified models. Of course, in real-world contexts, there is unlikely to be a ‘true’ model in the sense that we considered in our simulated data—instead, each of the models represents a simplification, albeit with realistic elements. We note that our results also show that the most realistic model—a ‘super-model’ containing all of the elements considered individually across the five models here—would almost certainly be unidentifiable from case data alone. Public health decision makers desire tools that demonstrate the sensitivity of results to model assumptions (on the analysis side) and policy decisions (on the implementation side). Attempts to address the divide between decision maker needs and modelers’ ability to present useful, actionable information are on the rise, with the development of adaptive management, cost-benefit, and game theoretic frameworks in public health [74, 75, 76, 77, 78, 79, 80]. In demonstrating that goodness-of-fit is not sufficient to infer transmission and immunity mechanisms in a disease system, and that some parameter estimation may be valid even when mechanisms remain elusive, we similarly seek to inform modelers and decision makers about the relative utility of different analyses for specific types of policy questions. Our research raises questions about the appropriateness and equality of model structures and types for epidemiological inference, and future work should seek to further characterize the sensitivity of public health decision making in the context of model choice.

5 Acknowledgments

This paper began as a group project at the joint NIMBioS-MBI-CAMBAM Graduate Summer School in 2013 (—many thanks to Suzanne Lenhart, Paul Hurtado, Ariel Cintron-Arias, and the other co-organizers and presenters. The summer school was hosted by the National Institute for Mathematical and Biological Synthesis (NIMBioS), an Institute sponsored by the National Science Foundation through NSF Award DBI-1300426, with additional support from The University of Tennessee, Knoxville. NIMBioS hosted the summer school jointly with Mathematical Biosciences Institute (MBI) at The Ohio State University (National Science Foundation grant DMS 68 0931642) and the Centre for Applied Mathematics in Bioscience and Medicine (CAMBAM) at McGill University. We thank the World Health Organization Cholera Task Force for providing us with data from the 2006 cholera outbreak in Angola. The NIMBioS summer school provided support to ECL, MRK, BO, SMA, KM, and MCE, and SMA acknowledges partial travel support from PIMS IGTC, Canada for summer school attendance. This work was supported by the National Science Foundation grant OCE-1115881 (to JHT and MCE) and the National Institute of General Medical Sciences of the National Institutes of Health under Award Number U01GM110712 (to MCE). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.


  • [1] D. Sack, R. Sack, G. Nair, and A. Siddique, “Cholera,” Lancet, vol. 363, pp. 223–233, 2004.
  • [2] WHO. World Health Organization - Cholera Factsheet, URL:, 2011.
  • [3] World Health Organization (WHO), “Cholera fact sheet n-107,”, 2010.
  • [4] C. Codeco, “Endemic and epidemic dynamics of cholera: the role of the aquatic reservoir,” BMC Infectious Diseases, vol. 1, 2001.
  • [5] D. Hartley, J. Morris, and D. Smith, “Hyperinfectivity: a critical element in the ability of V. cholerae to cause epidemics?,” PLoS Medicine, vol. 3, pp. 63–69, 2006.
  • [6] A. A. King, E. L. Ionides, M. Pascual, and M. J. Bouma, “Inapparent infections and cholera dynamics,” Nature, vol. 454, pp. 877–880, 2008.
  • [7] R. I. Joh, H. Wang, H. Weiss, and J. S. Weitz, “Dynamics of Indirectly Transmitted Infectious Diseases with Immunological Threshold,” BULLETIN OF MATHEMATICAL BIOLOGY, vol. 71, pp. 845–862, MAY 2009.
  • [8] J. H. Tien and D. J. D. Earn, “Multiple transmission pathways and disease dynamics in a waterborne pathogen model,” Bulletin of Mathematical Biology, vol. 72, no. 6, pp. 1506–1533, 2010.
  • [9] A. Mwasa and J. M. Tchuenche, “Mathematical analysis of a cholera model with public health interventions,” Biosystems, vol. 105, no. 3, pp. 190–200, 2011.
  • [10] Z. Shuai, J. H. Tien, and P. van den Driessche, “Cholera models with hyperinfectivity and temporary immunity,” Bulletin of Mathematical Biology, vol. Submitted, 2012.
  • [11] J. P. Tian and J. Wang, “Global stability for cholera epidemic models,” Mathematical biosciences, vol. 232, no. 1, pp. 31–41, 2011.
  • [12] J. R. Andrews and S. Basu, “Transmission dynamics and control of cholera in haiti: an epidemic model,” The Lancet, vol. 377, no. 9773, pp. 1248–1255, 2011.
  • [13] R. Sanches, C. Ferreira, and R. Kraenkel, “The role of immunity and seasonality in cholera epidemics,” Bulletin of Mathematical Biology, vol. 73, no. 12, pp. 2916–2931, 2011.
  • [14] A. R. Tuite, J. Tien, M. Eisenberg, D. J. Earn, J. Ma, and D. N. Fisman, “Cholera epidemic in Haiti, 2010: Using a transmission model to explain spatial spread of disease and identify optimal control interventions,” Annals of Internal Medicine, vol. 154, no. 9, pp. 593–601, 2011.
  • [15] D. L. Chao, M. E. Halloran, and I. M. Longini Jr., “Vaccination strategies for epidemic cholera in Haiti with implications for the developing world,” Proceedings of the National Academy of Sciences USA, vol. 108, no. 17, pp. 7081–7085, 2011.
  • [16] J. Y. Abrams, J. R. Copeland, R. V. Tauxe, K. A. Date, E. D. Belay, R. K. Mody, and E. D. Mintz, “Real-time modelling used for outbreak management during a cholera epidemic, haiti, 2010-2011,” Epidemiology and Infection, pp. 1–10, 2012.
  • [17] E. Bertuzzo, L. Mari, L. Righetto, M. Gatto, R. Casagrandi, M. Blokesch, I. Rodriguez-Iturbe, and A. Rinaldo, “Prediction of the spatial evolution and effects of control measures for the unfolding haiti cholera outbreak,” Geophysical Research Letters, vol. 38, p. L06403, 2011.
  • [18] M. C. Eisenberg, G. Kujbida, A. R. Tuite, D. N. Fisman, and J. H. Tien, “Examining rainfall and cholera dynamics in haiti using statistical and dynamic modeling approaches,” Epidemics, vol. 5, pp. 197–207, 2013.
  • [19] D. Swerdlow, E. Mintz, M. Rodriguez, E. Tejada, C. Ocampo, L. Espejo, K. Greene, W. Saldana, L. Seminario, R. Tauxe, J. Wells, N. Bean, A. Ries, M. Pollack, B. Vertiz, and P. Blake, “Waterborne transmission of epidemic cholera in Trujillo, Peru: lessons for a continent at risk,” Lancet, vol. 340, no. 8810, pp. 28–32, 1992.
  • [20] M. Eisenberg, S. Robertson, and J. Tien, “Identifiability and estimation of multiple transmission pathways in waterborne disease.,” Journal of Theoretical Biology, no. 324, pp. 84–102, 2013.
  • [21] Z. Mukandavire, S. Liao, J. Wang, H. Gaff, D. L. Smith, and J. G. Morris, “Estimating the reproductive numbers for the 2008Ð2009 cholera outbreaks in zimbabwe,” Proceedings of the National Academy of Sciences, vol. 108, no. 21, pp. 8767–8772, 2011.
  • [22] M. Pascual, K. Koelle, and A. P. Dobson, “Hyperinfectivity in cholera: a new mechanism for an old epidemiological model?,” PLoS Medicine, vol. 3, no. 6, pp. e278–281, 2006.
  • [23] K. Koelle, X. Rodo, M. Pascual, M. Yunus, and G. Mostafa, “Refractory periods and climate forcing in cholera dynamics,” Nature, vol. 436, pp. 696–700, 2005.
  • [24] R. L. Miller-Neilan, E. Schaefer, H. Gaff, K. R. Fister, and S. Lenhart, “Modeling optimal intervention strategies for cholera,” Bulletin of Mathematical Biology, vol. 72, no. 8, pp. 2004–2018, 2010.
  • [25] A. Rinaldo, E. Bertuzzo, L. Mari, L. Righetto, M. Blokesch, M. Gatto, R. Casagrandi, M. Murray, S. M. Vesenbeckh, and I. Rodriguez-Iturbe, “Reassessment of the 2010-2011 Haiti cholera outbreak and rainfall-driven multiseason projections,” Proceedings of the National Academy of Sciences USA, vol. 109, no. 17, pp. 6602–6607, 2012.
  • [26] O. Akman and E. Schaefer, “An evolutionary computing approach for parameter estimation investigation of a model for cholera,” Journal of biological dynamics, vol. 9, no. 1, pp. 147–158, 2015.
  • [27] M. Levine, R. Black, M. Clements, L. Cisneros, D. Nalin, and C. Young, “Duration of infection-derived immunity to cholera,” Journal of Infectious Diseases, vol. 143, no. 6, pp. 818–820, 1981.
  • [28] K. Koelle and M. Pascual, “Disentangling extrinsic from intrinsic factors in disease dynamics: a nonlinear time series approach with an application to cholera,” The American Naturalist, vol. 163, no. 6, pp. 901–913, 2004.
  • [29] R. Tohme, J. François, K. Wannemuehler, P. Iyengar, A. Dismer, P. Adrien, T. Hyde, B. Marston, E. Mintz, M. Katz, et al., “Oral cholera vaccine coverage, barriers to vaccination, and adverse events following vaccination, haiti, 2013 (1).,” Emerging infectious diseases, vol. 21, no. 6, pp. 984–991, 2015.
  • [30] M. O’Leary and K. Mulholland, “Oral cholera vaccines in endemic countries,” The Lancet, vol. 386, no. 10001, pp. 1321–1322, 2015.
  • [31] J. Deen, L. von Seidlein, F. J. Luquero, C. Troeger, R. Reyburn, A. L. Lopez, A. Debes, and D. A. Sack, “The scenario approach for countries considering the addition of oral cholera vaccination in cholera preparedness and control plans,” The Lancet Infectious Diseases, vol. 16, no. 1, pp. 125–129, 2016.
  • [32] C. R. Phares, K. Date, P. Travers, C. Déglise, N. Wongjindanon, L. Ortega, and P. R. N. Bhuket, “Mass vaccination with a two-dose oral cholera vaccine in a long-standing refugee camp, thailand,” Vaccine, vol. 34, no. 1, pp. 128–133, 2016.
  • [33] V. A. Moyer, “Screening for lung cancer: Us preventive services task force recommendation statement,” Annals of internal medicine, vol. 160, no. 5, pp. 330–338, 2014.
  • [34] A. H. Auchincloss and A. V. D. Roux, “A new tool for epidemiology: the usefulness of dynamic-agent models in understanding place effects on health,” American journal of epidemiology, vol. 168, no. 1, pp. 1–8, 2008.
  • [35] Y. H. Grad, J. C. Miller, and M. Lipsitch, “Cholera modeling: challenges to quantitative analysis and predicting the impact of interventions,” Epidemiology (Cambridge, Mass.), vol. 23, no. 4, p. 523, 2012.
  • [36] E. T. Lofgren, M. E. Halloran, C. M. Rivers, J. M. Drake, T. C. Porco, B. Lewis, W. Yang, A. Vespignani, J. Shaman, J. N. Eisenberg, et al., “Opinion: Mathematical models: A key tool for outbreak response,” Proceedings of the National Academy of Sciences, vol. 111, no. 51, pp. 18095–18096, 2014.
  • [37] M. Lipsitch, L. Finelli, R. T. Heffernan, G. M. Leung, and S. C. Redd; for the 2009 H1N1 Surveillance Group, “Improving the evidence base for decision making during a pandemic: the example of 2009 influenza a/h1n1,” Biosecurity and bioterrorism: biodefense strategy, practice, and science, vol. 9, no. 2, pp. 89–115, 2011.
  • [38] J. Koopman, “Modeling infection transmission,” Annual Review of Public Health, vol. 25, pp. 303–326, 2004.
  • [39] R. Meza, K. Haaf, C. Y. Kong, A. Erdogan, W. C. Black, M. C. Tammemagi, S. E. Choi, J. Jeon, S. S. Han, V. Munshi, et al., “Comparative analysis of 5 lung cancer natural history and screening models that reproduce outcomes of the nlst and plco trials,” Cancer, vol. 120, no. 11, pp. 1713–1724, 2014.
  • [40] E. Walter, Y. Lecourtier, and J. Happel, “On the structural output distinguishability of parametric models, and its relations with structural identifiability,” Automatic Control, IEEE Transactions on, vol. 29, no. 1, pp. 56–57, 1984.
  • [41] I. C.-H. Fung, “Cholera transmission dynamic models for public health practitioners,” Emerging themes in epidemiology, vol. 11, no. 1, p. 1, 2014.
  • [42] W. O. Kermack and A. G. McKendrick, “A contribution to the mathematical theory of epidemics,” Proceedings of the Royal Society of London Series A, vol. 115, pp. 700–721, 1927.
  • [43] J. B. Dunworth, “Nonlinear incidence of waterborne diseases,” Master’s thesis, Ohio State University, 2011.
  • [44] C. N. Haas, J. B. Rose, and C. P. Gerba, Quantitative microbial risk assessment. John Wiley & Sons, 1999.
  • [45] H. J. Wearing, P. Rohani, and M. J. Keeling, “Appropriate models for the management of infectious diseases,” PLoS Medicine, vol. 7, no. 7, 2005.
  • [46] A. L. Lloyd, “Realistic distributions of infectious periods in epidemic models: changing patterns of persistence and dynamics,” Theoretical Population Biology, vol. 60, pp. 59–71, 2001.
  • [47] H. W. Hethcote, “Simulations of pertussis epidemiology in the united states: effects of adult booster vaccinations,” Mathematical biosciences, vol. 158, no. 1, pp. 47–73, 1999.
  • [48] W. H. Mosley, A. S. Benenson, and R. Barui, “A serological survey for cholera antibodies in rural east pakistan: 2. a comparison of antibody titres in the immunized and control populations of a cholera-vaccine field-trial area and the relation of antibody titre to cholera case rate*,” Bulletin of the World Health Organization, vol. 38, no. 3, p. 335, 1968.
  • [49] CIA. Central Intelligence Agency World Factbook - Haiti, URL:, 2011.
  • [50] J. B. Dunworth, Nonlinear Incidence of Waterborne Diseases. PhD thesis, The Ohio State University, 2011.
  • [51] CIA. Central Intelligence Agency World Factbook - Angola, URL:, 2015.
  • [52] A. A. Weil, A. I. Khan, F. Chowdhury, R. C. LaRocque, A. S. G. Faruque, E. T. Ryan, S. B. Calderwood, F. Qadri, and J. B. Harris, “Clinical outcomes of household contacts of patients with cholera in Bangladesh,” Clinical Infectious Diseases, vol. 49, pp. 1473–1479, 2009.
  • [53] C. Cobelli and J. J. DiStefano, “Parameter and structural identifiability concepts and ambiguities: a critical review and analysis,” American Journal of Physiology - Regulatory, Integrative and Comparative Physiology, vol. 239, no. 1, pp. R7–R24, 1980.
  • [54] T. Rothenberg, “Identification in parametric models,” Econometrica, p. 577, 1971.
  • [55] J. A. Jacquez and P. Greif, “Numerical parameter identifiability and estimability: Integrating identifiability, estimability, and optimal sampling design,” Mathematical Biosciences, vol. 77, no. 1-2, pp. 201–227, 1985.
  • [56] M. Komorowski, M. J. Costa, D. A. Rand, and M. P. H. Stumpf, “Sensitivity, robustness, and identifiability in stochastic chemical kinetics models,” Proceedings of the National Academy of Sciences, vol. 108, no. 21, pp. 8645–8650, 2011.
  • [57] A. Cintron-Arias, H. Banks, A. Capaldi, and A. Lloyd, “A sensitivity matrix methodology for inverse problem formulation,” Journal of Inverse and Ill-posed Problems, vol. 17, pp. 1 – 20, 2009.
  • [58] M. C. Eisenberg and M. A. L. Hayashi, “Determining structurally identifiable parameter combinations using subset profiling,” In press, Mathematical Biosciences. arXiv preprint available:1307.2298 [q-bio.QM], 2014.
  • [59] H. Akaike, “Factor analysis and aic,” Psychometrika, vol. 52, no. 3, pp. 317–332, 1987.
  • [60] D. A. Sack, R. Bradley Sack, and Claire-Lise Chaignat, “Getting serious about cholera,” New England Journal of Medicine, vol. 355, no. 7, pp. 649–651, 2006.
  • [61] A. Raue, C. Kreutz, T. Maiwald, J. Bachmann, M. Schilling, U. Klingmuller, and J. Timmer, “Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood,” Bioinformatics, vol. 25, no. 15, pp. 1923–9, 2009.
  • [62] D. Butler, “Models overestimate ebola cases,” Nature, vol. 515, no. 18, 2014.
  • [63] C. Rivers, “Ebola: models do more than forecast,” Nature, vol. 515, no. 492, 2014.
  • [64] J. S. Weitz and J. Dushoff, “Modeling post-death transmission of ebola: Challenges for inference and opportunities for control,” Scientific reports, vol. 5, 2015.
  • [65] M. C. Eisenberg, J. N. Eisenberg, J. P. D’Silva, E. V. Wells, S. Cherng, Y.-H. Kao, and R. Meza, “Forecasting and uncertainty in modeling the 2014-2015 ebola epidemic in west africa,” arXiv preprint, pp. arXiv:1501.05555 [q–bio.PE].
  • [66] M. Pascual, X. Rodo, S. P. Ellner, R. Colwell, and M. J. Bouma, “Cholera dynamics and El Niño-Southern oscillation,” Science, vol. 289, pp. 1766–1769, 2000.
  • [67] M. Pascual, L. F. Chaves, B. Cash, X. Rodo, and M. Yunus, “Predicting endemic cholera: the role of climate variability and disease dynamics,” Climate Research, vol. 36, pp. 131–140, 2008.
  • [68] R. C. Reiner, A. A. King, M. Emch, M. Yunus, A. S. G. Faruque, and M. Pascual, “Highly localized sensitivity to climate forcing drives endemic cholera in a megacity,” Proceedings of the National Academy of Sciences, vol. 109, no. 6, pp. 2033–2036, 2012.
  • [69] A. Alam, R. C. LaRocque, J. B. Harris, C. Vanderspurt, E. T. Ryan, Q. F., and S. B. Calderwood, “Hyperinfectivity of human-passaged Vibrio cholerae by growth in the infant mouse,” Infection and Immunity, vol. 73, no. 10, pp. 6674–6679, 2005.
  • [70] E. Bertuzzo, R. Casagrandi, M. Gatto, I. Rodriguez-Iturbe, and A. Rinaldo, “On spatially explicit models of cholera epidemics,” Journal of the Royal Society Interface, vol. 7, no. 43, pp. 321–333, 2010.
  • [71] M. C. Eisenberg, Z. Shuai, J. H. Tien, and P. van den Driessche, “A cholera model in a patchy environment with water and human movement,” Mathematical Biosciences, vol. 246, no. 1, pp. 105 – 112, 2013.
  • [72] L. Righetto, E. Bertuzzo, R. Casagrandi, M. Gatto, I. Rodriguez-Iturbe, and A. Rinaldo, “Modelling human movement in cholera spreading along fluvial systems,” Ecohydrology, vol. 4, no. 1, pp. 49–55, 2011.
  • [73] A. Alexanderian, M. K. Gobbert, K. R. Fister, H. Gaff, S. Lenhart, and E. Schaefer, “An age-structured model for the spread of epidemic cholera: Analysis and simulation,” Nonlinear Analysis: Real World Applications, vol. 12, no. 6, pp. 3483 – 3498, 2011.
  • [74] A. P. Galvani, T. C. Reluga, and G. B. Chapman, “Long-standing influenza vaccination policy is in accord with individual self-interest but not with the utilitarian optimum,” Proceedings of the National Academy of Sciences, vol. 104, no. 13, pp. 5692–5697, 2007.
  • [75] D. Merl, L. R. Johnson, R. B. Gramacy, and M. Mangel, “A statistical framework for the adaptive management of epidemiological interventions,” PloS One, vol. 4, no. 6, p. e5807, 2009.
  • [76] R. Yaesoubi and T. Cohen, “Dynamic health policies for controlling the spread of emerging infections: influenza as an example,” PloS one, vol. 6, no. 9, p. e24043, 2011.
  • [77] E. P. Fenichel, C. Castillo-Chavez, M. Ceddia, G. Chowell, P. A. G. Parra, G. J. Hickling, G. Holloway, R. Horan, B. Morin, C. Perrings, et al., “Adaptive human behavior in epidemiological models,” Proceedings of the National Academy of Sciences, vol. 108, no. 15, pp. 6306–6311, 2011.
  • [78] K. Shea, M. J. Tildesley, M. C. Runge, C. J. Fonnesbeck, and M. J. Ferrari, “Adaptive management and the value of information: learning via intervention in epidemiology,” PLoS Biol, vol. 12, no. 10, p. e1001970, 2014.
  • [79] S. M. Moore and J. Lessler, “Optimal allocation of the limited oral cholera vaccine supply between endemic and epidemic settings,” Journal of The Royal Society Interface, vol. 12, no. 111, p. 20150703, 2015.
  • [80] M. A. Hayashi and M. C. Eisenberg, “Effects of adaptive protective behavior on the dynamics of sexually transmitted infections,” Journal of theoretical biology, vol. 388, pp. 119–130, 2016.
  • [81] R. Feachem, D. Bradley, H. Garelick, and D. Mara, “Vibrio cholerae and cholera,” in Sanitation and disease – health aspects of excreta and wastewater management (R. G. Feachem, ed.), pp. 297–325, John Wiley & Sons, 1983.
  • [82] M. A. Hood, G. E. Ness, and G. E. Rodrick, “Isolation of Vibrio cholerae serotype O1 from the eastern oyster, Crassostrea virginica.,” Appl. Environ. Microbiol., vol. 41, no. 2, pp. 559–560, 1981.

Appendix A Supplementary Information

a.1 Dimensional and nondimensional forms of each model

As in [8, 20], we use the nondimensional forms for each model throughout in order to reduce issues of model unidentifiability and simplify their presentation. Here we briefly introduce the dimensional and nondimensional forms for each model. We begin with the original SIWR model of Tien and Earn [8]. The dimensional form is given as:


with constant population . Here represents the rate at which susceptibles become infected due to contact with pathogen in the water (), is the transmission parameter for direct transmission, is the rate that infected individuals shed pathogen into the water, the rate of decay of pathogen in the water, the recovery rate, and the population turnover rate. We nondimensionalize the model by letting , , , , , and :


The Exponential model can be nondimensionalized using the same scaling as given for the SIWR model (since it is the same model but with an added loss of immunity term). Similarly, an equivalent scaling can be used for the Gamma and Waning Immunity models, with and for , for each model respectively.

For the Dose Response model, the dimensional form of the model is given by:


To nondimensionalize, we take , , , , and as before, but now we take and simply let , yielding the final form given in (4). (We note that there are other options for nondimensionalization, e.g. letting , but for consistency with the other models we opted for this.)

Lastly, for the Asymptomatic model, the dimensional form of the model is given by:


The nondimensionalization is similar to the previous, except that we note that and are at a fixed ratio to one another, with the fraction of all infections being symptomatic and being asymptomatic. This is because the infected initial conditions and the new cases are both at this ratio, and both equations have the same loss rates (). Thus we take , and write . Then let , and we can take an analogous rescaling as for the SIWR model above, letting , , , , , , , , and . This rescaling yields the final model given in (5).

a.2 Baseline parameter values and estimation details

As noted in the Methods section, we made a few modifications of the original parameters for the SIWR model fitted to the Angola epidemic, based other studies in the literature. We increased the value of the pathogen decay rate to 1/100 days (pathogen lifetime approximately three months), to better match the faster decay rates measured in other studies [14, 81, 82, 4]. As and form a practically identifiable combination (Figure S1) [20], we also reduced from  1 [20] to 0.5, which resulted in approximately the same epidemic shape and gave similar values for both waterborne and direct transmission parameters. The values of and were based on previous Angola data estimates[20], with adjusted after our changes to and to make roughly the same size epidemic as in [20].

Additionally, for parameter estimation, we found that in the normal noise case, maximum likelihood (i.e. weighted least squares) often yielded poor performance when the simulation model did not match the fitting model (e.g. Gamma fitted to Asymptomatic data). As the true underlying error distribution would not be known a priori, we instead fitted using weighted least squares with a weight of , as suggested in [20] and used for fitting the Angola data here. This gives a balance between weighted (where would be proportional to ) and unweighted least squares (where the weights are proportional to , i.e. constant weighting). We also note that for the Angola data, the prevalence measurement equation used is only an approximation for the weekly incidence data (similar to previous studies [20, 18]). However the approximation was quite close, with a total (cumulative) error of over the entire epidemic (simulating weekly incidence vs. using the prevalence approximation using the default parameters).

a.3 Parameter estimation results

a.3.1 100-day epidemic data

In addition to pooling parameter estimate deviations across parameters (Figure 4), we also grouped them by the model used to generate the data and the type of noise added to the simulated data (Figure S2). No data simulation model enables better recapture of parameter estimates, but the Exponential and Dose Response models had notably smaller deviations from the true parameter when fitting their own simulated data. In addition, the Gamma and Waning Immunity models recaptured true parameter values from the Gamma simulated data well (less than 10% deviation). Median deviations from the true parameter remained relatively constant across added noise categories, but deviations tended to be smaller with less noise, as evidenced by the longer lower tails of the distributions in the ‘None’ and ‘Poisson’ panels. As model complexity in the dimension of loss of immunity increased, the absolute deviation for the loss of immunity parameter () decreased (Figure S3).

Figure S1: Scatterplot of and estimates by colored fitting model with all data points (left) and without Dose Response model estimates and fits (right).
Figure S2: Percent deviation of parameter estimates from true values, grouped by model used to simulate the initial data (left) and type of noise added to data (right) for the simulated 100-day data. The model used to fit the data and estimate the parameter is indicated by color. The median across all estimates (i.e., across added noise type and simulation data) is marked with a black point in the distribution and the black dashed line represents deviation. Distribution ranges are truncated for visibility.
Figure S3: Absolute percent deviation of parameter estimates on a log scale, grouped by parameter. The model used to fit the data and estimate the parameter is indicated by color.

a.3.2 3-year long-term data

In examining the pooled parameter estimates for the simulated 3-year data, we found that parameter estimates were more accurate when models fit to data generated from models with the same distribution of waning immunity (exponentially-distributed waning immunity: Exponential, Dose Response, and Asymptomatic models; gamma-distributed waning immunity: Gamma and Waning Immunity models) (Figure S4). There were no discernible patterns in parameter estimate accuracy among added noise types (Figure S4). We also note that for the case of the Gamma Model fitted to data from the Exponential model, the convergence criterion for optimization had to be increased to ensure convergence.

Figure S4: Percent deviation of parameter estimates from true values, grouped by model used to simulate the initial data (left) and type of noise added to data (right) for the simulated 3-year data. The model used to fit the data and estimate the parameter is indicated by color. The median across all estimates (i.e., across added noise type and simulation data) is marked with a black point in the distribution and the black dashed line represents deviation. Distribution ranges are truncated for visibility.

a.4 Model fits to simulated epidemic (100-day) data

Shown below are the model fits for all simulating model datasets with informed starting parameters (Figure S5).


Figure S5: Fits to simulating model data (indicated by row) without noise (left column), with normal noise (middle column), and with poisson noise (right column), using informed starting parameters. Model fits are overlaid, thus obscuring some of the model fits in the figure.

a.5 Model fits to simulated long-term (3-year) data

Shown below are the model fits for all simulating model datasets with informed starting parameters (Figure S6) and naive starting parameters (Figure S7).


Figure S6: Fits to simulating model data (indicated by row) without noise (left column), with normal noise (middle column), and with poisson noise (right column), using informed starting parameters. Model fits are overlaid, thus obscuring some of the model fits in the figure.


Figure S7: Fits to simulating model data (indicated by row) without noise (left column), with normal noise (middle column), and with poisson noise (right column), using naive starting parameters. Model fits are overlaid, thus obscuring some of the model fits in the figure.

a.6 Parameter estimate tables to simulated epidemic data

We report the estimates for common model parameters to all 100-day simulated data fits in Tables S1 through S6. Across all fits with the Asymptomatic model, the mean and standard deviation of and were 0.002 (SD = 0.001) and 0.004 (SD = 0.005), respectively. The reported estimates of for this model are given as the weighted average of and (as described in the methods).

(a) Exponential Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2494 0.2517 0.2494 0.2560 0.2505
0.1809 0.5414 0.4131 1.0600 0.6476
0.0030 0.0017 0.0027 2.47e-06 4.69e-05
0.0113 0.0034 0.0124 0.0038 0.0075
1.96e-05 2.03e-5 1.99e-5 1.83e-5 2.02e-5
0 0 4 15 4
1.72 6.42 2.65 5.26 3.59
(b) Dose Response Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2506 0.2500 0.2480 0.2780 0.2449
1.0709 0.5000 0.6989 1.4200 1.0399
0.0030 0.0027 0.0026 4.94e-06 3.46e-05
0.0117 0.0100 0.0191 0.0055 0.0131
2.02e-5 2.00e-5 2.00e-5 1.74e-5 2.10e-5
0 0 4 220 7
5.28 6.00 3.79 6.79 5.14

(c) Asymptomatic Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2496 0.2503 0.2500 0.2590 0.2594
0.3979 0.2308 0.5000 0.6680 0.6680
0.0050 0.0043 0.0034 2.84e-05 2.96e-05
0.0129 0.0085 0.0100 0.0053 0.0053
1.97e-5 1.97e-5 2.00e-5 1.62e-5 1.62e-5
0 0 3 40 38
2.59 3.31 3.00 3.71 3.71

(d) Gamma Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2480 0.2494 0.2489 0.2500 0.2500
0.3139 0.3510 0.4596 0.5000 0.4998
0.0011 1.54e-9 0.0012 3.09e-05 3.21e-05
0.0175 0.0057 0.0113 0.0100 0.0100
1.96e-5 2.08e-5 2.03e-5 2.00e-5 2.00e-5
0 1 4 5 4
2.25 4.51 2.83 3.00 3.00

(e) Waning Immunity Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2484 0.2495 0.2489 0.2500 0.2500
0.3390 0.2716 0.4617 0.5000 0.5000
0.0008 3.37e-9 0.0013 3.09e-05 3.58e-05
0.0159 0.0074 0.0112 0.0100 0.0100
1.97e-5 2.04e-5 2.03e-5 2.00e-5 2.00e-5
0 0 4 5 4
2.35 3.71 2.84 3.00 3.00

Table S1: Noise-free with Informed Starting Parameters
(a) Exponential Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2494 0.2503 0.2506 0.2560 0.2502
0.4159 0.2186 0.6352 1.0800 0.5607
0.0031 0.0025 0.0024 6.54e-05 2.35e-05
0.0124 0.0090 0.0076 0.0037 0.0088
1.99e-5 1.96e-5 2.01e-5 1.83e-5 2.00e-5
0 0 4 15 4
2.66 3.19 3.54 5.34 3.24

(b) Dose Response Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2488 0.2438 0.2514 0.2780 0.2477
0.7729 0.2353 1.3799 1.4300 1.6883
0.0033 0.0033 0.0024 5.06e-05 4.94e-05
0.0170 0.0247 0.0088 0.0055 0.0076
2.00e-5 1.92e-5 2.04e-5 1.74e-5 2.27e-5
0 0 4 220 5
4.09 3.33 6.52 6.83 7.74

(c) Asymptomatic Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2491 0.2512 0.2507 0.2600 0.2601
0.3934 0.3603 0.6118 0.4370 0.4339
0.0052 0.0037 0.0032 3.7e-05 3.33e-05
0.0133 0.0052 0.0079 0.0080 0.0081
1.99e-5 2.00e-5 2.00e-5 1.56e-5 1.56e-5
0 0 3 51 49
2.57 4.61 3.45 2.79 2.78

(d) Gamma Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2499 0.2494 0.2484 0.2500 0.2497
0.4943 0.4224 0.4910 0.5780 0.4483
2.74e-5 2.30e-8 0.0018 2.22e-05 2.47e-05
0.0102 0.0047 0.0107 0.0086 0.0113
2.00e-5 2.11e-5 2.06e-5 2.03e-5 1.99e-5
9 11 0 16 14
2.98 5.22 2.96 3.31 2.79

(e) Waning Immunity Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2499 0.2494 0.2485 0.2500 0.2501
0.5005 0.8094 0.5234 0.5680 0.4476
9.66e-6 1.78e-8 0.0016 3.58e-05 3.83e-05
0.0100 0.0024 0.0100 0.0088 0.0112
2.00e-5 2.17e-5 2.07e-5 2.03e-5 1.98e-5
0 3 5 7 5
3.00 9.09 3.09 3.27 2.79

Table S2: Noise-free with Naive Starting Parameters
(a) Exponential Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2395 0.2384 0.2364 0.2480 0.2436
0.2816 0.1160 0.2298 0.7500 0.8582
0.0040 0.0040 0.0044 1.23e-05 4.44e-05
0.0215 0.0213 0.0294 0.0060 0.0059
1.96e-5 1.92e-5 1.96e-5 1.86e-5 2.07e-5
0 0 0 23 11
2.08 2.11 1.86 3.99 4.41

(b) Dose Response Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2677 0.2677 0.2679 0.2890 0.2660
0.8064 0.3931 0.8161 1.3700 2.1494
0.0026 0.0022 0.0018 2.47e-06 3.09e-05
0.0136 0.0110 0.0133 0.0053 0.0050
1.93e-5 1.90e-5 1.93e-5 1.71e-5 2.01e-5
0 1 5 109 16
4.30 5.00 4.34 6.63 9.66

(c) Asymptomatic Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2212 0.2274 0.1658 0.2420 0.2420
0.2825 0.2013 0.1927 0.5050 0.5047
0.0091 0.0061 0.0138 3.58e-05 3.46e-05
0.0297 0.0136 0.1446 0.0093 0.0093
2.15e-5 2.15e-5 1.98e-5 1.67e-5 1.67e-5
42 58 0 124 126
2.01 2.92 1.43 2.99 2.99

(d) Gamma Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2792 0.2811 0.2803 0.2810 0.2806
0.1144 0.1017 0.1470 0.4290 0.4279
0.0051 0.0011 0.0027 5.19e-05 2.22e-05
0.0250 0.0092 0.0174 0.0055 0.0055
1.70e-5 1.64e-5 1.66e-5 1.68e-5 1.68e-5
0 9 8 14 12
1.57 2.14 1.71 2.84 2.83

(e) Waning Immunity Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2568 0.2562 0.2623 0.2640 0.2642
0.1184 0.0638 0.5149 0.5060 0.5034
0.0067 0.0054 0.0014 4.32e-05 1.98e-05
0.0446 0.0317 0.0072 0.0068 0.0068
1.72e-5 1.79e-5 1.90e-5 1.80e-5 1.80e-5
0 2 11 17 12
1.50 1.66 3.11 3.08 3.07

Table S3: Normal Noise with Informed Starting Parameters
(a) Exponential Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2373 0.2418 0.2417 0.2480 0.2483
0.2515 0.1836 0.4272 0.4700 0.7331
0.0048 0.0026 0.0023 2.72e-05 2.22e-05
0.0258 0.0119 0.0129 0.0097 0.0060
1.97e-5 1.98e-5 2.02e-5 1.79e-5 1.88e-5
0 7 10 25 22
1.95 2.80 2.68 2.87 3.92

(b) Dose Response Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2670 0.2689 0.2679 0.2890 0.2647
0.6610 0.6375 0.7955 1.3700 0.7846
0.0028 0.0021 0.0018 6.54e-05 1.6e-05
0.0171 0.0065 0.0137 0.0052 0.0148
1.91e-5 1.94e-5 1.92e-5 1.71e-5 1.98e-5
0 2 5 109 23
3.71 7.45 4.25 6.63 4.20

(c) Asymptomatic Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2052 0.2282 0.2248 0.2430 0.2278
0.1888 0.1954 0.3747 6.7300 0.5668
0.0132 0.0058 0.0064 3.7e-05 2.35e-05
0.0697 0.0138 0.0199 0.0007 0.0119
1.99e-5 2.11e-5 2.21e-5 1.84e-5 2.24e-5
0 45 42 101 44
1.58 2.87 2.40 27.89 3.18

(d) Gamma Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2805 0.2811 0.2772 0.2810 0.2801
0.3507 0.0878 0.9213 0.3640 0.5275
0.0004 0.0015 0.0013 4.07e-05 3.09e-05
0.0068 0.0107 0.0031 0.0064 0.0046
1.69e-5 1.63e-5 1.89e-5 1.66e-5 1.72e-5
0 0 9 6 4
2.52 2.00 4.79 2.58 3.23

(e) Waning Immunity Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2532 0.2608 0.2578 0.2640 0.2642
0.1279 0.0798 0.1455 0.5050 0.5038
0.0083 0.0030 0.0054 3.33e-05 2.47e-05
0.0470 0.0208 0.0338 0.0068 0.0068
1.86e-5 1.76e-5 1.81e-5 1.80e-5 1.80e-5
0 7 6 19 14
1.52 1.84 1.61 3.08 3.07

Table S4: Normal Noise with Naive Starting Parameters
(a) Exponential Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2494 0.2494 0.2497 0.2550 0.2554
0.4999 0.1967 0.5043 0.6140 0.6139
0.0025 0.0024 0.0022 3.7e-05 2.47e-05
0.0100 0.0101 0.0098 0.0067 0.0067
1.98e-5 1.94e-5 1.97e-5 1.78e-5 1.78e-5
0 0 4 141 9
3.00 2.96 3.02 3.48 4.48

(b) Dose Response Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2479 0.2496 0.2496 0.2810 0.2508
0.7228 0.4984 0.9517 0.8190 8.9934
0.0035 0.0028 0.0024 2.59e-05 0.000312
0.0185 0.0101 0.0134 0.0093 0.0013
2.01e-5 2.01e-5 2.03e-5 1.69e-5 2.11e-5
0 0 4 2504 4
3.88 5.98 4.80 4.40 36.97

(c) Asymptomatic Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2238 0.2433 0.2412 0.2550 0.2556
0.1752 0.1922 0.4279 0.5990 0.6030
0.0140 0.0052 0.0049 3.58e-05 4.57e-05
0.0608 0.0117 0.0140 0.0063 0.0063
2.06e-5 2.04e-5 2.14e-5 1.63e-5 1.63e-5
0 7 9 411 40
1.60 2.89 2.68 3.42 3.43

(d) Gamma Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2541 0.2531 0.2528 0.2540 0.2541
0.4763 0.7484 0.6401 0.5030 0.5028
3.00e-8 1.00e-8 0.0011 3.09e-05 2.96e-05
0.0099 0.0025 0.0076 0.0094 0.0094
1.97e-5 2.14e-5 2.08e-5 1.98e-5 1.99e-5
0 2 5 108 4
2.92 8.49 3.57 3.03 3.03

(e) Waning Immunity Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2576 0.2573 0.2582 0.2590 0.2592
0.4043 0.1973 0.2819 0.2570 0.2568
3.00e-8 4.74e-9 0.0011 1.48e-05 3.46e-05
0.0112 0.0092 0.0161 0.0171 0.0171
1.92e-5 1.93e-5 1.84e-5 1.77e-5 1.77e-5
22 25 24 0 24
2.65 3.00 2.16 2.06 2.06

Table S5: Poisson Noise with Informed Starting Parameters
(a) Exponential Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2497 0.2509 0.2491 0.2550 0.2552
0.5188 0.4631 0.4233 0.6130 0.6067
0.0023 0.0015 0.0025 2.72e-05 3.21e-05
0.0096 0.0040 0.0120 0.0067 0.0068
1.98e-5 2.00e-5 1.96e-5 1.78e-5 1.78e-5
0 0 4 141 9
3.07 5.63 2.69 3.47 3.45

(b) Dose Response Data
Parameters/AIC Exponential Dose Response Asymptomatic Gamma Waning Immunity
0.2493 0.2501 0.2511 0.2790 0.2409
0.8864 0.5567 1.4445 1.4300 0.7726
0.0033 0.0028 0.0027 0.000159 4.07e-05
0.0146 0.0089 0.0085 0.0054