Information geometry and entropy in a stochastic epidemic rate process
Epidemic models with inhomogeneous populations have been used to study major
outbreaks and recently Britton and Lindenstrand  described the case
when latency and infectivity have independent gamma distributions. They found that
variability in these random variables had opposite effects on the epidemic
growth rate. That rate increased with greater variability in latency but decreased
with greater variability in infectivity. Here we extend their result by using the McKay bivariate
gamma distribution for the joint distribution of latency and infectivity, recovering
the above effects of variability but allowing possible correlation. We use methods
of stochastic rate processes to obtain explicit solutions for the growth of
the epidemic and the evolution of the inhomogeneity and information entropy.
We obtain a closed
analytic solution to the evolution of the distribution of the number of uninfected
individuals as the epidemic proceeds, and a concomitant
expression for the decay of entropy.
The family of McKay bivariate gamma distributions has a tractable information
geometry which provides a framework
in which the evolution of distributions can be studied as the outbreak grows,
with a natural distance structure for quantitative tracking of progress.
Keywords: Epidemic model, stochastic rate process, inhomogeneous, bivariate gamma, information geometry, entropy, distribution evolution.
The spreading of an infectious disease involves a (large) population in which an initially small number of individuals are infected. Each infected individual is for a period of latency not yet infectious but at the end of the latent period the individual becomes infectious for a period models provide distributions for the random variables and The rest of the population is susceptible to infection from infectious individuals; this susceptibility is often taken to be a constant but later we shall consider a population with an evolving inhomogeneous distribution of susceptibilities to infection. An infectious individual has random infectious contacts at a rate and contact with a susceptible individual results in infection and then the latent period of that individual commences. The so-called basic reproduction number is the product of the rate of infectious contacts and the mean period of infectiousness
 discussed the sensitivity of dynamical properties of an epidemic model to the choices of formulation and made use of a gamma distribution for the period of infectiousness and allowed for optional seasonality.  introduced a new approach to the analysis of epidemic time series data to take account of partial observation of latency and the temporal aggregation of observed data. They showed that homogeneous standard models can miss key features of epidemics in large populations. Also,  devised an estimate of reproduction number in terms of coarsely reported epidemic data, showing that an ideal reporting interval is the mean generation time rather than a fixed chronological interval. See also recent work by  for related results on partially observed data and by  on general distributions of generating intervals.
 have edited a new collection of articles on mathematical and statistical approaches to epidemic modelling and Chapter 2 there, by G. Chowell and F. Bauer, gives a detailed study of the basic reproduction rate in a variety of epidemic models.  addressed the sensitivity of the reproduction number to the shape of the distribution of generation intervals and obtained upper bounds even in the situation of no information on shape.
Recently  described a model where the period of latency and the period of infectiousness have independent gamma distributions. They found that variability in these random variables had opposite effects on the epidemic growth rate. That rate increased with greater variability in but decreased with greater variability in Here we extend their result by using the McKay bivariate gamma distribution for the joint distribution of and recovering the above effects of variability but allowing in case it may be of relevance the possibility of correlation. One might imagine that in the case of a disease in which the physical changes during latency lead to longer future infectiousness if the period of their development is longer, then the random variables and may have a positive correlation. We use methods of stochastic rate processes to obtain explicit solutions for the growth of the epidemic and the evolution of the inhomogeneity and information entropy. This admits a closed analytic solution to the evolution of the distribution of the number of uninfected individuals as the epidemic proceeds, and a concomitant expression for the decay of entropy. The family of McKay bivariate gamma distributions has a tractable information geometry which provides a framework in which the evolution of distributions can be studied as the outbreak grows, with a natural distance structure for quantitative tracking of progress.
2 Inhomogeneous Malthusian epidemic models
In their discussion of epidemic modelling,  highlighted aspects when stochastic features are more important than deterministic ones. In particular, they described the importance of admitting random variables to represent the period of latency and the period of infectiousness Their standard susceptible-exposed-infectious-removed (SEIR) epidemic model was elaborated using independent gamma distributions for and with means and standard deviations The basic (mean) reproduction number is given by
where is the rate of infectious contacts and is the mean length of infectious period. An epidemic becomes a major outbreak if and then the number infected increases exponentially,
where the Malthusian parameter satisfies the equation
Their independent bivariate model expresses in terms of the parameters of the two gamma distributions. They used means, and coefficients of variation to deduce
Then  found from numerical analysis of (4) that, at fixed the growth rate is monotonically decreasing with and but it is increasing with So increased variability in latency period increases the epidemic growth rate whereas increased variability in infectious period decreases the epidemic growth rate.
3 Bivariate gamma distribution of periods of latency and infectiousness
The model described here adds to the work of  in which they used independent univariate gamma distributions for the periods of latency and infectiousness in an epidemic model that they illustrated with data from the SARS outbreak . They used numerical methods to obtain approximate solutions. Our contribution is to use a bivariate gamma distribution which allows positive correlation between the random variables representing the periods of latency and infectiousness. That could represent a situation where physical changes during the latency period lead to longer future infectiousness if the period of their development is longer. We obtain a closed analytic solution and show that the same qualitative features persist in the presence of such correlation. This makes available the analytic information geometry of the space of probability densities, allowing comparison of possible trajectories for the epidemic against, for example, exponential distributions for periods of infectiousness or of latency, corresponding to underlying Poisson processes.
Somewhat surprisingly, it is rather difficult to devise bivariate versions of Poisson, exponential distributions or more generally gamma distributions that have reasonably simple form, and indeed only Freund bivariate exponential and McKay bivariate gamma distributions seem to have tractable information geometry . The family of McKay bivariate gamma density functions is defined on with parameters and probability density functions, Figure 1,
Here which must be positive, is the covariance of and and is the probability density for the two random variables and where and both have gamma density functions.
We obtain the means, standard deviations and coefficients of variation by direct integration:
The correlation coefficient, and marginal probability density functions of and are given by
Figure 2 shows a plot of the correlation coefficient from equation (9). The marginal probability density functions of latency period and infectiousness period are gamma with shape parameters and , respectively. It is not possible to choose parameters such that both marginal functions are exponential, so the two random variables cannot both arise from Poisson processes in this model.
4 Stochastic rate processes
For a detailed monograph on stochastic epidemic models see . We consider here a class of simple stochastic rate processes where a population of uninfected individuals, is classified by a smooth family of time-dependent probability density functions with random variable having at time mean and variance This situation was formulated by ,  in the following way. Let represent the frequency at the -cohort, then we have
General solutions for these equations were given in , from which we obtain
Here is the Laplace transform of the initial probability density function and so conversely is the inverse Laplace transform of the population (monotonic) decay solution See  for more discussion of the existence and uniqueness properties of the correspondence between probability densities and their Laplace transforms. In this section we shall use to represent the decreasing population of uninfected individuals as an epidemic grows. In our context of an epidemic model we might view the random variable as a feature representing susceptibility to infection in the population; in general this distribution will evolve during the epidemic. The model can be reformulated for a vector representing a composite population with a vector of distributions and a matrix of variables .
It is easy to deduce the rate process for entropy from Karev’s model. The Shannon entropy at time is
which reduces to
By using the decay rate is then
This result shows how the variance controls the entropy change during quite general inhomogeneous population processes. In fact equation (21) and further related results were given also in subsequent papers , . We note that the reverse process of population growth may have applications in constrained disordering type situations .
4.1 Initial growth rate
Here, from , for the average number of infections per infective, so is the contact rate; this gives the Malthusian parameter analytically in explicit form as
Thus, is monotonically decreasing with and but increasing with Figure 3 and Figure 4 plot typical values from the SARS epidemic  as used by . Figure 5 and Figure 6 show corresponding contour plots of the infectivity rate So the bivariate gamma model reveals that the result of  for the dependence of growth rate on variability in the periods of latency and infectiousness in the independent case persists also in the presence of correlation between these two random variables. Such a correlation may be relevant in particular applications, when physical changes evolve during the latent period and influence the length of the subsequent infectiousness period.
We can estimate also the evolution of an inhomogeneous distribution of susceptibility as the population of uninfected individuals declines with time For example, the case when the initial distribution is a gamma distribution with parameters was solved in  giving the result
Then the time dependences of mean, standard deviation and coefficient of variation are given by
From (21), we can see that the rate of entropy decrease is greater for more variability in susceptibility.
5 Information geometry of the space of McKay bivariate gamma distributions
Information geometry of the smooth family of McKay bivariate gamma probability density functions, which is of exponential type, has been studied in detail in  Chapter 4. This provides a Riemannian metric on yielding a curved 3-manifold so the affine immersion is a 3-dimensional object in which we can only represent in through its 2-dimensional submanifolds. Here we illustrate how the geometry may nevertheless be used to provide a natural distance structure on the space of the McKay distributions used in our epidemic model.
First we measure distances from distributions with exponential marginal distributions—those for which when the latency periods are controlled by a Poisson event process.
The derivation of a distance from distribution is given in , and yields in terms of and
where is the digamma function and is the Euler gamma constant—with numerical value about Figure 7 shows a plot of from equation (28). This is an approximation to the Riemannian distance but it represents the main features of the information distance of arbitrary latency period distributions from the curve of distributions with
Repeating the above procedure for the case when has which corresponds to an exponential infectiousness period distribution (and a Poisson process of infections) we obtain
This is plotted in Figure 8. The two graphics, Figures 7 and 8, show how we can depict the parameters in the joint distribution of periods of latency and infectiousness as surfaces of distance, measured from the two reference cases for the evolution of the epidemic starting from Poisson processes, respectively. On such surfaces could be represented data on the progress of epidemics under different intervention schemes, or simulations of such scenarios.
Geodesic curves in Riemannian manifolds give minimal arc length and examples are given in  for manifolds of Weibull, gamma and McKay bivariate gamma distributions, together with gradient flow curves for entropy. More details of the information geometry of uniform, exponential, gamma, Gaussian, and bivariate versions with applications are provided in .
 highlighted aspects when stochastic features are important and used independent gamma random variables to represent inhomogeneity of latency and infectiousness periods. In this paper we have a bivariate inhomogeneous epidemic process, modeled by correlated gamma distributions and we can use similar methods to depict and quantify departures from exponential periods of latency and infectiousness. This shows that the result of  for the dependence of growth rate on variability in the periods of latency and infectiousness in the independent case persists also in the presence of correlation between the two random variables, Figures 3 and 4. Moreover, the information theoretic distance from the two reference scenarios of exponential distributions of periods latency and infectiousness, Figures 7 and 8, provide natural quantitative representations for comparing different parametric data.
 used independent gamma distributions for periods of latency and infectiousness, from which the reproduction rate can be estimated, with applications for example to the SARS outbreak . Here we have used a bivariate gamma distribution which allows a corresponding reproduction rate to be computed. Also, we considered the case when the susceptibility to infection is not uniform and illustrated with the case when it begins as a gamma distribution then evolves as the epidemic proceeds. Other models could be used for the initial distribution of susceptibilities, including asymmetric distributions. A wide range of such other cases using log-gamma distributions is considered in  for a similar rate process applied to an evolutionary model when the random variable represents unfitness (like susceptibility to infection) in a population.
-  H. Andersson and T. Britton. Stochastic epidemic models and their statistical analysis. Lecture Notes in Statistics, Springer-Verlag, New York, Berlin 2000.
-  H. Andersson and R.M. May. Infectious Diseases of Humans: Dynamics and Control, Oxford University Press, Oxford 1991.
-  Khadiga Arwini and C.T.J. Dodson. Information Geometry Near Randomness and Near Independence. Lecture Notes in Mathematics, Springer-Verlag, New York, Berlin 2008.
-  N.T.J. Bailey. The Mathematical Theory of Infectious Diseases and its Applications. Griffin, London 1975.
-  T. Britton and D. Lindenstrand. Epidemic modelling: aspects where stochasticity matters. Mathematical Biosciences 222, 2 (2009) 109-116. Cf. also http://arxiv.org/abs/0812.3505 3 January 2009. http://arxiv.org/abs/0812.3505
-  Y. Cai, C.T.J. Dodson, O. Wolkenhauer and A.J. Doig. Gamma Distribution Analysis of Protein Sequences shows that Amino Acids Self Cluster. J. Theoretical Biology 218, 4 (2002) 409-418.
-  S. Cauchemez and N.M. Ferguson. Likelihood-based estimation of continuous-time epidemic models from time-series data: application to measles transmission in London. J. Royal Soc. Interface 5 (2008) 885-897.
-  Gerardo Chowell, James M. Hyman, Luis M. A. Bettencourt and Carlos Castillo-Chavez Editors. Mathematical and Statistical Estimation Approaches in Epidemiology. Springer Dordrecht, Heidelberg, London, New York, 2009.
-  C.T.J. Dodson. On the entropy flows to disorder. In C. H. Skiadas and I. Dimotikalis, (Eds.), Chaotic Systems: Theory and Applications World Scientific, Singapore, 2010 pp 75-84. http://arxiv.org/abs/0811.4318
-  C.T.J. Dodson. An inhomogeneous stochastic rate process for evolution from states in an information geometric neighbourhood of uniform fitness. Invited paper at 3rd Conference on Information Geometry and its Application, Leipzig 2-6 August 2010. Cf also: http://arxiv.org/abs/1001.4177v1
-  O. Diekmann and J.A.P. Heesterbeek. Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation, John Wiley, Chichester 2000.
-  W. Feller. An Introduction to Probability Theory and its Applications, Volume II 2 Edition, Wiley, New York 1971.
D. He, E. L. Ionides, and A. A. King. Plug-and-play inference for disease dynamics:
measles in large and small populations as a case study. J. Royal Soc. Interface
-  G.P. Karev. Inhomogeneous models of tree stand self-thinning. Ecological Modelling 160 (2003) 23-37.
-  G.P. Karev. Replicator equations and the principle of minimal production of information. Bulletin Mathematical Biology 72, 5 (2010) 1124-1142. http://arxiv.org/abs/0901.2378
-  G.P. Karev. On mathematical theory of selection: continuous time population dynamics. Journal Mathematical Biology 60 (2010) 107-129.
-  A.L. Lloyd. Destabilization of epidemic models with the inclusion of realistic distributions of infectious periods. Proc. Royal Soc. Lond B 268 (2001) 985-993.
-  Joel C. Miller, Bahman Davoudi, Rafael Meza, Anja Slim and Babak Pourbohloul. Epidemics with general generation interval distributions. Preprint, 2009. http://arxiv.org/abs/0905.2174v2.pdf
H. Nishiura, G. Chowell, H. Heesterbeek and J. Wallinga.
J. Royal Soc. Interface (2009) Online
-  J. Wallinga and M. Lipsitch. How generation intervals shape the relationship between growth rates and reproductive numbers. Proc. Royal Soc. B 274 (2007) 599-604.
-  WHO. Cumulative Number of Reported Probable Cases of Severe Acute Respiratory Syndrome (SARS). http://www.who.int/csr/sars/country/en/