Information geometry and entropy in a stochastic epidemic rate process

# Information geometry and entropy in a stochastic epidemic rate process

C.T.J. Dodson
School of Mathematics, University of Manchester, Manchester M13 9PL, UK
ctdodson@manchester.ac.uk
###### Abstract

Epidemic models with inhomogeneous populations have been used to study major outbreaks and recently Britton and Lindenstrand [5] described the case when latency and infectivity have independent gamma distributions. They found that variability in these random variables had opposite effects on the epidemic growth rate. That rate increased with greater variability in latency but decreased with greater variability in infectivity. Here we extend their result by using the McKay bivariate gamma distribution for the joint distribution of latency and infectivity, recovering the above effects of variability but allowing possible correlation. We use methods of stochastic rate processes to obtain explicit solutions for the growth of the epidemic and the evolution of the inhomogeneity and information entropy. We obtain a closed analytic solution to the evolution of the distribution of the number of uninfected individuals as the epidemic proceeds, and a concomitant expression for the decay of entropy. The family of McKay bivariate gamma distributions has a tractable information geometry which provides a framework in which the evolution of distributions can be studied as the outbreak grows, with a natural distance structure for quantitative tracking of progress.
Keywords: Epidemic model, stochastic rate process, inhomogeneous, bivariate gamma, information geometry, entropy, distribution evolution.

## 1 Introduction

Epidemiology is a big subject with a long history and a large literature; standard texts on modeling include [4][2][11], note also the new volume on numerical methods [1].

The spreading of an infectious disease involves a (large) population in which an initially small number of individuals are infected. Each infected individual is for a period of latency not yet infectious but at the end of the latent period the individual becomes infectious for a period models provide distributions for the random variables and The rest of the population is susceptible to infection from infectious individuals; this susceptibility is often taken to be a constant but later we shall consider a population with an evolving inhomogeneous distribution of susceptibilities to infection. An infectious individual has random infectious contacts at a rate and contact with a susceptible individual results in infection and then the latent period of that individual commences. The so-called basic reproduction number is the product of the rate of infectious contacts and the mean period of infectiousness

[17] discussed the sensitivity of dynamical properties of an epidemic model to the choices of formulation and made use of a gamma distribution for the period of infectiousness and allowed for optional seasonality. [7] introduced a new approach to the analysis of epidemic time series data to take account of partial observation of latency and the temporal aggregation of observed data. They showed that homogeneous standard models can miss key features of epidemics in large populations. Also, [19] devised an estimate of reproduction number in terms of coarsely reported epidemic data, showing that an ideal reporting interval is the mean generation time rather than a fixed chronological interval. See also recent work by [13] for related results on partially observed data and by [18] on general distributions of generating intervals.

[8] have edited a new collection of articles on mathematical and statistical approaches to epidemic modelling and Chapter 2 there, by G. Chowell and F. Bauer, gives a detailed study of the basic reproduction rate in a variety of epidemic models. [20] addressed the sensitivity of the reproduction number to the shape of the distribution of generation intervals and obtained upper bounds even in the situation of no information on shape.

Recently [5] described a model where the period of latency and the period of infectiousness have independent gamma distributions. They found that variability in these random variables had opposite effects on the epidemic growth rate. That rate increased with greater variability in but decreased with greater variability in Here we extend their result by using the McKay bivariate gamma distribution for the joint distribution of and recovering the above effects of variability but allowing in case it may be of relevance the possibility of correlation. One might imagine that in the case of a disease in which the physical changes during latency lead to longer future infectiousness if the period of their development is longer, then the random variables and may have a positive correlation. We use methods of stochastic rate processes to obtain explicit solutions for the growth of the epidemic and the evolution of the inhomogeneity and information entropy. This admits a closed analytic solution to the evolution of the distribution of the number of uninfected individuals as the epidemic proceeds, and a concomitant expression for the decay of entropy. The family of McKay bivariate gamma distributions has a tractable information geometry which provides a framework in which the evolution of distributions can be studied as the outbreak grows, with a natural distance structure for quantitative tracking of progress.

## 2 Inhomogeneous Malthusian epidemic models

In their discussion of epidemic modelling, [5] highlighted aspects when stochastic features are more important than deterministic ones. In particular, they described the importance of admitting random variables to represent the period of latency and the period of infectiousness Their standard susceptible-exposed-infectious-removed (SEIR) epidemic model was elaborated using independent gamma distributions for and with means and standard deviations The basic (mean) reproduction number is given by

 R0=λμI (1)

where is the rate of infectious contacts and is the mean length of infectious period. An epidemic becomes a major outbreak if and then the number infected increases exponentially,

 nI(t)∼ert (2)

where the Malthusian parameter satisfies the equation

 E(e−rtλ Prob{L

Their independent bivariate model expresses in terms of the parameters of the two gamma distributions. They used means, and coefficients of variation to deduce

 r=R0μI(1+rτ2LμL)−1/τ2L(1−(1+rτ2IμI)−1/τ2I). (4)

Then [5] found from numerical analysis of (4) that, at fixed the growth rate is monotonically decreasing with and but it is increasing with So increased variability in latency period increases the epidemic growth rate whereas increased variability in infectious period decreases the epidemic growth rate.

## 3 Bivariate gamma distribution of periods of latency and infectiousness

The model described here adds to the work of [5] in which they used independent univariate gamma distributions for the periods of latency and infectiousness in an epidemic model that they illustrated with data from the SARS outbreak [21]. They used numerical methods to obtain approximate solutions. Our contribution is to use a bivariate gamma distribution which allows positive correlation between the random variables representing the periods of latency and infectiousness. That could represent a situation where physical changes during the latency period lead to longer future infectiousness if the period of their development is longer. We obtain a closed analytic solution and show that the same qualitative features persist in the presence of such correlation. This makes available the analytic information geometry of the space of probability densities, allowing comparison of possible trajectories for the epidemic against, for example, exponential distributions for periods of infectiousness or of latency, corresponding to underlying Poisson processes.

Somewhat surprisingly, it is rather difficult to devise bivariate versions of Poisson, exponential distributions or more generally gamma distributions that have reasonably simple form, and indeed only Freund bivariate exponential and McKay bivariate gamma distributions seem to have tractable information geometry [3]. The family of McKay bivariate gamma density functions  is defined on with parameters and probability density functions, Figure 1,

 f(x,y;α1,σ12,α2)=(α1σ12)(α1+α2)2xα1−1(y−x)α2−1e−√α1σ12yΓ(α1)Γ(α2) . (5)

Here which must be positive, is the covariance of and and is the probability density for the two random variables and where and both have gamma density functions.

We obtain the means, standard deviations and coefficients of variation by direct integration:

 Means: μx=√α1σ12,  μy=(α1+α2)√σ12√α1,  μz=α2√σ12√α1 (6) SDs: σx=√σ12,  σy=√σ12(α1+α2)α1,  σz=√α2σ12α1 (7) CVs: τx=1√α1,  τy=1√α1+α2,  τz=1√α2 (8)

The correlation coefficient, and marginal probability density functions of and are given by

 ρ = √α1α1+α2>0 (9) f1(x) = (α1σ12)α12xα1−1e−√α1σ12xΓ(α1),x>0 (10) f2(y) = (α1σ12)(α1+α2)2y(α1+α2)−1e−√α1σ12yΓ(α1+α2),y>0 (11)

Figure 2 shows a plot of the correlation coefficient from equation (9). The marginal probability density functions of latency period and infectiousness period are gamma with shape parameters and , respectively. It is not possible to choose parameters such that both marginal functions are exponential, so the two random variables cannot both arise from Poisson processes in this model.

## 4 Stochastic rate processes

For a detailed monograph on stochastic epidemic models see [1]. We consider here a class of simple stochastic rate processes where a population of uninfected individuals, is classified by a smooth family of time-dependent probability density functions with random variable having at time mean and variance This situation was formulated by [14][16] in the following way. Let represent the frequency at the -cohort, then we have

 N(t) = ∫∞0lt(a)da   and  Pt(a)=lt(a)N(t) (12) dlt(a)dt = −alt(a)   so  lt(a)=l0(a)e−at (13)

General solutions for these equations were given in [14], from which we obtain

 N(t) = N(0)L0(t)  where L0(t)=∫∞0P0(a)e−atda (14) dNdt = −Et(a)N  where Et(a)=∫∞0aPt(a)da=−dlogL0dt (15) dEt(a)dt = −σ2t(a)=(Et(a))2−Et(a2) (16) Pt(a) = e−atP0(a)L0(t)  and lt(a)=e−atL0(t) (17) dPt(a)dt = Pt(a)(Et(a)−a). (18)

Here is the Laplace transform of the initial probability density function and so conversely is the inverse Laplace transform of the population (monotonic) decay solution See [12] for more discussion of the existence and uniqueness properties of the correspondence between probability densities and their Laplace transforms. In this section we shall use to represent the decreasing population of uninfected individuals as an epidemic grows. In our context of an epidemic model we might view the random variable as a feature representing susceptibility to infection in the population; in general this distribution will evolve during the epidemic. The model can be reformulated for a vector representing a composite population with a vector of distributions and a matrix of variables .

It is easy to deduce the rate process for entropy from Karev’s model. The Shannon entropy at time is

 St=−Et(logPt(a))=−Et(logP0(a)e−atL0(t)) (19)

which reduces to

 St=S0+logL0(t)+Et(a)t. (20)

By using the decay rate is then

 dStdt=−t σ2(t). (21)

This result shows how the variance controls the entropy change during quite general inhomogeneous population processes. In fact equation (21) and further related results were given also in subsequent papers [15][16]. We note that the reverse process of population growth may have applications in constrained disordering type situations [9].

### 4.1 Initial growth rate

We follow the method of [5] in their §3.2, to compute the initial exponential growth rate of the epidemic from equation (3) which we write for bivariate in the form

 ∫∞0∫xye−r(y−x)λf(x,y) dydx=1 (22)

Here, from [5], for the average number of infections per infective, so is the contact rate; this gives the Malthusian parameter analytically in explicit form as

 r=1μxτ2x((R0μy)τ2y−1). (23)

Thus, is monotonically decreasing with and but increasing with Figure 3 and Figure 4 plot typical values from the SARS epidemic [21] as used by [5]. Figure 5 and Figure 6 show corresponding contour plots of the infectivity rate So the bivariate gamma model reveals that the result of  [5] for the dependence of growth rate on variability in the periods of latency and infectiousness in the independent case persists also in the presence of correlation between these two random variables. Such a correlation may be relevant in particular applications, when physical changes evolve during the latent period and influence the length of the subsequent infectiousness period.

We can estimate also the evolution of an inhomogeneous distribution of susceptibility as the population of uninfected individuals declines with time For example, the case when the initial distribution is a gamma distribution with parameters was solved in [14] giving the result

 Pt(a)=P0(a)L0(t)e−at=(s+t)kak−1Γ(k)e−a(s+t),   for time t≥0. (24)

Then the time dependences of mean, standard deviation and coefficient of variation are given by

 μa(t) = ks+t (25) σa(t) = √ks+t (26) τa(t) = 1√k. (27)

From (21), we can see that the rate of entropy decrease is greater for more variability in susceptibility.

## 5 Information geometry of the space of McKay bivariate gamma distributions

Information geometry of the smooth family of McKay bivariate gamma probability density functions, which is of exponential type, has been studied in detail in  [3] Chapter 4. This provides a Riemannian metric on yielding a curved 3-manifold so the affine immersion is a 3-dimensional object in which we can only represent in through its 2-dimensional submanifolds. Here we illustrate how the geometry may nevertheless be used to provide a natural distance structure on the space of the McKay distributions used in our epidemic model.

First we measure distances from distributions with exponential marginal distributions—those for which when the latency periods are controlled by a Poisson event process.

The derivation of a distance from distribution is given in [3], and yields in terms of and

 EM(τx,ρ)|[T0:α1=1] = (ρ2+1)216ρ6∣∣∣1τ2x−1∣∣∣ (28) + 14∣∣∣(1−1τ2x)(1−1ρ2)+3log(τ2x)∣∣∣ + ∣∣∣ψ(1τ2x1ρ2−1)−ψ(1ρ2−1)∣∣∣ + ∣∣∣ψ(1τ2x)+γ∣∣∣

where is the digamma function and is the Euler gamma constant—with numerical value about Figure 7 shows a plot of from equation (28). This is an approximation to the Riemannian distance but it represents the main features of the information distance of arbitrary latency period distributions from the curve of distributions with

Repeating the above procedure for the case when has which corresponds to an exponential infectiousness period distribution (and a Poisson process of infections) we obtain

 EM(α1,α2)|[T0:α1+α2=1] = |ψ(α2)−ψ(1−α1)| (29) + 14∣∣∣(2α1+α2)24α1−12(α1+1)∣∣∣.

This is plotted in Figure 8. The two graphics, Figures 7 and 8, show how we can depict the parameters in the joint distribution of periods of latency and infectiousness as surfaces of distance, measured from the two reference cases for the evolution of the epidemic starting from Poisson processes, respectively. On such surfaces could be represented data on the progress of epidemics under different intervention schemes, or simulations of such scenarios.

Geodesic curves in Riemannian manifolds give minimal arc length and examples are given in [9] for manifolds of Weibull, gamma and McKay bivariate gamma distributions, together with gradient flow curves for entropy. More details of the information geometry of uniform, exponential, gamma, Gaussian, and bivariate versions with applications are provided in [3].

[5] highlighted aspects when stochastic features are important and used independent gamma random variables to represent inhomogeneity of latency and infectiousness periods. In this paper we have a bivariate inhomogeneous epidemic process, modeled by correlated gamma distributions and we can use similar methods to depict and quantify departures from exponential periods of latency and infectiousness. This shows that the result of  [5] for the dependence of growth rate on variability in the periods of latency and infectiousness in the independent case persists also in the presence of correlation between the two random variables, Figures 3 and 4. Moreover, the information theoretic distance from the two reference scenarios of exponential distributions of periods latency and infectiousness, Figures 7 and 8, provide natural quantitative representations for comparing different parametric data.

[5] used independent gamma distributions for periods of latency and infectiousness, from which the reproduction rate can be estimated, with applications for example to the SARS outbreak [21]. Here we have used a bivariate gamma distribution which allows a corresponding reproduction rate to be computed. Also, we considered the case when the susceptibility to infection is not uniform and illustrated with the case when it begins as a gamma distribution then evolves as the epidemic proceeds. Other models could be used for the initial distribution of susceptibilities, including asymmetric distributions. A wide range of such other cases using log-gamma distributions is considered in [10] for a similar rate process applied to an evolutionary model when the random variable represents unfitness (like susceptibility to infection) in a population.

## References

• [1] H. Andersson and T. Britton. Stochastic epidemic models and their statistical analysis. Lecture Notes in Statistics, Springer-Verlag, New York, Berlin 2000.
• [2] H. Andersson and R.M. May. Infectious Diseases of Humans: Dynamics and Control, Oxford University Press, Oxford 1991.
• [3] Khadiga Arwini and C.T.J. Dodson. Information Geometry Near Randomness and Near Independence. Lecture Notes in Mathematics, Springer-Verlag, New York, Berlin 2008.
• [4] N.T.J. Bailey. The Mathematical Theory of Infectious Diseases and its Applications. Griffin, London 1975.
• [5] T. Britton and D. Lindenstrand. Epidemic modelling: aspects where stochasticity matters. Mathematical Biosciences 222, 2 (2009) 109-116. Cf. also http://arxiv.org/abs/0812.3505 3 January 2009. http://arxiv.org/abs/0812.3505
• [6] Y. Cai, C.T.J. Dodson, O. Wolkenhauer and A.J. Doig. Gamma Distribution Analysis of Protein Sequences shows that Amino Acids Self Cluster. J. Theoretical Biology 218, 4 (2002) 409-418.
• [7] S. Cauchemez and N.M. Ferguson. Likelihood-based estimation of continuous-time epidemic models from time-series data: application to measles transmission in London. J. Royal Soc. Interface 5 (2008) 885-897.
• [8] Gerardo Chowell, James M. Hyman, Luis M. A. Bettencourt and Carlos Castillo-Chavez Editors. Mathematical and Statistical Estimation Approaches in Epidemiology. Springer Dordrecht, Heidelberg, London, New York, 2009.
• [9] C.T.J. Dodson. On the entropy flows to disorder. In C. H. Skiadas and I. Dimotikalis, (Eds.), Chaotic Systems: Theory and Applications World Scientific, Singapore, 2010 pp 75-84. http://arxiv.org/abs/0811.4318
• [10] C.T.J. Dodson. An inhomogeneous stochastic rate process for evolution from states in an information geometric neighbourhood of uniform fitness. Invited paper at 3rd Conference on Information Geometry and its Application, Leipzig 2-6 August 2010. Cf also: http://arxiv.org/abs/1001.4177v1
• [11] O. Diekmann and J.A.P. Heesterbeek. Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation, John Wiley, Chichester 2000.
• [12] W. Feller. An Introduction to Probability Theory and its Applications, Volume II 2 Edition, Wiley, New York 1971.
• [13] D. He, E. L. Ionides, and A. A. King. Plug-and-play inference for disease dynamics: measles in large and small populations as a case study. J. Royal Soc. Interface (2009) Online doi: doi:10.1098/rsif.2009.0151.
• [14] G.P. Karev. Inhomogeneous models of tree stand self-thinning. Ecological Modelling 160 (2003) 23-37.
• [15] G.P. Karev. Replicator equations and the principle of minimal production of information. Bulletin Mathematical Biology 72, 5 (2010) 1124-1142. http://arxiv.org/abs/0901.2378
• [16] G.P. Karev. On mathematical theory of selection: continuous time population dynamics. Journal Mathematical Biology 60 (2010) 107-129.
• [17] A.L. Lloyd. Destabilization of epidemic models with the inclusion of realistic distributions of infectious periods. Proc. Royal Soc. Lond B 268 (2001) 985-993.
• [18] Joel C. Miller, Bahman Davoudi, Rafael Meza, Anja Slim and Babak Pourbohloul. Epidemics with general generation interval distributions. Preprint, 2009. http://arxiv.org/abs/0905.2174v2.pdf
• [19] H. Nishiura, G. Chowell, H. Heesterbeek and J. Wallinga. J. Royal Soc. Interface (2009) Online doi: 10.1098/rsif.2009.0153.
• [20] J. Wallinga and M. Lipsitch. How generation intervals shape the relationship between growth rates and reproductive numbers. Proc. Royal Soc. B 274 (2007) 599-604.
• [21] WHO. Cumulative Number of Reported Probable Cases of Severe Acute Respiratory Syndrome (SARS). http://www.who.int/csr/sars/country/en/
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters