Bayesian inference for age-structured population model of infectious disease with application to varicella in Poland

# Bayesian inference for age-structured population model of infectious disease with application to varicella in Poland

Piotr Gwiazda Błażej Miasojedow Magdalena Rosińska Institute of Mathematics, Polish Academy of Sciences Faculty of Mathematics, Informatics and Mechanics, University of Warsaw National Institute of Public Health - National Institute of Hygiene, Warsaw
###### Abstract

The dynamics of the infectious disease transmission are often best understood by taking into account the structure of population with respect to specific features, for example age or immunity level. The practical utility of such models depends on the appropriate calibration with the observed data. Here, we discuss the Bayesian approach to data assimilation in the case of a two-state age-structured model. Such models are frequently used to explore the disease dynamics (i.e. force of infection) based on prevalence data collected at several time points. We demonstrate that, in the case when the explicit solution to the model equation is known, accounting for the data collection process in the Bayesian framework allows us to obtain an unbiased posterior distribution for the parameters determining the force of infection. We further show analytically and through numerical tests that the posterior distribution of these parameters is stable with respect to a cohort approximation (Escalator Boxcar Train) of the solution. Finally, we apply the technique to calibrate the model based on observed sero-prevalence of varicella in Poland.

###### keywords:
Age-structured population model, Bayesian inverse problem, Infectious disease dynamics
journal: arXiv111Piotr Gwiazda received funding from the National Science Centre (Poland) grant 2015/18/M/ST1/00075.222Magdalena Rosińska received funding from the National Science Centre (Poland) grant DEC-2012/05/E/ST1/02218

## 1 Introduction

Application of mathematical modelling of natural phenomena has proved to be very useful in many areas including population dynamics and transmission of infectious diseases. The practical value of such models depends heavily on the assumptions made while developing the model are realistic, whether it also depends on the assimilation of real data into the model to inform of model parameters.

The populations, which are heterogeneous with respect to some individual property, are often described with nonlinear first order hyperbolic equations (structured population models). In the models describing the epidemic processes in human population, examples of such parameters may include age, time from infection or the level of immunity induced by past infection or vaccination. For example, evolving demographic structure has an impact on infectious disease transmission. It has been observed that the long term evolution of the dynamics of infectious disease is highly dependent on demographic transitions; a change of age structure changing from a young population model an to aging model is typical for developed counties.

The classical model of infectious diseases was introduced by Kermack and McKendric Kermack1991 and variations og it also were studied by e.g. Reddingius Reddingius1971 , Metz Metz1978 , Iannelli Iannelli1995 , Diekmann Diekmann1977 ; Diekmann2013 , Thieme Thieme1977 . These models consider, for example, variable infectivity and variable susceptibility to infection. In particular, the infectivity often depends on the time from infection and susceptibility to infection - on immunity acquired from past infection waning in time.

A similar model, but with an age structure instead of time-since-infection, was considered by Iannelli1995 ; Diekmann2013 .
If only two states are considered, i.e. the susceptible and those who have ever been infected, the model simplifies to:

 ∂tq(t,a)+∂aq(t,a)=−λ(t,a)q(t,a) for (t,a)∈R×R+. (1)

In this model represents the proportion of susceptible individuals of age at a time point . If we assume that all individuals are susceptible at birth this equation may be supplemented with boundary condition:

 q(0,t)=1forallt∈R. (2)

Note that in this problem no initial condition is needed.

This simple model has received particular attention due to its usefulness in epidemiological applications. It captures the situation when the disease occurs with age and time dependent frequency , but in an individual we are only able to distinguish whether or not the disease has already occured. The model was applied to infectious diseases e.g. toxoplasmosis Aedes1993 ; Marschner1997 ; Marschner1996 , HIV Hallett2008 ; Mahiane2012 , hepatitis A Keiding1991 ; Shkedy06 , rubella, mumps, varicella Shkedy06 , tuberculosis Nagelkerke1999 and non-infectious diseases, e.g. diabetes Keiding1991 ; Bell2013 , myasthenia gravis Keiding1991 or dementia Brinks2015 . In the case of infectious diseases which confer long lasting immune response, a marker of past or ongoing infection can be found, i.e. the measurable serum level of antibodies. Seroprevalence studies are quite often performed and these data can be assimilated into this model for further applications, including simulation studies and predictions. The parameter itself may be of interest describing a quantity which is often difficult to measure directly, but is important from epidemiological point of view; the force of infection. So far, the estimation methods for rely on maximizing an appropriately constructed likelihood function as outlined in NielHens2012 . Construction of the data model required both evaluation of the solutions of equation (1) as well as the ability to account for the data aggregation process. The approach proposed in prior studies involves consideration of equation (1) on characteristics, i.e. birth cohorts , where b represents the time of birth and a is the age. With such parametrisation equation (1) can be rewritten as:

 dqb(a)da=−lb(a)qb(a), (3)

where and .
Equation (3) can be solved for :

 lb(a)=−q′b(a)qb(a)=π′b(a)1−πb(a), (4)

where denotes the prevalence of the antibodies.

Considerable research has been carried out to find a flexible method of modelling , witch work on both parametric and non-parametric approaches. These are usually based on factorization: . Additionally, they are based on a form of , which allows the construction of a general linear model for , which in turn would be estimable from the available data NielHens2012 . The problem of data aggregation is tackled either by assuming a piecewise constant force of infection on age-time boxes relevant for the data resolution Hallett2008 ; Shkedy06 , or using a mid-point value of the solution on the characteristic for the aggregation interval, as in Brunet1999 .

In this paper we propose a Bayesian approach to estimate the equation parameter based on available data. The Bayesian approach offers a flexible way to recover the full posterior distribution over the parameter, avoiding difficulties of estimating confidence intervals, through error propagation techniques. Acknowledging that previously the cohort formulation was commonly used in applications, we show how the approximation of the continuous case by the cohorts is reflected in the distance between the posterior distributions of the parameters. This distance depends in general on how densely the population is divided into the birth cohorts. When only few cohorts are considered, which has been the case in applications so far, this can lead to considerable bias in the posterior distribution in the continuous case.

We further note that equation (1) is a special case of structured population models. These models are often used in theoretical biology for a wide variety of models including evolution of populations, infectious diseases or cellular growth, see e.g. Iannelli1995 ; Roos2013 ; MR2914262 . For the simple model defined by equation (1) with boundary condition (2) it is possible to find an explicit formula for the solution and employ it directly in the Bayesian inverse problem. Thus, the birth cohort approach can be viewed just as an alternative way of modelling the process. However, our aim is not only to tackle this particular problem but also to propose a general method, which could be applied for general structured population model. For example, if we want to extend the model (1) to incorporate vertical transmission, we should include more complicated boundary conditions, e.g.:

 q(t,0)=∞∫0β(a)[1−q(t,a)]da. (5)

In this case we cannot solve this system explicitly and therefore we have to rely on an approximation scheme. For this aim, let us recall a recently developed framework for the analysis of structured population dynamics in the spaces of nonnegative Radon measures with a suitable metric which provides a rigorous tool to study numerical approximations of the system. One example of a such numerical algorithm widely applied in theoretical biology is Escalator Boxcar Train (EBT) Roos1988 . The approach is based on the idea of tracing growth and the transport of measures which approximate the solution of the original partial differential equation. These measures are defined as sums of Dirac measures, each one of which represents the average state and number of individuals within a specific group. In terms of population studies the concept corresponds to following birth cohorts over time. We remark that when applying this technique in a Bayesian inverse problem it is possible to find an approximate posterior measure for the equation parameter.

The distance of this measure from the posterior measure for the original problem will be related to the error of the EBT or similar particle approximation, depending on the model equation. In the recent papers (Carrillo2012 ; MR2746205 ; MR2644146 ), theoretical results on the stability of the solution (stability of the semigroup) to the general form of the structured population equation

 ∂tμ+∂x(b(t,μ)μ)+c(t,μ)μ=∫R+(η(t,μ))(y)dμ(y).

with respect to time, initial data and the model coefficients in bouded Lipschitz distance were proved (see e.g. Theorem 2.11 Carrillo2012 ). These results enabled confirmation both analytically and also in computational experiment of the stability of the particle methods as well as the first order of convergence of these methods (Carrillo2014 , see Theorem 3.2 and Section 4 for the numerical experiments). For stability results for EBT method see e.g. Brannstrom2013 ; jablonskiulikowska .

Firstly, in the section 2 we introduce the probabilistic model for the seroprevalence data - the data describing individuals as having or not been infected in the past. In this model we account for the process of data acquisition, including aggregation into cells or subsamples characterized by the age and time of test. Both these variables are recorded up to some precision, e.g. one year. The algorithm of sampling from the posterior distribution is then introduced, allowing for the aggregation process with an application of pseudo-marginal Monte Carlo Markov Chain (MCMC) Andrieu2009 . The next section is devoted to cohort discretization and relating the posterior distribution obtained with the cohort approach, to the continuous case. We show the rate of convergence of the posterior distribution in the cohort case to the continuous case with the number of cohorts in the approximation and illustrate this on a simulated dataset. Finally, in the last section we apply the method to a real dataset available for varicella in Poland.

## 2 Bayesian inference for the model (1)

### 2.1 Data model

We first describe the seroprevalence data. This type of data characterizes individuals who have been tested to establish if they have ever had contact with a disease or not. The observations are generally of the form , where is a random variable indicating whether the person in sample has had contact with the disease, at exact test time, and exact age at test, . We denote the total number of individuals in the sample by . Let us assume that:

 P(Yij=1|tij,aij)=q(tij,aij).

The function is the solution of the equation (1) supplemented with the boundary condition (2).

The data collection system aggregates data with respect to age and time of testing into subsamples , with some possibility of misclassification. This collection and aggregation process will be represented by the family of functions . The function is the probability density function of distribution of time of test and age at test in subsample . We assume that data collection process, at least in short time intervals, was random with respect to test time and age at test, so if no misclassification was present, the should be uniform distribution on a product of time and age intervals. However, due to uncertainty of age and time ascertainment it is smoothed on the boundary of the box.

Let us define as:

 (6)

then is distributed according to the binomial distribution . To be able to use standard Bayesian parametric inference we assume that is fully described by a finite dimensional parameter . We also assume that it is possible to factorise the force of infection: . According to the observed data, the incidence (rate of new infections) of many childhood infectious diseases such as varicella is periodic in time. Therefore, it seems to be relevant to assume that the function is periodic in time. In subsequent sections for practical application we will use a function such as , and will be a piecewise constant with values , . is than a vector of unknown parameters. In the next part of the paper, we add indices , to denote explicitly the dependence on . Next, let us denote the likelihood of the data by . To complete the description of the Bayesian model we need to set a prior distributions on , denoted by . The posterior distribution is than proportional to:

 π(θ|Y)∝L(θ|Y)f(θ). (7)

### 2.2 Monte Carlo Markov Chain (MCMC) algorithm

Typically, it is not possible to obtain an analytic form of the joint posterior distribution and a sample from this distribution is obtained by sampling the stationary state of a Markov Chain, for which the transition probability distribution depends on the right-hand side of equation (7). The standard MCMC algorithms, however, require computation of the right-hand side of equation (7). Consequently, in our case, a standard MCMC algorithm cannot be used directly due to the fact that is defined by the integral of the solution to a PDE, which typically cannot be computed analytically. Therefore to sample from posterior distribution of we use a pseudo-marginal approach Andrieu2009 . This algorithm still assumes that the solution of the PDE can be computed analytically, but it resolves the integration issue. The pseudo-marginal MCMC approach assumes existence of an unbiased, positive estimator of likelihood function, , which is used to introduce an auxiliary target of form

 π(θ,u)∝^L(θ|Y)f(θ)p(u), (8)

where is a random variable with a distribution which satisfies

 E[^L(θ|Y)]=∫^L(θ|Y)p(u)du=L(θ|Y).

Clearly the marginal distribution of is exactly . Therefore, if we are able to generate an ergodic Markov chain with stationary distribution then the sequence has the correct stationary distribution. In Algorithm 1 we describe the pseudo-marginal random walk Metropolis algorithm. Note that the only difference in comparison with the standard random walk Metropolis is that the true likelihood function is replaced by an unbiased estimator.

We propose the following procedure to obtain an unbiased, positive estimator of the likelihood function. Consider a sequence of independent random variables for and where is the number of subsamples in the model and is an arbitrary integer. We define an unbiased estimator of by

 ^pθ,j,i=1MM∑m=1qθ(Tj,m,Aj,m), (9)

for . Next we define by

 ^L(θ|Y)=∏jNj∏i=1^p1(i≤Yj)θ,j,i(1−^pθ,j,i)1(i>Yj). (10)

Clearly, by construction, is positive and . The choice of is crucial for the efficiency of the pseudo-marginal MCMC. Small values of lead to high variance of and consequently poor mixing of the Markov chain. High values of can lead to exhaustive computation of . Further, this procedure relies on an explicit solution for the PDE given by (1). In the general case when we are not able to compute the solution of the PDE analytically, the solution is obtained by an approximation scheme. This leads to an important theoretical question of stability of the posterior distribution with respect to the approximation. In the next section we show the relevant result for one of the commonly used approximations for the structured population models Roos2013 .

## 3 Discretization of the PDE constrain equation

Before we present our approach, we discuss the existing contributions to the Bayesian inverse problems with PDE constraints. Conjunction of differential equations and data gives rise to a range of inverse problems, attracting attention of both applied and theoretical research. In applications the interest often lies in solving the inverse problems given the observed data. There are considerable contributions to this field concerning the framing of inverse problems in a Bayesian perspective Stuart2010 ; Dashi2016 . The most studied model for data, , is given by:

 y=G(θ)+η, (11)

where is the unknown parameter of interest and is a random variable with mean , representing the observational noise. in turn is the observation’s operator, relating the observed quantities to the model. For more complex models, evaluation of i nvolves solving a system of differential equations. We also assume that additional information is available through the prior distribution .

Well-posedness of infinite dimensional inverse problems depends on both the properties of the PDE problem and the choice of the prior distribution. Much of the research in the Bayesian approach to infinite-dimensional inverse problems has been carried out for Gaussian priors. In general, the prior has to be chosen so that a function drawn from it is sufficiently regular Stuart2010 ; Lasanen2012 . Moreover, the practical interest often lies in sampling from the posterior distribution, for which specific algorithms are designed. To be able to implement such algorithms, finite-dimensional approximations are considered.

The Bayesian approach to inverse problems has been applied to a wide range of problems arising from the models used in physics, geology and atmospheric sciences, see e.g. Lasanen2012a ; Cotter2010 ; Bui-Thanh2013 ; Dwight2010 and in particular the review paper by Stuart Stuart2010 and the references where in. These studies are usually base on Gaussian priors and also assume Gaussian-noise. Noise does not, however, always follow the Gaussian distribution and the wrong noise distribution may lead to poor performance of the estimators. Even though specific cases of non-Gaussian noise are considered by Lasanen Lasanen2012 ; Lasanen2012a , the analysis is restricted only to the model with additive noise, in the general form of (11). Under certain assumptions, the Helinger distance between the posterior distributions obtained in the approximate problem and the original problem is bounded in relation to the error of approximation Stuart2010 ; Lasanen2012a ; Cotter2010 ; Bui-Thanh2013 ; Dwight2010 . For the definition and elementary properties of the Hellinger distance we refer to e.g. Definition 6.35 in Stuart2010 .

Here we present an application of Bayesian data assimilation to population epidemiological models. Due to the limited numbers of individuals in populations, it is unlikely that the observational noise is follows the Gaussian distribution. Rather than using additive Gaussian noise as in (11), which is appropriate for a model of measurement of continuous quantities in physics, we model counts as arising from binomial or Poisson distributions, the parameters of which are related to the mathematical models describing the process of population growth and the spread of infection in the population. In addition, when dealing with a human population, the data are usually naturally aggregated. For example, the desired characteristics are measured for an individual of certain age at a specific time point, but the age and time are recorded up to some precision, most commonly a year. In effect the available data originate from distributions, the parameters of which are defined by integrals of the solution to the model equation. In consequence our data model has a different structure than (11):

 n∑i=1Yi∼Bin(G(θ),n), (12)

where are binary data and is a Binomial distribution with n trials and probability of success .

Similarly to the results presented above, we define with the help of an infinite-dimensional partial differential equation problem, the equation (1) with boundary condition (2). As noted above, this particular problem has explicit solution. However, if we consider the boundary condition (5), we have to introduce an approximation technique, e.g. using the concept of cohorts as in EBT.

### 3.1 Cohort approach

When describing the evolution of a population it is often useful to group individuals into cohorts. Such cohorts consist of persons that share a certain feature. It is assumed that this feature does not change over time, so that the evolution of a cohort can be followed together. The natural grouping in population studies is by time of birth (birth cohorts). If the resolution of the time of birth is high enough then we may expect that the solution for the cohort model approximates the solution for the continuous model. First we construct the cohorts by dividing the birth time into a countable set of disjoint intervals, , of equal length, , such that , and , .

The cohort version of the equation 1 will be a set of ODE’s associated with the choice of , for all :

 aϵi(t) =t−xϵi for t≥xϵi mϵθ,i(0) =∫Iϵiqϵθ(t,0)dt (13) ddtmϵθ,i(t) =−λθ(t,aϵi(t))mϵθ,i(t) for t≥xϵi.

We define .
Remark: Note that is a distributional solution to (1) with boundary data , which approximates the boundary condition (2), i.e. .
Remark: For the boundary condition (5) the system is not explicitly solvable since the boundary condition is dependent on the solution itself. In a version of EBT, or in fact very similar in this context splitting method, we use the following approximation (for a full description see Roos1988 ; GwiazdaEtAl ; Carrillo2012 ; CaGwUl2013 ):

 mϵθ,i(0)=ϵ+∞∑j=1β(ϵ⋅j)mϵθ,i−j(xϵi). (14)

In this case we need information on the initial distribution of the age profile of .

### 3.2 Stability of posterior probability distribution of θ with respect to the cohort approximation of the PDE constrain

Let us consider the model (1). We note that the posterior probability distribution of for the exact solution of equation (1) , and the approximate solution assuming common prior distribution are described by:

 π(θ|Y) ∝ΠjpYjθ,j(1−pθ,j)Nj−Yjf(θ) (15) ^πϵ(θ|Y) ∝Πj(^pϵθ,j)Yj(1−^pϵθ,j)Nj−Yjf(θ), (16)

where:

 pθ,j =∫R+×RΨj(t,a)qθ(t,a)dtda ^pϵθ,j =∫R+∑i∈ZΨj(t,aϵi(t))mϵθ,i(t)dt

and determines the function (the force of infection) and describes the data collection and aggregation process. The definitions of distances between measures and related basic facts are presented in A.

###### Theorem 1.

In the model (1) with boundary condition (2) or (5) let , , for all and , a compact set. Moreover, we assume the following about the set of parameters and the function .

• For every compact set :

 supθ∈Hsup(a,t)∈Kλθ(t,a)<+∞.
• For every compact set :

 infθ∈Hinf(t,a)∈Kλθ(t,a)>0.
• is Lipschitz continuous.

then

 C⋅W1(π(θ|Y),^πϵ(θ|Y))≤∥π(θ|Y)−^πϵ(θ|Y)∥TV≤O(ϵ),

where denotes the Wasserstein distance.

Remark: In the model (1) with boundary condition (2) we are able to find an explicit solution. However, in this simple example we present the method and illustrate the type of structural and regularity conditions needed to guarantee that and are strictly separated from 0 and 1.

###### Proof.

For simplicity we conduct the full proof only for boundary condition (2). The proof for boundary condition (5) is analogous. It requires application of the stability result of the approximation in bounded Lipschitz distance presented in GwiazdaEtAl ; Carrillo2012 .
Let us consider the measures and defined by:

 ν(A) =∫A^πϵ(θ)f(θ)dθ μ(A) =∫Aπ(θ)f(θ)dθ.

The first inequality is obvious for the compact sets (see A). To prove the second inequality, we need to bound the total variation distance between measures and . We have

 ∥μ−ν∥TV =sup∥Φ∥∞≤1∫Φ(θ)d(μ−ν)(θ) ≤∥∥^πϵ(θ)|ν|−π(θ)|μ|∥∥∞ ≤1|μ|∥^πϵ(θ)−π(θ)∥∞+1|μ||ν|||μ|−|ν||∥^πϵ(θ)∥∞ ≤1|μ|CLip|pθ,j−^pϵθ,j|+1|μ||ν|CLip|pθ,j−^pϵθ,j|,

where is the Lipschitz constant of the polynomial defining with respect to . On the other hand,

 |pθ,j−^pϵθ,j|=∣∣ ∣∣∫R+×RΨj(t,a)qθ(t,a)dtda−∫R+∑i∈ZΨj(t,aϵi(t))mϵθ,i(t)dt∣∣ ∣∣.

Note that we can rewrite:

where and . Additionally for a fixed we can write:

 ∫RΨj(t,a)qθ(t,a)dt=∑i∈Z∫lϵi+1(a)lϵi(a)Ψj(t,a)qθ(t,a)dt,

where Therefore:

 |pθ,j−^pϵθ,j|≤∫R+∣∣ ∣∣∑i∈Z∫lϵi+1(a)lϵi(a)Ψj(t,a)qθ(t,a)dt−∑i∈ZΨj(tϵi(a),a))~mϵθ,i(a)∣∣ ∣∣da.

Note that:

Moreover:

 ∣∣ ∣∣∑i∈ZΨj(tϵi(a),a))~mϵθ,i(a)−∑i∈ZΨj(tϵi(a),a))∫lϵi+1(a)lϵi(a)qθ(t,a)dt∣∣ ∣∣≤∥Ψj∥∞∑{i∈Z:[lϵi(a),lϵi+1]∩suppΨj}∣∣ ∣∣~mϵθ,i(a)−∫lϵi+1(a)lϵi(a)qθ(t,a)dt∣∣ ∣∣≤∥Ψj∥∞Cϵ.

This last inequality follows from the Lipschitz continuity of and the exact formulae for solutions.

Remark: Note that the approximated solution is defined for cohorts evolving with time. Therefore it is not possible to evaluate the solution at an arbitrary point to obtain the estimate provided by formula (9). It is however possible to sample the time points from the marginal distribution of and obtain an unbiased estimate of in a similar way to (9) as:

 ˆ^pϵθ,j,i=1MM∑m=1∑i∈ZΨj(Tj,m,aϵi(Tj,m))mϵθ,i(Tj,m). (17)

### 3.3 Numerical tests posterior distribution stability

In order to illustrate the rate of convergence of the cohort approximation in the Wasserstein distance, we selected a simple model with one parameter. In order to challenge the algorithm with a model related to the one which we plan to apply to real data in the following section (periodic force of infection) we choose the parameter describing the period to be the basis of the toy example. Moreover, the period and the width of the box have been chosen so that the period of the wave is twice the width of the box and the sine wave takes the same values for all box borders, this together corresponds to the most challenging case for recognition of the parameter by the cohort approximation. We constructed the toy example as follows.

In equation (1) supplemented with boundary condition (2) we choose the force of infection , where is an unknown parameter. The choice of periodic function is motivated by the fact that the incidence of many childhood infections tend to follow a periodic pattern. In this particular case, the solution is given by

 qγ(a,t)=exp{−20[1.1a−1γ[cos(γt)−cos(γ(t−a))]]}. (18)

We simulate the data assuming , with regularized (piecewise afine) uniform distribution on six rectangles which corresponds to taking six subsamples. Precisely

 Ψ1(a,t) =ψ(20a)ψ(t) (19) Ψj(a,t) =ψ(20a)ψ(t−j+1),

with

 ψ(x)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩0if x<−0.01 or x>1.0150x+0.5if x≥−0.01 and x<0.011if x>0.01 and x<.9950(1−x)+0.5if x≥.99 and x<1.01. (20)

Each of the two subsamples consists of individuals. The subsamples were generated in two steps. Firstly, the ages and test times, , and were sampled for the individuals in each of the subsamples according to the distributions to reproduce the sampling process. Then the result of the test was sampled with probability . We simulate a dataset with and we set as the prior for .

In this toy model we analyse how the error between the posterior distribution for the exact solution to PDE, defined by equation (15), and the posterior distributions corresponding to the cohort approximations, defined by equation (16), depends on the number of cohorts. In particular we want to confirm in the numerical experiment that the convergence is of first order, which is analytically shown in the Theorem 1.

Next for we run MCMC Algorithm 1 for the true posterior and for cohort approximations with the number of cohorts equal to , where . For each case we run a pseudo-marginal RWM of length per each, with Gaussian proposal with standard deviation and with . The chosen proposal corresponds to an acceptance rate of around in all cases.

In figure 1 we present approximations of the density of the unknown parameter . For clarity of presentation we omit densities for cohorts. We observe that the true posterior concentrates around the true value of the parameter. The approximation by cohorts for a small number of cohorts is highly biased but converges quickly to the true posterior. For cohorts, the denisty of the true posterior and of cohort approximation are almost the same.

We define the error of approximation, as a distance between the reference (exact) solution and the approximated solution at the level of approximation . The order of convergence is then given by:

 q:=limϵ→0log(Err(2ϵ)/Err(ϵ))log2.

To estimate the Wasserstein distance between the true posterior and the posterior approximated by cohorts we use an algorithm from jablonski2013efficient applied to empirical measures given by the MCMC algorithm. Setting , for , we expect the order of convergence to approximate 1. The order of the convergence for several elements in this sequence in the Wasserstein metric, for the toy model described above, are given in Table 1. We note that for the Monte Carlo error (i.e. the error of the numerical approximation of the continuous posterior distribution by the empirical distribution) dominates over the error of approximation by cohorts, which distorts the order of convergence.

Moreover, the posterior mean and standard deviation estimators obtained from the cohort approximation are biased, but the bias tends to 0 as the number of cohorts goes to . In our toy example this error stabilises for more than 16 cohort, roughly when the Monte Carlo error becomes dominant.

## 4 Application to real data: varicella in Poland

Varicella or chickenpox is a viral disease which typically occurs in childhood with peak incidence at the age of 4 - 5, when children enter preschool or school. In the absence of immunisation programs, the majority of the population contracts the disease by adolescence or early adulthood Bonanni2009 . Once the infection takes place it confers life-time immunity and secondary infections do not generally ocur. Vertical infections occur occasionally, when a susceptible mother is infected during pregnancy, but are rare due to the universal immunity among adults. The biological marker of past infection exists for this disease, i.e. the presence of antibodies, although this marker may not be ideal due to transfer of maternal antibodies to the foetus. The level of maternal antibodies gradually decreases in the child and over 90 of children clear them by the end of the th month of life. Varicella occurs naturally in short cycles of about 3 - 4 years on top of longer cycles of approximately years as shown for Polish data on Figure 2. These short fluctuations are generally driven by the accumulation of susceptible individuals and the consequent compensatory epidemic, whereas the long cycle coincides with long-term periodic changes of the birth rate resulting in higher or lower proportion of young children in the population. The natural cycles are supressed when immunisation programs with substantial coverage are introduced. The routine vaccination programs,however, do not currently exist in Poland. The varicella vaccine has beem recommended since 2002, but not performed routinely, resulting in low uptake and coverage. In particular, before 2008 the coverage was biuletyny_szczepien .

We will demonstrate the use of our method on the sero-prevalence data for varicella in Poland in the time period when the vaccination was uncommon, excluding data from children aged months to avoid the potential influence of transferred maternal antibodies. The data were derived from the database of the POLYMOD project. Samples from individuals aged 1 - 19 years (by date of birth) at the time of the sample collection (2000 - 2004) were extracted from an existing bio-bank and tested for anti-VZV with a commercial testing kit. The bio-bank contained samples collected mainly for the purpose of routine check-ups or investigations before surgical procedures. Details of the sample collection and laboratory testing are available elsewhere Siennicka2009 . Altogether 1244 samples were included in the study, the number per year ranged from 108 to 500. The number of individuals in the single cells ranged from 1 to 45 and was generally smaller for the 2000 - 2001 period.

We consider the proportion of susceptible individual given by (1) with boundary condition (2). We model the force of infection by

 λ(t,a) =λ1(a)(sin(γ1t+γ2)+1+γ3)with λ1(a) =k∑i=1αi1(a∈Ai), (21)

where is a step function describing the different possible levels of infection in different age groups of form .

We choose four groups: children before preschool education , children during preschool education , primary school students , and others . The force of infection is fully specified by the following unknown parameters: for , , and . As in section 2, we describe seroprevalence data by a binomial Bayesian model. Let be a number of antibody tests performed during the calendar year for individuals with age at the time of test, measured as years completed by the time of test, and let corresponding number of positive results. As in section 2, we assume that with

where is the solution of PDE (1) with boundary condition (2) and with the force of infection given by (21). Note that our choice of leads to a closed analytic form of . The solution of PDE (1) with boundary condition (2) with constant level of infection , i.e , is equal

 qα(a,t)=exp{−α[(1+γ3)a+1γ1(cos(γ1t+γ2)−cos(γ1(t−a)+γ2))]}.

Hence in our case the function is given by

 q(a,t)=1(a∈Ai)qαi(a,t)∏j

We choose the distribution as uniform distribution smoothed on boundary, on the unit boxes, see (19). We set the following priors:

 αi ∼Exp(10)for i=1,…,k γ1 ∼Exp(0.8) γ2 ∼Unif([0,2π]) γ3 ∼Exp(1).

The choice of hyper-parameters is consistent with prior knowledge on the observed incidence of varicella in Poland as described above. Due to the multimodality of the joint posterior distribution, we use a slightly modified adaptive parallel tempering algorithm (APT) (introduced in miasojedow2013adaptive ) for auxiliary target (8) where only is tempered.

The detailed description of the algorithm is given in B. We used APT with levels of temperatures with iteration and with burn in time and with . For each level of temperature we use adaptive scaling RWM with desired acceptance rate equal .

We demonstrate the convergence of the algorithm in Figure 3, where we display the trace plots, showing satisfactory mixing of the chains and the autocorelation function. This in particular increases the confidence in the shape of the posterior distribution obtained from the model.

The marginal posterior densities of the parameters are given in Figure 4 . The posterior distributions for the parameters denoting the average level of the force of infection over time in the four age groups demonstrates a plausible pattern. The highest value of the posterior mean is observed for the age group 2 ( years), 0.11, markedly higher than for the other groups. This group comprises the pre-school children among whom peak incidence is usually observed. Furthermore, a parameter of interest is the cycle length denoted in our model by . Interestingly, this distribution is bimodal. One mode corresponds to the cycle length of years and the other one to a longer cycle, years. This multimodality can appear due to existence of at least two cycles in the observed varicella occurrence, including a long-term cycle.

Finally, we performed a validation substudy. We repeated the procedure described above, but using only the data from the years . Next, based on the posterior distributions of the parameters we estimated the age-specific prevalence for the year . The diagnostic plots for this MCMC are similar to the ones seen for the full dataset and are not shown. The results of the prediction are presented in Figure 5. The fit of this prediction is very encouraging, with only 2 data points falling outs of the credibility interval. We also observed that estimators of unknown parameters are almost the same as for the full data set.

## 5 Concluding remarks

We note that in previous literature, numerous attempts were made to assimilate seroprevalence data into an equation of the type of equation (1), but the (lack of) precision of the sampling process was not accounted for (see e.g. a review in NielHens2012 ). This corresponds to approximating the integral (6) by a value of at a single point or on an interval (single cohort). In our toy model developed in section 3.3, we observe that posterior mean of for a small number of cohorts, and for only a single cohort in particular, may be severely biased, as shown on the Figure 1.

Our method therefore offers a significant improvement, even for this simple and commonly used model. What is more, the methodology presented may be applied to modelling a wide range of different diseases both infectious and non-infectious. The important feature is the presence of a lasting marker of the disease, which could be measured experimentally. We note that the Bayesian approach applied here is flexible enough to include additional data sources, for which the data distributions could be parametrised by functions of the parameters defined by the model. Examples of such data may include the number of individuals diagnosed by age of diagnosis and diagnosis time, or data originating from screening programs targeting people who have not been diagnosed before. We considered here a general form of the force of infection (rate of occurrence of the disease). However, for infectious diseases a more complex form describing the disease transmission process could be considered. Moreover, additional components can also be added to the model such as mortality or birth rate as long as the general structure of the models admit the approximation with the EBT algorithm or a similar particle approximation. An interesting extension would be to include other relevant structural parameters. As an example we may consider a model relating to HIV. The HIV population is naturally structured with the status of the immune system, as could be approximated by the CD4 count. The existing models usually use the compartmental approach, in which the population members have the same chance of moving into a more advanced stage of disease, defined here by the lower ranges of the CD4 count, regardless how long they have in fact spent in the previous stage. At the individual level the decline of the CD4 count is a continuous process, so continuous models can potentially produce more accurate estimates. In this case the population would be structured in to incorporate both age and the CD4 count.

## Appendix A Definitions

We consider a metric space equipped with the Borel -field . For any measurable function for any we define the by

 ∥ϕ∥p={supθ∈H|ϕ(θ)if p=∞(∫H|ϕ(θ)|pdθ)1/pif 1≤p<∞.

For any Lipschitz function we denote its Lipschitz constant by , i.e.

 Lip(ϕ)=supx,y∈H|ϕ(x)−ϕ(y)||x−y|.

Let us consider a measures on , the total variation distance is defined by

 ∥μ−ν∥TV=sup∥ϕ∥∞≤1∫Hϕ(θ)d(μ−ν)(θ).

Note, in the case when and admits densities with respect to Lebesgue measure and , respectivelly the definition of total variation distance is equivalent to

 ∥μ−ν∥TV=∫H|fμ(θ)−fν(θ)|dθ.

The inequality is obvious, to see the equality is enough to take the test function .

For any we define the Wasserstein distance on a space of probability measures by

 Wp(μ,ν)p=infγ∈Γ(μ,ν)∫H×H|x−y|pdγ(x,y),

where is a set of all joint distributions on with marginals and respectivelly. In particular case by the Kantorovich - Rubenstein duality representation is equivalent to

 W1(μ,ν)=supLip(ϕ)≤1∫Hϕd(μ−ν).

If then there exists constant such that for any probabilistic measure we have

 W1(μ,ν)≤C∥μ−ν∥TV.

We denote by the space of essentially bounded function with essentially bounded gradient . By the Rademacher theorem any bounded Lipshitz function belongs to and further

 ∥Ψ∥W1,∞=∥Ψ∥∞+Lip(Ψ).

We also define the bounded Lipshitz distance between measures , also known in the probability theory as the Fortet-Mourier distance, by

 ρBL(μ,ν)=sup∥Ψ∥W1,∞≤1∫HΨd(μ−ν).

## Appendix B Modified adaptive parallel tempering algorithm

The parallel tempering algorithm (PT) is MCMC algorithm used in the case when the target distribution is hard to explore, usually due to the multimodality. The parallel tempering algorithm defines a Markov chain over the product space , where . Each of the chains targets a ‘tempered’ version of the target distribution , i.e. . Where is a sequnce of inverse temperatures such that the distribution is easy to explore and adjacent targets , are simmilar. Each time-step may be decomposed into two successive moves: the swap move and the propagation move. The swap move allows global moves in particular jumps between different modes, the propagation moves locally explore tempered targets at each level. In our case we wnat to tempere ony likelihood in auxiliary target (8). So we define a sequence of targets as follows

 πℓ(θ,u)∝ ^L(θ|Y)βℓf(θ)p(u).

For such choosen target we perform adaptive parallel tempering algorithm proposed in miasojedow2013adaptive . The detailed description is given in algorithm 2. For adatation of temperature schedule we use optimal acceptance rate as in miasojedow2013adaptive for random walk adaptation we use optimal acceptance rate sugested by sherlock2015 .