Incorporating social contact data in spatio-temporal models for infectious disease spread

Incorporating social contact data in spatio-
temporal models for infectious disease spread

Institute of Medical Informatics, Biometry, and Epidemiology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstraße 6, DE-91054 Erlangen, Germany
Epidemiology, Biostatistics and Prevention Institute, University of Zurich, Hirschengraben 84, CH-8001 Zürich, Switzerland

Routine public health surveillance of notifiable infectious diseases gives rise to weekly counts of reported cases—possibly stratified by region and/or age group. We investigate how an age-structured social contact matrix can be incorporated into a spatio-temporal endemic-epidemic model for infectious disease counts. To illustrate the approach, we analyze the spread of norovirus gastroenteritis over 6 age groups within the 12 districts of Berlin, 2011–2015, using contact data from the POLYMOD study. The proposed age-structured model outperforms alternative scenarios with homogeneous or no mixing between age groups. An extended contact model suggests a power transformation of the survey-based contact matrix towards more within-group transmission. Age-structured contact matrix; Areal count time series; Endemic-epidemic modelling; Infectious disease epidemiology; Norovirus gastroenteritis; Norwalk virus; Spatio-temporal surveillance data.

footnotetext: To whom correspondence should be addressed.\@sect


The social phenomenon of “like seeks like” produces characteristic contact patterns between subgroups of a population. If suitably quantified, such social mixing behaviour can inform models for infectious disease spread (Read and others, 2012). One of the largest social contact surveys to date was conducted as part of the EU-funded POLYMOD project, recording conversational contacts of 7 290 individuals in eight European countries (Mossong and others, 2008). Contact patterns were found to be similar across the different countries and highly assortative with respect to age, especially for school children and young adults.

The basic idea behind the combination of social contact data with epidemic models has been termed the “social contact hypothesis” (Wallinga and others, 2006): The age-specific numbers of potentially infectious contacts are proportional to age-specific numbers of social contacts. For instance, for pathogens transmitted via respiratory droplets, face-to-face conversation and/or physical contact are frequently used as proxy measures for exposure. Many studies have now made use of the POLYMOD contact data (Rohani and others, 2010; Goeyvaerts and others, 2010, 2015; Birrell and others, 2011; Baguelin and others, 2013), but none of them accounts for the spatial characteristics of disease spread. The distance of social contacts from the home location of each participant has only recently been investigated by Read and others (2014). Their finding that “most were within a kilometre of the participant’s home, while some occurred further than 500 km away” reflects the power-law distance decay of social interaction as determined by human travel behaviour (Brockmann and others, 2006). Meyer and Held (2014) found such a power law to translate to the spatial spread of infectious diseases.

The purpose of this paper is to combine the social and spatial determinants of infectious disease spread in a multivariate time-series model for public health surveillance data. For notifiable diseases, such data are routinely available as weekly counts of reported cases by administrative district and further stratified by age group or gender. Social contact matrices reflect the amount of mixing between these strata. Our focus is on age-structured models, but the methods equivalently apply to other or multiple strata. We investigate if a (possibly adjusted) contact matrix captures disease spread better than simple assumptions of homogeneous or no mixing between the subgroups. The approach also allows us to estimate how much disease incidence in each group can be linked to previous cases in their own and in other groups—while adjusting for the spatial pattern of disease spread.

This paper is organized as follows. Section Incorporating social contact data in spatio- temporal models for infectious disease spread introduces our case study on norovirus gastroenteritis, including contact data from the POLYMOD study. Section Incorporating social contact data in spatio- temporal models for infectious disease spread outlines the spatio-temporal modelling framework and describes how to incorporate additional stratification with a contact matrix. Section Incorporating social contact data in spatio- temporal models for infectious disease spread shows results of the case study and Section Incorporating social contact data in spatio- temporal models for infectious disease spread concludes the paper with a discussion. The supplementary material contains additional figures, an animation of the data, as well as the R source package hhh4contacts with the data and code to reproduce the presented analysis (run demo("hhh4contacts") after installing and loading the package).


section1[Case study: Norovirus gastroenteritis in Berlin, 2011–2015]Case study: Norovirus gastroenteritis in Berlin, 2011–2015

Most of the aforementioned studies relate contact patterns to the spread of influenza, whereas here we investigate the occurrence of norovirus-associated acute gastroenteritis. Both diseases are highly infectious, have a similar temporal pattern, and similar mortality in elderly persons (van Asten and others, 2012). However, in contrast to influenza, vaccines against noroviruses have yet to be developed (Pringle and others, 2015). Absence of vaccination simplifies the analysis of infectious disease occurrence since vaccination coverage—potentially varying across age groups, regions and over time—needs not to be taken into account.


subsection2[Epidemiology of norovirus gastroenteritis]Epidemiology of norovirus gastroenteritis

Norovirus-associated acute gastroenteritis is characterized by “sudden onset of vomiting, diarrhea, and abdominal cramps lasting 2–3 days” (Pringle and others, 2015). O’Dea and others (2014) estimate an average symptomatic period of 3.35 days from outbreaks in hospitals and long-term care facilities, where vulnerable individuals live closely together and norovirus outbreaks most commonly occur. Another frequently affected subgroup are children in daycare centres. Norovirus incidence peaks during winter, where outbreaks in childcare facilities were observed to precede those in private households, hospitals, and nursing homes (Bernard and others, 2014).

Noroviruses are highly contagious since only few viral particles are needed for an infection. Being thermally stable and particularly persistent in the environment (Marshall and Bruggink, 2011), noroviruses can also be transmitted indirectly via contaminated surfaces or food. The serial interval, i.e. the time between onset of symptoms in a primary and a secondary case, ranges from within a day to more than 1 week with a median of about 3 days (Götz and others, 2001).


subsection2[Incidence data]Incidence data

In Germany, the national public health institute (the Robert Koch Institute, RKI) provides access to incidence data of notifiable diseases through the SurvStat@RKI 2.0 online service ( Since the last revision of the case definition for norovirus gastroenteritis in 2011, only laboratory-confirmed cases are reported to the RKI. The number of cases to be modelled thus excludes all asymptomatic cases as well as all those symptomatic cases, who have not found their way to laboratory testing (Gibbons and others, 2014). It is known that under-reporting of norovirus illness is most pronounced in the 20- to 29-year-old persons and substantially lower in persons aged years and 70 years and over (Bernard and others, 2014). A sensitivity analysis will indicate how under-reporting may affect the interpretation of our model results.

As to the geographic region of interest, we chose the largest city of Germany, Berlin, which is divided into 12 administrative districts. This enables the analysis of disease spread on a smaller spatial scale. Furthermore, a large underlying population is required for our time-series model to be a reasonable approximation of the epidemic process (Farrington and others, 2003).

We have downloaded weekly numbers of reported cases of norovirus gastroenteritis in Berlin from SurvStat@RKI (as of the annual report 2015). These counts cover four norovirus seasons, from 2011-W27 to 2015-W26, and are stratified by the 12 city districts and 6 age groups: 0–4, 5–14, 15–24, 25–44, 45–64, and 65+ years of age. The age groups were condensed from 5-year intervals to reflect distinct social mixing of pre-school vs. school children, and intergenerational mixing. Similarly stratified population numbers were obtained from the Statistical Information System Berlin-Brandenburg StatIS-BBB ( at the reference date 31 December 2011, when Berlin had 3 501 872 inhabitants in total.

Figure 1 (left) shows the weekly norovirus incidence stratified by age group and aggregated over all city districts. The reported incidence is higher in pre-school children and the retired population than in the other age groups. The yearly seasonal pattern, with overall counts ranging from 7 to 214 cases per week, is approximately constant during the four years (supplementary Figure S1). The typical bump during the Christmas break could be related to reporting deficiencies and school closure (Hens and others, 2009). The time series of the 5- to 14-year-old children contains an outbreak caused by contaminated frozen strawberries, which were delivered almost exclusively to schools and childcare facilities (Bernard and others, 2014). Comparing seasonality between the age groups, the peak incidence in pre-school children seems to precede the peak in the highest age group. Our age-structured modelling approach will help to address the question raised by Bernard and others (2014), “whether this reflects a pattern of disease transmission from young to old in the community”—taking the spatial aspect of disease spread into account.

Figure 1: Age-stratified time series and maps of norovirus gastroenteritis incidence (per 100 000 inhabitants) in Berlin, 2011-W27 to 2015-W26. The weekly incidence plots on the left all use the same -scale. The Christmas break in calendar weeks 52 and 1 is highlighted. The group-specific maps on the right show the mean yearly incidence by city district.

How disease incidence varies across the 12 city districts of Berlin is shown in Figure 1 (right). The south-western district Steglitz-Zehlendorf tends to be affected more and the central districts tend to be affected less than the remaining districts. This pattern is roughly consistent across age groups. An exception are the two younger age groups, which exhibit a relatively high incidence in Marzahn-Hellersdorf. District-specific seasonal shifts are not apparent (supplementary Figure S2).

Animated, age-stratified maps of the weekly counts encompass the full information from all three data dimensions. Such an animation (supplementary material) may provide additional insight into the dynamics of disease spread. However, epidemic models estimated from these data offer a more structured view and take population heterogeneity directly into account.


subsection2[Contact data]Contact data

We use contact data from the German subset of the POLYMOD study (Mossong and others, 2008), where both physical and non-physical (conversational) contacts have been recorded. We will report results based on all contacts and on physical contacts only. The age-structured social contact matrix contains the mean numbers of contact persons in age group  during one day reported by a participant in age group . Instead of using sample means, we estimate by the approach of Wallinga and others (2006), which accounts for the reciprocal nature of contacts. Each entry  is assumed to be the mean of a negative binomial distribution, under the restriction , where is Berlin’s population in age group . We estimate a detailed contact matrix with 5-year intervals, which we subsequently aggregate to the above 6 age groups (Figure 2). Direct estimation of the aggregated contact matrix leads to similar numbers.

Figure 2: Age-structured contact matrix estimated from the German POLYMOD sample using 5-year intervals (left), and aggregated to the age groups of the surveillance data (right). The entries refer to the mean number of contact persons per participant per day.

The strong diagonal pattern in the social contact matrix reflects that people tend to mix with people of the same age. The other prominent pattern is produced by the contacts between parents and children. The matrix for physical contacts shows similar patterns (supplementary Figure S3). Aggregation of the contact matrix is done by summing over the contact groups (columns) to be joined and calculating the weighted average across the corresponding participant groups (rows), with weights equal to the group sizes. The aggregated contact matrix is asymmetric because of the different sizes of the involved age groups, but reciprocity at the population level still holds. For the models described in the next section, only the row-wise distributions will be relevant, i.e. the contact pattern of an infectious participant across the different age groups.


section1[An age-structured spatio-temporal model for infectious disease counts]An age-structured spatio-temporal model for infectious disease counts

We review an endemic-epidemic modelling framework for areal time series of infectious disease counts (Meyer and Held, 2014, Section 3), into which we subsequently incorporate an additional stratification variable featuring a contact matrix.


subsection2[Spatio-temporal formulation]Spatio-temporal formulation

Conditionally on past observations, the number of reported infections in region  and time period , , is assumed to follow a negative binomial distribution with mean  and region-specific overdispersion parameters  such that the conditional variance of is . The lower bound yields the Poisson distribution as a special case, and a common simplifying assumption is that is shared across regions. In its most general formulation, the mean is additively decomposed into endemic and observation-driven epidemic components as

with log-linear predictors

and normalized transmission weights , . The regression terms in (0.0) often include sine-cosine effects of time to reflect seasonally varying incidence (Held and Paul, 2012), but may also involve other explanatory variables, such as vaccination coverage (Herzog and others, 2011). The first, endemic component in (Incorporating social contact data in spatio- temporal models for infectious disease spread) is typically modelled proportional to a (population) offset , and partially captures infections not directly linked to observed cases from the previous time period, e.g. due to travelling outside the study region (edge effects). The epidemic component splits up into autoregressive effects, i.e. reproduction of the disease within region , and neighbourhood effects, i.e. transmission from other regions . It has proven useful to account for population size also in , such that determines how “attraction” to a region scales with population size (Xia and others, 2004). Furthermore, transmission weights reflect the flow of infections from region  to region . These weights may be based on additional movement network data (Paul and others, 2008; Schrödle and others, 2012; Geilhufe and others, 2014), but may also be estimated from the data at hand. A suitable parametric model is a power-law distance decay in terms of the adjacency order  in the neighbourhood graph of the regions (Meyer and Held, 2014).

Estimating separate dynamics for the reproduction of the disease within a region on the one hand, and transmission from other regions on the other hand, goes back to the original model formulation of Held and others (2005), where only first-order neighbours have been incorporated. The parametric distance weights offer an appealing alternative to reflect predominant local autoregression in a simpler model with a single epidemic component:


where the choice gives unit weight to local transmission () and then decays as a power law in terms of adjacency order. With such a power law and the suggested population dependence of , the epidemic component of (0.0) constitutes a so-called gravity model (Xia and others, 2004; Höhle, 2016). Furthermore, this formulation uses fewer parameters and extends more naturally to an additional stratification variable.


subsection2[Extension for stratified areal count time series]Extension for stratified areal count time series

Extending the above spatio-temporal model to fit multivariate time series of counts  stratified by (age) group in addition to region, enables us to relax the simple assumption of homogeneous mixing within each region. More complex strata such as the interaction of age group and gender are equally possible and can be subsumed in the single group index .

We assume that a contact matrix is given, where each entry quantifies the average number of contacts of an individual of group  with individuals of group . The spatio-temporal model (0.0) then extends to a three-dimensional version as


where both the endemic and epidemic predictors may gain group-specific effects. How the counts from the previous period affect the current mean in group  and region  is now determined by a product of contact and spatial weights. The product ensures that cases from group  in region  are ignored if there are no contacts to group  or if there is no flux of infections from region  to region . The weights are row-normalized over all combinations of group and region: . Note that this normalization removes any differences in group-specific overall contact rates (the row sums of ). Our model therefore does not distinguish between proportionate mixing, where the rows of the contact matrix only differ by a proportionality factor, and a matrix with identical rows. The weighted sum of past cases transmitted to group  in region  is scaled by . If , the group-specific effects will adjust the columns of the contact matrix.

There are two special cases of the contact structure involved in the epidemic component. First, a contact matrix with identical rows implies that the mixing pattern of the infectious cases does not depend on the group they belong to. An example of such homogeneous mixing is a contact matrix where each row equals the vector of group sizes (). If contains group-specific effects, a simple matrix of ones () will induce the same contact structure. The other special case is a diagonal contact matrix , which reflects complete absence of mixing. This is equivalent to formulating a separate spatio-temporal model (0.0) for each group. However, also in this case of no between-group mixing, the joint model formulation has the advantage of allowing for parsimonious decompositions of and into group and region effects. Borrowing strength across groups is especially useful in applications with low counts.


subsection2[Parameterising the contact matrix]Parameterising the contact matrix

Contact patterns derived from sociological studies might not fully match the characteristics of disease spread. For example, social networks are known to change during illness (van Kerckhove and others, 2013) and brief contacts are frequently not reported (Smieszek and others, 2014). We therefore suggest a parsimonious single-parameter approach to adaptively estimate the transmission weights as a function of the given contact matrix .

Our proposal is borrowed from Küchenhoff and others (2006), who progressively transform a misclassification matrix to establish an association between the amount of misclassification in a covariate and the corresponding parameter estimate. The proposed transformation is based on the eigendecomposition of the matrix to raise it to the power of ,


where is the diagonal matrix of eigenvalues and is the corresponding matrix of eigenvectors. Translated to our setting, the parameter measures the amount of transmission between the subgroups of the population. Specifically, corresponds to complete absence of between-group transmission (), whereas leaves the contact matrix unchanged. If is row-normalized, all rows of converge to the same distribution as . The transmission pattern thus becomes independent of the group the infected individual belongs to. Because of this useful interpretation, we assume a pre-normalized contact matrix in the remainder of this paper.

The basic requirement that can be factorized by an eigendecomposition will hold in most practical cases. However, we also need to make sure that has non-negative entries for . With our contact matrix, two entries in become negative (but close to 0) for small . We follow a pragmatic approach and truncate negative entries at 0. Figure 3 exemplifies for the row-normalized version of the contact matrix from Figure 2, and illustrates how diagonal and off-diagonal entries, respectively, are affected by the power transformation.

(a) for different values of .
(b) The diagonal entry and the off-diagonal entry of .
Figure 3: The power transformation (0.0) applied to the row-normalized POLYMOD contact matrix.


Likelihood inference for the multivariate count time-series model (Incorporating social contact data in spatio- temporal models for infectious disease spread) has been developed by Paul and Held (2011) and Meyer and Held (2014). The log-likelihood is maximized numerically using the quasi-Newton algorithm provided by the R function nlminb (R Core Team, 2016). Supplied with analytical formulae for the score function and Fisher information, convergence is fast, even for a large number of parameters. The modelling framework is implemented in the R package surveillance (Meyer and others, 2016, Section 5) as function hhh4.

The age-structured model (0.0) is built on top of the existing inference framework. The power parameter of (0.0) is conveniently estimated via a profile likelihood approach (see, e.g. Held and Sabanés Bové, 2014, Section 5.3), which avoids the cumbersome implementation of additional derivatives with respect to all model parameters. We numerically maximize the log-likelihood of a model with fixed contact matrix as a function of . The profile confidence interval for thus incorporates the uncertainty of all other parameter estimates (but not vice versa).



We apply an age-structured spatio-temporal model of the form (0.0) to the norovirus data described in Section Incorporating social contact data in spatio- temporal models for infectious disease spread. As the number of cases varies strongly by age group, we use group-specific overdispersion parameters . For the mean, we assume the endemic-epidemic structure


The endemic predictor allows for age- and district-specific incidence levels, fewer cases during the Christmas break ( in calendar weeks 52 and 1, otherwise ), as well as age-specific seasonality (). Transmission between age groups is modelled using the power transformation (0.0) for the row-normalized contact matrix estimated from the POLYMOD study. Transmission between districts is quantified by a power law with respect to adjacency order. The intercepts are identifiable by fixing , , where and are estimated on the log-scale, and including overall intercepts in both model components.

dim AIC
purely endemic model 36 0.0
homogeneous mixing () 55 -415.4 1.19 (0.83–1.55) 2.43 (2.04–2.88)
no mixing () 55 -602.8 0.61 (0.24–0.98) 2.18 (1.89–2.53)
original contact matrix 55 -631.9 0.97 (0.66–1.28) 2.34 (2.03–2.70)
adjusted contact matrix 56 -659.4 0.86 (0.53–1.19) 2.27 (1.98–2.61) 0.47 (0.34–0.66)
based on physical contacts only 56 -655.3 0.85 (0.52–1.19) 2.27 (1.98–2.61) 0.48 (0.35–0.66)
Table 1: Model summaries for the age-stratified, areal surveillance data of norovirus gastroenteritis in Berlin. For reference, the first row represents the purely endemic model, which assumes independent counts. The remaining rows correspond to endemic-epidemic models with a spatial power law, but varying assumptions on the age-structured contact matrix . The columns refer to the following model characteristics: the number of parameters, the difference in Akaike’s Information Criterion compared to the purely endemic model, the power of the population scaling factor, the decay parameter of the spatial power law, and the power adjustment of the contact matrix. The parameter columns contain the estimates and 95% Wald confidence intervals.

Table 1 summarizes competing models with respect to the assumed contact structure between age groups. It turns out that a superposed epidemic component improves upon a purely endemic model, and that incorporating the contact matrix from the POLYMOD study outperforms naive models with homogeneous or no mixing between age groups. Akaike’s Information Criterion (AIC) is minimal for the model with a power-adjusted contact matrix (penultimate row), where the exponent is estimated to be 0.47 (95% CI: 0.34 to 0.66). This means that the epidemic part subsumes more information from cases in the own age group than suggested by the original contact matrix (cf. Figure 2(a)). The change in AIC associated with this adjustment, however, is minor compared to the improvement achieved by employing the POLYMOD contact matrix in the first place. Results are very similar for physical contacts, but the fit is slightly worse.

The spatial spread of the disease across city districts is estimated to have a strong distance decay with 2.27 (95% CI: 1.98 to 2.61), such that the adjacency orders 0 to 4 have weights 1.00, 0.21, 0.08, 0.04, and 0.03. Supplementary Figure S4 shows age-dependent power laws (replacing by in (0.0)), as well as unconstrained estimates of the order-specific weights, which are close to the power law. In accordance with the idea of a gravity model, we find that the epidemic part scales with the population size of the “importing” district and age group. Similar to a previous application on influenza (Meyer and Held, 2014), the corresponding estimate 0.86 (95% CI: 0.53 to 1.19) is slightly below unity and provides strong evidence for such an association.

Figure 4: Fitted mean components from the AIC-optimal model with adjusted contact matrix, aggregated over all districts. The dots correspond to the reported numbers of cases.

Figure 4 shows the endemic-epidemic decomposition of the estimated mean aggregated across districts (see the supplementary Figures S5 to S7 for the district-level and overall fits). When reformulating the model as a multivariate branching process with immigration (Held and Paul, 2012), the largest eigenvalue of the matrix holding the estimated coefficients of is 0.71, which can be interpreted as the overall epidemic proportion of disease incidence. However, this value mostly reflects the situation for the 65+ age group where the within-group spread is dominating. In contrast, for the groups of 5–14 and 15–24 year-old persons, almost no dependence on past counts of the same or the other age groups can be identified. Interestingly, the groups of 25–44 and 45–64 year-old persons seem to inherit a relevant proportion of cases from other age groups. The youngest age group, though, mostly depends on the endemic component and its own cases, which is probably related to their early onset. The age-dependent sine-cosine effects capture these shifts and are shown in supplementary Figure S8. The modal endemic incidence is in calendar weeks 48 (0–4), 45 (5–14), 52 (15–24), 51 (25–44), 52 (45–64), and 3 (65+), respectively. The largest amplitude is estimated for the youngest and oldest groups.

The estimated group-specific overdispersion parameters are 0.24 (0–4), 1.98 (5–14), 0.30 (15–24), 0.03 (25–44), 0.15 (45–64), and 0.40 (65+) in the model with adjusted contact matrix. The large overdispersion for the 5- to 14-year-old children may be partly due to the food-borne outbreak in 2012, for which the model does not explicitly account. The estimates are similar for the other epidemic models of Table 1, but slightly larger in the endemic-only model.



We have incorporated a social contact matrix in a regression-oriented, endemic-epidemic time-series model for stratified, area-level infectious disease counts. This three-dimensional approach provides a more detailed description of disease spread than unstratified or non-spatial models, which inherently assume homogeneous mixing within each region or subgroup, respectively.

In our application to age-stratified counts of norovirus gastroenteritis in Berlin’s city districts, the contact model was superior to homogeneous or no mixing between age groups. The model further improved when adjusting the POLYMOD contact matrix towards more within-group transmission. This could be related to biases in contact reporting (Smieszek and others, 2014) with more unreported (short) contacts along the diagonal. The two age groups involving parents were affected the most by preceding infections in other age groups. This is in accordance with the leading role of school children in influenza epidemics (Worby and others, 2015).

Furthermore, new infections predominantly depend on past cases from the same district, as suggested by the estimated spatial transmission weights. An age-dependent distance decay could not be identified from the disease counts. One could thus try to replace the parametric formulation by a social contact matrix, stratified by spatial distance in addition to age group. Separate movement data for school children and adults could then be used to quantify the strength of epidemiological coupling between regions (Kucharski and others, 2015). However, integration of movement network data does not necessarily improve predictions (Geilhufe and others, 2014).

A potentially more severe simplification of our model is the assumption of a time-constant contact matrix. Although weekday vs. weekend differences in contact patterns are not relevant for weekly time-series models, there are possibly relevant seasonal effects on larger time scales. For instance, the contact structure of school children changes considerably between regular and school holiday periods (Hens and others, 2009). Our model could be further tuned both by incorporating a time-varying contact matrix and by estimating seasonality also in the epidemic component (Held and Paul, 2012), which the hhh4 implementation already supports.

To check the robustness of our results with respect to under-reporting, we re-estimated the models with age-specific multiplication factors applied to the reported numbers of cases. Roughly following Bernard and others (2014, Table 1), we used factors of 1.5 (0–4), 2.5 (5–14), 3.0 (15–24), 3.0 (25–44), 2.5 (45–64), and 2.0 (65+), respectively. While the overdispersion increases, the parameters of the mean are close to the original fit and the epidemic proportion is similar (supplementary Figure S9). For small strata with a low number of cases, a drawback of this simple deterministic approach is that zero reported counts remain zero regardless of the amount of under-reporting. More sophisticated adjustments are currently being investigated within a Bayesian modelling framework. In principle, asymptomatic infections could be similarly accounted for as missing cases, but they seem to play a minor role in disease transmission (Sukhrie and others, 2012). One-week-ahead forecasts or long-term simulations of the number of (symptomatic) infections, however, are of particular relevance for public health planning. Whether the improved model with social contact data also leads to better predictions will be described elsewhere.



We thank the associate editor, two anonymous referees, and Michael Höhle for helpful comments on a previous version of this manuscript. Joël Mossong made the POLYMOD data available, and a KML file of Berlin’s districts was obtained from the Statistical Office of Berlin-Brandenburg.



Swiss National Science Foundation (project #137919).


Supplementary material

Supplementary material is available at

plus .3ex


  • Baguelin and others (2013) Baguelin, M., Flasche, S., Camacho, A., Demiris, N., Miller, E. and Edmunds, W. J. (2013). Assessing optimal target populations for influenza vaccination programmes: An evidence synthesis and modelling study. PLOS Medicine 10(10), e1001527.
  • Bernard and others (2014) Bernard, H., Faber, M., Wilking, H., Haller, S., Höhle, M., Schielke, A., Ducomble, T., Siffczyk, C., Merbecks, S. S., Fricke, G., Hamouda, O., Stark, K., Werber, D. and others. (2014a). Large multistate outbreak of norovirus gastroenteritis associated with frozen strawberries, Germany, 2012. Eurosurveillance 19(8), pii=20719.
  • Bernard and others (2014) Bernard, H., Höhne, M., Niendorf, S., Altmann, D. and Stark, K. (2014b). Epidemiology of norovirus gastroenteritis in Germany 2001–2009: Eight seasons of routine surveillance. Epidemiology & Infection 142(1), 63–74.
  • Bernard and others (2014) Bernard, H., Werber, D. and Höhle, M. (2014c). Estimating the under-reporting of norovirus illness in Germany utilizing enhanced awareness of diarrhoea during a large outbreak of Shiga toxin-producing E. coli O104:H4 in 2011 – a time series analysis. BMC Infectious Diseases 14(1), 116.
  • Birrell and others (2011) Birrell, P. J., Ketsetzis, G., Gay, N. J., Cooper, B. S., Presanis, A. M., Harris, R. J., Charlett, A., Zhang, X. S., White, P. J., Pebody, R. G. and others. (2011). Bayesian modeling to unmask and predict influenza A/H1N1pdm dynamics in London. Proceedings of the National Academy of Sciences of the United States of America 108(45), 18238–18243.
  • Brockmann and others (2006) Brockmann, D., Hufnagel, L. and Geisel, T. (2006). The scaling laws of human travel. Nature 439(7075), 462–465.
  • Farrington and others (2003) Farrington, C. P., Kanaan, M. N. and Gay, N. J. (2003). Branching process models for surveillance of infectious diseases controlled by mass vaccination. Biostatistics 4(2), 279–295.
  • Geilhufe and others (2014) Geilhufe, M., Held, L., Skrøvseth, S. O., Simonsen, G. S. and Godtliebsen, F. (2014). Power law approximations of movement network data for modeling infectious disease spread. Biometrical Journal 56(3), 363–382.
  • Gibbons and others (2014) Gibbons, C. L., Mangen, M.-J., Plass, D., Havelaar, A. H., Brooke, R. J., Kramarz, P., Peterson, K. L., Stuurman, A. L., Cassini, A., Fèvre, E. M. and others. (2014). Measuring underreporting and under-ascertainment in infectious disease datasets: a comparison of methods. BMC Public Health 14(1), 1–17.
  • Goeyvaerts and others (2010) Goeyvaerts, N., Hens, N., Ogunjimi, B., Aerts, M., Shkedy, Z., van Damme, P. and Beutels, P. (2010). Estimating infectious disease parameters from data on social contacts and serological status. Journal of the Royal Statistical Society, Series C 59(2), 255–277.
  • Goeyvaerts and others (2015) Goeyvaerts, N., Willem, L., van Kerckhove, K., Vandendijck, Y., Hanquet, G., Beutels, P. and Hens, N. (2015). Estimating dynamic transmission model parameters for seasonal influenza by fitting to age and season-specific influenza-like illness incidence. Epidemics 13, 1–9.
  • Götz and others (2001) Götz, H., Ekdahl, K., Lindbäck, J., de Jong, B., Hedlund, K. O. and Giesecke, J. (2001). Clinical spectrum and transmission characteristics of infection with Norwalk-like virus: Findings from a large community outbreak in Sweden. Clin. Infect. Dis. 33(5), 622–628.
  • Held and others (2005) Held, L., Höhle, M. and Hofmann, M. (2005). A statistical framework for the analysis of multivariate infectious disease surveillance counts. Statistical Modelling 5(3), 187–199.
  • Held and Paul (2012) Held, L. and Paul, M. (2012). Modeling seasonality in space-time infectious disease surveillance data. Biometrical Journal 54(6), 824–843.
  • Held and Sabanés Bové (2014) Held, L. and Sabanés Bové, D. (2014). Applied Statistical Inference: Likelihood and Bayes. Berlin: Springer.
  • Hens and others (2009) Hens, N., Ayele, G., Goeyvaerts, N., Aerts, M., Mossong, J., Edmunds, J. and Beutels, P. (2009). Estimating the impact of school closure on social mixing behaviour and the transmission of close contact infections in eight European countries. BMC Infectious Diseases 9(1), 187.
  • Herzog and others (2011) Herzog, S. A., Paul, M. and Held, L. (2011). Heterogeneity in vaccination coverage explains the size and occurrence of measles epidemics in German surveillance data. Epidemiology & Infection 139(04), 505–515.
  • Höhle (2016) Höhle, M. (2016). Infectious Disease Modelling. In: Lawson, A. B., Banerjee, S., Haining, R. P. and Ugarte, M. D. (editors), Handbook of Spatial Epidemiology, Chapman & Hall/CRC Handbooks of Modern Statistical Methods, Chapter 26. Boca Raton: Chapman and Hall/CRC, pp. 477–500.
  • Kucharski and others (2015) Kucharski, A. J., Conlan, A. J. K. and Eames, K. T. D. (2015). School’s out: seasonal variation in the movement patterns of school children. PLOS ONE 10(6), 1–10.
  • Küchenhoff and others (2006) Küchenhoff, H., Mwalili, S. M. and Lesaffre, E. (2006). A general method for dealing with misclassification in regression: The misclassification SIMEX. Biometrics 62(1), 85–96.
  • Marshall and Bruggink (2011) Marshall, J. A. and Bruggink, L. D. (2011). The dynamics of norovirus outbreak epidemics: Recent insights. International Journal of Environmental Research and Public Health 8(4), 1141–1149.
  • Meyer and Held (2014) Meyer, S. and Held, L. (2014). Power-law models for infectious disease spread. Annals of Applied Statistics 8(3), 1612–1639.
  • Meyer and others (2016) Meyer, S., Held, L. and Höhle, M. (2016). Spatio-temporal analysis of epidemic phenomena using the R package surveillance. Journal of Statistical Software. In press. Preprint available from
  • Mossong and others (2008) Mossong, J., Hens, N., Jit, M., Beutels, P., Auranen, K., Mikolajczyk, R., Massari, M., Salmaso, S., Tomba, G. S., Wallinga, J., Heijne, J., Sadkowska-Todys, M., Rosinska, M. and others. (2008). Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Medicine 5(3), e74.
  • O’Dea and others (2014) O’Dea, E. B., Pepin, K. M., Lopman, B. A. and Wilke, C. O. (2014). Fitting outbreak models to data from many small norovirus outbreaks. Epidemics 6, 18–29.
  • Paul and Held (2011) Paul, M. and Held, L. (2011). Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Statistics in Medicine 30(10), 1118–1136.
  • Paul and others (2008) Paul, M., Held, L. and Toschke, A. (2008). Multivariate modelling of infectious disease surveillance data. Statistics in Medicine 27(29), 6250–6267.
  • Pringle and others (2015) Pringle, K., Lopman, B., Vega, E., Vinje, J., Parashar, U. D. and Hall, A. J. (2015). Noroviruses: Epidemiology, immunity and prospects for prevention. Future Microbiology 10(1), 53–67.
  • R Core Team (2016) R Core Team. (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Read and others (2012) Read, J. M., Edmunds, W. J., Riley, S., Lessler, J. and Cummings, D. A. T. (2012). Close encounters of the infectious kind: Methods to measure social mixing behaviour. Epidemiology & Infection 140(12), 2117–2130.
  • Read and others (2014) Read, J. M., Lessler, J., Riley, S., Wang, S., Tan, L. J., Kwok, K. O., Guan, Y., Jiang, C. Q. and Cummings, D. A. T. (2014). Social mixing patterns in rural and urban areas of southern China. Proceedings of the Royal Society of London, Series B 281(1785), 20140268.
  • Rohani and others (2010) Rohani, P., Zhong, X. and King, A. A. (2010). Contact network structure explains the changing epidemiology of pertussis. Science 330(6006), 982–985.
  • Schrödle and others (2012) Schrödle, B., Held, L. and Rue, H. (2012). Assessing the impact of a movement network on the spatiotemporal spread of infectious diseases. Biometrics 68(3), 736–744.
  • Smieszek and others (2014) Smieszek, T., Barclay, V., Seeni, I., Rainey, J., Gao, H., Uzicanin, A. and Salathé, M. (2014). How should social mixing be measured: comparing web-based survey and sensor-based methods. BMC Infectious Diseases 14(1), 136.
  • Sukhrie and others (2012) Sukhrie, F. H. A., Teunis, P., Vennema, H., Copra, C., Thijs Beersma, M. F. C., Bogerman, J. and Koopmans, M. (2012). Nosocomial transmission of norovirus is mainly caused by symptomatic cases. Clinical Infectious Diseases 54(7), 931–937.
  • van Asten and others (2012) van Asten, L., van den Wijngaard, C., van Pelt, W., van de Kassteele, J., Meijer, A., van der Hoek, W., Kretzschmar, M. and Koopmans, M. (2012). Mortality attributable to 9 common infections: Significant effect of influenza A, respiratory syncytial virus, influenza B, norovirus, and parainfluenza in elderly persons. Journal of Infectious Diseases 206(5), 628–639.
  • van Kerckhove and others (2013) van Kerckhove, K., Hens, N., Edmunds, W. J. and Eames, K. T. D. (2013). The impact of illness on social networks: Implications for transmission and control of influenza. American Journal of Epidemiology 178(11), 1655–1662.
  • Wallinga and others (2006) Wallinga, J., Teunis, P. and Kretzschmar, M. (2006). Using data on social contacts to estimate age-specific transmission parameters for respiratory-spread infectious agents. American Journal of Epidemiology 164(10), 936–944.
  • Worby and others (2015) Worby, C. J., Chaves, S. S., Wallinga, J., Lipsitch, M., Finelli, L. and Goldstein, E. (2015). On the relative role of different age groups in influenza epidemics. Epidemics 13, 10–16.
  • Xia and others (2004) Xia, Y., Bjørnstad, O. N. and Grenfell, B. T. (2004). Measles metapopulation dynamics: A gravity model for epidemiological coupling and dynamics. The American Naturalist 164(2), 267–281.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description