Hyak Mortality Monitoring System
Innovative Sampling and Estimation Methods
Proof of Concept by Simulation
green
Abstract
Traditionally health statistics are derived from civil and/or vital registration. Civil registration in low to middleincome countries varies from partial coverage to essentially nothing at all. Consequently the state of the art for public health information in low to middleincome countries is efforts to combine or triangulate data from different sources to produce a more complete picture across both time and space – data amalgamation. Data sources amenable to this approach include sample surveys, sample registration systems, health and demographic surveillance systems, administrative records, census records, health facility records and others.
We propose a new statistical framework for gathering health and population data – Hyak – that leverages the benefits of sampling and longitudinal, prospective surveillance to create a cheap, accurate, sustainable monitoring platform. Hyak has three fundamental components:

Data Amalgamation: a sampling and surveillance component that organizes two or more data collection systems to work together: (1) data from HDSS with frequent, intense, linked, prospective followup and (2) data from sample surveys conducted in large areas surrounding the Health and Demographic Surveillance System (HDSS) sites using informed sampling so as to capture as many events as possible;

Cause of Death: verbal autopsy to characterize the distribution of deaths by cause at the population level; and

SES: measurement of socioeconomic status in order to characterize poverty and wealth.
We conduct a simulation study of the informed sampling component of Hyak based on the Agincourt HDSS site in South Africa. Compared to traditional cluster sampling, Hyak’s informed sampling captures more deaths, and when combined with an estimation model that includes spatial smoothing, produces estimates of both mortality counts and mortality rates that have lower variance and small bias.
ACKNOWLEDGMENTS
Preparation of this manuscript was supported by the Bill and Melinda Gates Foundation and grants K01 HD057246 and K01 HD078452 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD). JW was supported by R01CA095994. The authors are grateful to Peter Byass, Basia Zaba, Stephen Tollman, Adrian Raftery, Philip Setel, Osman Sankoh and two very constructive anonymous reviewers for helpful discussions or other inputs.
Contents
1 New Directions for Health and Population Statistics in Low to MiddleIncome Countries
1.1 Background
In most of the developed world the traditional source of basic public health information is civil registration and vital statistics. Civil registration is a system that records births and deaths within a government jurisdiction. The purpose is twofold: (1) to create a legal record for each person, and (2) to provide vital statistics. Optimally a civil register includes everyone in the jurisdiction, provides the basis to ensure their civil rights and creates a steady stream of vital statistics (United Nations. Statistical Division, 2014).
The vital statistics obtained from many wellfunctioning civil registration systems include birth rates by age of mother, mortality rates by sex, age and other characteristics, and causes of death for each death. These basic indicators are the foundation of public health information systems, and when they are taken from a nearfullcoverage civil registration system, they relate to the whole population.
Although the idea is inherently simple, implementing fullcoverage civil registration is not, and only the world’s richest countries are able to maintain ongoing civil registration systems that cover a majority of the population. Civil registration in the rest of the world varies from partial coverage to essentially nothing at all (Mathers et al., 2005). A fourarticle series titled “Who Counts?” in the Lancet in 2007 reviews the current state of civil registration (AbouZahr et al., 2007; Boerma and Stansfield, 2007; Hill et al., 2007; Horton, 2007; Mahapatra et al., 2007; Setel et al., 2007). This was followed eight years later with another fourarticle series presenting a similar but slightly more hopeful picture (AbouZahr et al., 2015b, a; Mikkelsen et al., 2015; Phillips et al., 2015). The authors lament that there has been a half a century of neglect in civil registration in low to middleincome countries, and critically, that it is not possible to obtain useful vital statistics from those countries (Mahapatra et al., 2007; Setel et al., 2007; Mikkelsen et al., 2015).
The Lancet authors argue that in the long term all countries need complete civil registration to ensure the civil rights of each one of their citizens and to provide useful, timely public health information (AbouZahr et al., 2007, 2015a), and they explore a number of interim options that would allow countries to move from where they are today to full civil registration (Hill et al., 2007). Echoing the Lancet special series are additional urgent pleas for better health statistics in low and middleincome countries (for example: Abouzahr et al., 2010; Bchir et al., 2006; Mathers et al., 2005, 2009; Rudan et al., 2000). The WHO and its partners and supporters have actively supported improvements in civil registration and vital statistics (CRVS) over the recent past (World Health Organization, 2013a, b, 2014). These workers clearly identify a need for representative data describing sex, age, and causespecific mortality through time in small enough areas to be meaningful for local governance and health institutions. These critiques are for the most part discussed in the framework of civil registration as the ‘primary’ source of data.
Recently, various United Nations agencies, including the office of the Secretary General, have articulated strong, specific support for rapid improvement in the evidence base for the Sustainable Development Goals (SDG) (United Nations, 2014b) – the international target framework that follows from the MDGs (e.g., Data Revolution Group: The UN Secretary General’s Independent Expert Advisory Group on a Data Revolution for Sustainable Development, 2014; Commission on Population and Development, 2016; United Nations, 2016). The appropriately named Data Revolution (United Nations, 2014a) is the flagship program organized by the UN to address the systematic lack of data to measure progress toward the SDG targets.
We agree that in order to ensure civil rights and provide each unique citizen with a legal identity, fullcoverage civil registration is the longterm goal. Acknowledging that, we propose decoupling the discussion of civil registration from vital statistics. In particular, we can obtain accurate and representative vital statistics measurements by making inferences from carefully adjusted samples.
The samplebased approach drives the production of population statistics in many other fields including economics, sociology and political science. Borrowing from these fields public health workers have developed sampledriven approaches to health statistics that partially substitute for vital statistics derived from civil registration. India has conducted a sample registration system (SRS) for several decades (Office of the Registrar General & Census Commissioner, India, 2012) that has produced good basic vital statistics, and more recently Jha and colleagues (2006) have added verbal autopsy (Lopez et al., 2011) to this system to create the Indian Million Death Study (MDS). In a similar vein, USAID’s Sample Vital Registration with Verbal Autopsy (SAVVY) is a program that combines sample registration with verbal autopsy and provides generalpurpose tools to collect data (MEASURE Evaluation, 2012). USAID’s Demographic and Health Surveys (DHS) (Measure DHS, 2012) and UNICEF’s Multiple Indicator Cluster Surveys (MICS) (UNICEF  Statistics and Monitoring, 2012) are good examples of traditional household surveys that describe a select subset of indicators for national populations at multiple points in time. There are many more similar sample surveys conducted by smaller organizations and aimed at specific diseases or the evaluation of specific interventions.
These approaches generally utilize sampling designs developed to provide crosssectional snapshots of the current state of the population with respect to an indicator. With the exception of India’s SRS and SAVVY, they lack the ongoing, prospective, longitudinal structure of a traditional vital registration system. They also often lack the spatial resolution to distinguish differences in indicator values across short distances. Finally, they often miss or undercount rare events because they typically take one measurement and rely on recall to fill in recent history.
The current state of the art for public health information in low to middleincome countries is efforts to combine or triangulate data from multiple sources to produce a more complete picture across both time and space. The usual sources of data include: nonrepresentative, lowcoverage, poor quality vital registration data; roughly onceperdecade census data; snapshot or repeated snapshot data from (sometimes nationally representative) household surveys; oneoff sample surveys conducted for a variety of specific reasons by a diverse array of organizations; sample registration systems; and finally, a hodgepodge of miscellaneous data sources that may include health and demographic surveillance systems (HDSS), sentinel surveillance systems, administrative records, clinic/hospital records and others.
Combining data from different sources with multiple sampling schemes presents a myriad of statistical challenges. We use data pooling as a broad term that describes methods that adjust for bias due to differences in representativeness across data from different sources. The global burden of disease study by the Institute for Health Metrics and Evaluation (Naghavi et al., 2015) is a highly visible example of data pooling. As another example, Gething et al. (2011) pool survey data to produce fine geographical scale Plasmodium falciparum malaria endemicity. Data amalgamation, also uses data from multiple sources, but is differentiated by active engagement in the data collection process. Data amalgamation uses proactive (e.g. Hyak) or adaptive mechanisms that actively adjust the data collection process to optimize a set of metrics – minimize bias, minimize variance, minimize cost, etc. A recent study of malaria prevalence by Kabaghe et al. (2017) is an example of data amalgamation in which survey locations are adaptively chosen to minimize the variance of a target, see Chipeta et al. (2016) for statistical details. In the survey sampling literature, adaptive cluster sampling has a relatively long history (Thompson and Seber, 1996), and has been used extensively in surveys of rare animal and plant species; we are not aware of any applications in the context considered here. In short, we use data pooling for situations where researchers combine several datasets not necessarily collected to measure the indicator of interest, whereas data amalgamation is an intentional strategy that incorporates multiple heterogeneous data sources into the design process.
Rowe (2009) and Bryce and Steketee (2010) describe a system of ‘integrated, continuous surveys’ that would produce ongoing, longitudinal monitoring of a variety of outcomes – a proactive engagement with the data collection process in keeping with our definition of amalgamation. Data from such a system could be representative with respect to population, time and space and thereby substitute for and improve on traditional vital statistics data. The idea is to systematize the nationally representative household surveys already implemented in a country, conduct them on a regular schedule with a permanent team and institute rigorous quality controls. The innovation is to turn traditional cross sectional surveys into something quasi longitudinal and to ensure a level of consistency and quality. This concept appears to still be in the idea stage without any real methodological development or realworld testing. More in the spirit of data amalgamation, Bryce and colleagues (2004) use a variety of data sources to conduct a multicountry evaluation of Integrated Management of Childhood Illness (IMCI) interventions. This evaluation does develop some adhoc methods for combining and interpreting data from diverse sources.
Victora et al. (2011) articulate a similar vision for a national platform for evaluating the effectiveness of public health interventions, specifically those targeting the Millennium Development Goals (MDG). The authors argue that national coverage with districtlevel granularity is necessary, and like Rowe and Bryce, that continuous monitoring is required to assess changes and thereby intervention impacts. That article contains significant discussion of general survey methods, sample size considerations and other methodological requirements that would be necessary to evaluate MDG interventions. Again however, there are no methodological details that would allow someone to design and implement a national, prospective survey system of the type described.
Several authors who work at HDSS sites have described an idea for carefully distributing HDSS sites throughout a country in way that could lead to a pseudo representative description of health indicators in the country through time (Ye et al., 2012). Although these authors do not provide details for how this could be done or evidence that it works, the basic idea is supported by work from Byass and his colleagues (2011) who examine the national representativeness of health indicators generated in individual Swedish counties in 1925. Byass and colleagues discover that any of the notobviouslyunusual counties produced indicator values that were broadly representative of the national population – the counties being roughly equivalent to an HDSS site, and Sweden in 1925 being roughly equivalent to low and middleincome countries today.
Prabhat Jha (2012) summarizes all of this in his description of five ideas for improving mortality monitoring with cause of death. His five ideas include SRS systems with verbal autopsy, improving the representativeness of HDSS (similar to Ye and colleagues (2012)), coordinating, representative retrospective surveys (similar to Rowe and Bryce) and finally using whatever decentquality civil registration data might be available.
We find only two fully implemented and demonstrated examples of data amalgamation in the public health sphere. Alkema and colleagues (2007; 2008) working with the UNAIDS Reference Group on Estimates, Modelling and Projections develop a Bayesian statistical method that simultaneously estimates the parameters of an epidemiological model that represents the timeevolving dynamics of HIV epidemics and calibrates the results of that model to match populationwide estimates of HIV prevalence. The epidemiological model is fit to sentinel surveillance data describing HIV prevalence among pregnant women who attend antenatal clinics, and the populationwide measures of prevalence come from DHS surveys. Interestingly the second example relates to a similar problem. Lanjouw and Ivaschenko at the World Bank (2010) describe a method to amalgamate populationlevel data from DHS surveys and HIV prevalence data from a sentinel surveillance system. The DHS contains representative information on a variety of items but not HIV prevalence, and the sentinel surveillance system describes the HIV prevalence of a select (nonrepresentative) subgroup, again pregnant women who attend antenatal clinics. Building on ideas in smallarea estimation, they develop and demonstrate a method to adjust the sentinel surveillance data and then predict the HIV prevalence of the whole population.
Although these are two specific applications of data amalgamation, it is this level of conceptual and methodological detail that are necessary in order to amalgamate data from different sources to produce representative, probabilistically meaningful results. The population, public health and evaluation literatures are full of urgent requests for better data and more useful methods to amalgamate data from different sources to answer questions about cause and effect and change at national and subnational levels, but there is very little in any of those literatures that actually develops the new concepts and methods that are necessary to deliver the required new capabilities. Chipeta et al. (2016) describe an adaptive design whose aim is to estimate disease prevalence.
1.2 A New Statistical Platform
Taking account of the situation described in the literature and firmly in the spirit of ‘data amalgamation’, we aim to develop a system that provides high quality, continuously generated, representative vital statistics and other population and health indicators using a system that is cheap and logistically tractable. We are confident that such a system can provide highly useful health information at all important geographical (and other) scales: nation, province, district, and perphaps even subdistrict.
As we argue above, we strongly believe that a samplebased approach is both appropriate and sufficient to produce meaningful, useful public health information, and we do not believe it is fiscally responsible to attempt to cover the entire population with a public health information system. That argument must be made on the basis of guaranteeing human rights alone.
1.3 Design Criteria
What we want is a cheap, sustainable, continuously operated monitoring system that combines the benefits of both sample surveys (representativity, sparse sampling, logistically tractable) and surveillance systems (detailed, linked, longitudinal, prospective with potentially intense monitoring – e.g. of pregnancy outcomes and neonatal deaths) to provide useful indicators for large populations over prolonged periods of time, so that we can monitor change and relate changes to possible determinants, including interventions. More specifically, ‘useful’ in this context means an informative balance of accuracy (bias) and precision (variance) – i.e. minimal but probably not zero bias accompanied by moderate variance. We want indicators that are close to the truth most of the time, and we want an ability to study causality properly. Critically, we want the whole system to be cheaper and more sustainable than existing systems, and perhaps offer additional advantages as well.
1.4 Hyak
We propose an integrated data collection and statistical analysis framework for improved population and public health monitoring in areas without comprehensive civil registration and/or vital statistics systems. We call this platform Hyak – a word meaning ‘fast’ in the Chinook Jargon of the Northwestern United States.
Hyak is conceived as having three fundamental components:

Data Amalgamation: a sampling and surveillance component that organizes two data collection systems to work together to provide the desired functionality: (1) data from HDSS with frequent, intense, linked, prospective followup and (2) data from sample surveys conducted in large areas around the HDSS sites using informed sampling so as to capture as many events as possible.

Verbal Autopsy (Lopez et al., 2011) to estimate the distribution of deaths by cause at the population level, and

SES: measurement of socioeconomic status (SES) at household, and perhaps other levels, in order to characterize poverty and wealth.
Hyak uses relatively small, intensive, longitudinal HDSS sites to understand what types of individuals (or households) are likely to be the most informative if they were to be included in a sample. With this knowledge the areas around the HDSS sites are sampled with preference given to the more informative individuals (households), thus increasing the efficiency of sampling and ensuring that sufficient data are collected to describe rare populations and/or rare events. This fully utilizes the information generated on an ongoing basis by the HDSS and produces indicator values that are representative of a potentially very large area around the HDSS site(s). Further, the information collected from the sample around the HDSS site can be used to calibrate the more detailed data from the HDSS, effectively allowing the detail in the HDSS data to be extrapolated to the larger population. For an example of how this has been done in the context of antenatal clinic HIV prevalence surveillance and DHS surveys, see Alkema et al. (2008). Another way to do this is to build a hierarchical Bayesian model of the indicator of interest, say mortality, with the HDSS being the first (informative) level and the surrounding areas being at the second level. Thus the surrounding area can borrow information from the HDSS but is not required to match or mirror the HDSS.
In the remainder of this work we focus on the informed sampling component of Hyak. Informed sampling seeks to capture as many events as possible. This is critical for the measurement of mortality, and especially for the measurement of causespecific mortality fractions (CSMF) at the population level. In order to adequately characterize the epidemiology of a population, it is necessary to measure the CSMF with some precision, and to do this a large number of death events with verbal autopsy are required, especially for rare causes. Informed sampling aims to make the measurement of mortality rates and CSMFs as efficient as possible.
Below we present a detailed example of the informed sampling idea and a pilot study based on information from the Agincourt HDSS site^{1}^{1}1From Kahn et al. (2012): The Agincourt health and sociodemographic surveillance system (HDSS), located in rural northeast South Africa close to the Mozambique border, was established in 1992 to support district health systems development led by the postapartheid ministry of health. At baseline in 1992, 57,600 people were recorded in 8,900 households in 20 villages; by 2006, the population had increased to 70,000 people in 11,700 households. This increase is partly due to Mozambican inmigrants overlooked in the baseline survey and to a new settlement established as part of the postapartheid governmentÕs Reconstruction and Development Program. In 2007, the study area was extended to include the catchment area of a new privately supported community health centre established to provide HIV treatment before public sector rollout of HAART. By mid2011, the population under surveillance comprised 90,000 people residing in 16,000 households in 27 villages. Households are selfdefined as Ôpeople who eat from the same pot of foodÕ. Given sustained high levels of temporary labour migration in southern Africa, we included temporary migrants residing for less than 6 months per year who retain close ties with their rural homes in the HDSS. There have been 17 census and vital event update rounds conducted strictly annually since 2000. Participation is virtually complete, with only two households refusing to participate in 2011. in South Africa (Kahn et al., 2007, 2012). The Agincourt HDSS is situated in the rural northeast of South Africa and covers an area of 420km^{2} comprising a subdistrict of 27 villages. The site monitors roughly 90,000 people in 16,000 households. The villages and households are dispersed widely across this area, and there is a functional road network linking them all. The epidemiology of the site is typical for South Africa with generally low mortality except for the effect of HIV at very young and middle ages, and in terms of wealth/poverty, the population is typical of a middleincome country (e.g. Kabudula et al., 2016; Clark et al., 2015; Houle et al., 2014; Clark et al., 2013; GómezOlivé et al., 2013; Houle et al., 2013). The Agincourt HDSS is the canonical HDSS, not extreme along any dimension, and generally representative of what an HDSS site is.
We generate virtual populations based on information from the Agincourt site, and then we simulate applications of traditional twostage cluster and Hyak sampling designs. We estimate sexagespecific mortality rates for children ages years (last birthday) and compare and discuss the results. In the Conclusions Section we describe how verbal autopsy methods can be integrated into the Hyak system and the ‘demographic feasibility’ of Hyak.
We are thinking about existing data collection methods and these objectives in a unified framework, and we are starting by experimenting with sampling and analytical frameworks that work together to provide the basis for a measurement system that is representative, accurate and efficient in terms of information gained per dollar spent (not the same as cheap in an absolute sense because estimation of a binary outcome like death is still bound by the fundamental constraints of the binomial model; i.e. relatively large numbers of deaths are needed for useful measurements).
A measurement system like this would be among the cheapest and most informative ways to monitor the mortality of children affected by interventions that cover large areas and exist for prolonged periods of time. With this in mind, the pilot project we present below focuses on childhood ages .
2 Pilot Study of Hyak InformedSampling via Simulation
2.1 Methodological Approach
In this section we describe our approach to sampling and analysis. To be concrete, we suppose that the outcome of interest is alive or dead for children age . There are two novel aspects to our approach:

Informed Sampling: Using existing information from a HDSS site we construct a mortality model based on villagelevel characteristics. On the basis of this model we subsequently predict the number of outcomes of interest in each village of the study region. We then set sample sizes in each village in proportion to these predictions.

Analysis: We model the sampled deaths as a function of known demographic factors and villagelevel characteristics, and then we employ spatial smoothing to tune the model to each village and exploit similarities of risk in neighboring villages.
2.1.1 Notation
Given our interest in the binary status alive or dead, our modeling framework is logistic regression with random effects. Specifically, let represent villages within the study region and index strata which we take as the four levels of sex (F, M) and age (Young: years, old: years). Households within areas will be represented by , for . The quantity of interest is , the unobserved true number of deaths in village and in sex/age stratum . We assume that the populations are known in all villages. Also assumed known are villagespecific covariates (for example, the average SES in village , a measure of water quality, or proximity to health care facilities).
The probability of dying in village and stratum is denoted by , which is the hypothetical proportion of children dying in a hypothetical infinite population in area and strata . We stress that we are carrying out a smallarea estimation problem so the target of interest is and the probability is just an intermediary which allows us to set up a model. If the full data were observed, we would take the probability to be the observed frequency . The survey design problem corresponds to choosing , the number of children in stratum that we sample in village . Of these, are recorded as dying.
In the next section, we describe models that will be used to analyze the data; once we have estimated probabilities from a generic model, , we use the estimator:
(1) 
where is the observed number of deaths and is the number of unsampled individuals in village and stratum .
2.1.2 Models
In this section, we describe models that may be fit to the sampled data.

Naïve Model: This baseline model simply estimates , i.e., a single probability is applied to the unsampled individuals in each village. The predicted number of deaths in each village is then (1) with .

Strata Model: This model estimates , so that estimates of four stratumspecific probabilities are calculated. The predicted number of deaths in each village is then (1) with .

Covariate Model: This approach fits a model to data from all villages where sampling was carried out and estimates stratum effects along with the association between risk and villagelevel covariates . We assume a logistic form,
(2) where . Hence, we have a model with a separate baseline for each stratum and with the covariates having a common effect across stratum and village, so there no interaction between covariates and stratum, and covariates and area. We use the maximum likelihood estimates and to obtain fitted probabilities:
which may be used in (1).

Spatial Covariate Model: This approach requires sufficient villages to have sampled data so that spatial random effects can be estimated. Specifically, we assume a Bayesian implementation of the model:
(3) where . We have three random effects in this model. The unstructured village and householdlevel error terms and , respectively, are independent and allow for excessbinomial variability. The householdlevel random effects also allow for dependence within households. The error terms are villagelevel spatial random effects that allow the smoothing of rates across space. There are many different forms that these random effects could take. A modelbased geostatistical approach (Diggle et al., 1998) would assume the collection arise from a multivariate normal distribution, with covariances a function of the distance between villages. We go a different route and use an intrinsic conditional autoregressive (ICAR) model (Besag et al., 1991) in which:
where is the set of neighbors of village and is the number of such neighbors. This model assumes that the prior distribution for the spatial effect in area , given its neighbors, is centered on the mean of the neighbors, with a variance that depends on the number of neighbors (with more neighbors reducing the prior variance). We describe our ‘shared boundary’ neighborhood scheme in the next section. We use the posterior means and to obtain fitted probabilities:
which may be used in (1); we do not include the household random effects as these are not relevant to predicting an arealevel summary, but rather account for withinhousehold clustering. Until relatively recently, fitting this model was computationally challenging within the context of a simulation study (which requires repeated fitting). However, Rue et al. (2009) have described a clever combination of Laplace approximations and numerical integration that can be used to carry out Bayesian inference for this this model – the integrated nested Laplace approximation (INLA). The INLA R package implements the INLA method. A Bayesian implementation requires specification of priors for all of the unknown parameters, which for model (3) consist of , , , and . We choose flat priors for , , and Gamma priors for , and .
2.1.3 The Simulation Study Region
We describe the study region that we create for the simulation study, in order to provide a context within which the different sampling strategies can be described. The study region is based on the Agincourt HDSS site in South Africa (Kahn et al., 2007, 2012). We assume individuals reside in one of 20 villages and that there are between 1,400 and 14,000 children in each village, . In addition for each village, we assume half the children are boys and half are girls, with 20% in the age range years and 80% in the age range years. Within each village, we assume that households contain between 1 and 5 children and follow the distribution

(household with 1 child)

(household with 2 children)

(household with 3 children)

(household with 4 children)

(household with 5 children)
We sample a single population of children and then take repeated draws from this population under the four sampling schemes described below. Beginning with the denominators , we sample the observed deaths using a binomial with probabilities given by (2).
We sample a second population of children and treat this population as a historical cohort. It is from this population that we treat three of these villages as HDSS sites for which we have extensive and complete information.
We form a Voronoi tessellation of the village boundaries based on the 20 coordinate pairs that describe the centroids of the villages. This operation forms a set of tiles, each associated with a centroid and is the set of points nearest to that point. This is a standard operation in spatial statistics (e.g. Denison and Holmes, 2001). We can then define neighbors (for the spatial model) as those villages whose tiles share an edge. Figure 1 shows the study region along with village centroids and associated village polygons (as defined by the Voronoi tesselations), along with edges showing the neighborhood structure.
2.1.4 Sampling Strategies
In this section we describe the sampling strategies that we compare. In each strategy, we consider four different sample sizes, , for the total number of children sampled: 1,300, 2,600, 3,900 and 5,200.

Twostage Cluster Sampling: Randomly select 5 villages and randomly sample households within each of the villages (since each household contains, on average, 3 children). Additional households will be sampled as needed until at least children are obtained from each village. This is an example of a twostage cluster sampling plan, a common design.

Stratified Sampling: Randomly sample children’s outcomes from each of the 20 villages. This strategy lies between the cluster sampling and informed sampling designs.

Hyak – HDSS with Informative Sampling: The number of children sampled from each village is proportional to the predicted number of deaths based on the HDSS data. In particular, we select all children from the three HDSS villages in the historical cohort and we fit model (2). On the basis of the estimated , we obtain predicted counts of deaths for all villages, using the villagelevel covariates , . Let be the estimated parameters based on the historic HDSS data only and be the associated village and stratumspecific probabilities. We estimate via
Then, the predicted number of deaths are . We then select sample sizes as the (rounded versions of) so that villages with more predicted deaths are sampled more heavily. Specifically, we take where is the total predicted number of deaths. The observed number of deaths from is .

Optimum Allocation: As in the Hyak sampling design, we obtain the villagelevel estimates of the probability of death, , based on the historic HDSS data only. We then select sample sizes as the (rounded versions of)
Details are provided in the Appendix.
2.1.5 Measures of Predictive Accuracy
Given total children, broken into the four stratum, we can set risks (details of which appear in Section 2.2) for each village/stratum and then simulate counts . We take this set of as fixed, and then subsample from these counts, under each of the four designs and repeat times.
The estimated number of deaths in survey villages in simulation is
where the are obtained from one of the models we described in Section 2.1.2.
To estimate the frequentist properties of the simulation procedure, we summarize the results by examining various summary measures. An obvious measure of accuracy is the mean squared error (MSE) associated with the predicted number of deaths. The MSE of an estimator of the number of deaths in area and strata , averaged over villages and strata is
where is the true number of deaths (which recall, is fixed), and the expectation is over all possible samples that can be taken (for whichever design we are considering). This MSE is estimated based on simulations:
(4)  
where is the true number of deaths in village and stratum and
is the average of the predicted counts over simulations in village and stratum . The decomposition in terms of bias and variance is useful since it makes apparent the tradeoff involved in modeling.
2.2 Simulation
We assume there are two villagelevel covariates so that the length of the vector is 2. Both of the villagelevel covariates and are generated independently from uniform distributions on 0 to 1, . Based loosely on the real values from the Agincourt HDSS in South Africa, the parameter values we use in the simulation are:

The risk of death in young girls is .

The risk of death in young boys is .

The risk of death in older girls is .

The risk of death in older boys is .

The first villagelevel covariate has so that a unit increase in leads to the odds of death dropping by one ninth.

The second villagelevel covariate has so that a unit increase in leads to the odds of death quadrupling.

We set to determine the level of unstructured variability at the village level. This leads to a 95% range for the residual unstructured villagelevel odds being .

We set to determine the level of structured variability at the village level. This operation requires some care because the ICAR model does not define a proper probability distribution. The ICAR variance is not interpretable as a marginal variance (and so is not comparable to the other random effects variances, and ) and so instead Figure 2 shows a simulated set of , values, with darker values indicating higher risk. The spatial dependence is apparent, with this realization producing high risk to the West of the region and low risk in the East.

We set to determine the level of unstructured variability at the household level. This leads to a 95% range for the residual unstructured householdlevel odds being .
For the strata and covariates model, the covariate relationship is estimated from the villages that produced data, and then model (2) is used to obtain fitted probabilities that are applied to the unsampled villages, using the population and covariate information that is assumed known for each village.
Combining all of the elements of the model, we generate deaths for village and stratum by randomly drawing from a Binomial distribution with probabilities given by (3). This yields the predicted probabilities for all 20 villages and for each of the four stratum displayed in Figure 3. The historic cohort is generated in the same fashion. Details of the villagelevel characteristics for both cohorts are provided in Appendix A.2.
The HDSS villages are selected by taking the villages with both large and large , small and small , followed by a randomly sampled third village.
A prior is used for the spatial and nonspatial random effects in the spatial models (Model IV).
2.3 Results
Table 2.3 summarizes the results of the simulation study for . Results for the smaller sample sizes are shown in Tables A.3A.3 in Appendix A.3 The number of average sampled deaths and bias, variance and MSE from (4) are displayed for each combination of sampling strategy and analytical model.
Overall, the Hyak sampling strategy captures more deaths and is generally more accurate. Across sampling schemes and sample sizes, Hyak generally has the smallest MSEs. Further examination of the components of the MSE reveals that: (i) Hyak yields smaller bias, and (ii) pays for this by sacrificing some variance. The overall comparison between the sampling strategies clearly favors Hyak. This partly reflects the careful choice of HDSS villages so that they contain substantial variation in terms of villagelevel covariates.
[H] Design Model Deaths Bias Variance () MSE () \bigstrut[b] Cluster I. Naïve 459 1,067 174 1,312 \bigstrut[t] II. Strata 459 874 188 951 III. Strata/Covariates 459 651 386 810 IV. Strata/Covariates/Space 459 — — — Stratified I. Naïve 460 1,058 5 1,124 \bigstrut[t] II. Strata 460 866 15 765 III. Strata/Covariates 460 651 16 439 IV. Strata/Covariates/Space 460 183 80 113 Hyak I. Naïve 538 1,162 7 1,357 \bigstrut[t] II. Strata 538 969 18 956 III. Strata/Covariates 538 635 16 419 IV. Strata/Covariates/Space 538 182 66 100 Optimum I. Naïve 477 1,072 5 1,154 II. Strata 477 880 18 792 III. Strata/Covariates 477 632 17 416 IV. Strata/Covariates/Space 477 167 74 102 \bigstrut[b]
Comparing the analytical models also produces an encouraging result. Within each sampling strategy, the logistic regression random effects covariate model (model IV) performs best overall (smaller MSEs). Within Hyak, this outperforms the others. Similar patterns are observed across all sample sizes. This suggests that accounting for unmeasured factors and taking advantage of the spatial structure of mortality risk is significantly worthwhile.
The tradeoff between bias and variance is clearly revealed by a closer look at the distributions of the estimated probability of dying produced by each model. Figure 4 displays these distributions for models I, III & IV – Naïve, Covariates and Covariates & Space under the Hyak sampling strategy for , while figures A.1A.3 in Appendix A.3 display these same distributions for and , respectively. The Naïve model estimates are very condensed, always miss the truth and have clear bias; estimates from the Covariates model also have very little spread, almost always miss the truth and have some bias; and finally, estimates from the Covariates & Space model have large spread, however the distributions nearly always include the truth, and have much less bias. Clearly the Covariates & Space model displays the balance we are seeking: small bias and manageable spread, and importantly, distributions that include the truth. This combination of sampling strategy and analytical approach provides our key objective: an indicator that is close to (and around) the truth most of the time.
Figure 5 displays the average village and strataspecific estimates for the (unobserved) population counts of death plotted against the true values across each of the four models under the Hyak sampling scheme for , while figures A.4A.6 in Appendix A.3 display the same for the smaller sample sizes. (See figures A.7A.18 in Appendix A.3 for the remaining sampling schemes for all sample sizes.) In general, the average estimates from the spatial model tend to follow the line quite closely, indicating we are estimating the true number of deaths in each village quite well. Estimates tend to be closer under the Hyak sampling strategy and for larger sample sizes, thus confirming (visually) our previous results.
3 Discussion
3.1 Key Conclusions
The key conclusion of this pilot study is that the statistical sampling and analysis ideas supporting the Hyak monitoring system are sound: a combination of highly informative data such as are produced by a HDSS site can be used to judiciously inform sampling of a large surrounding area to yield estimated counts of deaths that are far more useful than those produced by a traditional cluster sample design. Further, Hyak combined with an analytical model that includes unstructured random effects and spatial smoothing produces the most accurate and wellbehaved estimates. The improvements are dramatic and clearly justify additional work on these ideas.
Another crucial idea underlying Hyak is the notion that very detailed information generated by an HDSS site can be extrapolated to the much larger surrounding population by calibrating that information with carefully chosen and much less detailed data from the surrounding population. This idea has already been demonstrated convincingly by Alkema et al. [2008] and is currently being applied by UNAIDS to produce global estimates of HIV prevalence. This relies on the assumption that the population monitored by the HDSS is similar enough to the population surrounding the HDSS that the relationships between covariates and the outcomes of interest are the same or very similar. The degree to which this is true will vary among specific settings. In particular, when HDSS sites also serve as research and intervention testing sites, it is possible that there will be Hawthorne Effect issues – i.e. the intensively studied HDSS population will be different from the surrounding population that has not participated in studies and trials. This may affect the key covariateoutcome relationships that drive Hyak. This is something that must be studied, initially with a realworld pilot study of Hyak, and then in an ongoing way by occasionally verifying these relationships through an oversample of the surrounding population, or through small addon studies conducted whenever a census is done in the surrounding areas to update the sampling frame. Although this is a concern, it is unlikely to make Hyak infeasible or invalidate Hyak results. An explicit goal of a pilot study will be to characterize the uncertainty created by possible Hawthorne Effect issues and build them into Hyak estimates.
A key advantage of Hyak sampling strategy is that it captures significantly more deaths. Verbal autopsy methods (Lopez et al., 2011) can be applied to all or a fraction of these deaths to assign causes (immediate, contributing, etc.). This cause of death information can then be used to construct distributions of deaths by cause – CSMFs – which illuminate the epidemiological regime affecting the population, and if this is monitored through time, how the epidemiology of the population is changing. Critically, this provides a means of measuring the impact of interventions on specific causes of death and the distribution of deaths over time. The increased number of deaths captured with informed sampling increases the accuracy and precision of measurements of CSMFs.
A final benefit of the Hyak system is that it provides two types of infrastructure: the HDSS and the sample survey. In addition to providing information with which to sample, the HDSS provides a platform on which a wide variety of longitudinal studies can be undertaken – linked observational studies; randomized, controlled trials, all kinds of combinations of these, etc. Moreover, the permanent HDSS infrastructure also provides a training platform that can support a wide variety of health and behavioral science training, mentoring and apprenticing/interning and experience for young scientists or health professionals. Having the sample survey infrastructure provides a means of quickly validating/calibrating studies conducted by the HDSS and provides another learning dimension for the educational and training activities that the system can support.
A potential limitation of any mortality monitoring system is ‘demographic feasibility’, that is the ability to capture enough deaths in a given population to measure levels and/or changes in mortality, potentially by cause, through time. Death is a binomial process defined by a probability of dying, and as such, is governed by the characteristics of the binomial model. That model specifies in simple terms the number of deaths necessary to estimate the probability of dying within a given margin of error with a given level of confidence. No amount of sophistication will release us from that basic set of facts. The Hyak system addresses this challenge by providing a means through which to choose the best possible sample given what we know about the population, and this in turn maximizes our ability to capture deaths. The fundamentals of the binomial model require that one must observe relatively large numbers of deaths to measure mortality precisely and especially to measure changes in mortality with both precision and confidence. So in light of those inescapable realities, the Hyak system produces the most information per dollar spent, because it captures more deaths per dollar spent.
Finally and perhaps most importantly, the Hyak monitoring system is cheaper to run over a period of years compared to traditional cluster samplebased survey methods. Combined with the fact that Hyak also produces more useful information, this makes Hyak highly cost effective – more bang for less buck.
Importantly, there are implementation considerations that must be addressed before Hyak can be used at provincial or national scale to provide populationrepresentative estimates. These will need to be resolved through additional theoretical work, simulation, and ultimately through a pilot study that conducts Hyak on a large population dispersed over a large physical space. Among many, these critical questions need to be answered:

How big do HDSS sites need to be to provide enough information for effective informative sampling?

How many HDSS sites are necessary for effective informed sampling with respect to key demographic and epidemiological indicators?

How should HDSS sites be dispersed geographically?

How well does Hyak work to provide disaggregated (finegrained) estimates of key indicators by sex, age, wealth/poverty, space, time, etc?

How much does the sampling frame affect Hyak results, and what cheap, feasible solutions are there to obtaining frequently updated sampling frames?

A detailed costing and cost comparison needs to be done comparing the costs of the the HDSS site; the additional census, sampling, and interviewing needed for Hyak; and a traditional household multistage cluster sample survey (like DHS) conducted in the same area.

How the method can be scaled up to a larger geographical area. We envisage that only a subset of villages will be sampled, and then a geostatistical model (Wakefield et al., 2016) can be used for spatial prediction to unobserved villages (a critical question is the number of villages needed to train the spatial model). Another important issue is to deal with the potential problem of preferential sampling (Diggle et al., 2010) in which sampling locations are selected based on the expected size of the response. In order to inform sampling historical data (for example, DHS surveys) may be used to model to create a predictive surface, upon which sampling may be based. Investigating this idea will be the subject of a future paper.
References
 AbouZahr et al. (2007) AbouZahr, C., Cleland, J., Coullare, F., Macfarlane, S. B., Notzon, F. C., Setel, P., Szreter, S., Anderson, R. N., Bawah, A. a., Betrán, A. P., Binka, F., Bundhamcharoen, K., Castro, R., Evans, T., Figueroa, X. C., George, C. K., Gollogly, L., Gonzalez, R., Grzebien, D. R., Hill, K., Huang, Z., Hull, T. H., Inoue, M., Jakob, R., Jha, P., Jiang, Y., Laurenti, R., Li, X., Lievesley, D., Lopez, A. D., Fat, D. M., Merialdi, M., Mikkelsen, L., Nien, J. K., Rao, C., Rao, K., Sankoh, O., Shibuya, K., Soleman, N., Stout, S., Tangcharoensathien, V., van der Maas, P. J., Wu, F., Yang, G., and Zhang, S. (2007). The way forward. Lancet, 370(9601):1791–9.
 AbouZahr et al. (2015a) AbouZahr, C., De Savigny, D., Mikkelsen, L., Setel, P. W., Lozano, R., and Lopez, A. D. (2015a). Towards universal civil registration and vital statistics systems: the time is now. The Lancet, 386(10001):1407–1418.
 AbouZahr et al. (2015b) AbouZahr, C., De Savigny, D., Mikkelsen, L., Setel, P. W., Lozano, R., Nichols, E., Notzon, F., and Lopez, A. D. (2015b). Civil registration and vital statistics: progress in the data revolution for counting and accountability. The Lancet, 386(10001):1373–1385.
 Abouzahr et al. (2010) Abouzahr, C., Gollogly, L., and Stevens, G. (2010). Better data needed: everyone agrees, but no one wants to pay. Lancet, 375(9715):619–21.
 Alkema et al. (2008) Alkema, L., Raftery, A., and Brown, T. (2008). Bayesian melding for estimating uncertainty in national hiv prevalence estimates. Sexually transmitted infections, 84(Suppl 1):i11–i16.
 Alkema et al. (2007) Alkema, L., Raftery, A., and Clark, S. (2007). Probabilistic projections of hiv prevalence using bayesian melding. The Annals of Applied Statistics, 1(1):229–248.
 Bchir et al. (2006) Bchir, A., Bhutta, Z., Binka, F., Black, R., Bradshaw, D., Garnett, G., Hayashi, K., Jha, P., Peto, R., Sawyer, C., Schwartländer, B., Walker, N., Wolfson, M., Yach, D., and Zaba, B. (2006). Better health statistics are possible. Lancet, 367(9506):190–3.
 Besag et al. (1991) Besag, J., York, J., and Molliè, A. (1991). Bayesian image restoration, with two applications in spatial statistics (with discussion). Annals of the Institute of Statistical Mathematics, 43:1–59.
 Boerma and Stansfield (2007) Boerma, J. T. and Stansfield, S. K. (2007). Health Statistics 1 Health statistics now : are we making the right investments ? Tuberculosis, 369(9563):779–786.
 Bryce and Steketee (2010) Bryce, J. and Steketee, R. (2010). Continuous surveys and quality management in lowincome countries: a good idea. The American journal of tropical medicine and hygiene, 82(2):360; author reply 361–2.
 Bryce et al. (2004) Bryce, J., Victora, C. G., Habicht, J.P., Vaughan, J. P., and Black, R. E. (2004). The multicountry evaluation of the integrated management of childhood illness strategy: lessons for the evaluation of public health interventions. American journal of public health, 94(3):406–15.
 Byass et al. (2011) Byass, P., Sankoh, O., Tollman, S. M., Högberg, U., and Wall, S. (2011). Lessons from history for designing and validating epidemiological surveillance in uncounted populations. PloS one, 6(8):e22897.
 Chipeta et al. (2016) Chipeta, M. G., Terlouw, D. J., Phiri, K. S., and Diggle, P. J. (2016). Adaptive geostatistical design and analysis for prevalence surveys. Spatial Statistics, 15:70–84.
 Clark et al. (2015) Clark, S. J., GómezOlivé, F. X., Houle, B., Thorogood, M., KlipsteinGrobusch, K., Angotti, N., Kabudula, C., Williams, J., Menken, J., and Tollman, S. (2015). Cardiometabolic disease risk and hiv status in rural south africa: establishing a baseline. BMC public health, 15(1):135.
 Clark et al. (2013) Clark, S. J., Kahn, K., Houle, B., Arteche, A., Collinson, M. A., Tollman, S. M., and Stein, A. (2013). Young children’s probability of dying before and after their mother’s death: a rural south african populationbased surveillance study. PLoS Med, 10(3):e1001409.
 Commission on Population and Development (2016) Commission on Population and Development (2016). Strengthening the demographic evidence base for the post2015 development agenda: Report of the secretarygeneral. Technical report, United Nations.
 Data Revolution Group: The UN Secretary General’s Independent Expert Advisory Group on a Data Revolution for Sustainable Development (2014) Data Revolution Group: The UN Secretary General’s Independent Expert Advisory Group on a Data Revolution for Sustainable Development (2014). A world that counts: Mobilising the data revolution for sustainable development. Technical report.
 Denison and Holmes (2001) Denison, D. and Holmes, C. (2001). Bayesian partitioning for estimating disease risk. Biometrics, 57(1):143–149.
 Diggle et al. (2010) Diggle, P. J., Menezes, R., and Su, T.L. (2010). Geostatistical inference under preferential sampling. Journal of the Royal Statistical Society: Series C (Applied Statistics), 59:191–232.
 Diggle et al. (1998) Diggle, P. J., Tawn, J. A., and Moyeed, R. A. (1998). Modelbased geostatistics (with discussion). Applied Statistics, 47:299–350.
 Gething et al. (2011) Gething, P. W., Patil, A. P., Smith, D. L., Guerra, C. A., Elyazar, I. R., Johnston, G. L., Tatem, A. J., and Hay, S. I. (2011). A new world malaria map: Plasmodium falciparum endemicity in 2010. Malaria journal, 10(1):378.
 GómezOlivé et al. (2013) GómezOlivé, F. X., Angotti, N., Houle, B., KlipsteinGrobusch, K., Kabudula, C., Menken, J., Williams, J., Tollman, S., and Clark, S. J. (2013). Prevalence of hiv among those 15 and older in rural south africa. AIDS care, 25(9):1122–1128.
 Hill et al. (2007) Hill, K., Lopez, A. D., Shibuya, K., Jha, P., AbouZahr, C., Anderson, R. N., Bawah, A. a., Betrán, A. P., Binka, F., Bundhamcharoen, K., Castro, R., Cleland, J., Coullare, F., Evans, T., Carrasco Figueroa, X., George, C. K., Gollogly, L., Gonzalez, R., Grzebien, D. R., Huang, Z., Hull, T. H., Inoue, M., Jakob, R., Jiang, Y., Laurenti, R., Li, X., Lievesley, D., Fat, D. M., Macfarlane, S., Mahapatra, P., Merialdi, M., Mikkelsen, L., Nien, J. K., Notzon, F. C., Rao, C., Rao, K., Sankoh, O., Setel, P. W., Soleman, N., Stout, S., Szreter, S., Tangcharoensathien, V., van der Maas, P. J., Wu, F., Yang, G., Zhang, S., and Zhou, M. (2007). Interim measures for meeting needs for health sector data: births, deaths, and causes of death. Lancet, 370(9600):1726–35.
 Horton (2007) Horton, R. (2007). Counting for health. Lancet, 370(9598):1526.
 Houle et al. (2014) Houle, B., Clark, S. J., GómezOlivé, F. X., Kahn, K., and Tollman, S. M. (2014). The unfolding countertransition in rural south africa: mortality and cause of death, 1994–2009. PLoS One, 9(6):e100420.
 Houle et al. (2013) Houle, B., Stein, A., Kahn, K., Madhavan, S., Collinson, M., Tollman, S. M., and Clark, S. J. (2013). Household context and child mortality in rural south africa: the effects of birth spacing, shared mortality, household composition and socioeconomic status. International journal of epidemiology, 42(5):1444–1454.
 Jha (2012) Jha, P. (2012). Counting the dead is one of the world’s best investments to reduce premature mortality. Hypothesis, 10(1).
 Jha et al. (2006) Jha, P., Gajalakshmi, V., Gupta, P. C., Kumar, R., Mony, P., Dhingra, N., and Peto, R. (2006). Prospective study of one million deaths in India: rationale, design, and validation results. PLoS medicine, 3(2):e18.
 Kabaghe et al. (2017) Kabaghe, A. N., Chipeta, M. G., McCann, R. S., Phiri, K. S., Van Vugt, M., Takken, W., Diggle, P., and Terlouw, A. D. (2017). Adaptive geostatistical sampling enables efficient identification of malaria hotspots in repeated crosssectional surveys in rural malawi. PLoS One, 12(2):e0172266.
 Kabudula et al. (2016) Kabudula, C. W., Houle, B., Collinson, M. A., Kahn, K., Tollman, S., and Clark, S. (2016). Assessing changes in household socioeconomic status in rural south africa, 2001–2013: a distributional analysis using household asset indicators. Social Indicators Research, pages 1–27.
 Kahn et al. (2012) Kahn, K., Collinson, M., GómezOlivé, F., Mokoena, O., Twine, R., Mee, P., Afolabi, S., Clark, B., Kabudula, C., Khosa, A., et al. (2012). Profile: Agincourt health and sociodemographic surveillance system. International Journal of Epidemiology, 41(4):988–1001.
 Kahn et al. (2007) Kahn, K., Tollman, S., Collinson, M., Clark, S., Twine, R., Clark, B., Shabangu, M., GómezOlivé, F., Mokoena, O., and Garenne, M. (2007). Research into health, population and social transitions in rural south africa: Data and methods of the agincourt health and demographic surveillance system1. Scandinavian Journal of Public Health, 35(69 suppl):8–20.
 Lanjouw and Ivaschenko (2010) Lanjouw, P. and Ivaschenko, O. (2010). A new approach to producing geographic profiles of hiv prevalence. Technical Report 5207, World Bank  Policy Research Working Paper Series.
 Lopez et al. (2011) Lopez, A. D., Lozano, R., Murray, C. J., and Shibuya, K. (2011). Verbal autopsy: innovations, applications, opportunities  improving cause of death measurement (article collection). Population Health Metrics, 9.
 Mahapatra et al. (2007) Mahapatra, P., Shibuya, K., Lopez, A. D., Coullare, F., Notzon, F. C., Rao, C., and Szreter, S. (2007). Civil registration systems and vital statistics: successes and missed opportunities. The Lancet, 370(9599):1653–1663.
 Mathers et al. (2009) Mathers, C. D., Boerma, T., and Ma Fat, D. (2009). Global and regional causes of death. British medical bulletin, 92:7–32.
 Mathers et al. (2005) Mathers, C. D., Fat, D. M., Inoue, M., Rao, C., and Lopez, A. D. (2005). Counting the dead and what they died from: an assessment of the global status of cause of death data. Bulletin of the World Health Organization, 83(3):171–7.
 Measure DHS (2012) Measure DHS (2012). Demographic and Health Surveys. http://www.measuredhs.com.
 MEASURE Evaluation (2012) MEASURE Evaluation (2012). SAVVY: Sample Vital Registration with Verbal Autopsy. http://www.cpc.unc.edu/measure/tools/monitoringevaluationsystems/savvy.
 Mikkelsen et al. (2015) Mikkelsen, L., Phillips, D. E., AbouZahr, C., Setel, P. W., De Savigny, D., Lozano, R., and Lopez, A. D. (2015). A global assessment of civil registration and vital statistics systems: monitoring data quality and progress. The Lancet, 386(10001):1395–1406.
 Naghavi et al. (2015) Naghavi, M., Wang, H., Lozano, R., Davis, A., Liang, X., Zhou, M., Vollset, S. E., Ozgoren, A. A., Abdalla, S., AbdAllah, F., et al. (2015). Global, regional, and national agesex specific allcause and causespecific mortality for 240 causes of death, 19902013: a systematic analysis for the global burden of disease study 2013. Lancet, 385(9963):117–171.
 Office of the Registrar General & Census Commissioner, India (2012) Office of the Registrar General & Census Commissioner, India (2012). India’s Sample Registration System. http://censusindia.gov.in/Vital_Statistics/SRS/Sample_Registration_System.aspx.
 Phillips et al. (2015) Phillips, D. E., AbouZahr, C., Lopez, A. D., Mikkelsen, L., De Savigny, D., Lozano, R., Wilmoth, J., and Setel, P. W. (2015). Are well functioning civil registration and vital statistics systems associated with better health outcomes? The Lancet, 386(10001):1386–1394.
 Rowe (2009) Rowe, A. K. (2009). Potential of integrated continuous surveys and quality management to support monitoring, evaluation, and the scaleup of health interventions in developing countries. The American journal of tropical medicine and hygiene, 80(6):971–9.
 Rudan et al. (2000) Rudan, I., Lawn, J., Cousens, S., Rowe, A. K., BoschiPinto, C., Tomasković, L., Mendoza, W., Lanata, C. F., RocaFeltrer, A., Carneiro, I., Schellenberg, J. a., Polasek, O., Weber, M., Bryce, J., Morris, S. S., Black, R. E., and Campbell, H. (2000). Gaps in policyrelevant information on burden of disease in children: a systematic review. Lancet, 365(9476):2031–40.
 Rue et al. (2009) Rue, H., Martino, S., and Chopin, N. (2009). Approximate bayesian inference for latent gaussian models by using integrated nested laplace approximations. Journal of the royal statistical society: Series b (statistical methodology), 71(2):319–392.
 Setel et al. (2007) Setel, P. W., Macfarlane, S. B., Szreter, S., Mikkelsen, L., Jha, P., Stout, S., and AbouZahr, C. (2007). A scandal of invisibility: making everyone count by counting everyone. Lancet, 370(9598):1569–77.
 Thompson and Seber (1996) Thompson, S. K. and Seber, G. A. (1996). Adaptive sampling. Wiley.
 UNICEF  Statistics and Monitoring (2012) UNICEF  Statistics and Monitoring (2012). Multiple Indicator Cluster Surveys (MICS). http://www.unicef.org/statistics/index_24302.html.
 United Nations (2014a) United Nations (2014a). Data Revolution for Sustainable Development. http://www.un.org/apps/news/story.asp?NewsID=48594#.VEVQpoctuvJ.
 United Nations (2014b) United Nations (2014b). Sustainable Development Goals. http://sustainabledevelopment.un.org/owg.html.
 United Nations (2016) United Nations (2016). Resolution 2016/1: Strengthening the demographic evidence base for the 2030 agenda for sustainable development. http://undocs.org/E/CN.9/2016/1.
 United Nations. Statistical Division (2014) United Nations. Statistical Division (2014). Principles and Recommendations for a Vital Statistics System. United Nations Department of Economic and Social Affairs, New York, 3 edition.
 Victora et al. (2011) Victora, C. G., Black, R. E., Boerma, J. T., and Bryce, J. (2011). Measuring impact in the Millennium Development Goal era and beyond: a new approach to largescale effectiveness evaluations. Lancet, 377(9759):85–95.
 Wakefield et al. (2016) Wakefield, J., Simpson, D., and Godwin, J. (2016). Comment: Getting into space with a weight problem. discussion of, “modelbased geostatistics for prevalence mapping in lowresource settings”, by P. J. Diggle and E. Giorgi. Journal of the American Statistical Association, 111:1111–1119.
 World Health Organization (2013a) World Health Organization (2013a). Strengthening civil registration and vital statistics for births, deaths and causes of death: resource kit. Technical report, World Health Organization.
 World Health Organization (2013b) World Health Organization (2013b). Strengthening civil registration and vital statistics systems through innovative approaches in the health sector: Guiding principles and good practices. Technical report, World Health Organization.
 World Health Organization (2014) World Health Organization (2014). Improving mortality statistics through civil registration and vital statistics systems: Strategies for country and partner support. Technical report, World Health Organization.
 Ye et al. (2012) Ye, Y., Wamukoya, M., Ezeh, A., Emina, J., and Sankoh, O. (2012). Health and demographic surveillance systems: a step towards full civil registration and vital statistics system in subSahara Africa? BMC public health, 12(1):741.
Appendix A Appendix
a.1 Optimum allocation sampling strategy details
Suppose we have stratum, indexed by . In our case the strata are areas. Let be the population of area and the total population in the study region.
Let be the indicator of whether child in area died, , . Then we are interested in , the total number of deaths. The fraction of deaths is .
Let and be the standard deviation of the response in stratum where
which is estimated by
If we use the usual estimator of then the variance is
where , which leads to
Substituting in gives the estimated variances.
We wish to choose , the number of samples to take in area .
Then the optimum allocation, in the sense of minimizing (which is the same as minimizing the variance of ) is Neyman allocation in which
(5) 
Note: we really should be minimizing MSE as our estimators are biased (since they are random effects models with shrinkage).
a.2 Villagelevel characteristics for the current and historic cohorts
Tables A.1 and A.2 display the village characteristics for both the currentday and historical cohorts. The currentday cohort is the fixed population from which we draw repeated samples, while the historical cohort is used by the Hyak and optimum sampling schemes to obtain estimated villagelevel probabilities of death. In our simulation, we used villages 4, 7 and 8 as the HDSS sites.
Village  Number of Households  Number of Children  # Deaths  P(Death)  

1  4221  12523  1654  0.13  0.56  0.70 
2  1376  4150  119  0.03  0.92  0.32 
3  3050  9172  169  0.02  0.89  0.55 
4  3804  11331  483  0.04  0.92  0.56 
5  1275  3802  492  0.13  0.39  0.68 
6  1515  4550  156  0.03  0.58  0.17 
7  3036  9011  929  0.10  0.77  0.98 
8  2648  7870  554  0.07  0.32  0.07 
9  1957  5841  658  0.11  0.55  0.83 
10  3532  10630  500  0.05  0.57  0.47 
11  2679  7981  1286  0.16  0.10  0.60 
12  2034  6043  413  0.07  0.05  0.83 
13  2082  6291  218  0.03  0.73  0.17 
14  3320  9901  939  0.09  0.76  0.96 
15  2466  7361  196  0.03  0.53  0.51 
16  2467  7301  531  0.07  0.66  0.44 
17  709  2092  230  0.11  0.04  0.51 
18  1192  3610  725  0.20  0.02  0.76 
19  3083  9300  600  0.06  0.62  0.27 
20  836  2482  447  0.18  0.09  0.97 
Village  Number of Households  Number of Children  # Deaths  P(Death)  

1  1460  4331  587  0.14  0.56  0.70 
2  4064  12001  331  0.03  0.92  0.32 
3  524  1552  33  0.02  0.89  0.55 
4  2927  8720  377  0.04  0.92  0.56 
5  4022  11891  1499  0.13  0.39  0.68 
6  4157  12450  393  0.03  0.58  0.17 
7  2873  8532  919  0.11  0.77  0.98 
8  1529  4540  322  0.07  0.32  0.07 
9  4108  12152  1292  0.11  0.55  0.83 
10  1570  4640  231  0.05  0.57  0.47 
11  2789  8342  1444  0.17  0.10  0.60 
12  3685  10931  693  0.06  0.05  0.83 
13  1786  5242  165  0.03  0.73  0.17 
14  674  2070  187  0.09  0.76  0.96 
15  473  1402  31  0.02  0.53  0.51 
16  3187  9550  735  0.08  0.66  0.44 
17  4344  13080  1329  0.10  0.04  0.51 
18  3449  10302  2058  0.20  0.02  0.76 
19  3080  9191  666  0.07  0.62  0.27 
20  468  1422  286  0.20  0.09  0.97 
a.3 Additional simulation results
Tables A.3, A.3, and A.3 summarize the results of the simulation study for and , respectively. The number of average sampled deaths and bias, variance and MSE from (4) are displayed for each combination of sampling strategy and analytical model.
[h] Design Model Deaths Bias Variance () MSE () \bigstrut[b] Cluster I. Naïve 342 1,072 192 1,342 \bigstrut[t] II. Strata 342 878 207 977 III. Strata/Covariates 342 644 775 1,190 IV. Strata/Covariates/Space 342 — — — Stratified I. Naïve 344 1,066 9 1,145 \bigstrut[t] II. Strata 344 871 26 785 III. Strata/Covariates 344 660 25 460 IV. Strata/Covariates/Space 344 225 99 150 Hyak I. Naïve 409 1,181 8 1,402 \bigstrut[t] II. Strata 409 982 25 988 III. Strata/Covariates 409 640 22 431 IV. Strata/Covariates/Space 409 188 92 128 Optimum I. Naïve 356 1,079 7 1,171 II. Strata 356 885 23 806 III. Strata/Covariates 356 642 23 436 IV. Strata/Covariates/Space 356 194 85 123 \bigstrut[b]
[h] Design Model Deaths Bias Variance () MSE () \bigstrut[b] Cluster I. Naïve 250 1,075 170 1,326 \bigstrut[t] II. Strata 250 881 190 966 III. Strata/Covariates 250 659 382 816 IV. Strata/Covariates/Space 250 — — — Stratified I. Naïve 256 1,075 11 1,166 \bigstrut[t] II. Strata 256 879 30 802 III. Strata/Covariates 256 664 27 468 IV. Strata/Covariates/Space 256 248 123 185 Hyak I. Naïve 302 1,193 15 1,439 \bigstrut[t] II. Strata 302 992 41 1,025 III. Strata/Covariates 302 646 30 448 IV. Strata/Covariates/Space 302 209 109 152 Optimum I. Naïve 264 1,090 10 1,198 II. Strata 264 893 31 829 III. Strata/Covariates 264 646 29 446 IV. Strata/Covariates/Space 264 223 109 159 \bigstrut[b]
[h] Design Model Deaths Bias Variance () MSE () \bigstrut[b] Cluster I. Naïve 113 1,079 193 1,358 \bigstrut[t] II. Strata 113 886 241 1,025 III. Strata/Covariates 113 662 1,252 1,690 IV. Strata/Covariates/Space 113 — — — Stratified I. Naïve 119 1,088 23 1,205 \bigstrut[t] II. Strata 119 895 62 863 III. Strata/Covariates 119 662 60 499 IV. Strata/Covariates/Space 119 325 196 301 Hyak I. Naïve 138 1,193 24 1,447 \bigstrut[t] II. Strata 138 1,001 70 1,071 III. Strata/Covariates 138 655 61 491 IV. Strata/Covariates/Space 138 309 175 271 Optimum I. Naïve 122 1,100 27 1,238 II. Strata 122 902 78 891 III. Strata/Covariates 122 658 68 500 IV. Strata/Covariates/Space 122 306 203 297 \bigstrut[b]
Figures A.1A.3 display the distributions of the estimated probability of dying produced by each model (models I, III & IV – Naïve, Covariates and Covariates & Space) under the Hyak sampling strategy for and , respectively.
Figures A.4A.6 display the average village and strataspecific estimates for the (unobserved) population counts of death plotted against the true values across each of the four models under the Hyak sampling scheme for and , respectively.
Figures A.7A.10 display the average village and strataspecific estimates for the (unobserved) population counts of death plotted against the true values across each of the four models under the twostage cluster sampling scheme for and , respectively.