A Millennium Bug Still Bites Public Health
– An Illustration Using Cancer Mortality
^{0}^{0}footnotetext: Corresponding authors: Wenjiang J. Fu, Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan 48824 Tel: (517) 353 - 8623 ext 113. Email: fuw@msu.edu; and
Shuangge Ma, Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut 06520 Tel: (203) 785-3119. Email: shuangge.ma@yale.edu.
ABSTRACT
Accurate estimation of cancer mortality rates and the comparison across cancer sites, populations
or time periods is crucial to public health, as identification of vulnerable groups who suffer the most from these diseases may lead to efficient cancer care and control with timely treatment. Because cancer mortality rate varies with age, comparisons require age–standardization using a reference population. The current method of using the Year 2000 Population Standard is standard practice, but serious concerns have been raised about its lack of justification. We have found that using the US Year 2000 Population Standard as reference overestimates prostate cancer mortality rates by 12–91% during the period 1970–2009 across all six sampled U.S. states, and also underestimates case fatality rates by 9–78% across six cancer sites, including female breast, cervix, prostate, lung, leukemia and colon-rectum. We develop a mean reference population method to minimize the bias using mathematical optimization theory and statistical modeling. The method corrects the bias to the largest extent in terms of squared loss and can be applied broadly to studies of many diseases.
Keywords: Age-standardization; Bias; Crude rate; Optimization
Introduction
Cancer is one of the leading causes of death in the United States and a major public health concern [1-5]. Cancer mortality rates are often reported in age-specific groups, making it difficult to extrapolate the overall mortality assessment or generate comparisons across populations [6]. Researchers often calculate a summary rate, such as the crude rate, which is an average of the age-specific mortality rates weighted to the proportions of age groups in the population. Such summary rates depend on the age-specific mortality rates and the population’s age structure, where the latter may vary largely and cause unfair comparison among populations, presenting numerical illusion of large differences in the summary rate even as the age-specific mortality rates remain the same [7]. This discrepancy motivated the direct age-standardization procedure for comparing mortality rates across populations [7-9] using a standard population as reference, such as the US Year 2000 Population Standard in current practice.
The age-standardization method calculates an age-adjusted rate using the age structure of a standard population to compare disease rates across populations or time periods. This method has been extensively adopted by the United States and world health agencies [4-5,10-12] following a memorandum from the Secretary of the U.S. Department of Health and Human Services in 1998 [13] mandating the use of the US Year 2000 Population Standard to calculate age-adjusted mortality rate starting 1999 for more consistent reporting of mortality rate [14]. Accordingly, the US Year 2000 Population Standard or the Year 2000 World Standard Population has been used as reference in public health reports by many U.S. states [15-20] and worldwide agencies [12].
Although age-standardization provides a way to compare disease rate among populations and has acknowledged merits [6-7,10-12], the current approach of using standard reference population also has caveats. It has been noted that choosing different reference populations may change the age-adjusted mortality rates dramatically and may also alter the ranking [7, 21]. As a result, the selection of standard population is still in debate [7, 12]. On one hand, selecting the Year 2000 Population Standard keeps the mortality rate adjustment consistent with a contemporary reference population [13, 14], making the comparison procedure easy to follow with uniformity [21]. On the other hand, a study by the World Health Organization (WHO) pointed out “There is clearly no conceptual justification for choosing one standard over another, hence the choice is arbitrary” [12]. Further, a health disparities study attributed declining racial/ethnic and socioeconomic inequalities in health to the change of the reference population from Year 1940, 1970 to Year 2000, a “statistical illusion” due to the use of the Year 2000 Population Standard [23]. This illusive effect of the Y2K or millennium bug is not the result of a technical problem as in the computer programming case, but rather of a more difficult methodological one that requires theoretical research in quantitative science. The change of reference from US Year 1940 Population to US Year 2000 Population Standard may cause age-adjusted mortality rate to increase largely [14], sometimes even more than doubling in size [24]. The fact that the conclusion of mortality rate comparison depends on the choice of reference population creates confusions and misinterpretation [23]. Consequently, concerns about inadequate public health policymaking on healthcare, racial/ethnic and socioeconomic inequalities that result from improper comparisons need to be addressed urgently.
In this study, we investigated the issue of reference population selection using the Surveillance, Epidemiology and End Results (SEER) database [25]. We analyzed prostate cancer mortality rates in six U.S. states (California, Massachusetts, Michigan, Missouri, New Jersey and New York) from 1970 to 2009 and also examined the U.S. case fatality rates in 2008 of six sites: female breast, cervix, prostate, lung, leukemia, and colon-rectum. We found that the age-standardization procedure using the US Year 2000 Population Standard as reference overestimated prostate cancer mortality rates in all six states and underestimated case fatality rates of all six cancer sites. This finding clearly indicates that bias has been introduced by the age-standardization procedure. To minimize the bias, we developed a mean reference population method to compare different populations. This method possesses a number of advantageous properties. First, the mean reference population minimizes the overall squared bias among all convex linear combinations. Second, by construction the mean reference population resembles the age profiles of all populations in comparison and thus represents them accurately while a standard population may present a largely different profile. Third, the existence and uniqueness of such a mean reference population is guaranteed by the mathematical optimization theory and can be computed using a mathematical quadratic programming method or a statistical sampling method. Fourth, the mean reference population method does not depend on the specifics of cancer mortality and can thus be applied to studies of incidence and mortality rates of other diseases. We show that the mean reference population method reduced to a large extent the overall bias associated with the use of the US Year 2000 Population Standard in the age-standardization procedure and yielded cancer mortality rate close to the crude rate.
Results
By definition, the crude rate calculated with the total probability rule provides an unbiased estimate of the mortality rate [26]. To achieve fair comparison, however, a reference population is needed to remove artifacts due to varying age structure and bias is inevitably introduced by a reference population. See Material and Methods for more details and the definitions of various rates.
Compared to the crude rate, the age-adjusted rate using the US Year 2000 Population Standard as reference overestimated prostate cancer mortality rate by 12% to 91% in all eight periods studied during 1970–2009, consistently in all six states (Tables 1). The cumulative rate, which takes the sum of the age-specific mortality rates from age 40 to 79 of each population [6], yielded relative deviation of 1227% to 3145%, or 13 to 32 times that of the crude rate. In contrast, the mean reference population yielded age-adjusted rates much closer to the crude rates, with relative deviation between -25% and 27% and no systematic deviation towards either overestimation or underestimation. Fig. 1 illustrates the comparison in the State of California. Similar patterns were observed in the other five states (Fig. S1 in Supplementary Material). Due to its large scale, the cumulative rate is not shown.
Table 2 compares the observed number of deaths with the expected one in each population by the crude rate, the age-adjusted rates using the US Year 2000 Population Standard and using the mean reference population. The crude rate yielded exactly the same number of deaths as the observed, indicating unbiased estimation, whereas the age-adjusted rate using the US Year 2000 Population Standard yielded much larger number of deaths than the observed, indicating biased estimation by age-adjusted rate. Overall, the mean reference population yielded a more accurate estimation of the number of deaths than the US Year 2000 Population Standard. The expected number of deaths by the cumulative rate was not calculated because the cumulative rate is not a weighted average of the age-specific rates and thus is not a rate in a strict sense.
The age-adjusted rate using the US Year 2000 Population Standard as reference underestimated the case fatality rate of all six cancer sites, as shown in Table 3. The relative deviations were -78%, -52%, -42%, -39%, -32% and -9% for cancer of prostate, lung, cervix, leukemia, female breast, and colon–rectum, respectively, indicating underestimation consistently across cancer sites. This discrepancy altered the ranking between leukemia and colon-rectum cancer. The cumulative rate overestimated the case fatality rate by 284–755%. In contrast, the mean reference population yielded estimate much closer to the crude rate, with a mean deviation equal to 403.4, 90% less than that by the US Year 2000 Population Standard (4305.1) and 99% less than that by the cumulative rate (53351.3). Fig. 2 illustrates the comparison, again with the cumulative rate not shown because of its large scale.
We then explored possible explanations for the large contrast of the age-adjusted rates between the two references of the US Year 2000 Population Standard and the mean reference population. We first examined the age profile of the reference populations using the case fatality study data and compared them to those of the six cancer patient populations. As shown in Fig. 3, five out of six cancers (except for leukemia) had virtually no patients before age 20, followed by a sharp increase between age 20, a peak between age 50 and 70 and a decline thereafter (except for colon-rectum cancer, which increased through age 85+). Mortality rates for leukemia were positive in early age, decreased slowly between age 15 and 40, and then increased thereafter. Overall, the mean reference population had an increasing trend similar to those of the six cancers, suggesting that it represented the cancer patient populations accurately. In contrast, the US Year 2000 Population Standard had an overall decreasing trend, staying large before age 20, peaking around 40 and then decreasing sharply thereafter. This sharp contrast suggests that the US Year 2000 Population Standard did not represent cancer patient populations, which explains why it yielded large deviation from the crude rate in comparing case fatality rates.
We then examined the weights of the reference populations. The mean reference population had a positive weight for each cancer site by construction and accurately represented the six cancer patient populations on the average.
where are the population proportions of female breast, cervix, prostate, lung, leukemia, and colon-rectum, respectively. We further decomposed the US Year 2000 Population Standard by the six cancer patient populations using a linear regression model with no intercept for comparison with the above .
The regression yielded three negative weights on breast, lung and colon-rectum cancers. In addition, the sum of the absolute values of all weights was 4.24, much greater than 1 as in the mean reference population. This result suggests that the US Year 2000 Population Standard was not “close” to a weighted average of the six cancer patient populations and thus was not a representative of them.
We also examined the population profile of the six states during the period 1970-2009 (Fig. S2-S7). Although the population in each state remained relatively stable, the effect of aging was observed by a shift of the peak from 1970 to 2009, indicating the change of population structure, the needs of age-standardization and the subsequent minimization of the overall bias as shown in Table 1 and Fig. 1 and S1.
Discussion
Accurate estimation of cancer mortality rate is a challenging task and has a major impact on cancer care and public health policymaking for cancer prevention, treatment and control [27-28]. The method of direct age-standardization has been studied for over a century [8]. Although concerns about the arbitrary selection of reference population are not new, they have become more urgent in recent years with the U.S. and world health agencies changing the reference population to reflect the contemporary, aging population structure. Such a change, though appealing in keeping mortality rate estimation consistent with a contemporary reference population, still lacks theoretical justification. As mentioned previously, the observed illusive effect on the declining racial/ethnic and socioeconomic inequalities raises more questions and demands renewed comparison in various aspects of disease incidence and mortality rates. The confusion caused by the selection of reference population is far from being clarified, and though a mandate of using a standard population as reference may help to streamline the age-standardization task, it may not help to adequately address the above concerns. Further theoretical research is urgently needed, more so than ever before.
We have demonstrated that age-standardization using the US Year 2000 Population Standard overestimated prostate cancer mortality rate by up to 91% and underestimated case fatality rate by up to 78%. Such large bias may result in confusion and misinterpretation of cancer mortality. For example, prostate cancer mortality may be misinterpreted as much higher than it actually was in all six states, and lung cancer case fatality rate may be misinterpreted as less than 10,000 per 100,000 person-year, while the actual rate was more than doubled ( per 100,000 person-year). Our observation is consistent with the concerns raised in the literature [7, 21, 22].
Since the age-adjusted rate using the US Year 2000 Population Standard has been widely used in epidemiological studies and public heath reports, it has been regarded as the standard approach to comparing disease rates across populations. Many public health reports use it to generate disease rate estimation while acknowledging that the crude rate yields poor comparison with potential bias. For the first time, our study points out that the crude rate is unbiased and age-adjusted rates are biased. The use of a standard population in the direct age-standardization introduces bias and results in confusion and misinterpretation, as shown in Tables 1 and 2. The merit of the age-standardization is that it provides an equal footing for comparing disease rates among populations with different age structure, eliminating artifacts introduced by different population structure. Furthermore, we show in this paper that as long as one uses a common age structure to calculate age-adjusted rates, such equal footing is guaranteed. However, equal footing does not necessarily yield fair comparison because an age-structure may be in favor of one population over others. Hence, an equal footing age-structure may not be used as the only criterion for the fairness of comparison. This issue motivated us to search for a population that minimized the overall bias among all possible reference populations and led us to construct the mean reference population based on the populations in comparison. Our mean reference population method not only provided a common population structure for comparison but also minimized the overall bias.
Although the effect of age-standardization differed in the two cancer mortality studies, overestimation of mortality rate and underestimation of case fatality rate, both showed a consistent lack of calibration by age-adjusted rate, indicating the need for improvement. Our mean reference population method minimizes the overall bias, and may also help to address the issue of arbitrary selection of reference population raised in the WHO report [12].
Although cancer case fatality rates may be inaccurate due to lead bias in cancer diagnosis, the principle of our analysis remains the same, and the underestimation of case fatality rates by the US Year 2000 Population Standard remains a valid conclusion. For example, take prostate cancer, a disease with a late onset at age 35 or older (Table S1). A major proportion (45%) of the US Year 2000 Population Standard is younger than age 35, a population in which prostate cancer rarely develops (assuming equal distribution by age between males and females, thus gender effect need not be considered). Hence, the age-standardization by the US Year 2000 Population Standard only accounts for 55% weight on death from prostate cancer, largely underestimating the case fatality rate. Similar explanation holds for other cancers.
We appreciate that although the mean reference population provides a unique reference population and minimizes the overall bias of age-adjusted rate among given populations, it does not remain the same in different studies and has to be constructed for each comparison study. This leads to technical inconvenience in comparing disease rates. To resolve this issue, we plan to provide a computer software package to implement the procedure, for which computation of optimal weights and storage of population proportions of racial and sex groups in geographic locations is inexpensive.
We conclude that direct age-standardization using a standard population may lead to inaccurate estimation and incorrect interpretation, resulting in confusions and inappropriate decision and policy making. It is hoped that the mean reference population method may lead to improved cancer patient care and efficient healthcare management. Furthermore, since the method relies on no specification of cancer mortality rates, it applies broadly to studies comparing incidence or mortality rates of a wide range of diseases in varied countries and geographic regions.
Materials and Methods
Data
Prostate cancer mortality rates and population proportions of five year age groups and five year periods during the years 1970 - 2009 were generated for six US states (California, Massachusetts, Michigan, Missouri, New Jersey, and New York), from the Surveillance, Epidemiology and End Results (SEER) database [25] using the SEER*Stat software version 8.0.4 [29]. These six states were selected because they used the age–standardization method to generate state public health reports of cancer mortality [15-20]. The SEER database consists of cancer incidence and mortality data of U.S. cancer registries in a growing number (nine and more) of metropolitan areas since the 1970s. Hence research results based on the SEER database are often interpreted as the results for the United States.
The U.S. case fatality rates in 2008 were generated using the SEER database and the Cancer Prevalence database of the NCI/NIH [30]. We first generated the U.S. cancer mortality rates for each cancer site with the SEER*Stat software, and then calculated the case fatality rates using the prevalence of each cancer site estimated by the software CanQues Version 4.2 [30]. See Supplementary Material for details.
Methods
Age-standardization for comparing mortality rate across populations Cancer mortality rate varies with age (Tables S3, S5, S7, S9, S11 and S13), and a summary rate (e.g. the crude rate) is often preferred to a sequence of age-specific rates in comparing the mortality [6-7,10-12]. The age-standardization yields an age-adjusted rate, a weighted average of the age-specific rates with selected weights ,
(1) |
Age-adjusted rates Four rates were calculated to summarize the age-specific mortality rates for comparison, the crude rate, the cumulative rate, age-adjusted rates using the US Year 2000 Population Standard and using the mean reference population. They were calculated as follows for given age-specific mortality rate , population proportion of population , and population proportion of a standard population . The crude rate of prostate cancer mortality in population in each state and each period during 1970–2009 was calculated using the total probability rule [26]
where the weights were its own population proportions . The US crude case fatality rate of each cancer site in 2008 was calculated similarly. The cumulative rate of each population was calculated to be the sum of the age–specific mortality rates from age 40-44 to age 75-79 for each state and each period [6], , where the weights for 40, 45, 75, and 0 otherwise. Each age-adjusted rate was calculated following the direct age-standardization procedure in equation (1) see [6]. The US 2000 age-adjusted rate was calculated using the US Year 2000 Population Standard proportions as weights, and the mean reference rate was calculated using a mean reference population proportion as weights, where the mean reference population was constructed using a convex linear combination of the proportions of the populations in comparison. See Statistical Modeling below for details.
Unbiased estimation of overall mortality by crude rate Assume we have age groups. The age-specific mortality rates of each population form an -vector . Also assume the proportion of a given population with the disease is for age groups, . Assume that a study has populations, and each population has a mortality rate vector and a corresponding population proportion vector , . The mortality rate of each population in a given period of time is defined in [6] as
By the total probability rule [26], the crude mortality rate of each population is estimated with , an inner product of the two vectors and , i.e. the crude mortality rate is a weighted average of the age-specific rates. It is shown below that the crude mortality rate provides an unbiased estimate of the overall mortality rate over one year period for each population. This explains why the crude rate yielded the same number of deaths as the observed (Table 2).
(2) | |||||
Thus the unbiased estimates of the mortality rates of the populations are
(3) |
Comparing multiple populations with age-adjusted mortality rates Equation (3) provides an unbiased estimate of the overall mortality rate for each population. However, the rates so generated cannot provide a fair comparison across populations as different populations may have different age structures. With a late-onset disease, it is very likely that an older population yields a higher mortality rate than a younger population, even if the two populations have the same age-specific mortality rates [7, 10]. For fairer comparison, a direct age-standardization procedure was studied [7], in which an age-adjusted rate was calculated based on age-specific rates of a population in comparison and age structure of a standard population, such as the US Year 2000 Population Standard [4,5,10] or the WHO World Standard Population [12]. The age-adjusted mortality rate was calculated as for the -th population. The expected number of deaths was thus calculated by multiplying the rate by the total population of population , . The calculation of expected number of deaths allowed comparison of the rate among populations, and more importantly allowed comparison of the expected number of deaths of each population to the observed, assessing the bias of each rate.
Bias introduced by age-standardization
We made four observations below.
1) The crude rate in equation (3) provides unbiased estimation for each population. 2) An age-adjusted rate using a reference population may deviate from the crude rate and introduce bias. The deviation is calculated with . 3) The bias is often inevitable in comparison among multiple populations. unless the difference vector between the reference population and the -th population is perpendicular to the rate vector . Since multiple populations are often compared in a given study, the chance that one single reference population makes the deviation of all populations equal to is extremely small because needs to satisfy conditions and . 4) It is thus desirable to find a reference population to minimize the overall bias for all populations [7].
The difference between age-adjusted and crude rates represents an estimate of bias caused by using a reference population. We define a relative deviation to be the deviation of an age-adjusted rate () as a percentage of the crude rate (),
A positive value indicates overestimation and a negative one indicates underestimation.
To calibrate the age-adjusted rate with a mean reference population, we took a weighted average of the proportions of all populations in comparison, i.e. the proportions of six US cancer patient populations (Table S1) or the population proportions in eight periods of each state (Tables S4, S6, S8, S10, S12 and S14). We selected the weights by minimizing the total deviation, which is defined as the sum of squares of the deviations across all populations in comparison. By definition, the mean reference population is optimal in calibrating the age-adjusted mortality rate.
We compared the age-adjusted rate with the crude rate, using the US Year 2000 Population Standard or the mean reference population as reference. We used the mean deviation (averaged over all populations in comparison) to assess each age-adjusted rate. See equation (8) below for the definition of mean deviation. We also compared age profile of the US Year 2000 Population Standard and the mean reference population with the populations in comparison, as shown in Fig. 3.
Criteria in searching for a reference population We searched for a reference population by minimizing the total squared deviation of populations
(4) |
Technically, a reference population needs to satisfy and for population proportion. However, these conditions may not be enough to ensure that the reference population is “close” to or representative of the given populations . In order to make the reference population represent the given populations, we further required that the search for the reference population be conducted among weighted averages of the given populations. Mathematically they are convex linear combinations of these populations with for and . Such a convex linear combination ensured that the target reference population was representative and retained the characteristics of the populations. Therefore, our objective was to search for a set of weights for satisfying such that the linear combination minimized the total deviation in equation (4). This approach formulated the objective into a mathematical optimization problem [32, 33].
Optimization by quadratic programming Let be the total deviation to be minimized. Then
(5) |
where is an matrix. The column vectors and . is the Euclidean norm. Since , is a quadratic function of , and can be minimized in a compact domain. Since the constraints for and form a simplex in a -dimensional space, which is compact, the function can be minimized in as stated in the following theorem.
Theorem 1. The quadratic function in equation (5) has a minimum in the domain , which can be attained at some finite point .
Theorem 1 can be proved based on the continuity of function in the compact domain . This minimization problem is equivalent to a quadratic programming problem using the following Lagrange multiplier for convex programming ([32], page 13), for which the existence of the solution is guaranteed by the Khun-Tucker Theorem.
(6) |
where with and is a real number. We also provide the uniqueness of the solution.
Theorem 2. The quadratic function in equation (5) has a unique minimum in the domain .
We prove Theorem 2 by contradiction. Assuming that there exist two distinct minima (including local minimum) and further without loss of generality, by Jensen’s inequality [33] one has for any real number with convex function , which implies that is not a local or global minimum, unless , which is impossible for a quadratic function. This contradiction completes the proof.
Algorithm for computing the optimal weights and reference population
To solve the optimization problem, the following two approaches can be employed.
I. Quadratic programming approach The optimization problem in equation (6) leads to an equation system (7) below using the Lagrange multiplier approach by minimizing the objective function, see [32] for details.
with parameters and , where is a matrix with column vectors , and is a -vector of components 1.
(7) |
II. Statistical sampling approach A statistical sampling method can also find the optimal vector that minimizes the total deviation .
Step (a). Set an initial threshold , i.e. the total deviation with initial value .
Step (b). Take a random sample from uniform distribution Unif [0,1] and take the sum.
Step (c). If set and go to Step (d). Otherwise, discard the sample and repeat Steps (b) and (c) until the condition is satisfied.
Step (d). Check the total deviation in equation (5). If , update the threshold by setting . If not, repeat the above steps with a new sample.
Step (e). Repeat the above steps (b-d) to achieve a reasonably small total deviation .
Step (f). Repeat Step (e) to fine-tune the search by shrinking the sampling domain from [0,1] to a small one for () with and . This fine-tuning leads to the convergence of by the existence and uniqueness in Theorems 1 and 2.
Comparison of cancer mortality rates by age - standardization methods The crude rate was set as the reference for comparison, the cumulative rate and the age-adjusted rates using the US Year 2000 Population Standard and the mean reference population were compared to the crude rate. The relative deviation was calculated and reported as a percentage for each population , and the mean deviation of an age adjusted rate was calculated with equation (8) below as an average over all populations and used to assess an age adjustment method.
(8) |
A large mean deviation indicates inaccurate estimation while a small one indicates accurate estimation.
Acknowledgments
The authors are grateful to Norman Breslow, Robert Rosenberg and Bent Nielsen for their valuable comments and suggestions. We also thank Marla Broadfoot for her editorial assistance. This work is partly supported by NIH grant (5K01CA131259, CA142774 and CA165923). All authors declare no conflict of interest.
References
- [1] American Cancer Society (2012) Cancer Facts and Figures 2012. (American Cancer Society Publication 2012). http://www.cancer.org/acs/groups/content /@epidemiologysurveilance/documents/ document/acspc-031941.pdf
- [2] McGinnis JM, Foege WH (1993) Actual causes of death in the United States. J Amer Med Assoc 270:2207-2212. doi:10.1001/jama.1993.03510180077038.
- [3] Mokdad AH, Marks JS, Stroup DF, Gerberding JL (2000) Actual causes of death in the United States. J Amer Med Assoc 291:1238-1245. doi:10.1001/jama.291.10.1238 4.
- [4] Howlader N et al. (2012) SEER Cancer Statistics Review, 1975-2009 (Vintage 2009 Populations) National Cancer Institute. Bethesda, MD http://seer.cancer.gov/csr/ 19752009pops
- [5] Eheman C et al. (2012) Annual Report to the Nation on the status of cancer, 1975-2008, featuring cancers associated with excess weight and lack of sufficient physical activity. Cancer 118:2338-2366. doi: 10.1002/cncr.27514.
- [6] Breslow NE, Day NE (1987) Statistical Methods in Cancer Research Volume II - The Design and Analysis of Cohort Studies (International Agency for Research on Cancer, Lyon).
- [7] Doll R, Cook P (1976) Summarizing indices for comparison of cancer incidence data. Int J Canc 2:269-279.
- [8] Ogle MW (1892) Proposal for the establishment and international use of a standard population. Bulletin de l’Institut International de Statistique, Rome, ed. 6, Vol 1:83-85.
- [9] Wolfenden HH (1923) On the methods of comparing the mortalities of two or more communities and the standardization of death rates. J Roy Statist Soc 86:399-411.
- [10] National Center for Health Statistics (1998) Report of the workshop on age adjustment. Vit Heal Statist 4:30.
- [11] US Cancer Statistics Working Group (2012) United States Cancer Statistics: 1999 - 2008 Incidence and Mortality Web-based Report. (U.S. Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute, Atlanta. Available at: www.cdc.gov/uscs).
- [12] Ahmad OB et al (2001) Age Standardization of Rates: a new WHO standard. (GPE Discussion Paper Series: No. 31 EIP/GPE/EBD World Health Organization).
- [13] Shalala DE (1998) HHS policy for changing the population standard for age adjusting death rates. Memorandum from the Secretary. August 26, 1998. Available at http://aspe.hhs.gov /datacncl/ageadj.htm Accessed October 1, 2013.
- [14] Anderson R, Rosenberg HM (1998) Age standardization of death rates: Implementation of the Year 2000 Standard, Nat Vit Statist Rep 47:3.
- [15] California Department of Public Health http://www.cdph.ca.gov/data/statistics/Documents/ VSC-2005-0501.pdf
- [16] State of Massachusetts the Executive Office of Health and Human Services http://www.mass. gov/eohhs/docs/dph/research-epi/death-report-09.pdf
- [17] Michigan Department of Community Health http://www.mdch.state.mi.us/pha/osr/CHI/ Cancer/frame.asp
- [18] Missouri Department of Health and Senior Services http://health.mo.gov/data/mica /CDPMICA/AARate.html
- [19] State of New Jersey Department of Health http://www4.state.nj.us/dhss-shad/home/AARate. html
- [20] State of New York, Department of Health, http://www.health.ny.gov/diseases/chronic/ageadj. htm
- [21] Robson B, Purdie G, Cram F, Simmonds S. (2007) Age standardization – an indigenous standard? BMC Emerg Theme Epi 4:3.
- [22] Pamuk ER (2001) Cautiously adjusting to the new millennium: changing to the 2000 population standard. Am Publ Heal 91,8:1174-1176.
- [23] Krieger N, Williams DR (2001) Changing to the 2000 standard million: are declining racial/ethnic and socioeconomic inequalities in health real progress or statistical illusion? Am Publ Heal 91,8:1209-1213.
- [24] Sorlie PD et al (1999) Age-adjusted death rates: consequences of the year 2000 standard. Ann Epi 9:93-100.
- [25] National Cancer Institute, SEER program http://seer.cancer.gov/statistics/
- [26] Rosner B (2011) Fundamentals of Biostatistics. (Brooks/Cole, New York, ed.7), pp.50.
- [27] Hébert JR et al. (2009) Mapping cancer mortality-to-incidence ratios to illustrate racial and sex disparities in a high-risk population. Cancer 115:2539-2552.
- [28] Jemal A, Ward E, Thun M (2012) Declining death rates reflect progress against cancer. PLoS One 5:e9584. doi:10.1371/journal.pone.0009584
- [29] SEER*Stat Software. http://seer.cancer.gov/seerstat/
- [30] CanQues Cancer Query Systems. http://seer.cancer.gov/canques/
- [31] National Cancer Institute. Incidence-based mortality sample analysis. http://surveillance. cancer.gov/documents/statistics/ibm/prostate.ibm.demo.pdf.
- [32] Lee GM, Tam NN, Yen NG (2005) Quadratic Programming and Affine Variational Inequalities. (Springer, New York) pp. 13-4.
- [33] Needham T (1993) A Visual Explanation of Jensen’s Inequality. Amer Math Month 100:768-771.
- [34] The R Project for Statistical Computing. R version 2.13.0. http://www.r-project.org/
Year | |||||||||
State | Method | 1970-74 | 1975-79 | 1980-84 | 1985-89 | 1990-94 | 1995-99 | 2000-04 | 2005-09 |
CA | Crude | 15.54 | 17.50 | 18.98 | 20.05 | 21.51 | 19.44 | 17.56 | 17.03 |
US2000 | 29.66 | 31.84 | 33.47 | 34.72 | 36.76 | 30.84 | 26.33 | 23.33 | |
% Dev^{1}^{1}1 | 91 | 82 | 76 | 73 | 71 | 59 | 50 | 37 | |
MeanRef | 17.91 | 19.11 | 20.11 | 20.84 | 21.82 | 18.12 | 15.25 | 13.56 | |
% Dev | 15 | 9 | 6 | 4 | 1 | -7 | -13 | -20 | |
Cumul | 482.92 | 499.10 | 511.47 | 539.32 | 551.92 | 440.27 | 352.83 | 308.69 | |
% Dev | 3007 | 2752 | 2594 | 2590 | 2466 | 2165 | 1909 | 1712 | |
MA | Crude | 19.21 | 21.25 | 23.64 | 25.13 | 28.80 | 26.69 | 23.59 | 20.78 |
US2000 | 31.31 | 32.68 | 34.57 | 36.01 | 38.87 | 34.01 | 28.66 | 23.17 | |
% Dev | 63 | 54 | 46 | 43 | 35 | 28 | 22 | 12 | |
MeanRef | 23.11 | 24.18 | 25.38 | 26.27 | 28.32 | 24.43 | 20.33 | 16.42 | |
% Dev | 20 | 14 | 7 | 5 | -2 | -8 | -14 | -21 | |
Cumul | 499.05 | 527.05 | 529.31 | 528.14 | 567.52 | 473.18 | 359.01 | 275.71 | |
% Dev | 2498 | 2381 | 2139 | 2002 | 1871 | 1673 | 1422 | 1227 | |
MI | Crude | 16.63 | 18.51 | 20.82 | 23.84 | 27.26 | 25.14 | 21.44 | 18.85 |
US2000 | 31.68 | 33.91 | 34.99 | 37.33 | 40.91 | 36.09 | 28.71 | 22.45 | |
% Dev | 90 | 83 | 68 | 57 | 50 | 44 | 34 | 19 | |
MeanRef | 21.06 | 22.27 | 22.90 | 24.38 | 26.15 | 22.64 | 17.88 | 14.04 | |
% Dev | 27 | 20 | 10 | 2 | -4 | -10 | -17 | -25 | |
Cumul | 539.70 | 549.20 | 569.68 | 585.61 | 616.35 | 501.73 | 384.85 | 289.05 | |
% Dev | 3145 | 2868 | 2637 | 2357 | 2161 | 1896 | 1695 | 1434 | |
MO | Crude | 20.93 | 22.87 | 23.38 | 26.21 | 28.96 | 26.11 | 21.01 | 19.99 |
US2000 | 29.37 | 31.78 | 31.18 | 33.84 | 36.61 | 32.94 | 26.10 | 23.09 | |
% Dev | 40 | 39 | 33 | 29 | 26 | 26 | 24 | 15 | |
MeanRef | 23.30 | 24.85 | 24.30 | 26.39 | 28.38 | 25.05 | 19.51 | 17.49 | |
% Dev | 11 | 9 | 4 | 1 | -2 | -4 | -7 | -13 | |
Cumul | 495.14 | 502.12 | 481.57 | 524.52 | 540.68 | 459.26 | 325.60 | 308.95 | |
% Dev | 2265 | 2095 | 1959 | 1901 | 1767 | 1659 | 1449 | 1445 | |
NJ | Crude | 17.53 | 20.38 | 23.09 | 25.70 | 29.68 | 26.98 | 22.43 | 19.08 |
US2000 | 31.36 | 33.42 | 35.48 | 36.95 | 40.80 | 35.19 | 28.13 | 22.30 | |
% Deviate | 79 | 64 | 54 | 44 | 37 | 30 | 25 | 17 | |
MeanRef | 22.28 | 23.75 | 24.91 | 26.11 | 28.46 | 24.37 | 19.19 | 15.28 | |
% Dev | 27 | 17 | 8 | 2 | -4 | -10 | -14 | -20 | |
Cumul | 524.23 | 542.64 | 546.12 | 574.13 | 604.50 | 510.12 | 373.19 | 302.53 | |
% Dev | 2891 | 2563 | 2265 | 2134 | 1937 | 1791 | 1563 | 1485 | |
NY | Crude | 18.15 | 20.57 | 22.94 | 24.58 | 27.23 | 24.83 | 22.12 | 19.30 |
US2000 | 28.61 | 31.12 | 33.13 | 34.42 | 37.80 | 33.27 | 28.02 | 22.49 | |
% Dev | 58 | 51 | 44 | 40 | 39 | 34 | 27 | 16 | |
MeanRef | 21.22 | 22.74 | 24.02 | 25.16 | 27.13 | 23.66 | 19.71 | 15.83 | |
% Dev | 17 | 11 | 5 | 2 | 0 | -5 | -11 | -18 | |
Cumul | 486.59 | 504.17 | 529.21 | 548.42 | 572.84 | 489.36 | 386.83 | 310.59 | |
% Dev | 2581 | 2351 | 2207 | 2131 | 2003 | 1871 | 1649 | 1509 | |
Unit of mortality rate is per person-year. | |||||||||
Percentage of deviation from the crude rate. |
Year | |||||||||
State | Method | 1970-74 | 1975-79 | 1980-84 | 1985-89 | 1990-94 | 1995-99 | 2000-04 | 2005-09 |
CA | Obs | 7760 | 9503 | 11457 | 13657 | 16276 | 15530 | 15000 | 15169 |
Crude | 7760 | 9503 | 11457 | 13657 | 16276 | 15530 | 15000 | 15169 | |
US2000 | 14807 | 17290 | 20203 | 23649 | 27809 | 24639 | 22492 | 20772 | |
% Dev^{2}^{2}2 | 91 | 82 | 76 | 73 | 71 | 59 | 50 | 37 | |
MeanRef | 8940 | 10380 | 12136 | 14192 | 16509 | 14474 | 13030 | 12075 | |
% Dev | 15 | 9 | 6 | 4 | 1 | -7 | -13 | -20 | |
MA | Obs | 2600 | 2872 | 3215 | 3521 | 4119 | 3945 | 3597 | 3199 |
Crude | 2600 | 2872 | 3215 | 3521 | 4119 | 3945 | 3597 | 3199 | |
US2000 | 4238 | 4417 | 4701 | 5046 | 5561 | 5028 | 4370 | 3567 | |
% Dev | 63 | 54 | 46 | 43 | 35 | 28 | 22 | 12 | |
MeanRef | 3129 | 3268 | 3452 | 3682 | 4050 | 3611 | 3100 | 2528 | |
% Dev | 20 | 14 | 7 | 5 | -2 | -9 | -14 | -21 | |
MI | Obs | 3610 | 4078 | 4553 | 5223 | 6180 | 5938 | 5194 | 4561 |
Crude | 3610 | 4078 | 4553 | 5223 | 6180 | 5938 | 5194 | 4561 | |
US2000 | 6876 | 7473 | 7654 | 8181 | 9274 | 8525 | 6956 | 5433 | |
% Dev | 90 | 83 | 68 | 57 | 50 | 44 | 34 | 19 | |
MeanRef | 4570 | 4908 | 5009 | 5341 | 5927 | 5347 | 4332 | 3398 | |
% Dev | 27 | 20 | 10 | 2 | -4 | -10 | -17 | -25 | |
MO | Obs | 2357 | 2627 | 2732 | 3135 | 3595 | 3418 | 2865 | 2835 |
Crude | 2357 | 2627 | 2732 | 3135 | 3595 | 3418 | 2865 | 2835 | |
US2000 | 3307 | 3650 | 3643 | 4047 | 4544 | 4311 | 3558 | 3273 | |
% Dev | 40 | 39 | 33 | 29 | 26 | 26 | 24 | 15 | |
MeanRef | 2623 | 2854 | 2839 | 3156 | 3523 | 3280 | 2661 | 2481 | |
% Dev | 11 | 9 | 4 | 1 | -2 | -4 | -7 | -13 | |
NJ | Obs | 3042 | 3553 | 4068 | 4674 | 5570 | 5298 | 4589 | 3983 |
Crude | 3042 | 3553 | 4068 | 4674 | 5570 | 5298 | 4589 | 3983 | |
US2000 | 5442 | 5826 | 6251 | 6721 | 7657 | 6912 | 5754 | 4655 | |
% Dev | 79 | 64 | 54 | 44 | 37 | 30 | 25 | 17 | |
MeanRef | 3867 | 4141 | 4388 | 4750 | 5342 | 4786 | 3926 | 3190 | |
% Dev | 27 | 17 | 8 | 2 | -4 | -10 | -14 | -20 | |
NY | Obs | 7791 | 8616 | 9476 | 10343 | 11739 | 11016 | 10061 | 8832 |
Crude | 7791 | 8616 | 9476 | 10343 | 11739 | 11016 | 10061 | 8832 | |
US2000 | 12282 | 13036 | 13688 | 14482 | 16296 | 14761 | 12746 | 10288 | |
% Dev | 58 | 51 | 44 | 40 | 39 | 34 | 27 | 16 | |
MeanRef | 9111 | 9526 | 9922 | 10584 | 11694 | 10497 | 8963 | 7243 | |
% Dev | 17 | 11 | 5 | 2 | 0 | -5 | -11 | -18 | |
Percentage of deviation from the observed number. |
Mean | Cancer Site | ||||||
Rate | Deviation | Breast | Cervix | Prostate | Lung | Leukemia | Colon-Rect |
Crude | 0 | 1380.6 | 1170.5 | 1088.3 | 19771.7 | 3705.3 | 3201.5 |
US2000 | 4305.1 | 941.7 | 683.9 | 243.4 | 9384.5 | 2265.3 | 2900.3 |
% Dev | -31.79 | -41.57 | -77.63 | -52.54 | -38.86 | -9.41 | |
MeanRef | 403.4 | 1394.1 | 1295.7 | 845.2 | 19344.1 | 4538.4 | 3358.1 |
% Dev | 0.98 | 10.70 | -22.34 | -2.16 | 22.48 | 4.89 | |
Cumul | 53351.3 | 10867.8 | 9789.3 | 4175.9 | 145146.2 | 28241.8 | 27358.2 |
% Dev | 687.19 | 736.36 | 283.71 | 634.11 | 662.19 | 754.55 | |
Unit of case fatality rate is per person-year. | |||||||
Negative deviation indicates underestimation. | |||||||
Mean deviation is calculated as the square-root of the average of the squared | |||||||
deviation over all six cancer sites, see equation (8) in Methods. |
Supplementary Materials
Data
Generating Cancer Mortality and Case Fatality Rates
We generate prostate cancer mortality rate of each state with the US SEER 9 registration data using the Mortality All COD - Aggregated with State, Total US (1969-2009) Katrina/Rita Population Adjustment with the state specified to be one of the six states and gender specified to be “Male” and Site and Morphology Cause of Death Record to be “Prostate”. Eight 5 year periods (1970–74, , 2005–09) were specified to generate the age–specific cancer mortality rate for the eight periods. By default, the numbers of cases less than 10 were set to 0 with a corresponding mortality rate 0. The newborn group age 0 was excluded for data analysis and the remaining 18 age groups (1–4, 5–9, , 80-84, and 85+) were used for data analysis.
We generate the cancer mortality rate with the US SEER 9 registry data using the incidence-based mortality (IBM) database (31), the Incidence-Based Mortality - SEER 9 Regs Research Data, Nov 2011 Sub Vintage 2009 Pops (1973-2009) Katrina/Rita Population Adjustment. Following the IBM instruction (31), we specify the maximum number of months of the “Survival time recode” in the database to ensure that all cancer records in the database are included in calculating the age-specific mortality rate for the year 2008. Also generated by the SEER*Stat software are the numbers of deaths reported in 2008 by age group and the numbers of people in the general public (including healthy people). Similar to the above, we exclude the age 0 group and thus have the mortality rates, the numbers of deaths, and the population exposures in 18 age groups (age 1-4, 5-9, and ). Note that although SEER*Stat is used to estimate mortality rate based on the total population in the database, it does not provide patient exposure-based case fatality rate (25). We further generate the prevalence of each cancer as of January 1, 2009 by the same 18 age groups using CanQues, and calculate the case fatality rate as follows.
Let denote the mortality rate based on the general population (including the healthy people), the case fatality rate based on the cancer patient exposure to death from the disease, the number of deaths from the disease, the cancer patient exposure, the total population, and the prevalence of the disease in the general population. It can be seen that
i.e., each age-specific case fatality rate can be calculated with the age-specific mortality rate divided by the age-specific prevalence of the disease . Note that the following two rules apply in calculating the mortality rates:
1) If the age-specific prevalence , the mortality rate is set to be 0.
2) To ensure the stability of mortality rate based on a small number of cases, the age-specific case fatality rate is set to be 0 if the age-specific number of deaths is small ().
Table S1 displays the person-year exposure in 2008 by cancer site and age group and the US Year 2000 Population Standard by age group. Table S2 displays the case fatality rates in 2008 by age group of six cancer sites: female breast, cervix, prostate, lung, leukemia, and colon-rectum. Tables S3–S14 in the Supplementary Materials display the mortality rate and population exposure during 1970–2009 of the six states. The newborn group of age 0 is excluded from all tables as they are not considered for cancer mortality in this study.
Table S1. Distribution of US Cancer Patient of Six Sites and US Year 2000 Population Standard
Age | Cancer Site | US 2000 | |||||
---|---|---|---|---|---|---|---|
(year) | Female Breast | Cervix | Prostate | Lung | Leukemia | Colon-Rectum | () |
1-4 | 0.0 | 0.0 | 0.0 | 0.0 | 288.4 | 0.0 | 15191.6 |
5-9 | 0.0 | 0.0 | 0.0 | 0.0 | 798.1 | 0.0 | 19919.8 |
10-14 | 2.0 | 2.9 | 0.0 | 2.1 | 963.7 | 0.0 | 20056.8 |
15-19 | 6.0 | 3.0 | 0.0 | 9.3 | 1169.0 | 0.0 | 19819.5 |
20-24 | 30.3 | 37.9 | 0.0 | 23.5 | 1016.5 | 51.0 | 18257.2 |
25-29 | 209.5 | 147.2 | 0.0 | 54.5 | 1015.0 | 167.8 | 17722.0 |
30-34 | 825.1 | 457.9 | 0.0 | 78.0 | 898.8 | 335.3 | 19511.4 |
35-39 | 2665.1 | 1110.3 | 18.6 | 152.1 | 893.8 | 696.6 | 22180.0 |
40-44 | 6535.6 | 1681.6 | 279.9 | 345.9 | 802.9 | 1574.9 | 22479.2 |
45-49 | 13663.2 | 2315.5 | 1534.3 | 916.9 | 1071.2 | 2900.7 | 19805.8 |
50-54 | 21312.3 | 2683.2 | 5756.7 | 1926.7 | 1332.3 | 5505.3 | 17224.4 |
55-59 | 27213.8 | 2763.8 | 13937.6 | 2840.2 | 1699.5 | 7808.3 | 13307.2 |
60-64 | 31922.1 | 2274.7 | 25540.1 | 4160.4 | 2044.7 | 9616.8 | 10654.2 |
65-69 | 30378.7 | 1690.6 | 33205.4 | 5175.3 | 2079.9 | 10896.1 | 9409.9 |
70-74 | 26981.9 | 1256.4 | 36867.2 | 5293.5 | 2028.9 | 12527.7 | 8725.6 |
75-79 | 25525.1 | 961.1 | 38201.6 | 5244.7 | 2009.4 | 14127.7 | 7414.6 |
80-84 | 23685.1 | 617.2 | 33037.0 | 4189.4 | 1815.1 | 14773.0 | 4900.2 |
85+ | 25249.0 | 536.8 | 26454.0 | 2766.4 | 1687.5 | 18098.4 | 4259.2 |
Fraction person-year exposure is due to the conversion from total population by disease prevalence.
Table S2. US Age-specific Case Fatality Rate ( person-year) of six Cancer Sites in 2008
Age | Cancer Site | |||||
---|---|---|---|---|---|---|
(year) | Female Breast | Cervix | Prostate | Lung | Leukemia | Colon-Rectum |
1-4 | 0.0 | 0.0 | 0.0 | 0.0 | 2424.6 | 0.0 |
5-9 | 0.0 | 0.0 | 0.0 | 0.0 | 876.5 | 0.0 |
10-14 | 0.0 | 0.0 | 0.0 | 0.0 | 623.0 | 0.0 |
15-19 | 0.0 | 0.0 | 0.0 | 0.0 | 599.3 | 0.0 |
20-24 | 0.0 | 0.0 | 0.0 | 0.0 | 1474.9 | 0.0 |
25-29 | 0.0 | 0.0 | 0.0 | 0.0 | 1380.2 | 7750.0 |
30-34 | 1575.9 | 1092.6 | 0.0 | 10250.0 | 1780.9 | 4470.9 |
35-39 | 2363.8 | 1260.8 | 0.0 | 10527.0 | 1119.5 | 6174.0 |
40-44 | 1713.7 | 832.8 | 0.0 | 11857.1 | 2615.4 | 4508.5 |
45-49 | 1690.6 | 906.8 | 456.5 | 18213.3 | 2148.1 | 4171.5 |
50-54 | 1398.2 | 1304.4 | 330.1 | 18322.3 | 2552.3 | 3306.0 |
55-59 | 1355.9 | 1085.5 | 466.4 | 19470.5 | 3294.6 | 3393.8 |
60-64 | 1181.0 | 967.3 | 536.4 | 19132.5 | 3618.8 | 3098.7 |
65-69 | 1168.6 | 1656.3 | 605.3 | 17796.1 | 3269.5 | 3184.6 |
70-74 | 1145.2 | 955.0 | 783.9 | 19552.3 | 5766.6 | 2785.8 |
75-79 | 1214.5 | 2081.2 | 997.3 | 20802.1 | 4976.5 | 2909.2 |
80-84 | 1397.5 | 1458.2 | 1440.8 | 21292.0 | 7823.0 | 2931.0 |
85+ | 1952.6 | 1303.7 | 2884.2 | 24833.7 | 9362.8 | 3447.8 |