Cohort effects in mortality modelling: a Bayesian statespace approach
Abstract
Cohort effects are important factors in determining the evolution of human mortality for certain countries. Extensions of dynamic mortality models with cohort features have been proposed in the literature to account for these factors under the generalised linear modelling framework. In this paper we approach the problem of mortality modelling with cohort factors incorporated through a novel formulation under a statespace methodology. In the process we demonstrate that cohort factors can be formulated naturally under the statespace framework, despite the fact that cohort factors are indexed according to yearofbirth rather than year. Bayesian inference for cohort models in a statespace formulation is then developed based on an efficient Markov chain Monte Carlo sampler, allowing for the quantification of parameter uncertainty in cohort models and resulting mortality forecasts that are used for life expectancy and life table constructions. The effectiveness of our approach is examined through comprehensive empirical studies involving male and female populations from various countries. Our results show that cohort patterns are present for certain countries that we studied and the inclusion of cohort factors are crucial in capturing these phenomena, thus highlighting the benefits of introducing cohort models in the statespace framework. Forecasting of cohort models is also discussed in light of the projection of cohort factors.
Keywords: mortality modelling, cohort features, statespace model, Bayesian inference, Markov chain Monte Carlo
1 Introduction
The declining trend of mortality rates is generally observed in many developed countries. It is widely acknowledged among actuaries and demographers that dynamic mortality models are required to account for the uncertainty associated with the projection of mortality for different generations. From governments who are responsible for pension policy design to insures who offer retirement income products, it is important to consider and incorporate different factors that would impact the projections of mortality rates. Arguably one of the most discussed and important factors is the socalled cohort effect, that is, the effect of yearofbirth on mortality rates.
It is perhaps not surprising that people born in different years or generations would undergo different mortality experiences. Willets (2004) finds evidence for the existence of cohort trends by examining the average annual mortality improvement rate for males and females in the population of England and Wales. Murphy (2009) discusses the “golden generations” of the British population who were born in early 1930s and have experienced exceptionally rapid improvements in mortality rates. Possible explanations for this “golden generations” phenomenon includes changing smoking patterns between generations; better diet and environmental conditions during and after the Second World War; those born in periods of low fertility facing less competition for resources as they age; and benefits from medical advances (Murphy (2009)). Willets (2004), Murphy (2009) and Murphy (2010) provide detailed discussions on cohort effects including their identification from mortality data and competing explanations. The aforementioned studies are not modelbased however, but are relying on empirical data analysis and qualitative analysis such as descriptive and graphical representations to conclude the significance of cohort effects on population mortality.
For actuarial applications such as mortality forecasting and longevity risk management (Cairns et al. (2008) and Barrieu et al. (2012)), one is often interested in building dynamic mortality models with stochastic cohort features. Renshaw and Haberman (2003) extend the wellknown LeeCarter mortality model (Lee and Carter (1992)) by introducing an agemodulated cohort effect. In a similar way, the CairnsBlakeDowd mortality model introduced in Cairns et al. (2006) is extended to incorporate cohort factors in Cairns et al. (2009). In contrast to studying cohort effects via qualitative arguments, mortality models offer a quantitative and statistical approach to identify and analyse cohort patterns exhibited in mortality data.
There are two main approaches for estimating mortality models. The first approach relies on least squares estimation based on singular value decomposition (SVD), pioneered by Lee and Carter (1992) in the mortality context. Other studies using this approach to analyse mortality include Renshaw and Haberman (2003), Yang et al. (2010) and Shang et al. (2011). The second approach employs regressionbased methods to calibrate mortality models (Brouhns et al. (2002)). For some recent studies based on this approach, see O’Hare and Li (2012), van Berkum et al. (2016) and Enchev et al. (2016). The recent paper Currie (2016) provides a comprehensive summary on mortality modelling based on the generalised linear modelling framework. In particular, the paper points out that there are convergence and robustness problems in the regressionbased estimation setting that need to be resolved for models with cohort features.
In this paper, we approach the problems of estimating and forecasting mortality with cohort features via statespace methodology, which is a natural extension of the framework recently described in Fung et al. (2017) to the cohort model formulation. In particular we work under the Bayesian paradigm and as a result the important aspect of parameter uncertainty can be accounted for naturally. The main contribution of this paper is to demonstrate how mortality models with cohort effects can be formulated, estimated and forecasted under a Bayesian statespace framework. Other works using the statespace approach for mortality modelling include Pedroza (2006), De Jong and Tickle (2006), Kogure et al. (2009) and Liu and Li (2016b). In our view, the statespace approach has three major advantages.
First, the ability of modelling, estimating and forecasting mortality under a unified framework can avoid the potential pitfalls of the 2step estimation procedure typically found in the literature. As discussed in details in Fung et al. (2017), a common practice of estimating mortality models consists of two steps:
 Step 1

Obtain estimates of parameters including period (cohort) effects; the period (cohort) effects are treated as parameters without the assumptions of their dynamics.
 Step 2

Assume a time series model, for example ARIMA models, for the period (cohort) effects; parameters of the time series model are then estimated by fitting the model to period (cohort) effects obtained from Step 1.
In a recent study in Leng and Peng (2016), the authors point out that the leastsquares method for the 2step estimation approach utilised in Lee and Carter (1992) will in general lead to inconsistent estimators unless restrictions are imposed on the possible range of time series models for the underlying dynamics. From a statistical point of view, the twosteps approach is a somewhat adhoc procedure. One may argue that it is more satisfactory to perform estimation and forecasting under a single universal, consistent and rigorous framework, and one such example is the statespace methodology which is wellestablished in the statistics community.
Second, the ability to provide confidence or credible intervals for the estimated parameters, and hence the quantification of parameter risk in mortality projections, is especially important in longevity analysis. The impact of parameter uncertainty on forecasting mortality rates is documented in Czado et al. (2005), Koissi et al. (2006) and Kleinow and Richards (2016). A Bayesian approach to mortality modelling via credit risk plus methodology is considered in Shevchenko et al. (2015) and Hirz et al. (2017). It is also a significant issue in valuing liabilities in life insurance portfolio and pension schemes. Moreover, the rather short time series data used for calibration purpose typically assumed in the literature^{1}^{1}1It is often the case that mortality rates data obtained before 1960 or even 1970 are not used when calibrating mortality models. further enforce the necessity to account for parameter uncertainty where forecasting intervals are required. The ability to quantify parameter risk will enhance the reliability of the projected mortality rates. As shown in Fung et al. (2017), the statespace approach offers a particularly rich and flexible framework for Bayesian and frequentist estimation, where a range of techniques such as filtering, sequential Monte Carlo and Markov chain Monte Carlo (MCMC) methods can be employed.
Third, the statespace approach allows for a wide range of mortality models to be considered while estimation and forecasting can still be performed efficiently. Fung et al. (2017) show that a majority of the popular mortality models can be cast in statespace formulation; in addition, stochastic volatility features can be introduced to dynamic mortality models where numerical filtering techniques are employed for model estimation. Statespace formulation of mortality models have been applied to financial problems as well. Annuity pricing via a Bayesian statespace formulation of the LeeCarter model is considered in Fung et al. (2015). Pricing of longevity instruments based on the maximum entropy principle using a statespace mortality model is studied in Kogure and Kurachi (2010). The flexibility of the statespace approach is a key element in dealing with diverse issues concerning mortality modelling as well as pricing and risk analysis involving longevity risk, see Liu and Li (2016a) and Liu and Li (2016b) for an application of statespace mortality model for longevity hedging .
Despite the advantages of the statespace method, the approach is still under explored in our view. A key element that is as yet missing from this literature, that we aim to address in this paper, is the consideration of cohort effects in a statespace modelling setting. Given the fact that cohort effects are known to be present in certain countries, the possibility of exploiting cohort features under a statespace framework will undoubtedly enhance an actuary’s ability to analyse mortality data. The importance of incorporating cohort effects in statespace setting is also emphasized in Liu and Li (2016b), where the authors “acknowledge that cohort effects are significant in certain populations, and that it is not trivial to incorporate cohort effects in a statespace representation in which the vector of hidden states evolve over time rather than year of birth” (p.66). Therefore, in this paper we focus on addressing this missing piece of model formulation.
The paper is organised as follows. Section 2 provides an overview of different approaches to mortality modelling, leading to the introduction of the statespace methodology. Statespace formulation of cohort models is discussed in details in Section 3. In Section 4, we develop Bayesian inference for cohort models under the statespace framework based on efficient MCMC sampling. Empirical studies for male and female population data from various countries using the statespace cohort models are conducted in Section 5. Finally, Section 6 concludes.
2 Statespace approach to mortality modelling
In this section we provide an overview of different approaches to modelling mortality as well as their estimation methodologies. Our focus will be on single population mortality modelling, however the essential elements of the approaches and methods discussed can be carried over to multipopulation settings, see for example Enchev et al. (2016).
2.1 Stochastic mortality models
The definitions here follow Dowd et al. (2010). We use to denote the true mortality rate, i.e., the probability of death between time and for individuals aged at time . The true death rate, denoted by , is related to the true mortality rate via
(1) 
where the force of mortality is assumed to be constant within integers and . One can use the observed number of deaths and initial exposures data^{2}^{2}2The initial exposure is the population size aged at the beginning of year . to obtain the crude mortality rate as , which is a crude estimate of . Similarly, the crude death rate is defined as the ratio of the observed number of deaths to the average population size , known as central exposures, ages last birthday during year .^{3}^{3}3The average population size is often determined approximately as the population size at the middle of the year.
The work of Lee and Carter (1992) introduced the socalled LeeCarter (LC) model
(2) 
where the terms and aim to capture the age and period effects respectively. Since they proposed to use singular value decomposition (SVD) to calibrate the model to mortality data, the noise term is included in addition to the age and period effects; note that the crude death rate , which is obtained directly using mortality data, is used to obtain estimates of the parameters.
In contrast to the SVD approach where the models are fitted to crude death rates, Brouhns et al. (2002) considers an alternative approach where mortality models are fitted to the number of deaths instead. Using the number of observed deaths and central exposures , the model proposed by Lee and Carter can be reconsidered as
(3) 
In this approach, the number of deaths plays an important role in calibrating the model and consequently the additive error structure in (2) is replaced by the Poisson error structure. Also notice that confusion may arise if one does not distinguish the true and crude death rate, for example it does not make sense to use in (3).
Besides having a different statistical assumption on the error structure, the Poisson regression setting, which belongs to the class of models known as generalised linear/nonlinear models, has the advantage that the cohort factor can be incorporated while estimation can still be performed without extra difficulty in contrast to the SVD approach. The LeeCarter model in the Poisson setup can be enriched by adding a cohort factor , where refers to the yearofbirth, as follows:
(4) 
which was proposed in Renshaw and Haberman (2006). As and take values in and respectively, the cohort index takes values in . The dimension of the cohort index is thus which is different to the dimension of the period index .
Under the LeeCarter original approach, one might consider modelling the crude death rate with cohort effects as follows:
(5) 
However the dimension of the cohort index would cause difficulty for the SVD estimation approach. One of the goals of the present paper is to show that model (5) can be successfully estimated by considering it as a statespace model instead.
Model  Dynamics 

Lee and Carter (1992)  
Renshaw and Haberman (2003)  
Renshaw and Haberman (2006)  
Currie (2009)  
Cairns et al. (2006)  
Cairns et al. (2009)  
Plat (2009) 
Examples of popular mortality models are provided in Table 1. Renshaw and Haberman (2003) consider a multiperiod () extension of the LC model. Renshaw and Haberman (2006) introduce a cohort factor () to LC model. A simplified version of the model in Renshaw and Haberman (2006) is studied in Currie (2009). Cairns et al. (2006) propose the socalled CBD model to model , where is the average age of the sample range, i.e. , designed to capture the matureage mortality dynamics. Cairns et al. (2009) extend the CBD model by incorporating a cohort factor. Plat (2009) introduces a model which combines the desirable features of the previous models and include a term which aims to capture youngage mortality dynamics.
Remark 2.1
An alternative approach to modelling cohort mortality is to consider the continuoustime modelling of the mortality intensity (force of mortality) , with and taking continuous values. It is of particular relevance for financial and actuarial applications as pricing formulas can be derived and expressed via the mortality intensity. Continuoustime approaches are discussed in Cairns et al. (2008), and they have been studied extensively in the literature, see for example Biffis (2005), Dahl and Moller (2006), Luciano and Vigna (2008) and Fung et al. (2014).
2.2 Generalised linear modelling framework
Estimation of stochastic mortality models such as those presented in Table 1 can be performed based on a flexible approach known as generalised linear modelling (GLM) framework (Villegas et al. (2015), Currie (2016)).
Given central exposures , it is typical to approximate initial exposures as . Under the GLM framework, one is interested in modelling the number of deaths as random variables. Common examples include Poisson error structure
(6) 
and Binomial error structure
(7) 
Note that we have the expected values and for the Poisson and Binomial models respectively. Through the socalled link function , one can associate the mean or with a predictor as
(8) 
where can be initial or central exposures. Typical link functions for the Poisson and Binomial models are the log function and the logit function respectively.
The models in Table 1, from the viewpoint of GLM framework, provide a specification for the predictor. In particular, the last four models in Table 1 assume linear predictors while the first three models describe nonlinear predictors since multiplicative terms of parameters such as are involved in the predictor. The latter models can be referred to as generalised nonlinear models (Currie (2016)).
We note here that a clear advantage of the GLM framework is that it allows sophisticated error structures such as Poisson and Binomial distributions compared to the SVD approach. However, despite its flexibility, the framework involves a 2step procedure for estimation which is in contrast to the statespace approach presented in the following sections where a joint estimation of model parameters and latent factors is performed.
2.3 Statespace modelling framework
In this paper we extend previous work, see for example Pedroza (2006), Kogure and Kurachi (2010) and Fung et al. (2017), on using statespace techniques to model mortality dynamics with cohort features taken into consideration. We first present a brief review of mortality modelling via statespace representation.
A statespace model consists of two equations: the observation equation and the state equation which are given, respectively, by
(9a)  
(9b) 
where represents an observed multidimensional time series and the state represents a multidimensional hidden Markov process. Here, and are independent random noises and the functions and can be nonlinear in general. It is clear that each of the mortality models shown in Table 1 specify the observation equation of a statespace model, where the period and cohort factors represent the hidden states. A time series model for the period effects will form the state equation, thus completing the description of a statespace system. As an example, the LeeCarter model can be reformulated as
(10a)  
(10b) 
where , is a dimensional identity matrix and denotes the normal distribution with mean and covariance .
Fung et al. (2017) study two generalisations of the statespace LeeCarter system (10) to analyse long term mortality time series for the Danish population. The first generalisation is to incorporate heteroscedasticity into the model, i.e. the homogeneous covariance structure in the observation equation in (10) which is replaced by a heterogeneous covariance matrix . This feature turns out to be a major improvement to model fit for Danish mortality data.
The second generalisation is to consider stochastic volatility for the latent process, that is the period effect in the state equation in (10), to capture the observed characteristics of the long term time series data. Specifically, the following extended LC model with stochastic volatility feature is proposed:
(11a)  
(11b)  
(11c) 
where captures the stochastic volatility for the period effect. Bayesian inference for the model is developed based on particle MCMC method (Andrieu et al. (2010)).
Despite the fact that cohort models are essential in mortality modelling as shown in Table 1, a formulation of mortality models with cohort factors incorporated is yet to be studied and analysed in the statespace setting in the literature. In the following we will present and demonstrate our approach for dealing with cohort models via statespace methodology.
3 Cohort effects: statespace formulation
This section presents a formulation of cohort models in statespace framework. We first describe how the cohort factor impacts the evolution of the agespecific death rates. The insight will give us a way to derive a statespace representation of the cohort effects. Bayesian inference for cohort models will be developed in Section 4.
3.1 Background
Renshaw and Haberman (2006) introduces a cohort factor to the LeeCarter model together with an agemodulating coefficient as follows:
(12) 
where and represent age and calendaryear respectively. Here represents yearofbirth and hence is a factor created to capture the cohort effect.
Numerical estimation of the cohort model (12), however, is reported in Hunt and Villegas (2015) to produce mixed convergence results based on the regression setup. Robustness of the resulting regressionbased cohort models is also questioned as the goodness of model fit is reported to be sensitive to the data being used and the fitting algorithm. These robustness and convergence problems are also noticed in Currie (2016) where the paper presents a comprehensive approach for mortality modelling based on generalised linear and nonlinear models, see also Section 2.2 for a brief discussion.
These issues thus provide another motivation to consider cohort models in the statespace framework as the statespace method would potentially eliminate the inefficiency caused by the 2step estimation procedure required in other approaches, and in addition doesn’t seem to suffer from the same sensitivity and poor convergence results identified in the aforementioned literature. We report our empirical findings for the estimation and forecasting of statespace cohort models in Section 5.
3.2 Statespace formulation
In this paper we focus on the full cohort model
(13) 
where a dynamics for the crude death rate is being modelled and a noise term is included.
To aid in explaining how we derive a statespace representation of cohort models, we consider a matrix of cells where the row and column corresponds to age () and year () respectively, see Table 2. Here we assume and for illustration. The cohort factor is indexed by the yearofbirth and its value on each cell is displayed in Table 2. We first notice that the value is constant on the “cohort direction”, that is on the cells , and so on.
age/year  

Now consider the cohort model (13) and let . The model can be expressed in matrix form as
(14) 
As time flows from to , the cohort vector , which represents the cohort factor in matrix form, proceeds as
(15) 
The key observation here from (15) is that the first two elements of the cohort vector at time will appear as the bottom two elements of the cohort vector at time . The pattern can also be observed from Table 2. Therefore, the evolution of the cohort vector must satisfy
(16) 
which is in fact a result of the defining property of “cohort”: . Furthermore, it is obvious from (16) that one only needs to model the dynamics of but not and . We will use this observation to derive a statespace formulation of cohort models which is presented next.
Let , in matrix notation we have (recall that )
(17) 
It is clear that, from (17), we have which corresponds to (13) for . Here is the dimensional state vector.
From (15)(16), we can write the state equation in matrix notation as follows:
(18) 
Here we assume is a random walk with drift process ()
(19) 
and the dynamics of is described by a stationary AR(1) process (ARIMA(1,0,0))
(20) 
where . One may consider other dynamics for by specifying the second row of the by matrix in (18). For example, one can consider generally the state equation as
(21) 
where which is an ARIMA(p,0,0) process since , .
We can express the matrix form of (17)(18) succinctly as
(22a)  
(22b) 
where
(23) 
and , the dimensional identity matrix and is a by diagonal matrix with diagonal . For simplicity we assume homoscedasticity in the observation equation; heteroscedasticity can be considered as developed in Fung et al. (2017).
The full cohort model (13) is suffering from an identification problem since the model is invariant to the following transformation
(24) 
where and . The identification problem can be resolved by imposing the following parameter constraints
(25) 
to ensure a unique model structure is identified. Hunt and Villegas (2015) and Currie (2016) provide further discussions on the identifiability issues for cohort models.
Remark 3.1
The observation and state equations (17)(18) imply that the cohort model that we have formulated here belongs to the linearGaussian class of statespace models. As a result one can perform efficient maximumlikelihood or Bayesian estimation on fitting the model to data, see Fung et al. (2017). In this paper we focus on Bayesian inference so that mortality forecasts can take into account parameter uncertainty.
3.3 A simplified cohort model
The cohort model (12) assumes that the impact of the cohort factor on agespecific death rates is modulated by the coefficient . Iterationbased estimation of the cohort model is reported in Cairns et al. (2009) to be suffering from convergence problems. As a result, Haberman and Renshaw (2011) consider to simplify the model structure to
(26) 
That is, the modulating coefficient for the cohort factor is set to be one for all age . It is suggested that the simplified model (26) exhibits better estimation convergence behaviour when fitting the model to mortality data.
Remark 3.2
When fitting cohort models to data based on the Poisson or binomial regression setup via iterationbased estimation, it is typical to assume the starting values of the iteration scheme are coming from the estimates of other similar models, for example the LC model or APC model (Hunt and Villegas (2015), Currie (2016), Villegas et al. (2015)). Even doing so the convergence is not guaranteed. We will report in Section 5 that our approach based on Bayesian statespace framework do not require such an assumption and is able to successfully perform Bayesian estimation for various countries when cohort patterns are present in the data.
In the following, we will also consider the model
(27a)  
(27b) 
where
(28) 
It will be referred to as the simplified cohort model. In addition, the following set of parameter constraints
(29) 
is imposed to ensure the identifiability of the simplified cohort model.
4 Bayesian inference for cohort models
In this section we begin by detailing the Bayesian estimation of the full cohort model in statespace formulation (22). Nested models including the simplified cohort model (27) and the LC model (10) will be discussed in Section 4.2. We first note that the cohort models and the LC model belong to the class of linear and Gaussian statespace models. As a result one can apply an efficient MCMC estimation algorithm based on Gibbs sampling with conjugate priors combined with forwardbackward filtering as described in Fung et al. (2017), this forms a special case of the socalled collapsed Gibbs sampler framework of van Dyk and Park (2008).
4.1 Bayesian inference for the full cohort model
For the full cohort model (22), the target density is given by
(30)  
(31) 
where is the dimensional (for each ) latent state vector and is the dimensional static parameter vector. In order to simplify the notation, we write instead of . We perform block sampling for the latent state via the socalled forwardfilteringbackwardsampling (FFBS) algorithm (Carter and Kohn (1994)) and the posterior samples of the static parameters are obtained via conjugate priors. The sampling procedure is described in Algorithm 1, where is the number of MCMC iterations performed. Note also that the notation is used in Algorithm 1.
In Algorithm 1, after obtaining from line 3, one can impose the constraint by setting where . Similarly, once is obtained from in line 3, setting , where , will ensure the constraint is satisfied. To impose the constraint , we set , where , once is obtained from line 57. Constraint for can be imposed similarly.
4.1.1 Forwardbackward filtering for latent state dynamics
The FFBS procedure requires to carry out multivariate Kalman filtering forward in time and then sample backwardly using the obtained filtering distributions. For the full cohort model (17)(18), the conditional distributions involved in the multivariate Kalman filtering recursions are given by
(32a)  
(32b)  
(32c)  
(32d) 
where
(33a)  
(33b)  
(33c) 
for . Since
(34) 
we see that for a block sampling of the latent state, one can first draw from and then, for (that is backward in time), draws a sample of recursively given a sample of . It turns out that where
(35a)  
(35b) 
based on Kalman smoothing (Carter and Kohn (1994)).
4.1.2 Posteriors for static parameters
To sample the posterior distribution of the static parameters in Algorithm 1, we assume the following independent conjugate priors:
(36a)  
(36b)  