Risk Margin Quantile Function Via Parametric and NonParametric Bayesian Quantile Regression
Abstract
We develop quantile regression models in order to derive risk margin and to evaluate capital in nonlife insurance applications. By utilizing the entire range of conditional quantile functions, especially higher quantile levels, we detail how quantile regression is capable of providing an accurate estimation of risk margin and an overview of implied capital based on the historical volatility of a general insurers loss portfolio. Two modelling frameworks are considered based around parametric and nonparametric quantile regression models which we develop specifically in this insurance setting.
In the parametric quantile regression framework, several models including the flexible generalized beta distribution family, asymmetric Laplace (AL) distribution and power Pareto distribution are considered under a Bayesian regression framework. The Bayesian posterior quantile regression models in each case are studied via Markov chain Monte Carlo (MCMC) sampling strategies.
In the nonparametric quantile regression framework, that we contrast to the parametric Bayesian models, we adopted an AL distribution as a proxy and together with the parametric AL model, we expressed the solution as a scale mixture of uniform distributions to facilitate implementation. The models are extended to adopt dynamic mean, variance and skewness and applied to analyze two real loss reserve data sets to perform inference and discuss interesting features of quantile regression for risk margin calculations.
School of Mathematics and Statistics
The University of Sydney, NSW 2006, Australia
[0pt] email: xdon0433@uni.sydney.edu.au, (Corresponding Author)
[0pt] Department of Statistical Science,University College London UCL, London, UK;
[0pt]
Asymmetric Laplace distribution, Bayesian inference, Markov chain Monte Carlo methods, Quantile regression, loss reserve, risk margin, central estimate.
1 Background on Risk Margin Calculation
A core component of the work performed by general insurance actuaries involves the assessment, analysis and evaluation of the uncertainty involved in the claim process with a view to assessing appropriate risk margins for inclusion in insurance liabilities. An appropriate valuation of insurance liabilities including risk margin is one of the most important issues for a general insurer. Risk margin is the component of the value of claims liability that relates to the inherent uncertainty.
The significance of this task is well understood by the actuarial profession and has been debated by both practitioners and academic actuaries alike. Much of the attention involves the non prescriptive nature of risk margin requirements discussed in regulatory guidelines such as Article 77 and Article 101 of the Solvency II Directives. In Australia a general task force was established, developing a report on risk margin evaluation methodologies presented to the Australian actuarial profession at the Institute of Actuaries of Australia during the 16th General Insurance Seminar in 2008. This report aimed to highlight approaches to risk margin calculations that are often considered. Before briefly discussing these aspects we first note the following Solvency II items which relate to the Solvency Capital Requirement (SCR) and the risk margin.
Article 101 of the Solvency II Directive states,
“ The Solvency Capital Requirement (SCR) shall correspond to the ValueatRisk (VaR) of the basic own funds of an insurance or reinsurance undertaking subject to a confidence level of 99.5% over a oneyear period. ”
Essentially, the basic own funds are defined as the excess of assets over liabilities, under specific valuation rules. In this regard, a core challenge is the capital marketconsistent value of insurance liabilities, which requires a best estimate typically defined as the expected present value of future cash flows under Solvency II plus a risk margin calculated using a costofcapital approach.
Furthermore, under Article 77 of the 2009 Solvency II Directive it states that the risk margin calculation is described as
“ The risk margin shall be such as to ensure that the value of the technical provisions is equivalent to the amount insurance undertakings would be expected to require in order to take over and meet the insurance obligations… â¦ it shall be calculated by determining the cost of providing an amount of eligible own funds equal to the Solvency Capital Requirement necessary to support the insurance obligations over the lifetime thereof… ”
As can be seen from such specifications, the recommendations to be adopted are not prescriptive in the required model approaches. Therefore, as discussed in the white paper produced by the Risk Margins Taskforce 1998, there have been several approaches considered which range from those that involve little analysis of the underlying claim portfolio to those that involve significant analysis of the uncertainty using a wide range of information and techniques, including stochastic modelling. They highlighted approaches adopted in practice in the assessment of risk margins and pointed to percentile or quantile methods as being most prevalent in practice, this provides a good foundation for the methods we consider.
Traditionally, actuaries that adopt a stochastic framework would evaluate claims liability using a central estimate which is typically defined as the expected value over the entire range of outcomes. However with the inherent uncertainty that may arise from such an estimator which is not statistically robust and therefore sensitive to outlier claims, claims liability measures often differ from their central estimates. In practice, the approach adopted is typically to then set an insurance provision so that, to a specified probability, the provision will eventually be sufficient to cover the runoff claims. For instance, in order to satisfy the requirement of the Australian Prudential Regulation Authority (APRA) to provide sufficient provision at a 75% probability level, the risk margin should be modelled statistically so that it can capture the inherent uncertainty of the mean estimate. When this margin is then added to the central estimate, it should provide a reasonable valuation of claims liability and therefore increases the likelihood of providing sufficient provision to meet the level required in GPS 320. In this regard, it is worth noting that the more volatile a portfolios runoffs or those that display heavy tailed features may require a higher risk margin, since the potential for large swings in reserves is greater than that of a more stable portfolio.
To accommodate these ideas, two common methods for risk margin estimation have been proposed in practice. These are the cost of capital and the percentile methods. Under the cost of capital method the actuary determines the risk margin by measuring the return on the capital required to protect against adverse development of those unpaid claim liabilities. It is evident that application of the cost of capital method requires an estimate of the initial capital to support the unpaid claim liabilities and also the estimate of return on that capital. Alternatively, under the percentile or quantile method that we consider in this paper, which is currently used in Australia the actuary takes the perspective that the insurer must be able to meet its liability with some probability under some assumptions on the distribution of liabilities. Risk margin is then calculated by subtracting the central estimate from a predefined critical percentile value.
What we bring to the percentile and quantile based framework in our proposed methods is the ability to incorporate in a rigorous statistical manner, regression factors that may be related to both exogenous features directly related to the insurance claims runoff stochastic process as well as endogenous factors that are related to for instance the current micro or macro economic conditions and the regulatory environment. These will be incorporated into a statistical model that allows one to explain the proportion of variation in the risk margin that is attributed to such features in a principled manner, as we shall demonstrate allowing for accurate estimation and prediction. We argue that since the percentilebased method involves the estimation of quantiles, it is therefore somewhat natural to consider quantile regression, which is a statistical technique to estimate conditional quantile functions, which can be used to estimate risk margin.
Just as classical linear regression methods based on minimizing sums of squared residuals enable one to estimate models for conditional mean functions, quantile regression methods offer a mechanism for estimating models for the conditional median function, and the full range of other conditional quantile functions. This model allows studying the effect of explanatory variables on the entire conditional distribution of the response variable and not only on its center. Hence we may develop factors and covariates which are explanatory of the risk margin variation directly through the proposed quantile regression framework. By supplementing the estimation of conditional mean functions with techniques for estimating an entire family of conditional quantile functions, quantile regression is capable of providing a more complete statistical analysis of the stochastic relationships among random variables.
Quantile regression has been applied to a wide range of applications in economics and finance, but has not yet been developed in a claim reserving context for risk margin estimation. We will demonstrate the features of quantile regression that have been popularized in finance and explain how they can be adopted in important applications in insurance, such as risk margin calculations. In quantitative investment, least square regressionbased analysis is extensively used in analyzing factor performance, assessing the relative attractiveness of different firms, and monitoring the risks in their portfolios. Engle and Manganelli (2004) consider the quantile regression for the Value at Risk (VaR) model. They construct a conditional autoregressive value at risk model (CAVaR), and employ quantile regression for the estimation. The risk measure, VaR is defined as a quantile of the loss distribution of a portfolio within a given time period and a confidence level. Accurate VaR estimation can help financial institutions maintain appropriate capital levels to cover the risk from the corresponding portfolio.
Taylor (2006) estimate percentilebased risk margins via a parametric model based on the assumption of a log normal distribution of liability. Other sophisticated distributions to capture flexible shapes and tail behaviors are also proposed to model severity distribution on aggregated claim data. These distributions include the generalized (McDonald and Newey, 1988), Pareto (Embrechts et al., 1997), the Stable family (Paulson and Faris, 1985; Peters, Byrnes and Shevchenko 2011; Peters, Shevchenko, Young and Yip, 2011), the Pearson family (Aiuppa, 1988), the loggamma and lognormal (Ramlau and Hansen, 1988) and the lognormal and Burr 12 (Cummins et al., 1999), and type II generalized beta (GB2) distribution (Cummins et al., 1990, 1999, 2007). While these distributions on real support are flexible to model both leptokurtic and platykurtic data, they require logtransformation for claims data and the resulting loglinear model may be more sensitive to low values than large values (Chan et al., 2008).
In Peters, Wuethrich and Shevchenko (2009) they adopt a PoissonTweedie family of models which incorporates families such as normal, compound poisson Gamma, positive stable and extreme stable distributions into a family of models. It was shown how such a generalized regression structure could be used in a claims reserving setting to model the claims process whilst incorporating covariate structures from the loss reserving structure. In this instance a multiplicative structure for the mean and variance functions was considered and quantiles were derived from modelling the entire distribution, rather than specifically targeting a model at the conditional quantile function.
Recently, in Dong and Chan (2013) an alternative class of flexible skew and heavy tail models was considered involving the GB2 distribution with positive support adopting dynamic mean functions and mixture model representation to model long tail loss reserving data and showed that GB2 outperforms some conventional distributions such as Gamma and generalised Gamma. The GB2 distribution family is very flexible as it includes both heavytailed and lighttailed severity distributions, such as gamma, Weibull, Pareto, Burr12, lognormal and the Pearson family, hence providing convenient functional forms to model claims liability. From the perspective of quantile specific regression models, recently Cai (2010) proposed a powerPareto model which allows for flexible quantile functions which can provide a combination of quantile functions for both power and Pareto distributions. These combinations enable the modelling of both the main body and tails of a distribution.
The difference with our current methodology is that instead of developing a statistical model to capture all features of the claims runoff stochastic structure, with the incorporation of regression components, we propose, in this work, to target explicitly the conditional quantile functions in a regression structure. From a statistical perspective, this is a fundamentally different approach to these previously mentioned reserving model approaches. However we will illustrate that we can borrow from such models in developing our risk margin quantile regression framework. In fact the associate parameter estimation loss functions, parameter estimator properties and the resulting quantile in sample and out of sample forecasts will significantly differ to those achieved when trying to develop a model for the entire process rather than targeting the quantity of interest in this case, the particular quantile level. This is clear from the perspective that only under a Gaussian distributional assumption for such reserve models (on log scale) would a standard least squares approach be optimal from the perspective of GaussMarkov theory. In situations where returns are heavy tailed and skewed alternative models will prove more appropriate as we will discuss.
Traditional approaches, both frequentist and Bayesian, to quantile regression have involved parametric models based on the asymmetric Laplace (AL) distribution. Using asymmetric Laplace distribution provides a mechanism for Bayesian inference of quantile regression models. Hu et al. (2012) develop a fully Bayesian approach for fitting singleindex models in conditional quantile regression. The benefit of using a Bayesian procedure, lies in the adoption of available prior information and the provision of a complete predictive distribution for the required reserves (de Alba, 2002). Different Bayesian loss reserve models have been proposed for different types of claims data. Zhang et al. (2012) propose a Bayesian non linear hierarchical model with growth curves to model the loss development process, using data from individual companies forming various cohorts of claims. Ntzoufras and Dellaportas (2002) investigate various models for outstanding claims problems using a Bayesian approach via Markov chain Monte Carlo (MCMC) sampling strategy and show that the computational flexibility of a Bayesian approach facilitated the implementation of complex models.
1.1 Contributions
The contribution of this paper is threefold. First, we propose using quantile regression for loss reserving. The proposed method, relating the provision to quantile regression allows a direct modelling of risk margin, and hence provision, instead of estimating the mean then applying a risk margin. It provides a richer characterization of the data, especially when the data is heavy tailed, allowing us to consider the impact of a covariate on the entire distribution, not merely its conditional mean. Secondly, we develop a range of parametric quantile regression models in Bayesian framework, each with their own distribution features. Especially, in particular we generalize the AL distribution model to incorporate a dynamic mean, variance and the shape parameters to model risk margin via a user friendly Bayeisan software WinBugs, which is easy for users without much Bayeisan background or specialized knowledge of Markov chain Monte Carlo (MCMC) methodology. Furthermore, the estimation of shape parameter by accident year gives us an analytical framework to estimate risk margin. This allows us to capture the feature that the cohort of claims in different accident year may be heterogeneous, and hence applying different different risk margin to different accident year gives us an explicit provision in reserving. Finally, we compare the performance of parametric and nonparametric quantile regressions in the context of loss reserving.
The rest of the paper is organized as follows. Section 2 explains the parametric and nonparametric models proposed. Section 3 presents the posterior quantile regression models in a Bayesian framework. Section 4 details the way to calculate risk measures and risk margin using our models. Then we apply the methodology to two real loss reserve data sets in Section 5 and 6. Section 7 concludes.
2 Quantile Regression for Claims Reserving
In this section, we present quantile regression models and explain their relevance to loss reserving, this will be undertaken in both a nonparametric and a parametric modelling framework under the Bayesian paradigm. In the process we propose a novel analytical approach to perform estimation of the risk margin under various quantile regression model structures. Of particular focus in this paper is the class of models based on the Asymmetric Laplace (AL) distributional family. In the special case of the AL distribution we demonstrate that risk margin estimation is achieved naturally through the modelling the shape parameters of the AL distribution and hence the inference on the model parameters directly informs the inference of the risk margin.
In developing a quantile regression framework for general insurance claims development triangles we will assume that there is a runoff triangle containing claims development data in which will denote the cumulative claims with indices and , where denotes the accident year and denotes the development year (cumulative claims can refer to payments, claims incurred, etc). Furthermore, without loss of generality, we make the simplifying assumption that the number of accident years is equal to the number of observed development years, that is, with observations. At time the index set in the upper triangular is
(1) 
and for claims reserving at time the index set to predict the future claims in the lower triangle is:
(2) 
Therefore the vector of observed in the upper triangle is given by and the corresponding vector of covariates is denoted by . Similarly and are the vectors of claims and covariates in the lower triangle.
In the quantile regression structures we will aim to make inference on the quantile function of the data within sample, in each cell of as well as predictive outoff sample quantile function estimation based on the claim cells in in lower triangle. The estimation of the quantile function regression has three main components:

The conditional distribution and in this case conditional quantile function of the dependent variables given by the claims data, given the explanatory variables.;

The structural component of the regression structure based on the link functions and imposed model structures linking the regression structures with the covariates to the location and scale of the conditional distribution and conditional quantile functions of the response.;

The actual choice of independent variables i.e. the covariates in the regression model, in this case we will also consider some basis function regression structures in some of the models proposed.
In the following subsections we discuss each of these components in term, starting with the distributional aspects of the quantile regression models we consider.
2.1 Nonparametric Quantile Regression Models
In a nonparametric quantile regression approach, we perform estimation of regression coefficients without the need to make any assumptions on the distribution of the response, or equivalently the residuals. If is a set of observed losses and is a vector of covariates that describe . The quantile function for the log transformed data is
(3) 
where is the quantile level, are the linear model coefficients for quantile level which are estimated by solving
(4) 
and . Then the quantile function for the original data is . Koenker and Hallock (2001) illustrate the loss function for quantile regression as we represent in Figure 1.
Koenker and Machado (1999) and Yu and Moyeed (2001) show that the solution to minimization of the loss function in equation (4) for estimating the parameter vector is equivalent to maximum likelihood estimation of the parameters of the AL distribution. Hence, the parameter vector can be estimated via an AL distribution with pdf
(5) 
where the skew parameter gives the quantile level , is the scale parameter and is the location parameter. Since the pdf (5) contains the loss function (4), it is clear that parameter estimates which maximize (5) will minimize (4).
In this formulation the AL distribution represents the conditional distribution of the observed dependent variables (responses) given the covariates. More precisely, the location parameter of the AL distribution links the coefficient vector and associated independent variable covariates in the linear regression model to the location of the AL distribution. It is also worth noting that under this representation it is straightforward to extend the quantile regression model to allow for heteroscedasticity in the response which may vary as a function of the quantile level under study. To achieve this one can simply add a regression structure linked to the scale parameter in the same manner as was done for the location parameter.
Equivalently, we assume conditionally follows an AL distribution denoted by . Then
(6) 
where , and . Discussion on the choice of link function and structure of regression terms will be undertaken in later sections. In presenting the model in this fashion we already start to move towards the representation of a parametric quantile regression structure.
2.2 Parametric Quantile Regression Models
Alternatively, we may adopt a parametric approach to study the quantile regression structure. Two types of distributions, on real support or positive support can be considered and we begin with distributions on . In this case, we assume that where is the conditional cumulative distribution function (cdf) and is a vector of model parameters including all unknown coefficient parameters and distributional parameters. The quantile function for the conditional distribution of given at a quantile level is given by:
(7) 
Under this formulation, the conditional quantile function in (7) can be written as
(8) 
where is the inverse cdf for the standardized variable and again one may incorporate regression structures given as follows for location and scale functions:
(9)  
(10) 
To transform the quantile function back to the original scale of the data , we suggest . We note that there is no unique way to transform the quantile function for back to and the proposed transformation does not equal in general to the quantile function for the logAL distribution.
Remark: We observe that the difference between the nonparametric and the parametric quantile regression models is that in the parametric structure we make explicit the quantile function of the “residual” denoted by .
For distributions on , we assume that with mean where is given in (9). Next we make explicit several possible parametric models one may consider in quantile regressions for risk margin. Each model has different associated properties with regard to the relationship of the skewness, kurtosis and heaviness of the tail that it imposes on the quantile function of the response given the covariates.
2.2.1 Asymmetric Laplace Distribution
As discussed above, the AL distributional family is a useful model structure which naturally fits into a quantile regression framework. As made explicit above, the AL distribution is a three parameter distribution which has been shown to be directly linked to the estimation of quantiles in a quantile regression framework, see further details in Yu and Zhang (2005).
Since this realization, the AL family has been utilized in several financial risk and econometric settings such as Guermat and Harris (2001) who use the symmetric laplace distribution with GARCH volatility to model shorthorizon asset returns. Lu et al. (2010) extend this to allow skewness via AL distribution. Yu and Moyeed (2001) apply AL distribution for quantile regression purposes, though as yet, no such developments have been made in the insurance and particularly the risk margin context. Here we propose such a model for risk margin estimation.
If we model the residuals by an AL distribution, the quantile function for observed data is given by (8) where is the inverse cdf (quantile function)
(11) 
To understand how the three location, shape and scale parameters of the AL distribution affect the shape and tails of the distribution it is also useful to note the following relationship between the parameters and the mean, variance, skewness and kurtosis of AL distribution:
(12)  
(13) 
Note that the shape parameter of the AL distribution gives the magnitude and direction of skewness. AL distribution is skewed to left when and skewed to right when and hence it can model the left skewness of most log transformed loss data directly through this shape parameter . Moreover as the risk margin adopted in insurance industry is mostly greater than 50 percent, AL distribution allows the calculation of quantiles rather than mean estimates fairly easily. Figures 2(a) and 2(b) show a variety of pdf for AL distribution and its skewness and kurtosis respectively.
2.2.2 Power Pareto Model
As the second choice of parametric quantile regression model we consider the framework of Cai (2010). In this approach a polynomial powerPareto (PP) quantile function model is developed. This model combines a power distribution with a Pareto distribution, which enables us to model both the main body and the tails of a distribution. In considering the PP model the conditional quantile function of the response (reserve in each cell) are comprised of two components:

component 1: a power distribution where and with a corresponding quantile function then given by for ; and

component 2: a Pareto distribution function where and with a corresponding quantile function then given by .
One may use the fact that the product of the two quantile functions will remain a strictly valid quantile function producing the new quantile function family known as the PolynomialPower Pareto model. The resulting structural form given by the inverse cdf of the Pareto distribution with an additional polynomial power term:
(14) 
Hence the quantile function is again given by (8) where and .
From the specification of this quantile function, one may then derive the resulting pdf of the PP model for which is given by
where is given by solving the system of equations defined for each observation by
(15) 
where again we treat the location in (9) and scale in (10) as functions of the regression coefficients and associated covariates. We note that in this case the is really an implicit function of the regression structure as each is found as the solution to the system of equations in (15).
To complete the specification of the polynomial power Pareto model we plot the shape of the density that can be obtained for a range of different power parameters for the power and pareto components with a unit scale factor . These plots in Figure 3 demonstrate the flexible skewness, kurtosis and tail features that can be obtained from such a model by varying the parameters and .
2.2.3 Generalised Beta Distribution of the Second Type Family
We note that the AL and PP families of quantile regression models require a log transformation of the data before the modelling to ensure the data has real support that these distributions are defined upon. In performing this transformation, one must analyze carefully the effect of the transformation on the ability to fit such models and the resulting model interpretability must be interpreted with regard to the transformation. This is particularly the case if zero counts are present in the data for some accident and development years. Moreover, in the context of claims reserving, loss data often exhibits heavytailed behavior, particularly for long tail business classes. To account for such features and to remove the need to consider pretransformation of the data one may consider the family of generalized beta (GB2) distributions of the second kind.
The type two generalized beta distribution (GB2) has attractive features for modelling loss reserve data, as it has a positive support and nests a number of important distributions as its special cases. The GB2 distribution has four parameters, which allows it to be expressed in various flexible densities. See Dong and Chan (2013) for a more detailed description of GB2 distribution including its pdf and distribution family.
If conditionally follows a GB2 distribution, then it can be characterized by the density given by
(16) 
where and are shape parameters and is the scale parameter.
In particular, can be linked to the mean of the distribution as follows:
(17) 
where is loglinked to a linear function of covariates in (9) according to the relationship:
(18) 
Then the variance is given by:
(19) 
The GB2 distribution is a generalization from the beta distribution with pdf:
(20) 
via the transformation . Hence the cdf of GB2 distribution is given by:
(21) 
where is the incomplete beta function.
The GB2 is directly relevant for quantile regression models since one may also find its quantile function in closed form according to the following expression:
(22) 
There are many widely known and utilized subfamilies of the GB2 family, we present two examples of relevance to the context of risk margin estimation that we will explore, corresponding to the generalized gamma and the gamma distribution subfamilies.
2.2.4 Two Special Cases of GB2
To understand the flexibility of the GB2 family of models, we consider the case when , then the resulting GB2 distribution subfamily becomes the generalized gamma (GG) distribution, see discussion in McDonald et al. (1984). The GG family of models was independently introduced by Stacy (1962), as a three parameter distribution with pdf given by:
(23) 
where and are shape parameters and is scale parameter linked to the mean of the distribution as:
(24) 
and the mean is again loglinked to a linear function of covariates in (18). The cdf is
where is the lower incomplete gamma function and . Hence, the quantile function is given by:
(25) 
The second case is nested within the GG family and corresponds to the two parameter Gamma distribution which is obtained by further restricting . Its pdf and quantile function are wellknown and can be expressed using equations (23) and (25) by replacing with 1.
Having defined clearly the three different quantile regression distributional families that will be considered in the parametric quantile regression framework, we now introduce the different regression structures we consider in the quantile regression under each distributional assumption.
2.3 Structural Components of the Quantile Regression Framework
In the model structures we will adopt, as is standard practice in regression modelling, once we believe we have suitable explanatory variables for the dependent variable quantity of interest, in this case the conditional quantile function, we will assume the observations are independent.
In the following subsections we explain how under each different distributional assumption for the conditional quantile regression structure, one may introduce a link function to relate regression models using independent covariates to the response quantiles in order to model trend behaviors in the location and scale of the quantile function. To simplify all the possible different model considerations we consider only log link functions in all regressions.
The possible regression structures we consider will be classified as: location based explanatory factors i.e. trends in accident and development years; and scale (heteroskedascity / variance) based explanatory factors for accident and development years. We note that when it comes to different distributional choices since we may transform the observations, we are actually considering both additive and multiplicative (mixed interaction) terms in our regressions and as such we explore aspects of ANOVA as well as ANCOVA regression structures in the quantile regression setting. A summary of the model structures we consider for the location and scale components of each model is provided in Table 8 in Appendix 1. We note that in general one may consider that a version of the ANCOVA model was applied to the PP and AL models and a version of the ANOVA model was effectively applied to the AL and GB2 families. In addition we will allow the influence of covariates to affect different quantile levels to different extents, making for an interesting analysis on the effect of model structure on quantile level.
We note that since the focus of this manuscript differs to that undertaken in the PoissonTweedie regression context of Peters, Shevchenko and Wuethrich (2009), in that the focus of the regression model comparison will be primarily concerned with the model choice for the distributional form of the conditional quantile function, not so much on the model structure uncertainty related to all possible covariate model subspace structures and nested models, therefore we limit the analysis to the ANOVA and ANCOVA structures given below. If one is interested in specialized techniques to explore and compare all possible models subspaces within each distributional model, we suggest the approach adopted in Peters, Shevchenko and Wuethrich (2009) or recently in Verrall, R.J. and Wuthrich, M. (2013)
2.3.1 Location: Development and Accident Year Trend Model Structures
The primary sets of covariates we consider correspond to the accident year and the development year in the claims reserving structure, as well as transformations of these through basis functions. From Table 8 one may observe that we label models using two subscripts according to their mean and variance functions respectively. Models 0 (denoted by ) and 1 (denoted by ) are parsimonious location structure specifications for the general model in (9) with , that is, the additive structure is given by:
(26)  
(27) 
Under one assumes a linear trend across accident and development years. If a nonlinear trend across development years is considered with an assumption of common behavior down the accident years, on may consider which is a basis regression model popular in term structure models and known as the NelsonSeigel model (Nelson and Siegel, 1987). Examples of typical basis functions we considered under this choice for the location are given in Figure 4 below, where we show the ‘level’, ‘slope’ and ‘curvature’ structure of the location trend from such a model.
In the context of an ANOVA model specification for the location one can assume a form given by:
(28) 
This location (trend) function corresponds to the general model in (9) with ,
The parameters and denote the accident year and development year effects respectively and they satisfy the following constraints:
(29) 
This parametrization is set up in the context of loss reserving so that all parameters are relative to the first accident year which has the most information. These location functions (26) to (28) apply to both AL and PP distributions in general. For Gamma, GG and GB2 distributions with positive support , a log link function is considered and the location functions become . When the AL distribution, with the shape parameter is applied, Model 3 () corresponds a nonparametric quantile function
(30) 
where are parameters at quantile level .
2.3.2 Scale: Development and Accident Year Variance Model Structures
There are different choices for the structure of the variance function for the AL and PP distributions but Gamma, GG and GB2 distributions do not have a component to model directly. Model 0 () assumes homoscedastic variance . Models 0 () to 3 () are specified below:
(31)  
(32)  
(33)  
(34) 
where the parameters and which denote the accident year and development year effects respectively satisfy the following constraints:
(35) 
Again Models 1 to 2 corresponds to (10) with and . Furthermore, for Model 23’, the shape parameter in the AL distribution is further modelled by the accident year effect, which is specified as follows:
(36) 
where the parameters denote the accident year effect and satisfy the following constraints:
(37) 
3 Bayesian Framework: Posterior Quantile Regression
The estimation of quantile regression models is straightforward to adopt under a Bayesian formulation. One of the key advantage of using Bayesian procedures for practical models such as those we develop above lies in the adoption of available prior information and the provision of a complete predictive distribution for the required reserves (de Alba, 2002).
To complete the posterior distribution specification in each model it suffices to consider the representation of two components: the likelihood of the data for the regression structure (that is, the density not the quantile function); and the prior specifications for the model parameters. In the above sections, the quantile function of the likelihood is presented, along with the associated density for the observations conditional upon the parameters and covariates, that is, the likelihood for each model. Therefore, to formulate the Bayesian structure we simply need to present the prior structures we consider for the parameters in each model. This will be relatively straightforward for models formed from the AL distribution structure and the GB2 structures, but not so trivial for the case of the PP model.
In the real data examples we consider below, we adopt an objective Bayesian perspective in which we consider relatively uninformative priors. This reflects our lack of prior knowledge for the model parameters likely ranges or magnitudes. For instance, the priors for parameters (coefficients) in mean, variance and skewness quantile regression functions are all selected as Gaussian:
(38) 
and for the shape parameters of the GB2 distribution are:
(39) 
Normal and gamma distributions are standard choices of priors for parameters with a real and positive support respectively, see discussions on possible choices in Denison et. al. (2002). In the case of the AL and GB2 models, these priors combined with the resulting likelihoods produce in each case standard and well defined posterior distributions.
In the case of the PP model one has to be careful to define the posterior support to ensure the resulting distribution is normalized and therefore a proper posterior density. To ensure this is the case one must impose constraints on the posterior support which can be uniquely characterized by the three sets of parameter space constraints , and , for coefficient vectors , and respectively, given by:
(40) 
Under these parameter space restrictions the resulting posterior for the PP model can be shown to be well defined as a proper density, see a derivation and proof in Theorem 1 of Cai (2010).
In Cai (2010) they consider an MCMC scheme for the resulting posteriors based on standard MetropolisHastings steps with rejection when the proposed parameter values fail to satisfy the posterior support constraints. In general this results in a very slowly mixing MCMC chain which will have very poor properties. We replace this idea with simple block Metropolis within Gibbs updates which allow for smaller moves in each component of the constrained posterior support making it more likely to satisfy the constraints and also simpler to design and tune the proposal for the MCMC scheme. This was a significant improvement compared to the approach proposed in Cai (2010). We implement this sampler in R. For the other Bayesian models from the AL and GB2 models, sampling from the intractable posterior distributions involved the Gibbs sampling algorithm (Smith and Roberts, 1993; Gilks et al., 1996) and MetropolisHastings algorithm (Hastings, 1970; Metropolis et al., 1953) are the most popular MCMC techniques. For readers who are less familiar with Bayesian computation techniques, we suggest the WinBUGS (Bayesian analysis Using Gibbs Sampling) package, see Spiegelhalter et al. (2004). The MCMC algorithms that are implemented for each model in WinBugs and R are available upon request.
In the Gibbs sampling scheme, a single Markov chain is run for 60,000 to 110,000 iterations, discarding the initial 10,000 iterations as the burnin period to ensure convergence of parameter estimates. Convergence is also carefully checked by the history and autocorrelation function (ACF) plots. The every 10th simulated values from the Gibbs sampler after the burnin period are sampled to mimic a random sample of size 5000 to 10,000 from the joint posterior distribution for posterior inference. Parameter estimates are given by the posterior means.
4 Quantile Prediction for Risk Measures, Risk Margin
As discussed in the introduction, the predicted reserves are typically performed in a claims reserving setting by predicting the mean reserve in each cell in the lower triangle . Other methods for reserving may involve the quantification of a risk measure based on the distribution of the predicted reserves, in place of the mean predicted reserve, such as VaR, Expected Shortfall (ES) or Spectral Risk Measures (SRM), see discussions in the tutorial review of Peters et.al. (2013). In addition, in order to quantify the uncertainty in a central measure for the predicted reserve, one may alternatively take the central measure of reserve and make a risk margin adjustment based on the distribution of the predicted reserves in the form of a quantile function.
When calculating any of these required measures for the resulting total outstanding reserves one requires to first obtain the predictive density, which under the Bayesian setting can be obtained for instance in one of the following two ways for each :

Full Predictive Posterior Distribution:
Here, all posterior parameter uncertainty is integrated out of the predictive distribution.

Conditional Predictive Posterior Distribution:
where the point estimator contains the information from the upper triangle. Examples of common estimators include the posterior mean or mode .
Using these predictive distributions, one may also be interested in quantities such as the distribution of the total outstanding claim given by the sum of the losses in the lower triangle according to the random variable which has distribution given under the full predictive posterior distribution according to convolution given by
(41) 
where, one convolves the distributions for the loss elements in the lower triangle with the convolution operator. One can then state several features about the tail behavior of the total loss distribution and also therefore of the high quantiles as , depending on the properties of the individual loss random variables in the sum. For instance, if one has loss distributions on then one can obtain the lower bound given by
(42) 
for some . Note, if at least one of the lower triangle losses is distributed according to a heavy tailed loss distribution, such as subexponential, regularly varying or long tailed loss distributions then one can find the precise value for . For instance if the total loss is maxsum equivalent, then , see definitions for regular variation, subexponential, long tailed and maxsum equivalence in Bingham et al. (1989) and in the context of insurance and quantile function approximations as discussed here, see the recent tutorial and references therein from Peters et al. (2013).
These conditional predictive distributions can be obtained for any model approximately by solving the integrals using the Markov chain Monte Carlo samples obtained from the posterior . Then, given a predictive distribution, one can then find quantile functions according to the following approaches:

Full Predictive Posterior Quantile Function: is given by which is the solution to the second order ordinary differential equation:
which is obtained by twice differentiating the following identity:
(43) The solution to this second order ordinary differential equation can often be found in the form of a power series, see discussions in Gyorgy and Shaw (2008).

Conditional Predictive Posterior Quantile Function:
(44) which is the most convenient choice that we recommend since the inverse of the predictive distribution in this case takes the closed form expressions for the particular model considered as detailed in Section 2.2.

Conditional Total Reserve Posterior Quantile Function: In many cases one is also interested in finding the quantile function of the distribution corresponding to the total reserve, which under conditional independence is given by where this is given by the quantile function of the distribution in Equation 41. In general finding the convolution and inverse of this convolved distribution must be done numerically. There are many basic results known about these quantities such as asymptotic results and bounds for different properties of light and heavy tailed random variables, independent or dependent, see a discussion in Kaas et al. (2000).
Light Tailed Runoff for Claims Process: In the case in which no loss cells in the claims triangle are heavy tailed, then in general one would need to approximate the tail quantile for the partial sum of all losses. In Kaas et al. (2000) they study partial sums of random variables with no assumption of independence or of identical marginal distributions. The only assumption is that the tails are not so heavy for each marginal, such that each marginal has finite mean. It will be useful to recall that for two random variables and , proceeds under convex ordering iff for all convex real functions with finite expectations one has
(45) Thus, two random variables X and Y with equal mean are convex ordered if their cdfs cross once.
Then one can show that in such cases for any sequence of loss distributions the following convex order relationship holds
(46) for , see derivations in Goovaerts et al. (2000). This result means that the total loss in the convex order sense, comprised of the most risky joint vector of losses with given marginals, has the comonotonous joint distribution. The components of which are maximally dependent since all components are nondecreasing functions of a common random variable .
Hence, we consider the following quantile function approximation for the total loss based on the most conservative estimate using the above bound, given by
(47) Note, in the case of heavy tailed losses this can be refined for large quantiles as follows.
Heavy Tailed Runoff For Claims Process: Alternatively, if additional features of the loss distributions in the lower triangle are known, such as these loss models contain at least one heavy tailed loss distribution, then one can bound the total quantile function result. This can be done conservatively by instead considering the fold convolution of the distribution, say which correspond to the loss distribution amongst all the lower trianglular loss elements with the dominant index of regular variation (that is, with the heaviest tails). In such cases it would be popular to utilize an asymptotic result for the quantile function of the sum, as the quantile level becomes large . For instance, one could use the first order or second order asymptotic results, see discussions in Peters et al. (2013) and Cruz et al. (2014). As an example, if the quantile regression was structured such that the distribution of the partial sum is regularly varying with index with conditionally i.i.d. with each taking positive support, then one can write the first order tail approximation which is asymptotically equivalent to the following
(48) see detailed tutorial in Peters et al. (2013). This would lead to the approximation of the required quantile asymptotically by the expression
(49)
5 Model Structure Analysis for Israel data
In this section we perform two core studies: The first involves isolating the structural components for the quantile regressions, in order to perform a study on the mean function and variance functions that are most suitable for an example of a representative claims reserving data set. This is therefore performed using the nonparametric and Bayesian formulations of the AL model with different assumptions on the mean and variance functions. The second involves isolating the distributional choices of the quantile regression, where we take the best fitting parametric model mean and variance function structures and use these to study distributional properties under the different quantile function choices.
The data set used throughout this section is interesting for such a benchmark exercise as it has been previously studied and its features are reasonably well known, see Chan et al. (2008) for more details on the Israel Data set. The data is available in Figure 18 in Appendix I and represents the paid out claim amounts for an Israel insurance company, covering periods from 1978 to 1995, containing 171 observations. For mathematical convenience, two zero claim amounts have been replaced with 0.01. Some general trends are observed in this data. Given an accident year, the claim development amounts generally increase between the first 4 to 6 development years then this increase is followed by a generally decreasing trend thereafter. The mean, median, variance and kurtosis of this data are 4459.7, 3,871, 12,059,232.6 and 0.4 respectively. The overall skewness is 0.58 and on a log scale is 2.67.
This data has been studied in Chan et al. (2008) using the generalized (GT) distribution expressed as scale mixtures of uniform which facilitates the Bayesian implementation. They adopt the ANOVA and ANCOVA mean structures to study the accident year and development year effects on the conditional mean functions but not on any quantile level. Moreover they also remark that the log transformed data become negatively skewed which the symmetric GT distribution fails to accommodate. Hence, they suggest to adopt some skewed error distributions to improve the model performance.
Our primary point of departure for these previous studies on this data is the conjecture that using a measure of average effects may not be appropriate for understanding loss reserves at higher quantiles. Higher quantile projection is critical in loss reserving, for reinsurance premium calculations and also in deriving the risk margin. In this section, we use all the models in Section 2 for quantile projection with an aim to provide a more comprehensive study on model performance with a wide range of distributions having different tails behavior and model structures for the quantile trends and heteroscedasticity in the accident and development years.
5.1 Analysis of Quantile Regression Models: Location and Scale
To investigate the model structures for location (mean) and scale (variance) functions, we consider two settings: the first class of models involves the parametric models using the AL distribution with either fixed (denoted by fix) or left to be estimated (denoted by est), the mean functions given by (26) to (28) and variance being constant (Models 0020) or given by (34) (Models 0323); the second class of models involves a set of nonparametric models which are also studied with mean function (28) and variance being constant or given by (34) (Models 30 and 33) using AL as a proxy distribution with fixed at different quantile levels.
For model comparison, deviance information criterion (DIC) is adopted, see Appendix III for details. Since, models with smaller are preferred to those with larger , then the results of the model comparisons provided in Table 1 show that among the parametric models, which incorporates an ANOVA model for both accident and development years in modelling both the mean and variance functions is the best fitting model according to . This show that the accident year and development year effects are both important in describing the dynamics of the mean and variance. Hence, these ANOVAtype mean and variance functions are applied to most of the subsequent analyses whenever possible. For the nonparametric models, with ANOVA variance provide better fit than with constant variance.
Models  Models  
Variance Constant  Variance Function  
195.41  255.21  315.02  0.85 (est)  272.82  334.74  396.66  0.93 (est)  
223.30  284.10  344.91  0.88 (est)  199.14  247.49  295.85  0.95 (est)  
50.94  120.17  189.40  0.81 (est)  20.81  24.91  70.63  0.75 (est)  
55.94  125.61  195.28  0.30 (fix)  37.06  38.34  113.74  0.30 (fix)  
73.10  152.26  231.43  0.50 (fix)  38.80  35.51  109.82  0.50 (fix)  
55.26  132.56  209.87  0.75 (fix)  17.33  53.40  124.12  0.75 (fix)  
44.86  116.38  187.91  0.95 (fix)  64.26  3.68  71.62  0.95 (fix)  
is the posterior mean deviance ; where is the posterior mean of 
Between parametric model and nonparametric models , the nonparametric models provide better model performance according to . These models correspond to the AL models with mean and variance functions and we study their performances for a range of fixed quantile levels as shown in Figure 5. This plot demonstrates the quantilequantile plot for the fitted models at different quantile levels, indicating appropriate fits from the specified model structures for a range of different quantile levels.
In addition, we investigate the trends of development year effects as depicted in Figure 6 which reports the fitted loss where is given by (28) and calculated using the conditional predictive posterior quantile function in (44) for the first accident year (). The quantile levels correspond to the shape parameter set to 0.3, 0.5, 0.75 and 0.95 respectively in AL distribution. The figure demonstrates that there is a clear requirement for a nonlinear trend in the development year covariate at all quantile levels which uniformly increases up until and subsequently decreases thereafter at all quantile levels. Furthermore, the trends of fitted loss at all quantile levels agree with this observed trend.
To conclude the benchmark analysis on model structure we also present for the best model with mean and variance functions the estimated model trends for all accident years, depicted in Figure 7 as five triangular heat maps. The heat maps each depict the fitted loss by accident and development years in the upper triangle at all five quantile levels, where the first row corresponds to that which was studied in Figure 6. All heat maps show a consistent trend across development years for all accident years and quantile levels with high levels of loss as indicated by light colours being around the fourth development year, particularly for lower accident years. With increasing quantile levels, the width of light colours for each accident year increases showing higher levels of fitted losses around the peak.
Although nonparametric models have lower values, Table 1 shows that parametric model actually provides comparable model fit according to s before model complexity penalty was applied. This is because parametric models with additional shape parameters are subject to heavier model complexity penalty. However it should be noted that parametric models provide better model fit in general over a range of models and quantile levels. In addition, the parametric models have a significant advantage that they will be more readily interpretable as well as directly usable when calculating risk margins and quantile based risk measures as long as the quantile functions are in closed form, as was discussed in Section 4. For the mean structure corresponding to model choice under parametric model we also studied different variance structures, in order to explore the different choices of variance functions under the AL distribution.
Models  

50.94  120.17  189.40  1015.71  0.80  0.02  
4.32  56.66  117.64  849.91  0.74  0.04  
6.63  54.29  101.95  755.66  0.68  0.19  
20.81  24.91  70.63  850.10  0.75  0.17 
Again, we confirm that amongst all models with AL distribution, which incorporates both accident and development year effects for the mean and variance demonstrates the best model fit according to . On the other hand, favors which adopts only development year effect for the variance. One possible reason might be that the payments made in different accident years are relatively stable compared to those across development years, and hence the development year effect dominates in the variance estimation.
5.2 Analysis of Quantile Regression Models: Quantile Distribution
In this section we analyze the different model choices from the distributional perspective. This is not directly trivial to achieve, since each model has different features that must be taken into consideration in the comparison. It is clear from previous studies that one should always utilize an ANOVAtype mean function with accident and development years effect (), or at a minimum incorporate a quadratic or basis function form for the development year effects such as . In the case of the GB2 and AL models we will therefore consider mean structures in . However, in the case of the PP model we will consider , since purely from a computational perspective it will be easier to implement an efficient MCMC sampler for compared to . The reason for this is due to the rejection stage in the MetropolisHastings acceptance probability where under the PP model the posterior constraint regions will be easier to satisfy with less model complexity. In terms of the variance functions, when working with the GB2 models, we will consider in which we do not specify variance functions as there is no variance parameter in the distribution to model the variance directly. The variance of the models are given by (19). Then in the case of the AL model we consider as well as and for the PP model we consider and .
Table 3 reports the results split according to models with constant, unspecified and dynamic variance functions. In the case of constant or unspecified variance, the best performing model is again the AL model, followed by the GG model. Among distributions in the GB2 family with positive support, GG provides the best model fit according to with model complexity penalty while GB2 model provides the best model prediction according to . Comparing s without model complexity penalty, GG and GB2 provide very similar model fit. Besides, it is clear that the PP model with only the basis function regression structure for the mean, given by a quadratic polynomial for the trend in the development year covariate, and a constant variance was not sufficient to capture all the features required. We believe that this is largely due to the fact that such a model is more suitable for heavy tailed runoff in the claims development and the Israel data clearly does not display such a feature. It is therefore expected that such a heavy tailed quantile regression model will not perform as well for this data. When the variance is also modeled, the AL model is clearly significantly better than all the other models considered, again making with AL model optimal compared to all choices. Since, the PP model is shown to be not suitable for this data, we will consider analyses going forward with only the GB2 and AL models.
Models  
Quantile Regression: Unspecified Variance Function  
Gamma  3064.50  3028.93  2993.36  537.82  1  1.87    
GG  2707.42  2932.97  3158.52  582.78  33.22  0.08    
GB2  3002.82  2964.60  2926.37  526.65  7.94  1.78  0.17   
Quantile Regression: Constant Variance Function  
PP  3272.14  1021.71  1230.01  1132.12        14.15 
AL  50.94  120.17  189.40  1015.71    0.80    0.02 
Quantile Regression: NonConstant Variance Function  
PP  1502.19  1906.49  2310.98  923.00        9.10 
AL  20.81  24.91  70.63  850.10    0.75    0.17 
Next, we compare the standardized residuals for the GB2, Gamma and GG models under structure in