Quantifying the Model Risk Inherent in the Calibration and Recalibration of Option Pricing Models
Abstract.
We focus on two particular aspects of model risk: the inability of a chosen model to fit observed market prices at a given point in time (calibration error) and the model risk due to recalibration of model parameters (in contradiction to the model assumptions). In this context, we follow the approach of \citeasnounglasserman2014robust and use relative entropy as a premetric in order to quantify these two sources of model risk in a common framework, and consider the trade–offs between them when choosing a model and the frequency with which to recalibrate to the market. We illustrate this approach applied to the models of \citeasnounOZ:Bla&Sch:73 and \citeasnounOZ:Heston:93, using option data for Apple (AAPL) and Google (GOOG). We find that recalibrating a model more frequently simply shifts model risk from one type to another, without any substantial reduction of aggregate model risk. Furthermore, moving to a more complicated stochastic model is seen to be counterproductive if one requires a high degree of robustness, for example as quantified by a 99% quantile of aggregate model risk.
Yu Feng]Yu.Feng5@student.uts.edu.au Ralph Rudd] Christopher Baker] Qaphela Mashalaba] Melusi Mavuso] Corresponding author]Erik.Schlogl@uts.edu.au \markleftYU FENG ET AL.
1. Introduction
The renowned statistician George E. P. Box wrote that “essentially, all models are wrong, but some are useful.”^{1}^{1}1See \citeasnounOZ:Box:87. This is certainly true in finance, where many models and techniques that have been extensively empirically invalidated remain in widespread use, not just in academia, but also (perhaps especially) among practitioners. At times, the way models are used directly contradicts the model assumptions: As observed market prices change, parameters in option pricing models, which are assumed to be time–invariant, are recalibrated, often on a daily basis. Incorrect models, and model misuse, represent a source of risk that is being increasingly recognised — this is called “model risk.” As a paper by the Board of Governors of the Federal Reserve System put it in 2011,^{2}^{2}2See \citeasnounfedgov2011. “The use of models invariably presents model risk, which is the potential for adverse consequences from decisions based on incorrect or misused model outputs and reports.”
In broad terms, one could identify four general classes of model risk inherent to the way mathematical models are used in finance, for example in (but not limited to) option pricing applications:

Parameter uncertainty (and sensitivity to parameters) — let’s call this “Type 0” model risk for short. If model parameters need to be statistically estimated, they will only be known up to some level of statistical confidence, and this parameter uncertainty induces uncertainty about the correctness of the model outputs.^{3}^{3}3Examples of where this type of risk is considered explicitly in the literature include \citeasnounCDO:Loeffler:03, \citeasnounBan&Sch:13 and \citeasnounKer&Ber&Sch:10.

Inability to fit a model to a full set of simultaneous market observations — this is “calibration error,” let’s call this “Type 1” model risk for short. To the extent that a model cannot match observed prices on a given day, singleday (a.k.a. “crosssectional”) market data already contradicts the model assumptions. The classical example of this is the Black/Scholes implied volatility smile.

Change in parameters due to recalibration — let’s call this “Type 2” model risk for short. Once one moves from one day to the next, this aspect of model risk becomes apparent: In order to again fit the market as closely as possible, it is common practice in the industry to recalibrate models. This recalibration results in model parameters (which the models assume to be fixed) changing from day to day, contradicting the model assumptions.

The “true” dynamics of state variables don’t match model dynamics^{4}^{4}4This type of model risk is considered for example in \citeasnounKer&Ber&Sch:10, who also relate this to identification risk, which they define as risk which “arises when observationally indistinguishable models have different consequences for capital reserves.” — let’s call this violation of model assumptions “Type 3” model risk.^{5}^{5}5\citeasnounBou&Dan&Kou&Mai:14 present a method for making value–at–risk more robust with respect to this source of model risk by “learning” from the results of model backtesting. The classical example of this is the econometric rejection of the hypothesis that asset prices follow geometric Brownian motion, thus invalidating the key assumption in the seminal model of \citeasnounOZ:Bla&Sch:73. This type of model risk would impact in particular the effectiveness of hedging strategies based on a model.^{6}^{6}6\citeasnounDet&Pac:16 take the approach of measuring model risk based on the residual profit/loss from hedging in a misspecified model.
Note that there is a gradual transition between the different types of model risk, and depending on one’s modelling choices, to a certain extent one can trade off one type of model risk against another. For example,

Less stringent requirements of an exact fit to market observations (Type 1) allows less frequent recalibration (Type 2).

Instead of different model dynamics (Type 3), one could consider a parameterised family of models (Type 2).

Regime–switching models “legalise” changes in parameters, so Type 2 becomes more like Type 3.

Adding parameters shifts model risk from Type 1 to Type 2 (or, to a certain extent, to Type 0).

Adding state variables shifts model risk from Type 2 to Type 3.
glasserman2014robust propose relative entropy as a consistent premetric by which to measure model risk from different sources.^{7}^{7}7Instead of using a relative entropy premetric, one could approach quantifying model risk in terms of optimal–transport distance, using for example Wasserstein distance, which most recently has become popular for this purpose (see \citeasnounBar&Dra&Tan:18, \citeasnounBla&Che&Zho:18 and \citeasnounFen&Sch:18). In the present paper, we follow the more established approach using relative entropy, which has its roots in the seminal work of Hansen and Sargent (see e.g. \citeasnounHan&Sar:06). What matters in the application of mathematical models in finance is the probability distributions which the models imply,^{8}^{8}8\citeasnounBre&Csi:16 call this distribution model risk. either under a “risk–neutral” probability measure (for applications to relative pricing of financial instruments) or the “physical” (a.k.a. “real–world”) probability measure (for risk management applications such as the calculation of expected shortfall). Each type of model risk manifests itself as some form of ambiguity about the “true” probability measures which should be used for these purposes, and being able to quantify different types of model risk in a unified setting using a premetric for the divergence between distributions (like relative entropy) allows one to make an informed choice about the trade–offs between different sources of model risk. \citeasnounglasserman2014robust postulate a “relative entropy budget” defining a set of models sufficiently close (in the sense of relative entropy) to a nominal reference model to be considered in an evaluation of model risk expressed as a “worst case” expectation — i.e., a worst–case price or a worst–case risk measure. However, they say little as to how one typically would obtain a specific number for this “relative entropy budget”. In a sense, we invert this problem by noting that higher relative entropy between model distributions indicates higher model risk, and propose a method to jointly evaluate model risk of two types, based on how this model risk manifests itself when option pricing models are calibrated and recalibrated to liquid market instruments.
We focus on the model risk inherent in the calibration and recalibration (i.e., in the above terminology, Types 1 and 2) of option pricing models, and to illustrate our approach we consider the models of \citeasnounOZ:Bla&Sch:73 and \citeasnounOZ:Heston:93, thus comparing the most classical option pricing model with its popular extension incorporating stochastic volatility. Clearly, if (as is often the case in practice) one focuses solely on calibration error, \citeasnounOZ:Heston:93 will always be preferred to \citeasnounOZ:Bla&Sch:73, and more frequent recalibration preferred to less. We quantify calibration and recalibration risk in both models applied to equity option data, and also explore the trade–off between these two types of model risk, finding that there is no longer a trivial answer to the question which model and which recalibration frequency should be preferred when these two sources of model risk are considered in a unified framework.
The rest of the paper is organised as follows. Section 2 introduces a framework for the joint evaluation of model risk due to calibration error and due to model recalibration. The numerical implementation of the method is discussed in Section 3. Section 4 presents the results obtained by applying this method to option price data, and Section 5 concludes.
2. Calibration error, model risk due to recalibration, and treatment of latent state variables
As noted above, model risk is reflected in the ambiguity with regard to the “correct” probability distribution to use for relative pricing or risk assessment. Following \citeasnounglasserman2014robust, we quantify this ambiguity using the divergence between probability measures. In the present context, these can be classified as divergence measures defined as a function satisfying
(1)  
(2) 
where is a space of all probability measures with a common support. More specifically, most divergence measures belong to the class of divergence, which gives the divergence between two equivalent measures as:^{9}^{9}9See e.g. \citeasnounali1966general, \citeasnouncsisz1967information or \citeasnounahmadi2012entropic.
(3) 
where is a convex function of the RadonNikodym derivative satisfying . Kullback–Leibler divergence (a.k.a. relative entropy) is the most common divergence, which assigns . It is noted that the methodology of this paper applies to all types of statistical distances in principle, though in the empirical study the Kullback–Leibler divergence is adopted due to its simplicity and widespread use.
If we wish to quantify calibration error (Type 1 model risk) in this fashion, then in equations (1)–(3), the probability measure corresponds to the calibrated model and thus is parametric in some form. The probability measure , on the other hand, serves as a reference measure exactly matching observed market prices at a given point in time, unrestricted by the assumptions of the model under consideration. On calibrating an option pricing model, we may regard the measure as some nonparametric risk–neutral measure that explains the market in full assuming absence of arbitrage. In practice, however, the measure is not unique as the market is usually incomplete. We therefore define the space of all probability measures that explains the market in full by .
We may further define the space of probability measures given by all possible choices of parameter values for the target model by . The new calibration methodology proposed here aims to minimise the calibration error as quantified by the divergence between the two measures and , taken from their respective spaces, i.e.
(4) 
This is to say, the new approach attempts to calibrate a model measure (i.e., a set of model parameters ) and nonparametric perfect fit to the market (at a given point in time) , in a fashion which minimises the calibration error expressed by
(5) 
This is not an end in itself — it is required in order to compare model risk due to calibration error and model risk due to recalibration (as specified below) in a unified framework.
The classical approaches of model calibration, such as minimising the mean–squared error between model and market prices for options, would be inappropriate in this context, as they would lead to unnecessarily high model risk quantities. It is the choice of divergence measure which informs the calibration procedure, resulting in a pair of probability measures, , one of which corresponds to the calibrated model while the other provides a consistent reference measure fitting the market exactly.
To quantify the model risk due to recalibration, let us consider the more specific case where the model is Markovian in a vector of observable state variables , the model is characterised by a vector of model parameters , and market prices are given for European option prices of a single maturity .^{10}^{10}10This last assumption of a single maturity avoids the need to constrain the choice of to ensure the absence of calendar spread arbitrage between nonparametric risk–neutral measures for different time horizons — parametric models typically ensure this by construction. If we appropriately constrain , this assumption can be lifted. Suppose we solved (4) yesterday (at time ) to obtain a — to be as explicit as possible, denote this by
(6) 
I.e., this is a (conditional) probability measure defined on all –measurable events, where the conditioning is on the state variables at time , , and we write to express that the time realisations of the state variables are known at the time that these probabilities are evaluated. We write the subscript to express that these probabilities are evaluated in a model with parameters calibrated by solving (4) at time . Furthermore, denote the nonparametric measure resulting from solving (4) at time by .
Now, if we recalibrate today (at time ) by solving (4), we obtain and
(7) 
We can then define the model risk quantity due to recalibration as
(8) 
which is the divergence between the (conditional) probability measures evaluated at time , where one measure is based on the recalibrated parameters and the other is based on the previously calibrated parameters (thus expressing, in terms of divergence, the inconsistency with the model assumptions due to the fact that we are going “outside of the model” to change parameters in recalibration). The aggregate of calibration error and model risk due to recalibration is then
(9) 
i.e., the divergence between the nonparametric probability measure obtained by solving (4) at time , and the nonrecalibrated parametric probability measure, consisting of probabilities conditional on the state at time , but based on model parameters obtained by solving (4) at time . However, this approach minimises the divergence between the reference distribution and the recalibrated distribution, thus arguably overstating the divergence to the nonrecalibrated (i.e. model–consistent) distribution, and therefore overstating the aggregate model risk .
Alternatively, we may choose as the nonparametric reference distribution at time :
(10) 
resulting in a lower aggregate model risk of
(11) 
Note that is still obtained by solving (4), because both and represent nonparametric probability measures fitting observed market prices exactly, so remains the best available parametric fit to the market at time ( is only used to determine minimum divergence of the nonrecalibrated model to a measure giving a perfect fit).
In the heuristic schematic of Figure 1(a),^{11}^{11}11Note that these graphs are for the purpose of heuristic illustration only — in particular, we are not requiring that the two sets of probability measures are convex. point A represents , being the parametric probability measure “closest” to the set of nonparametric probability measures fitting the market exactly, where point C represents . If we do not recalibrate at time , we end up with the parametric probability measure (point B), to which (point D) is the “closest” nonparametric probability measure fitting the market exactly.
In the case of Kullback–Leibler divergence, note that if Type 1 (calibration error) and 2 (recalibration) model risk involve independent Radon–Nikodym derivatives, then, in the first case considered above, aggregate model risk equals the sum of the two components. In fact, the RadonNikodym derivatives, as random variables, take the key role in evaluating the two types of model risk. At the time the model is recalibrated, we again consider the optimisation (4), with now changed to reflect the change in observed market prices, so we have the following Radon–Nikodym derivatives:
(12)  
(13)  
(14) 
Abbreviating as and as , the aggregate risk can be expressed in terms of and as:
(15)  
(16)  
(17) 
If and are independent, . The total model risk is equal to the sum of the calibration risk and the recalibration risk. Surprisingly, in our empirical exploration below we found that this equality is followed quite well by the BlackScholes model. However, it typically does not hold in the Heston model, suggesting substantial dependence (of RadonNikodym derivatives) between the calibration error and model risk due to recalibration.
We also consider models which involve one or more latent state variables. An example of that is the class of stochastic volatility models where the volatility is taken as a latent state variable rather than a model parameter (in the empirical examples below, we specifically consider the model of \citeasnounOZ:Heston:93, which falls into this category). Under the framework of a single stochastic volatility state variable, a model specified by a given set of parameters forms a onedimensional manifold (Fig. 1(b)) for possible realisations of the state variable, rather than a point in the BlackScholes world (Fig. 1(a)).
Thus, the model which we are now considering is Markovian in a vector of state variables , where the state variables are observable and the state variables are latent (unobservable). Then, the initial calibration problem (4) becomes
(18) 
where and are the sets of legitimate values of the state variables and the parameters, respectively. is the set of model parameters calibrated to the market, and is the best estimate of the latent state variables under the calibrated model.^{12}^{12}12This effectively treats the latent (unobserved) state variable as an additional parameter to be calibrated, but the recalibration of which does not contribute to (Type 2) model risk due to recalibration, because it is consistent with the model assumptions for this latent state variable to evolve stochastically. This does shift Type 2 model risk to Type 3, the risk that the state variable dynamics are not (econometrically) consistent with the dynamics assumed in the model. However, in the present paper we deliberately set aside Type 3 model risk for the purposes of our analysis, leaving the integration of all four types of model risk for future research. The notation in (6) is amended to
(19) 
At time , we have for the calibration error
(20) 
The model risk due to recalibration is
(21) 
The aggregate model risk using from and from in (18) is
(22) 
or alternatively, using and determined analogously to (10), i.e.,
(23) 
which results in
(24) 
We then have the following Radon–Nikodym derivatives:
(25)  
(26)  
(27) 
Note that the key difference between (12)–(14) and (25)–(27) is that the change in , being permitted by the model assumptions, does not contribute to the model risk quantities. In (4) and (18), we are deliberately prioritising the minimisation of calibration error, as this is congruent to the (often exclusive) focus of practitioners on calibration error (with little or no regard to model risk due to recalibration). If desired, one could reformulate this approach to prioritise the minimisation of aggregate model risk, or of model risk due to recalibration.
3. Numerical implementation
In this section, we outline the numerical scheme for solving the minimisation problems arising when taking into account calibration error and model risk due to recalibration in the manner described in the previous section, including problems of the type (4) involving the optimal choice of two probability measures. In this case, an iterative process is required, optimising two probability measures and in turn until convergence, in the following manner:
 1):

Produce from a parametric model based on an initial guess of the model parameters (and latent state variables, where required).
 2):

Solve for via Lagrange multipliers for the constrained problem that minimises .
 3):

Solve for to obtain model parameters for the that minimises .
 4):

Iterate steps 2 and 3: until convergence.
In Step 1, the initial guess may be obtained in several different ways. A common way is to minimise the mean–squared error between model and market option prices at all available strikes. We opted for the Broyden/Fletcher/Goldfarb/Shanno (BFGS) algorithm for conducting this initial calibration of the model parameters and (where required) latent state variables.
In Step 2, we solve the following constrained minimisation problem using Lagrange multipliers:
(28)  
(29) 
Note that here we specify the constraints in the form of expectations under the measure , where these expectation are the model prices for our calibration instruments for the model based on the nonparametric reference distribution . In general, , and are vectors; thus (29) is a “stack” of inequality constraints representing observed market prices. Also notice that for generality we “relax” each equality constraint into two inequality constraints. This is in order to account for the bidask spread of each option traded on the market. The vector denotes a list of bid prices while the vector contains ask prices. In a simplified scenario, where exact option prices are given, we may set . denotes the vector of discounted option payoffs. By introducing vectors of Lagrange multipliers and , we convert the constrained problem to an unconstrained dual problem,
(30) 
In the case of KullbackLeibler divergence, solving the inner problem gives the probability density function of in terms of the density of ,
(31) 
Substituting (31) into (30), we get a maximisation problem,
(32) 
If , then the last term vanishes, representing the problem with exact market prices. If (componentwise) , then the last term reflects a penality on the objective function that is proportional to the difference of the two Lagrange multipliers. We may therefore transform the Lagrange multipliers by
(33)  
(34) 
and the objective function becomes
(35)  
(36) 
We may numerically solve the maximization problem by taking its gradient with respect to ,
(37) 
where the elementwise sign function assigns 1, 1 or 0 to each element of . However, due to the discontinuouity of the sign function (37) cannot be solved directly in a stable way. To bypass this problem, we approximate the sign function with a continuous step function:
(38) 
We use Powell’s hybrid method to solve the multidimensional equations (37), where controls the steepness of the function and carefully choosing this value is critical for a fast and stable convergence of the method.
In Step 3, we use LBFGSB algorithm to minimise the divergence with respect to model parameters (or latent variables or both). Step 2 and Step 3 are repeated until convergence. The convergence criterion adopted here is that all the percentage changes of parameters after one iteration do not exceed a certain threshold, say 0.1%.
4. Examining the trade–off between calibration error and model risk due to recalibration
As an application example of the method described in the previous two sections, we consider historical data consisting of daily market prices for call options on AAPL and GOOG stock over a period from 6 January 2004 to 19 December 2008 for AAPL and 4 January 2005 to 19 December 2008 for GOOG. This gives us a reasonably straightforward application example free of extraneous complications,^{13}^{13}13Although these options are of the American type, i.e. permitting early exercise, AAPL and GOOG did not pay any dividends during this period. Thus the possibility of early exercise may be ignored (see \citeasnounOZ:Merton:73). while still covering reasonably liquid options and including a period of “interesting” market volatility (2007/8). From this data, we remove options very far away from the money, restricting the range of strikes from delta 2.5% to delta 97.5%. Furthermore, we remove prices of options which had zero trading volume on a given day, in order to avoid using prices which are likely to be stale.
On this data we consider two parametric models, \citeasnounOZ:Bla&Sch:73 and \citeasnounOZ:Heston:93 — arguably the two most popular option pricing models available, where the latter introduces a latent variable for stochastic volatility. The unified methodology, quantifying calibration error, model risk due to recalibration, and the aggregate of the two, allows us to explore the trade–off between calibration error (which is, unsurprisingly, reduced by moving from \citeasnounOZ:Bla&Sch:73 to \citeasnounOZ:Heston:93) and model risk due to recalibration (which has hitherto been largely ignored) when moving from one parametric model to another as well as when changing the frequency with which the model is recalibrated.
We start by evaluating the calibration, recalibration and aggregated model risks under a Black/Scholes model, i.e. where the underlying asset price is assumed to follow geometric Brownian motion, with dynamics under the risk–neutral measure given by
(39) 
where is the continuously compounded risk–free rate of interest and is a constant volatility parameter. We note that in the Black/Scholes model we obtain a simple closed form expression for the recalibration risk defined in (8):
(40) 
where is the correctly recalibrated Black/Scholes volatility parameter and is the parameter value obtained in a previous calibration. This formula is a consequence of the lognormal distribution of returns assumed in the Black/Scholes model.
We can express the aggregate model risk as the sum of the calibration error, the recalibration risk and a residual. As noted in equation (15), the residual is zero if the likelihood ratios involved in the calibration and recalibration risks are two independent random variables. In practice, the residual usually takes a small nonzero value. In Figure 2 we demonstrate the decomposition of the total model risk into the three components.^{14}^{14}14The vertical axis denotes the numerical value of the relative entropy. Unsurprisingly (as it is well documented that the Black/Scholes model cannot fit the implied volatility “smile” observed in most options markets), we see that calibration error typically predominates.
In the Heston model, the dynamics (39) are extended to allow for stochastic volatility, i.e.
(41)  
(42) 
This model involves two state variables, the underlying asset price and the volatility , and five model parameters: , , , , where is the correlation coefficient between the two Wiener processes:
is the riskfree rate,^{15}^{15}15In our empirical application examples, we take the risk–free rate as one of the financial variables observed in the market, but we do not explicitly take into account interest rate risk in our empirical analysis. For the short–dated options considered here, interest rate risk is known to be of relatively little importance — for a discussion of this issue, see e.g. \citeasnounCHENG2017 and the literature cited therein. and , and relate to the volatility process, being the rate of mean reversion, the long–run mean and the volatility of this process.
Following \citeasnounOZ:Gatheral:06, the riskneutral probability of exercise of a European call option with strike in the Heston model is given by
(43) 
where is the current value of the volatility state variable , is the time to maturity, and is the logarithmic forward moneyness of the option, i.e.
with the time price of a zero bond maturing in . Furthermore,
(44)  
(45)  
(46) 
Parameters and are functions of (Fourier transform variable of ):
(47)  
(48) 
Aggregate Model Risk  
Risk measure  BlackScholes  Heston  
1 day  3 days  1 week  2 weeks  1 quarter  1 day  3 days  1 week  2 weeks  1 quarter  
mean  0.070  0.071  0.073  0.073  0.085  0.037  0.035  0.038  0.039  0.046  
median  0.045  0.047  0.047  0.051  0.057  0.005  0.005  0.005  0.006  0.010  
Quantile  99%  0.474  0.427  0.471  0.455  0.462  0.508  0.462  0.512  0.503  0.649 
95%  0.212  0.221  0.221  0.212  0.251  0.173  0.169  0.177  0.177  0.185  
90%  0.158  0.160  0.165  0.160  0.192  0.096  0.097  0.098  0.105  0.121  
75%  0.092  0.094  0.095  0.099  0.112  0.027  0.027  0.029  0.031  0.034  
Calibration Error  
mean  0.055  0.066  0.069  0.072  0.084  0.008  0.026  0.032  0.036  0.045  
median  0.038  0.043  0.045  0.049  0.057  0.001  0.003  0.004  0.005  0.009  
Quantile  99%  0.239  0.397  0.433  0.443  0.455  0.163  0.416  0.496  0.495  0.648 
95%  0.171  0.207  0.212  0.210  0.249  0.015  0.130  0.158  0.168  0.182  
90%  0.135  0.151  0.158  0.158  0.191  0.008  0.065  0.083  0.097  0.119  
75%  0.075  0.087  0.091  0.096  0.111  0.003  0.012  0.019  0.026  0.034  
Model Risk due to Recalibration  
mean  0.024  0.009  0.005  0.002  0.001  0.057  0.019  0.011  0.006  0.001  
median  0.002  0.001  0.001  0.000  0.000  0.004  0.001  0.001  0.001  0.000  
Quantile  99%  0.458  0.196  0.057  0.030  0.012  0.630  0.226  0.117  0.067  0.011 
95%  0.103  0.034  0.021  0.010  0.005  0.315  0.108  0.062  0.032  0.006  
90%  0.056  0.020  0.011  0.006  0.002  0.164  0.052  0.031  0.016  0.004  
75%  0.013  0.006  0.004  0.002  0.001  0.055  0.016  0.011  0.005  0.001 
It is noted that since by definition is the probability of exercise. The probability density function of the riskneutral measure is therefore obtained:
(49)  
(50) 
For simplicity denotes the ratio of the forward price at to its spot price at maturity , i.e. , we derive the riskneutral probability with respect to :
(51)  
(52) 
can be calculated by fast Fourier transform (FFT).
The decomposition of the total model risk into the three components (the components due calibration and recalibration, and the positive or negative residual measuring the departure from independence between the first two components) when using the Heston model as the baseline is given in Figure 3. Again, since it is well documented that the Heston model can fit observed option prices better than Black/Scholes, it is unsurprising that in this case the relative entropy measuring calibration error is much lower — however, already in this set of example days it is evident that this comes at a price of increased model risk due to recalibration.
Aggregate Model Risk  
Risk measure  BlackScholes  Heston  
1 day  3 days  1 week  2 weeks  1 quarter  1 day  3 days  1 week  2 weeks  1 quarter  
mean  0.165  0.168  0.165  0.165  0.188  0.115  0.105  0.113  0.114  0.127  
median  0.109  0.111  0.112  0.115  0.138  0.053  0.050  0.055  0.053  0.059  
Quantile  99%  0.705  0.722  0.722  0.699  0.728  0.726  0.668  0.711  0.699  0.744 
95%  0.519  0.530  0.521  0.496  0.549  0.475  0.438  0.455  0.467  0.560  
90%  0.393  0.391  0.387  0.385  0.408  0.330  0.287  0.320  0.324  0.380  
75%  0.219  0.223  0.214  0.215  0.256  0.145  0.137  0.142  0.153  0.157  
Calibration Error  
mean  0.125  0.155  0.158  0.162  0.187  0.055  0.083  0.103  0.107  0.126  
median  0.082  0.102  0.106  0.111  0.137  0.006  0.022  0.043  0.046  0.058  
Quantile  99%  0.655  0.701  0.709  0.696  0.728  0.626  0.651  0.706  0.688  0.744 
95%  0.400  0.482  0.507  0.494  0.548  0.329  0.381  0.438  0.445  0.557  
90%  0.295  0.359  0.362  0.379  0.408  0.172  0.250  0.298  0.306  0.375  
75%  0.171  0.208  0.207  0.212  0.255  0.034  0.103  0.127  0.141  0.155  
Model Risk due to Recalibration  
mean  0.036  0.014  0.008  0.004  0.001  0.119  0.044  0.012  0.012  0.003  
median  0.004  0.002  0.001  0.001  0.000  0.043  0.016  0.003  0.003  0.001  
Quantile  99%  0.405  0.172  0.091  0.050  0.010  0.757  0.255  0.087  0.077  0.012 
95%  0.173  0.066  0.035  0.018  0.004  0.543  0.187  0.056  0.064  0.011  
90%  0.109  0.038  0.022  0.012  0.003  0.411  0.151  0.034  0.045  0.009  
75%  0.036  0.013  0.008  0.004  0.001  0.142  0.052  0.016  0.013  0.003 
These observations are reinforced when we consider aggregate model risk, calibration error and model risk due to recalibration over the entire sample period, as presented in Tables 1 and 2. Note that the absolute numbers refer to relative entropy and thus lack direct financial interpretation — what matters are the relative values when comparing the model across models and different recalibration frequencies, in particular when considering the aggregate model risk. Here, we consider recalibrating the Black/Scholes and Heston models either daily, every three days, every week, every two weeks, or every quarter year. We see that recalibrating more frequently has little effect on the aggregate model risk, neither when using the Black/Scholes model nor when using the Heston model. Essentially, recalibrating more frequently simply shifts calibration error into model risk due to recalibration,^{16}^{16}16Note that on days on which we do not recalibrate, the model risk due to recalibration is zero, because (consistent with the model assumptions) we are keeping previously calibrated parameters unchanged — so on those days aggregate model risk is entirely due to calibration error (which increases because the fit to market prices deteriorates when we do not recalibrate). highlighting the dangers in the common practice of focusing solely on the calibration of derivative pricing models, at the expense of all other sources of model risk.
In addition, we observe that if we are interested in “robustness” at a high level of confidence (looking at, say, the 99% quantile of aggregate model risk), moving from Black/Scholes to Heston also does not appear to deliver any advantage (it does yield some improvement at lower quantiles, or average or median, aggregate model risk). This means that when high levels of confidence are required, any gain in calibration accuracy delivered by the Heston model is offset by higher model risk due to recalibration. One should note that this last point holds even before considering Type 3 model risk, which may well be worse when additional state variables are introduced (as in the Heston model). For these results, in Tables 1 and 2 we calculated means, medians and quantiles across all available option maturities. If we consider only particular maturity “buckets”, the same qualitative conclusions are evident — Tables 3 and 4 illustrate this in the case of daily recalibration.
Aggregate Model Risk  
Risk measure  BlackScholes  Heston  
all  00.2 year  0.20.7 year  0.7 year  all  00.2 year  0.20.7 year  0.7 year  
mean  0.070  0.041  0.066  0.109  0.037  0.039  0.024  0.046  
median  0.045  0.021  0.052  0.081  0.005  0.004  0.004  0.008  
Quantile  99%  0.474  0.471  0.266  0.567  0.508  0.590  0.302  0.472 
95%  0.212  0.120  0.185  0.282  0.173  0.243  0.091  0.191  
90%  0.158  0.081  0.143  0.213  0.096  0.074  0.066  0.142  
75%  0.092  0.047  0.090  0.141  0.027  0.021  0.015  0.054  
Calibration Error  
mean  0.055  0.026  0.057  0.087  0.008  0.012  0.004  0.007  
median  0.038  0.014  0.048  0.068  0.001  0.001  0.001  0.001  
Quantile  99%  0.239  0.157  0.230  0.289  0.163  0.072  0.052  0.116 
95%  0.171  0.091  0.157  0.212  0.015  0.021  0.011  0.017  
90%  0.135  0.062  0.126  0.181  0.008  0.006  0.007  0.010  
75%  0.075  0.036  0.077  0.128  0.003  0.002  0.003  0.003  
Model Risk due to Recalibration  
mean  0.024  0.020  0.013  0.038  0.057  0.062  0.048  0.060  
median  0.002  0.002  0.001  0.003  0.004  0.004  0.003  0.009  
Quantile  99%  0.458  0.506  0.116  0.662  0.630  0.702  0.536  0.512 
95%  0.103  0.066  0.066  0.169  0.315  0.443  0.274  0.247  
90%  0.056  0.030  0.042  0.102  0.164  0.223  0.127  0.160  
75%  0.013  0.009  0.010  0.031  0.055  0.030  0.048  0.091 
Aggregate Model Risk  
Risk measure  BlackScholes  Heston  
all  00.2 year  0.20.7 year  0.7 year  all  00.2 year  0.20.7 year  0.7 year  
mean  0.165  0.167  0.162  0.164  0.115  0.115  0.120  0.109  
median  0.109  0.104  0.099  0.120  0.053  0.052  0.057  0.048  
Quantile  99%  0.705  0.736  0.709  0.661  0.726  0.714  0.707  0.736 
95%  0.519  0.527  0.551  0.459  0.475  0.468  0.500  0.476  
90%  0.393  0.417  0.382  0.371  0.330  0.351  0.322  0.304  
75%  0.219  0.221  0.216  0.218  0.145  0.149  0.155  0.136  
Calibration Error 

mean  0.125  0.121  0.123  0.132  0.055  0.055  0.052  0.057  
median  0.082  0.074  0.075  0.099  0.006  0.005  0.006  0.006  
Quantile  99%  0.655  0.662  0.650  0.639  0.626  0.658  0.531  0.665 
95%  0.400  0.403  0.423  0.391  0.329  0.349  0.311  0.330  
90%  0.295  0.298  0.301  0.285  0.172  0.155  0.185  0.171  
75%  0.171  0.162  0.160  0.176  0.034  0.031  0.034  0.034  
Model Risk due to Recalibration 

mean  0.036  0.038  0.039  0.033  0.119  0.117  0.117  0.124  
median  0.004  0.005  0.003  0.005  0.043  0.043  0.037  0.049  
Quantile  99%  0.405  0.419  0.479  0.284  0.757  0.746  0.767  0.754 
95%  0.173  0.176  0.188  0.155  0.543  0.510  0.558  0.554  
90%  0.109  0.111  0.115  0.098  0.411  0.390  0.418  0.418  
75%  0.036  0.038  0.032  0.036  0.142  0.144  0.137  0.146 
5. Conclusion
Under our approach, less relative entropy implies less model risk, and we are able to evaluate two hitherto disparate sources of model risk (calibration error and model risk due to recalibration) in a unified fashion, and examine the potential trade–off between the two. We have considered a simple choice between two models, and between different recalibration frequencies. “Putting a number on model risk” by calculating quantiles for the maximum model risk (quantified by relative entropy) over a time series of market data allows one to assess the added value (if any) of more complicated stochastic models of financial markets.
In our application, we are deliberately prioritising the minimisation of calibration error, as this is congruent to the (often exclusive) focus of practitioners on calibration error (with little or no regard to model risk due to recalibration).^{17}^{17}17If desired, one could reformulate this approach to prioritise the minimisation of aggregate model risk, or of model risk due to recalibration. Even in this case, we see that by including recalibration as one of the sources of aggregate model risk, recalibrating a model frequently to a changing market simply interchanges one source of model risk for another, and more complicated stochastic models may well underperform when aggregate model risk is taken into account.
References
 [1] \harvarditemAhmadiJavid2012ahmadi2012entropic AhmadiJavid, A.: 2012, Entropic valueatrisk: A new coherent risk measure, Journal of Optimization Theory and Applications 155(3), 1105–1123.
 [2] \harvarditemAli and Silvey1966ali1966general Ali, S. M. and Silvey, S. D.: 1966, A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society. Series B (Methodological) pp. 131–142.
 [3] \harvarditemBannör and Scherer2013Ban&Sch:13 Bannör, K. F. and Scherer, M.: 2013, Capturing parameter risk with convex risk measures, European Actuarial Journal 3, 97–132.
 [4] \harvarditem[Bartl et al.]Bartl, Drapeau and Tangpi2018Bar&Dra&Tan:18 Bartl, D., Drapeau, S. and Tangpi, L.: 2018, Computational aspects of robust optimized certainty equivalents and option pricing, Technical Report 1706.10186, arXiv preprint.
 [5] \harvarditemBlack and Scholes1973OZ:Bla&Sch:73 Black, F. and Scholes, M.: 1973, The pricing of options and corporate liabilities, Journal of Political Economy 81(3), 637–654.
 [6] \harvarditem[Blanchet et al.]Blanchet, Chen and Zhou2018Bla&Che&Zho:18 Blanchet, J., Chen, L. and Zhou, X. Y.: 2018, Distributionally robust mean–variance portfolio selection with Wasserstein distances, Technical Report 1802.04885, arXiv preprint.
 [7] \harvarditemBoard of Governors of the Federal Reserve System Office of the Comptroller of the Currency2011fedgov2011 Board of Governors of the Federal Reserve System Office of the Comptroller of the Currency: 2011, Supervisory guidance on model risk management, Technical Report OCC 201112 Attachment, Federal Reserve.
 [8] \harvarditem[Boucher et al.]Boucher, Danielsson, Kouontchou and Maillet2014Bou&Dan&Kou&Mai:14 Boucher, C. M., Danielsson, J., Kouontchou, P. S. and Maillet, B. B.: 2014, Risk models–at–risk, Journal of Banking & Finance 44, 72–92.
 [9] \harvarditemBox and Draper1987OZ:Box:87 Box, G. E. P. and Draper, N. R.: 1987, Empirical Model–Building and Response Surfaces, Wiley.
 [10] \harvarditemBreuer and Csiszár2016Bre&Csi:16 Breuer, T. and Csiszár, I.: 2016, Measuring distribution model risk, Mathematical Finance 26(2), 395–411.
 [11] \harvarditem[Cheng et al.]Cheng, Nikitopoulos and Schlögl2017CHENG2017 Cheng, B., Nikitopoulos, C. S. and Schlögl, E.: 2017, Pricing of longdated commodity derivatives: Do stochastic interest rates matter?, Journal of Banking & Finance .
 [12] \harvarditemCsiszár1967csisz1967information Csiszár, I.: 1967, Informationtype measures of difference of probability distributions and indirect observations, Studia Scientiarum Mathematicarum Hungarica 2, 299–318.
 [13] \harvarditemDetering and Packham2016Det&Pac:16 Detering, N. and Packham, N.: 2016, Model risk of contingent claims, Quantitative Finance 16(9), 1357–1374.
 [14] \harvarditemFeng and Schlögl2018Fen&Sch:18 Feng, Y. and Schlögl, E.: 2018, Model risk measurement under Wasserstein distance, Technical report, SSRN Working Paper.
 [15] \harvarditemGatheral2006OZ:Gatheral:06 Gatheral, J.: 2006, The Volatility Surface: A Practitioner’s Guide, John Wiley & Sons.
 [16] \harvarditemGlasserman and Xu2014glasserman2014robust Glasserman, P. and Xu, X.: 2014, Robust risk measurement and model risk, Quantitative Finance 14(1), 29–58.
 [17] \harvarditemHansen and Sargent2006Han&Sar:06 Hansen, L. P. and Sargent, T. J.: 2006, Robustness, Princeton University Press.
 [18] \harvarditemHeston1993OZ:Heston:93 Heston, S. L.: 1993, A closed–form solution for options with stochastic volatility with applications to bond and currency options, The Review of Financial Studies 6, 327–343.
 [19] \harvarditem[Kerkhof et al.]Kerkhof, Melenberg and Schumacher2010Ker&Ber&Sch:10 Kerkhof, J., Melenberg, B. and Schumacher, H.: 2010, Model risk and capital reserves, Journal of Banking & Finance 34, 267–279.
 [20] \harvarditemLöffler2003CDO:Loeffler:03 Löffler, G.: 2003, The effects of estimation error on measures of portfolio credit risk, Journal of Banking & Finance 27, 1427–1453.
 [21] \harvarditemMerton1973OZ:Merton:73 Merton, R. C.: 1973, Theory of rational option pricing, The Bell Journal of Economics and Management Science 4(1), 141–183.
 [22]