# Option Pricing, Historical Volatility and Tail Risks

###### Abstract

We revisit the problem of pricing options with historical volatility estimators. We do this in the context of a generalized GARCH model with multiple time scales and asymmetry. It is argued that the reason for the observed volatility risk premium is tail risk aversion. We parametrize such risk aversion in terms of three coefficients: convexity, skew and kurtosis risk premium. We propose that option prices under the real-world measure are not martingales, but that their drift is governed by such tail risk premia. We then derive a fair-pricing equation for options and show that the solutions can be written in terms of a stochastic volatility model in continuous time and under a martingale probability measure. This gives a precise connection between the pricing and real-world probability measures, which cannot be obtained using Girsanov Theorem. We find that the convexity risk premium, not only shifts the overall implied volatility level, but also changes its term structure. Moreover, the skew risk premium makes the skewness of the volatility smile steeper than a pure historical estimate. We derive analytical formulas for certain implied moments using the Bergomi-Guyon expansion. This allows for very fast calibrations of the models. We show examples of a particular model which can reproduce the observed SPX volatility surface using very few parameters.

## 1 Introduction

Most option pricing models are written directly in the martingale or pricing probability measure [1]. Such models usually have a significant number of parameters which need to be fitted to the volatility surface. In the end, such parameters will show strong time dependency, invalidating the initial assumptions of the model. Moreover, the final values of the parameters have little physical significance, and so there is no notion of a “fair” option price. We think of this as a fit-only approach, which in our opinion is best done in the context of parametric smile models such as SSVI [2].

On the other hand, there is another stream of literature which studied the volatility surface produced by GARCH volatility forecasts [3, 4, 5, 6]. However, here one runs into another problem: what is the relation between the real-world and the pricing or martingale probability measure? One solution is to leave free some of the GARCH parameters so they can be fitted to the volatility surface. However, this puts us back into the fit-only approach without any understanding of the physical meaning of these parameters. Worst, the GARCH models are written in discrete time, and hence require time-consuming Monte Carlo simulations in order to find the optimal parameters.

An early attempt to find a direct relation between the pricing and real-world measure in the context of GARCH models is found in [3]. This approach assumes that the one-period expected variance, is the same in both probability measures. However, this assumption is wrong as we will show in this paper. In fact, due to tail risks, there should be a significant premium paid to the short gamma trader even for one-day returns. In other approaches, such as [4], the authors start by modeling log returns in discrete time and define the pricing measure by requiring simple returns to be a martingale. However, as we will show in this paper, the volatility risk premium has nothing to do with the drift of the underlying. In fact, option prices are very insensitive to drifts and the underlying can very well be a martingale in both the real-world and pricing measure. A notable exception to this literature stream is [7, 8], which uses a price kernel approach to connect both probability measures. We believe there is an interesting connection between their approach and ours, but we will leave this for future work.

In this article we introduce a new approach to option pricing. Our goal is to use historical volatility estimators, while introducing risk premia which will allow to fit the volatility surface. In fact, only the risk premia needs to be fitted to the option prices. The rest of the parameters will be determined using the time series of the underlying. We will show that most of the features of the volatility surface can be explained using a good volatility forecast and only three risk premia. We will assume that the risk premia are constant. This is not true in practice, but a generalization is possible and will be left for future work. We will argue that the risk premium parameters we introduce are related to tail risk aversion, and come from the fact that option traders mark to market their books in discrete time and have limited capital.

We will begin by working in discrete time, but assume the time step to be small enough so that we can expand option prices to second order in variations of the stochastic variables. This is what most option traders do in practice. Moreover, as most practitioners know, only the first few Greeks can be traded in the market due to liquidity constraints. This approximation will allow us to make a connection with the more familiar continuous time stochastic volatility models.

We define tail risk as a typical large move of the underlying, not necessarily a catastrophic “Black-Swan” event [9]. However, we do not assign probabilities to such events. In practice, all market participants have limited capital, and must limit their leverage so that they can withstand such tail events. In fact, most brokers determine margin requirement precisely this way, using stress testing. What does this mean in practice? Suppose you are a trader who is short gamma. Under a large move of the underlying price , you face a potentially large loss of size: , where is the net gamma exposure. This is an unhedgable tail risk! In other words, there is a large tail risk asymmetry between a long-volatility and a short-volatility position. The short volatility trader must then put aside more capital than the long volatility counter-party. This is a cost of carry and so it is only fair that the short-volatility trader gets compensated by having a non-zero drift in his/her portfolio: . This very simple argument is the basis of our option pricing approach. In a nutshell, we propose that in the real-world measure, the drift of option returns is governed by the prices of tail risk.

We will restrict ourselves to a class of GARCH models with asymmetry and multiple time scales. However, our methodology can be applied to more general models, even those that include high-frequency volatility estimators [10]. We derive a generalized Black-Scholes equation under the real-world measure. Using Feynman-Kac theorem, we map the solutions to this equation to a stochastic volatility model in continuous time and under a martingale probability measure. This gives a precise mapping from the real-world to the pricing measure. However, this connection cannot be obtained using the standard Girsanov transformation.

Using the results of Bergomi and Guyon [11], we derive approximate formulas for certain implied moments up to second order in the volatility of volatility (vol-of-vol). These moments can be compared to the corresponding strip of options for fast calibration. Each risk premium is calibrated independently. In particular, we show that we can get the convexity/gamma risk premium by fitting the variance swap term structure. Moreover, the skew and kurtosis risk premia are obtained by fitting similar strip of options. Once the risk premia are calibrated, one can generate full volatility surfaces using Monte Carlo simulations. We show that the volatility surfaces obtained this way are close to what we observe in the market.

We should stress that the goal of this paper is not to provide a comparative study of GARCH models or best estimation techniques. Our purpose is simply to introduce a new pricing methodology and give some examples. Therefore, we will not attempt to compare the fit quality of different models.

In section 2 we will make the tail risk argument more precise and define the risk premia. In section 3 we study in detail the GARCH(1,1) model which serves to illustrate the main ideas. In section 4 we generalize the GARCH model to include asymmetry and multiple time scales. In section 5 we derive approximate formulas for certain implied moments of the underlying returns. In section 6 we explain how the calibration is done using SPX option data. Moreover, we give examples of the volatility surfaces obtained from a particular GARCH model. We conclude in section 7.

### 1.1 Notation

We denote the price of the underlying asset by , where is time measured in years. As usual, we assume that is the forward price, so that we can ignore dividends and interest rates. When working in discrete time we take a one day time step: (in years). Simple returns will be denoted by

In general, time subscripts denote stochastic time dependence while parenthesis denote smooth time dependence. For example, is a smooth function of for fixed . Moreover, all stochastic processes of the form are -measurable in the sense that they depend on information up to time .

The underlying return will be decomposed as follows:

where is a i.i.d. noise with zero mean and unit standard deviation, and is the realized annualized variance. Note that we take the underlying to be a martingale under the real-world measure. However, adding a drift or taking log-returns instead has a negligible effect on the parameters of the model. We also find little evidence for skewness in the distribution of . Therefore, we will assume that the distribution of is symmetric.

We will make ample use of exponential moving averages or EMAs. Our definition is the following:

(1) |

where is the time scale of the EMA in days, and is some random process.

The real-world probability measure is denoted by . The notation for means conditional expectation with information up to time . The pricing measure will be denoted by with similar notation for the conditional expectation: .

## 2 Tail Risks

The tail risk of an option trader follows from the non-linear dependency of options on the movements of the underlying asset. We consider tail scenarios parametrized by the normalized return . For example, is a “3-sigma” scenario. Moreover, we use the notation to denote a large underlying move (not literally infinite). Basically, we think about typical scenarios of 3-5 sigma. These are not Black Swan events, as they happen quite often. However, they are large enough to cause substantial losses to option traders and trigger margin calls.

Suppose we have a portfolio with some gamma exposure such that, under a tail event we have:

(2) |

where the superscript in indicates the asymptotic quadratic dependency on . As we discussed in the introduction, a trader with a short position in will be asked by the broker to put more margin than the one with a long position. This is a cost of carry, because he/she could be investing this money somewhere else. In order to compensate this trader, the profits and losses (P&L) of must have a drift in the real-world measure:

(3) |

where we expect on average. We call the convexity or gamma risk premium. Note that we do not have to know any details about this portfolio, but only its asymptotic exposure to . In fact, the key assumption of this paper is that the form of such portfolio does not matter, and that any other portfolio, say , with the same tail risk will have the same drift. In other words, derivative markets only price tail risks and not “daily” variance.

A simple example of a portfolio with gamma exposure is the front VIX future contract. In figure 1 we compare the cumulative P&L of the front short VIX contract with those of the front long SPMINI. Both P&Ls have been risk managed so that they have the same daily risk in a scale of 20 days^{1}^{1}1More precisely, let be the daily P&L of the future contract. The risk managed P&L is given by .. We can clearly see that the VIX future has a greater risk premium than the SPMINI for the same daily risk. However, it also has larger draw-downs. In figure 2 we show the residual VIX future P&L conditioned on the SPMINI future P&L^{2}^{2}2The residual P&L is defined by , where , where are the risk-managed PnLs of the VIX and SPMINI contracts respectively. . It is clear that the short VIX future has a gamma component that causes quadratically large losses for large movements of the SPMINI. This is the reason for the extra premium!

Now consider a portfolio such that,

(4) |

In equity markets, most traders are afraid of the left tail. This means that the trader with a long position in is exposed to cubic losses under a large drawdown. In such markets one expects to see a skew risk premium such that

(5) |

where on average. In FX or certain commodity markets, we do not expect to see such risk premium as market participants are equally afraid to both the left and right tail.

Note that this is a statement about risk aversion and not about the probability distribution of the market. In fact, one can argue that nobody know the true real-world probability measure. However, all of us have capital requirements that become more stringent on downside equity markets (e.g. most investors are long equities by definition).

Finally, we introduce a kurtosis risk premium:

(6) | |||||

(7) |

where we expect on average.

One can imagine higher moments, but as most option traders know, it is increasingly difficult to get such exposures due to liquidity constraints. The higher the moment, the more we need to leverage the option book and the less capacity there is for such strategy. Moreover, in the GARCH models studied below, we do not get higher order exposures if we restrict ourselves to second-order Greeks. The risk premia will turn out to be the only parameters than need to be fitted to option prices.

## 3 The GARCH(1,1) Model

In this section we study in detail the GARCH(1,1) model. This is the simplest model of the GARCH family and will serve to illustrate the main ideas. The goal of this section is to derive the pricing or martingale probability measure for this model using a tail risk argument. We begin by pricing a variance swap, and later move to price a general European contingency claim.

The GARCH(1,1) model is basically an EMA filter:

(8) | |||||

(9) | |||||

(10) |

where is the unconditional variance and is a parameter that controls the strength of the volatility autocorrelation.

### 3.1 Pricing a Variance Swap

Let’s now begin by pricing a variance swap contract with maturity date . We denote the price of such contract at time by . At expiry our variance swap pays , where is the expiry date and the number of days between and . Since it takes zero capital to enter such contract, the P&L of the variance swap between time and is given by

(11) |

We now assume that the price is a smooth function of time and the filter , . Moreover, note that the boundary condition is . Up to second order in variations of and assuming a small enough time step we have,

(12) |

This expansion will turn out to be exact in this case.

We now look at the tail risks of the variance swap. Using Eqs. (2), (6) and (10) in Eq. (12), we can decompose the asymptotic limit of the variance swap P&L as follows:

(13) |

We should emphasize that since we want to consider a general solution , we cannot compare the different terms in Eq. (13) as we do not know the magnitude of the derivatives. In fact, for the variance swap it turns out that is a linear function of and so the second derivative vanishes.

According to our argument in the previous section, any two portfolios with the same tail risks should have the same drift. Therefore, using Eqs. (3) and (7) and the asymptotics given in Eq. (13), we conclude that the drift of the variance swap must be given by

(14) | |||||

This is the fair-value equation for the variance swap under the real-world probability measure. More explicitly, we can write Eq. (14) as a PDE for :

(15) |

where

(16) | |||||

(17) | |||||

(18) | |||||

(19) |

and the boundary condition is . In writing Eq. (15) we have discarded a term quadratic in the drift of : . We find that, empirically, this is a very good approximation.

Using Feynman-Kac formula, one can write the solution to Eq. (15) in terms of a continuous time stochastic volatility process:

Note that the pricing probability measure is just a mathematical trick to solve Eq. (15). However, it is very useful in order to get analytical solutions. In fact, in this case the solution can be calculated explicitly:

(20) |

where the value of the filter is given by Eq. (9) and

Since Eq. (20) is linear in , we can see that this is an exact solution to the variance swap price to all order in vol-of-vol. In fact, the solution only depends on the convexity or gamma risk premium. Therefore, by calibrating the variance swap term structure we can obtain the value of . Moreover, notice how the gamma risk premium not only shifts the level of the varswap, but also changes the effective mean-reversion time scale, which in turn changes the slope of the term structure.

Finally, note that the level of the implied expected variance is shifted from the historical one, even at the smallest time step:

where is given by the historical estimate of Eq. (8). This invalidates the assumption of [3], who proposed that the one-period expected variance is the same in both the real-world and martingale probability measures.

### 3.2 Pricing Options

We now generalize the previous problem to price a Europen-style option with final payoff . For a delta-hedged option, the second order expansion reads:

(21) |

where is the P&L of the self-financed and delta-hedged option, and is the risk-free rate which we take to be constant. The tail risks now include a skew contribution due to the cross term . More precisely, we have:

(22) | |||||

Hence, the drift of the delta-hedged option is given by:

(23) |

which leads to the following PDE for the option price:

(24) |

where , and are defined in Eqs. (16) - (18) and

(25) |

Using Feynman-Kac formula, we can write the solutions to Eq. (24) in terms of the following stochastic process

(26) | |||||

(27) | |||||

(28) | |||||

(29) | |||||

(30) |

As in the special case of the variance swap, the probability measure is the so-called martingale or pricing measure. It is interesting to note that we have derived a direct connection between the real-world and pricing measures parametrized by the three risk premia . These are the only parameters that must be inferred from the option prices. The rest is completely determined by historical data, including the initial value of the EMA filter .

Looking at Eq. (27) we notice how the convexity risk premium makes the implied volatility higher than the historical one (on average). Moreover, we can see that this risk premium cannot be absorbed into the probability measure using a Girsanov transformation on . There is a fundamental reason for this: the presence of comes from the fact that the option P&L is marked to market in discrete time. Another way to see this is that the underlying price is a martingale both in the real-world and pricing measures. Therefore, the volatility risk premium has nothing to do with the drift of the underlying. Many authors seem to confuse the volatility risk premium with the equity risk premium. Those are two completely different quantities. In fact, there are many assets which do not have any obvious risk premium (e.g. FX rates or some commodities). However, their options still show a volatility risk premium. Therefore, any attempt to derive the pricing measure by putting a martingale condition on is doomed to fail (see e.g. [3, 6]).

The skew risk premium makes the correlation between the spot and the volatility more negative. In fact, even if the underlying distribution is symmetric, we can still have non trivial implied leverage effect due to the skew risk premium. Finally, the kurtosis risk premium makes the implied vol-of-vol higher than the historical estimate.

## 4 Including Asymmetry and Multiple Time Scales

There is a considerable number of studies that give evidence of multiple time scales in volatility auto-correlations (see for example [12, 13, 14, 15, 16, 17, 18]) . In fact, it has been argued that volatility auto-correlations decay as a power law [18]. One problem with a power-law filter is that it is non-Makovian. However, as shown in [19], one can always approximate a power law filter with multiple exponentials. Hence, in this section we study a generalized GARCH models which is a linear combination of EMA filters with different time scales.

Another stylized fact of volatility, is the so-called leverage effect [20]. In other words, for equity indices, negative returns tend to increase future volatility more than positive ones. In the context of GARCH models, this is captured by adding a filter that depends only on past negative returns [21]. Hence, we will study the following general class of models:

(33) | |||||

(34) | |||||

(35) |

where , , , and the i.i.d. noise term has zero mean and unit standard deviation. Note that we do not have constant unconditional variance in Eq. (34) as we did in the simple GARCH(1,1) model. However, we can always take one of the time scales to infinity, say . This way we can recover the usual GARCH(1,1) model for example. In practice we will take days. This way we avoid too much in-sample bias as we only use past observations and we avoid having to fit the long term unconditional variance.

In order to find the pricing measure for this model, we can go over the same argument as in section 4. However, when expanding the option P&L we now will have the following new tail risks:

(36) | |||

(37) | |||

(38) |

where is one of the asymmetric filters. We can now imagine ideal portfolios, so that

for . In order to avoid introducing more risk premia for our model, we will argue that in equity markets, investors are only afraid of large negative returns. In other words, they only value downside tail risk. Therefore, these new tail risks must have the same drift as the symmetric ones:

(39) | |||||

(40) | |||||

(41) |

In order to value an option, we assume as before that its price is a smooth function of time, the spot and the filters: . Expanding to second order in variations and taking into account the tail risks as in the previous section, we get the following PDE:

(42) |

where we have dropped terms quadratic in the drift of and we have defined the following variables:

(43) | |||||

(44) | |||||

(45) | |||||

(46) | |||||

(47) | |||||

(48) | |||||

(49) |

Moreover, the correlation between the filters is one if both are symmetric or asymmetric (), but the correlation between a symmetric and asymmetric filter is:

(50) |

for .

Using Feynan-Kac formula, we can relate the solutions of Eq. (4) to the following stochastic volatility model:

(51) | |||||

(52) | |||||

(53) | |||||

(54) | |||||

(55) | |||||

(56) |

The Brownian motions can be decomposed into a few PCA factors as follows:

where all Brownian motions on the RHS are uncorrelated, and

(58) | |||||

(59) | |||||

(60) | |||||

(61) |

Consistency of the model requires the following constraints:

(62) | |||||

(63) | |||||

(64) | |||||

(65) |

Note that condition (65) is the most stringent bound. This condition translates into a minimum bound for the kurtosis risk premium:

(66) |

Note that Eq. (66) only applies for the general case where we have both symmetric and asymmetric filters. If all filters are symmetric () we only need to enforce Eq. (62). On the other hand, if all filters are asymmetric (), we only need to impose Eq. (63).

Empirically, we find that Eq. (66) is saturated most of the time. The saturation of this condition can be interpreted as saying that, in this class of models, volatility has only one risk factor (apart from the spot moves). This makes sense, as in the discrete model, all filters are driven by the spot returns (squared). It would be interesting to generalize the model so that we generate more volatility risk factors. This can be done, for example, by adding a high-frequency filter to our volatility estimate.

To conclude this section, we will price forward variance and a variance swap for the model of Eqs. (52) - (56). We will make ample use of these results in the next section. We begin by introducing the matrix:

and its eigenvalue decomposition,

Forward variance is defined as

(67) |

For our multi-scale model, we get

Moreover, note that forward variance is a martingale:

(68) |

The variance swap is imply the integrated forward variance:

(69) | |||||

## 5 The Bergomi-Guyon Expansion

In order to find the value of the risk premia , we need to fit our stochastic volatility model to option prices. However, our model does not have an analytical solution, so we will employ the perturbative expansion developed by Bergomi and Guyon in [11]. This is basically a vol-of-vol expansion (e.g. an expansion in ). In this section we will derive closed-form formulas for various implied moments which will be used in the next section to calibrate our models. This will avoid the use of Monte Carlo simulations in the calibration. We will not address the accuracy of the expansion.

The main result of [11], is that to second order in vol-of-vol, option prices can be approximated by certain functionals of the initial forward variance curve. More precisely, let be the log-price of the underlying at the initial time , and be option price evaluated at zero vol-of-vol (). Moreover, we take to be the time to expiry. Then, to second order in vol-of-vol, we can approximate the option price as

(70) | |||||

where

(71) | |||||

(72) | |||||

(73) | |||||

(74) | |||||

(75) |

and is the option payoff at expiry. Note that all integrals are functionals of the initial forward variance curve , and their functional derivative is defined such that:

where is the Dirac delta function.

One useful special case of Eq. (70) is the moment generating function, which can be derived by using the following payoff: . We have,

(76) | |||||

(77) |

where

(78) |

Using Eqs. (52) and (68) it is straightforward to evaluate the integrals (72) - (74):

(79) | |||||

(80) | |||||

(81) |

where^{3}^{3}3In deriving Eqs. (79) - (81) we have used the following approximation: . Any corrections to this approximation will lead to at least cubic order corrections in to the equations above.

(82) | |||||

(83) | |||||

(84) | |||||