Arbitrage-Free Regularization

Arbitrage-Free Regularization

Anastasis Kratsios    Cody B. Hyndman    Anastasis KratsiosCody B. Hyndman11footnotemark: 1
Department of Mathematics and Statistics, Concordia University, 1455 Boulevard de Maisonneuve Ouest, Montréal, Québec, Canada H3G 1M8. emails:,
July 16, 2019

We introduce a path-dependent geometric framework which generalizes the HJM modeling approach to a wide variety of other asset classes. A machine learning regularization framework is developed with the objective of removing arbitrage opportunities from models within this general framework. The regularization method relies on minimal deformations of a model subject to a path-dependent penalty that detects arbitrage opportunities. We prove that the solution of this regularization problem is independent of the arbitrage-penalty chosen, subject to a fixed information loss functional. In addition to the general properties of the minimal deformation, we also consider several explicit examples. This paper is focused on placing machine learning methods in finance on a sound theoretical basis and the techniques developed to achieve this objective may be of interest in other areas of application.

Keywords: HJM framework, Functional Itô-Calculus, Forward-Rate Curves, Stochastic Local Volatility Surfaces, Stochastic Differential Geometry, Flows, Arbitrage, Deformations, -Convergence, Information Theory, Machine Learning.

footnotetext: This research was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). The authors thank Alina Stancu (Concordia University) for helpful discussions.

Mathematics Subject Classification (2010): 91G30, 91G20, 68T05, 65J20, 60D05, 94A15.

1 Introduction

The Heath, Jarrow, and Morton [28] framework, henceforth referred to as HJM, was originally developed to model the term-structure of interest rates. In a similar manner we may consider models of an observable asset price, such such as a European call option as in [16], [8] and [34], as a function of an unobservable infinite-dimensional stochastic process. In the case of a zero-coupon bond, the unobservable process is the curve of forward interest rates. In the case of European call options the unobservable process is the surface describing the volatilities of all European call options on the same underlying stock. This approach allows for a simultaneous description of the entire market and implies a drift restriction on the unobservable process characterizing the absence of arbitrage opportunities. In [8], [3], and [44] the HJM framework was extended to other asset classes.

A central computational difficulty of using the HJM framework is that it relies on an infinite-dimensional process. In practice this is typically overcome by assuming a finite-dimensional factor model for the unobservable process. Common examples include the Nelson and Siegel [42] model for the forward-rate curve or the SVI-JW model of [23] for the stochastic volatility surface. The existence of finite-dimensional factor models consistent with an HJM model was characterized in [4] using geometric methods.

Factor models may present their own shortcomings since, for example, it is not necessarily clear if they allow for arbitrage opportunities. Under a few modeling assumptions the question of when a finite-dimensional factor model for the forward-rate curve allows arbitrage opportunities was characterized by [19] using geometric methods. The particular case of the Nelson-Siegel model was further addressed in [18]. In Section 2 of this paper we prove a characterization of the absence of arbitrage for a geometric and path-dependent generalization of the HJM framework that is capable of modeling a wide variety asset class. Our techniques rely on the functional Itô calculus of [15] and [21]. The characterization of the absence of arbitrage, in terms of a stochastic partial differential equation (SPDE), generalizes the results of [19].

Models similar to the Nelson and Siegel [42] model, but which do not admit arbitrage, were proposed in [14] and [13]. In section 3 we considers the general problem of how a model that may admit arbitrage opportunities can be best corrected. To solve this problem we employ the machine learning technique of regularizing a loss function against a penalty term. Our regularization framework considers deformations of a given factor model and selects the particular deformation of that model which satisfies the no-arbitrage condition while simultaneously retaining as much information embedded in the original factor model as possible. The retention of information is measured according the information-theoretic functional Bregman divergence of [22]. To this end, our approach explicitly constructs an arbitrage penalty using the SPDEs of Section 2 and we prove the existence of an optimizer using -convergence techniques of [6]. Further, we show that the best arbitrage-free deformation of a model is unique and independent of the particular arbitrage penalty chosen once the information loss is fixed. We then show the existence of an optimal deformation of the factor model, prove several properties of our regularization operator, and close with a few examples. An appendix contains technical results and proofs. To motivate our approach we first recall two HJM-type models for the forward rate and stochastic local volatility and close this section with associated factor models.

1.1 The HJM-Type Framework and Geometric Functional Factor Models

The HJM-type framework has been considered by many authors such as [28, 8, 16, 34] in and outside the forward-rate modeling setting. We begin by reviewing two versions of this modeling approach.

Example 1.1 (Instantaneous Forward-Rate Process).

Consider a zero-coupon bond with maturity . We denote its price at time by . Let the instantaneous forward-rate curve process be defined by


for . In [28] the forward-rate curve is interpreted as the instantaneous interest rate at time as viewed from the present time . In particular is the instantaneous interest rate in effect at time called the short-rate . The process is assumed to follow an -valued diffusion process


where is an -valued Wiener process.

In [28] an explicit drift restriction on the dynamics of is shown to characterize when the HJM model is arbitrage-free. In order for the dynamics of given by equation (1.2) to be arbitrage-free the drift must be expressed as a function of the volatility as


The HJM no-arbitrage condition of equation (1.3) was generalized in [31] to allow for forward-rate models driven by infinite-dimensional Lévy processes.

Example 1.2 (Stochastic Local Volatility).

The HJM approach was used in [16] to jointly consider the dynamics of an underlying stock price process and a European call option with strike price and maturity time . The stock price and the European call option prices are connected through the stochastic local volatility surface defined by the boundary value problem


where and .

Analogous to equation (1.2) the process is assumed to follow an -valued diffusion process whose dynamics are described by

where is an -valued Wiener process. Likewise, it was shown in [8] by making further assumptions on the dynamics of , the absence of arbitrage of the model for may be characterized as a drift restriction. The dynamics of are then modelled using the at-the-money point on the stochastic local volatility surface at the current time as

here is the instantaneous spot volatility .

1.2 Factor Models

Factor models are commonly employed in practice due to the interpretability of the driving risk-factors. Common factor models for the forward-rate curve include the Nelson and Siegel [42] model and its generalizations. Similarly, factor models for volatility surfaces such as the stochastic volatility inspired (SVI) model of [23] are reviewed here to motivate the geometric structure which forms the basis of this paper.

Example 1.3 (Nelson-Siegel Model).

The Nelson and Siegel [42] model for the forward-rate or for the yield curve is given by


The parameters , , , and are interpreted as level, slope, curvature, and shape parameters respectively [14]. We require that the parameters belong to the set


for some parameter chosen to set a lower bound for rates.

Example 1.4 (Svi-Jw).

Let denote the forward price process of the stock price process and denote the Black-Scholes price of a European call option on with strike price , maturity , and instantaneous volatility . Here is a moneyness parameter. The Black-Scholes implied volatility is denoted by and the total implied variance is defined by

The map is referred to as the volatility surface and for a fixed maturity the map is a slice of the volatility surface. Following [24] we write for a given maturity slice, dropping the -dependence, where represents a set of parameters.

The original, or raw, SVI parameterization of the total implied variance is given by


for all and parameters belong to the set


The parametric restrictions ensure a non-negative volatility surface and preclude calendar spread and butterfly arbitrage opportunities as shown in [24]. The parameters , , , , and control the general level of variance the slopes of both the put and call wings, the slope of the left wing with respect to the right wing, the centre of the smile and the ATM curvature respectively.

The SVI-jump-wings (SVI-JW) parameterization of the original SVI model was introduced in [24] to provide a method of calibrating the model in a manner that excludes static arbitrage and which is easier to interpret. In [24, Section 3.3] the SVI-JW surface is obtained by imputing the functions of the maturity time of the option as well as the parameters as follows:

into the raw-SVI surface


here , , and the dependence of the functions on the raw parameters and is omitted for legibility. The parameters are interpreted as ATM variance, skew, left wing slope, right wing slope and minimum implied variance respectively.

Examples 1.3 and 1.4 indicate that geometric constraints on model parameters are important features which allow for realistic and parsimonious modelling of financial variables and may also characterize the absence of arbitrage. However, these examples are static and may only be used to calibrate the model to financial data at a particular point in time. In practice if one of these factor models, or any model from computational finance, is recalibrated after the arrival of new data then the model parameters will likely change and may not adhere to the modeling constraints. In the next section we consider the geometric characterization of no-arbitrage in a large class of dynamic models.

2 Characterization of Arbitrage

In this section we characterize the non-existence of arbitrage opportunities in path-dependent geometric models generalizing the HJM framework. By no-arbitrage we mean there does not exist a free lunch with vanishing risk (NFLVR) in the terminology of [12]. That is, there does not exist a sequence of self-financing portfolios that converge to a risk-free strategy with a strictly positive payoff. In [12] it was shown that if the modelled asset follows a continuous semi-martingale, then the non-existence of free lunch is equivalent to the existence of an equivalent local martingale measure.

2.1 Arbitrage-Free Flows

We develop a general modeling approach which is parsimonious, generalizing the previously presented examples of factor models, and which incorporates the important properties of the HJM framework such as non-Markovian dynamics and a drift-restriction characterizing the absence of arbitrage. To motivate our framework we consider two examples.

Example 2.1 (Dynamic Nelson-Siegel).

Define the functional


where is the set of constraints given in equation (1.6), , , and is an -valued diffusion process.

The process , which is independent of , describes a forward-rate curve process driven by a -dimensional diffusion process. This provides a dimension reduction as compared to the infinite-dimensional diffusion of Example 1.1. Inverting equation (1.1) we obtain a family of non-anticipative functionals defined by


where is the quadratic variation of , is the space of -valued paths, and is the space of paths taking values in the set of symmetric positive definite matrices as defined in [21, Section 2.2].

The price at time of a zero-coupon bond with maturity , may be represented in terms of the Dynamic Nelson-Siegel model as

Example 2.2 (Dynamic SVI-JW).

Define the functional


where is the interior of the constraint set described in equation (1.7), , , and let be an -valued diffusion process.

The process is a stochastic local volatility surface model in the sense of Example 1.2, driven by a -dimensional diffusion, as opposed to the infinite-dimensional stochastic local volatility of Example 1.2.

As noted in [8], equation (1.4) may be transformed into the uniformly parabolic initial value problem


by performing the change of variables , and . Since is strictly positive, working with stochastic local variance is equivalent to working with . The uniform parabolicity of equation (2.5) implies that for every there exists a unique solution to the initial-value problem (2.5). Therefore equation (1.4) can be inverted through the solution operator


by mapping any to the solution of the initial value problem (2.5). Analogous to equation (2.2) we define a non-anticipative functionals by


which represents the price of a European call option with maturity and strike in terms of the SVI-JW model by


In general, a non-linear functional for the unobservable process plays the role of the functional of equations (2.1), and (2.4). It depends on the current time , the inputs of the unobservable process, and the -dimensional parameter vector . The deterministic factor model is made more flexible by replacing the static parameters by a -dimensional diffusion process parameterizing .

Similar to Examples 2.1 and 2.2, the process driving the parameters may be required to take values in a submanifold of in order to ensure certain modeling constraints are met. To ensure that does not leave in finite time we construct a new stochastic process by identifying the tangent spaces of with and rolling along the trajectory of on without slipping (see [30, Section 2.3] for details). Assuming regularity conditions listed in Appendix A, is endowed with a particular affine connection making into a stochastically and geodesically complete Riemannian manifold. Consequentially, is always defined (the details of which are included in Lemma A.1). As a conceptual example, when the processes and are indistinguishable.

In [4, Section 3.3] the authors consider an analogous setup where the functional takes values in a particular weighted Sobolev space , which is a Hilbert space, termed the space of forward-rate curves. Assuming this exact setup will cause some technical issues when deriving SPDEs characterizing the absence of arbitrage. We therefore assume that the process takes values in a Reproducing Kernel Hilbert Space (RKHS) of functions. It was shown in [1] that every RKHS is uniquely determined by a symmetric positive definite kernel. Since the quadratic-variation of a semi-martingale defines a symmetric positive-definite matrix-valued process then there is a natural RKHS associated to . We will denote this RKHS by and describe its construction in further detail in Appendix B (see [38] for details on RKHS).

Definition 2.3 (Geometric Representation).

Let denote the Lebesgue measure on , be an -measurable function from to , be a connected Borel subset of , and denote the -weighted Lebesgue space from to by . Furthermore let be an -valued stochastic process with -a.s. càdlàg paths.

The process is said to have a geometric representation with geometry if and only if there exists a -dimensional Riemannian submanifold of with corresponding affine connection , for some , a functional , an -valued semi-martingale , a RKHS of functions from to denoted by , and a family of maps defined by


satisfying the following regularity conditions:

  1. For every in the processes and are indistinguishable -a.e,

  2. The maps are class of -diffeomorphisms onto their images,

  3. For every the map

    is continuously differentiable.

  4. For every the map


    is an element of the RKHS .

We say that is the realization of the geometry .

In general our framework allows for the maps to depend on the time parameter , which is not the case in Examples 2.1 and 2.2. If we were to include as a dimension of the manifold would result in the model being infinite dimensional. However, if instead we expand our horizon from finite-dimensional manifolds to a continuously deforming finite-dimensional manifold, known as a flow in [41], in this way we may still preserve finite-dimensionality. To see this note that the process is consistent with the sub-manifold

of precisely at every time . Geometrically the family of manifolds are interpreted as smoothly deforming in time via the diffeomorphisms


where is defined by equation (2.9).

We may endow each of the manifolds with a stochastically and geodesically complete Riemannian metric defined by pulling-back the affine connection across the diffeomorphism (see [32] for details). This construction is an analogue of the idea of a time-dependent family of manifolds, or flow, studied in [41]. We interpret the flow as deforming continuously in time to best incorporate current modeling requirements at time . We shall be primarily concerned with flows that are arbitrage-free.

Example 2.4 (Band-limited HJM models).

The space of band-limited functions from to , denoted by , is the RKHS with underlying set of continuous functions from to satisfying


Assume that is a forward-rate curve satisfying


and is band-limited in with for every . Then takes values in the set of function satisfying equation (2.12) which can be given the structure of a RKHS with reproducing kernel

(see [38, Example 4.2] for details). Let be a basis of , then equation (2.13) has weak solution (see [36, Chapter 10.3] for details on weak solutions to SDEs)

For every define the linear map by


Consider the projection operator mapping onto the span of . Then, similar to [13] where the projection of an HJM model onto the span of the Nelson-Siegel family was considered, we consider the projection of the weak solution of equation (2.4) onto given by


Then is a geometric representation of the weak -dimensional approximation of defined by equation (2.15), where is the Euclidean connection on , and are the maps defined in equation (2.9).

Assumption 2.5.

From now on we assume that solves the diffusion equation

where and are essentially bounded and predictable with respect to the canonical filtration generated by a -dimensional Brownian motion .

This construction may be extended to the HJM framework inspired by the codebook methodology of [16]. The ideas in this paper may be translated to this generalization of the HJM-framework which was further explored in [34] and adapted to a variety of asset classes as in [8]. Here we assume that the parameterization of the observable asset in terms of the unobservable driving process is achieved by a time-dependent functional of the path of the geometry which does not look into the future (see [21, Definition 2.1]).

Following [11], recall that the horizontal (respectively, the vertical ) derivative of a non-anticipative path-dependent functional of a càdlàg path is defined by extending the path constantly in time (respectively, in space) infinitesimally and measuring the infinitesimal rate of change. Conceptually, it can be interpreted as an adaptive version of the Malliavin derivative of [37] through the intertwining formula

(see [2, Theorem 7.4.1]). In contrast to the Malliavin derivative, which is the adjoint of the non-adaptive Skorokhod integral, the vertical derivative is the adjoint of the Itô-integral and so satisfies a stochastic integration by parts formula (see [2, Table 7.1] for a summary of these facts). Moreover, the interpretation of the horizontal derivative is that it is a rough time-derivative.

Following [21], denote the class of boundedness-preserving functionals of a path by . Denote the horizontal and vertical derivative operators of the path by and respectively (see Definitions 2.6 and 2.8 of [21]). Write for the class of once continuously horizontally and twice continuously vertically differentiable functionals of a path. Finally, denotes the class of all left-continuous functionals of the path by.

Definition 2.6 (Flow Model).

Let have the geometric representation and let be an asset price such that there exists a non-anticipative functional in satisfying


where (resp. ) is the quadratic-variation of (resp. ). If satisfies the following regularity conditions

  1. is predictable in its second argument,

  2. , , are all in ,

  3. are all in , and

  4. is horizontally Lipschitz ([21])

then the triple is said to be a flow model. The process is refereed to as the observable asset, the geometric representation of is said to be the flow model’s geometry, and the non-anticipative functional is said to encode the geometry into .

Remark 2.7.

The terminology ”encoding” and Definition 2.6 are inspired by the codebook terminology of [16], [8], and [34].

Definition 2.8 (Arbitrage-Free Flow).

A flow model satisfying the property that follows arbitrage-free dynamics is known as a arbitrage-free flow.

2.2 Characterization of Arbitrage-Free Dynamics

We make use of functional Itô calculus developed in [15] and made rigorous by [21]. In order to generalize the HJM drift-restriction on characterizing the dynamics of the observable asset arbitrage. Our results are consistent with the classical results of [28] and those of [19] in the case of the forward-rate codebook for bonds. We begin by motivating the central result in the case of the HJM forward rate and the stochastic local-volatility surface.

Proposition 2.9 (Arbitrage-Free Characterization for the Forward-Rate Curve Encoding).

Let is a realization of the Flow Model representing the price of a zero-coupon bond price encoded by the forward-rate curve as in Example 1.1. Moreover, the Flow Model defines an arbitrage-free flow if and only if the SPDE


holds, -a.e.

If and we assume that and are both smooth and deterministic then Proposition 2.9 can be simplified to a PDE and we recover the consistency result of [19]. To see this, first consider the time reversal where . Then since the integral equation (2.17) must hold for all and all initial conditions of it follows that we may let . Doing so and replacing by we obtain the PDE found in [20, Proposition 9.1].

We may use the same techniques to derive a general characterization of the arbitrage-free dynamics of a call option under the stochastic-local volatility surface framework of [8] and [16].The uniform parabolicity of equation (2.5) implies that for every there exists a unique solution to the initial-value problem (2.5). Therefore the solution operator, defined by equation (2.6), taking any to the solution of the initial value problem (2.5) is well-defined, exists, and is unique. Hence, the price of every call-option with time-to maturity and log-strike can be obtained by evaluating the function at . Define the evaluation map


Similar to the bond setting, where equation (1.1) was inverted to obtain an expression for the bond price in terms of the forward-rate curve process , the composition


defines a non-anticipative functional expressing the price of a call option with time to maturity and log-strike in terms of the volatility surface .

Further, assuming a geometry with representation for such that is a subset of assures that the evaluation map is a bounded linear functional. Therefore under this assumption becomes a non-anticipative bounded-linear functional from the subset of to such that

Proposition 2.10 (Arbitrage-Free Characterization of the Stochastic Local Volatility Surface).

Let be a pair of time-to maturity and log-strike, let be a representation for the volatility-surface process , and let be the non-anticipative functional defined in equation (2.19). Then defines an arbitrage-free flow model for the call option if and only if for every pair the SPDE


holds, -a.e.

We note that Proposition 2.10 has analogous but different assumptions than the central result of [8]. Namely, the differentiability requirements for and are weakened and the dynamics for are allowed to be more general.

We close this section by giving a general result extending Propositions 2.9 and 2.10, and which will be used to derive the arbitrage penalty function in our regularization method, and two general examples which can be used to model the forward rate curves in our framework.

Theorem 2.11.

Let be a flow model with realization . Then is arbitrage-free if and only if


is satisfied -a.e. Here is the horizontal lift of to the frame-bundle on with initial frame .


See appendix. ∎

Example 2.12 (Orthonormal Affine Models).

Let be a set of twice-continuously differentiable functions from to such their equivalence class in forms an orthonormal basis of . Define the order -orthonormal geometric representation where is the map,


are the coordinates of , is the usual Euclidean connection which identifies the tangent space at a points in by translation (see [9]), and is a function that is differentiable in . Since is the Euclidean connection on it follows that .

If we assume the flow model , the SPDE of Proposition 2.9 particularizes to


If we further assume that and are not functions of then equation (2.23) reduces to


Moreover equation (2.24) holds -a.e. if and only if equals to



Example 2.13 (Orthonormal Wavelet-Basis Factor Model).

Wavelet analysis provides a locally parsimonious alternative to Fourier analysis by representing a function as a series of rapidly decaying wavelets.The series of orthonormal wavelets can form an orthonormal basis of