ArbitrageFree Regularization
Abstract
We introduce a pathdependent geometric framework which generalizes the HJM modeling approach to a wide variety of other asset classes. A machine learning regularization framework is developed with the objective of removing arbitrage opportunities from models within this general framework. The regularization method relies on minimal deformations of a model subject to a pathdependent penalty that detects arbitrage opportunities. We prove that the solution of this regularization problem is independent of the arbitragepenalty chosen, subject to a fixed information loss functional. In addition to the general properties of the minimal deformation, we also consider several explicit examples. This paper is focused on placing machine learning methods in finance on a sound theoretical basis and the techniques developed to achieve this objective may be of interest in other areas of application.
Keywords: HJM framework, Functional ItôCalculus, ForwardRate Curves, Stochastic Local Volatility Surfaces, Stochastic Differential Geometry, Flows, Arbitrage, Deformations, Convergence, Information Theory, Machine Learning.
Mathematics Subject Classification (2010): 91G30, 91G20, 68T05, 65J20, 60D05, 94A15.
1 Introduction
The Heath, Jarrow, and Morton [28] framework, henceforth referred to as HJM, was originally developed to model the termstructure of interest rates. In a similar manner we may consider models of an observable asset price, such such as a European call option as in [16], [8] and [34], as a function of an unobservable infinitedimensional stochastic process. In the case of a zerocoupon bond, the unobservable process is the curve of forward interest rates. In the case of European call options the unobservable process is the surface describing the volatilities of all European call options on the same underlying stock. This approach allows for a simultaneous description of the entire market and implies a drift restriction on the unobservable process characterizing the absence of arbitrage opportunities. In [8], [3], and [44] the HJM framework was extended to other asset classes.
A central computational difficulty of using the HJM framework is that it relies on an infinitedimensional process. In practice this is typically overcome by assuming a finitedimensional factor model for the unobservable process. Common examples include the Nelson and Siegel [42] model for the forwardrate curve or the SVIJW model of [23] for the stochastic volatility surface. The existence of finitedimensional factor models consistent with an HJM model was characterized in [4] using geometric methods.
Factor models may present their own shortcomings since, for example, it is not necessarily clear if they allow for arbitrage opportunities. Under a few modeling assumptions the question of when a finitedimensional factor model for the forwardrate curve allows arbitrage opportunities was characterized by [19] using geometric methods. The particular case of the NelsonSiegel model was further addressed in [18]. In Section 2 of this paper we prove a characterization of the absence of arbitrage for a geometric and pathdependent generalization of the HJM framework that is capable of modeling a wide variety asset class. Our techniques rely on the functional Itô calculus of [15] and [21]. The characterization of the absence of arbitrage, in terms of a stochastic partial differential equation (SPDE), generalizes the results of [19].
Models similar to the Nelson and Siegel [42] model, but which do not admit arbitrage, were proposed in [14] and [13]. In section 3 we considers the general problem of how a model that may admit arbitrage opportunities can be best corrected. To solve this problem we employ the machine learning technique of regularizing a loss function against a penalty term. Our regularization framework considers deformations of a given factor model and selects the particular deformation of that model which satisfies the noarbitrage condition while simultaneously retaining as much information embedded in the original factor model as possible. The retention of information is measured according the informationtheoretic functional Bregman divergence of [22]. To this end, our approach explicitly constructs an arbitrage penalty using the SPDEs of Section 2 and we prove the existence of an optimizer using convergence techniques of [6]. Further, we show that the best arbitragefree deformation of a model is unique and independent of the particular arbitrage penalty chosen once the information loss is fixed. We then show the existence of an optimal deformation of the factor model, prove several properties of our regularization operator, and close with a few examples. An appendix contains technical results and proofs. To motivate our approach we first recall two HJMtype models for the forward rate and stochastic local volatility and close this section with associated factor models.
1.1 The HJMType Framework and Geometric Functional Factor Models
The HJMtype framework has been considered by many authors such as [28, 8, 16, 34] in and outside the forwardrate modeling setting. We begin by reviewing two versions of this modeling approach.
Example 1.1 (Instantaneous ForwardRate Process).
Consider a zerocoupon bond with maturity . We denote its price at time by . Let the instantaneous forwardrate curve process be defined by
(1.1) 
for . In [28] the forwardrate curve is interpreted as the instantaneous interest rate at time as viewed from the present time . In particular is the instantaneous interest rate in effect at time called the shortrate . The process is assumed to follow an valued diffusion process
(1.2) 
where is an valued Wiener process.
In [28] an explicit drift restriction on the dynamics of is shown to characterize when the HJM model is arbitragefree. In order for the dynamics of given by equation (1.2) to be arbitragefree the drift must be expressed as a function of the volatility as
(1.3) 
The HJM noarbitrage condition of equation (1.3) was generalized in [31] to allow for forwardrate models driven by infinitedimensional Lévy processes.
Example 1.2 (Stochastic Local Volatility).
The HJM approach was used in [16] to jointly consider the dynamics of an underlying stock price process and a European call option with strike price and maturity time . The stock price and the European call option prices are connected through the stochastic local volatility surface defined by the boundary value problem
(1.4)  
where and .
Analogous to equation (1.2) the process is assumed to follow an valued diffusion process whose dynamics are described by
where is an valued Wiener process. Likewise, it was shown in [8] by making further assumptions on the dynamics of , the absence of arbitrage of the model for may be characterized as a drift restriction. The dynamics of are then modelled using the atthemoney point on the stochastic local volatility surface at the current time as
here is the instantaneous spot volatility .
1.2 Factor Models
Factor models are commonly employed in practice due to the interpretability of the driving riskfactors. Common factor models for the forwardrate curve include the Nelson and Siegel [42] model and its generalizations. Similarly, factor models for volatility surfaces such as the stochastic volatility inspired (SVI) model of [23] are reviewed here to motivate the geometric structure which forms the basis of this paper.
Example 1.3 (NelsonSiegel Model).
The Nelson and Siegel [42] model for the forwardrate or for the yield curve is given by
(1.5) 
The parameters , , , and are interpreted as level, slope, curvature, and shape parameters respectively [14]. We require that the parameters belong to the set
(1.6) 
for some parameter chosen to set a lower bound for rates.
Example 1.4 (SviJw).
Let denote the forward price process of the stock price process and denote the BlackScholes price of a European call option on with strike price , maturity , and instantaneous volatility . Here is a moneyness parameter. The BlackScholes implied volatility is denoted by and the total implied variance is defined by
The map is referred to as the volatility surface and for a fixed maturity the map is a slice of the volatility surface. Following [24] we write for a given maturity slice, dropping the dependence, where represents a set of parameters.
The original, or raw, SVI parameterization of the total implied variance is given by
(1.7) 
for all and parameters belong to the set
(1.8) 
The parametric restrictions ensure a nonnegative volatility surface and preclude calendar spread and butterfly arbitrage opportunities as shown in [24]. The parameters , , , , and control the general level of variance the slopes of both the put and call wings, the slope of the left wing with respect to the right wing, the centre of the smile and the ATM curvature respectively.
The SVIjumpwings (SVIJW) parameterization of the original SVI model was introduced in [24] to provide a method of calibrating the model in a manner that excludes static arbitrage and which is easier to interpret. In [24, Section 3.3] the SVIJW surface is obtained by imputing the functions of the maturity time of the option as well as the parameters as follows:
into the rawSVI surface
(1.9) 
here , , and the dependence of the functions on the raw parameters and is omitted for legibility. The parameters are interpreted as ATM variance, skew, left wing slope, right wing slope and minimum implied variance respectively.
Examples 1.3 and 1.4 indicate that geometric constraints on model parameters are important features which allow for realistic and parsimonious modelling of financial variables and may also characterize the absence of arbitrage. However, these examples are static and may only be used to calibrate the model to financial data at a particular point in time. In practice if one of these factor models, or any model from computational finance, is recalibrated after the arrival of new data then the model parameters will likely change and may not adhere to the modeling constraints. In the next section we consider the geometric characterization of noarbitrage in a large class of dynamic models.
2 Characterization of Arbitrage
In this section we characterize the nonexistence of arbitrage opportunities in pathdependent geometric models generalizing the HJM framework. By noarbitrage we mean there does not exist a free lunch with vanishing risk (NFLVR) in the terminology of [12]. That is, there does not exist a sequence of selffinancing portfolios that converge to a riskfree strategy with a strictly positive payoff. In [12] it was shown that if the modelled asset follows a continuous semimartingale, then the nonexistence of free lunch is equivalent to the existence of an equivalent local martingale measure.
2.1 ArbitrageFree Flows
We develop a general modeling approach which is parsimonious, generalizing the previously presented examples of factor models, and which incorporates the important properties of the HJM framework such as nonMarkovian dynamics and a driftrestriction characterizing the absence of arbitrage. To motivate our framework we consider two examples.
Example 2.1 (Dynamic NelsonSiegel).
Define the functional
(2.1)  
where is the set of constraints given in equation (1.6), , , and is an valued diffusion process.
The process , which is independent of , describes a forwardrate curve process driven by a dimensional diffusion process. This provides a dimension reduction as compared to the infinitedimensional diffusion of Example 1.1. Inverting equation (1.1) we obtain a family of nonanticipative functionals defined by
(2.2)  
where is the quadratic variation of , is the space of valued paths, and is the space of paths taking values in the set of symmetric positive definite matrices as defined in [21, Section 2.2].
The price at time of a zerocoupon bond with maturity , may be represented in terms of the Dynamic NelsonSiegel model as
(2.3) 
Example 2.2 (Dynamic SVIJW).
Define the functional
(2.4)  
where is the interior of the constraint set described in equation (1.7), , , and let be an valued diffusion process.
The process is a stochastic local volatility surface model in the sense of Example 1.2, driven by a dimensional diffusion, as opposed to the infinitedimensional stochastic local volatility of Example 1.2.
As noted in [8], equation (1.4) may be transformed into the uniformly parabolic initial value problem
(2.5)  
by performing the change of variables , and . Since is strictly positive, working with stochastic local variance is equivalent to working with . The uniform parabolicity of equation (2.5) implies that for every there exists a unique solution to the initialvalue problem (2.5). Therefore equation (1.4) can be inverted through the solution operator
(2.6)  
by mapping any to the solution of the initial value problem (2.5). Analogous to equation (2.2) we define a nonanticipative functionals by
(2.7)  
which represents the price of a European call option with maturity and strike in terms of the SVIJW model by
(2.8) 
In general, a nonlinear functional for the unobservable process plays the role of the functional of equations (2.1), and (2.4). It depends on the current time , the inputs of the unobservable process, and the dimensional parameter vector . The deterministic factor model is made more flexible by replacing the static parameters by a dimensional diffusion process parameterizing .
Similar to Examples 2.1 and 2.2, the process driving the parameters may be required to take values in a submanifold of in order to ensure certain modeling constraints are met. To ensure that does not leave in finite time we construct a new stochastic process by identifying the tangent spaces of with and rolling along the trajectory of on without slipping (see [30, Section 2.3] for details). Assuming regularity conditions listed in Appendix A, is endowed with a particular affine connection making into a stochastically and geodesically complete Riemannian manifold. Consequentially, is always defined (the details of which are included in Lemma A.1). As a conceptual example, when the processes and are indistinguishable.
In [4, Section 3.3] the authors consider an analogous setup where the functional takes values in a particular weighted Sobolev space , which is a Hilbert space, termed the space of forwardrate curves. Assuming this exact setup will cause some technical issues when deriving SPDEs characterizing the absence of arbitrage. We therefore assume that the process takes values in a Reproducing Kernel Hilbert Space (RKHS) of functions. It was shown in [1] that every RKHS is uniquely determined by a symmetric positive definite kernel. Since the quadraticvariation of a semimartingale defines a symmetric positivedefinite matrixvalued process then there is a natural RKHS associated to . We will denote this RKHS by and describe its construction in further detail in Appendix B (see [38] for details on RKHS).
Definition 2.3 (Geometric Representation).
Let denote the Lebesgue measure on , be an measurable function from to , be a connected Borel subset of , and denote the weighted Lebesgue space from to by . Furthermore let be an valued stochastic process with a.s. càdlàg paths.
The process is said to have a geometric representation with geometry if and only if there exists a dimensional Riemannian submanifold of with corresponding affine connection , for some , a functional , an valued semimartingale , a RKHS of functions from to denoted by , and a family of maps defined by
(2.9)  
satisfying the following regularity conditions:

For every in the processes and are indistinguishable a.e,

The maps are class of diffeomorphisms onto their images,

For every the map
is continuously differentiable.


For every the map
(2.10) is an element of the RKHS .
We say that is the realization of the geometry .
In general our framework allows for the maps to depend on the time parameter , which is not the case in Examples 2.1 and 2.2. If we were to include as a dimension of the manifold would result in the model being infinite dimensional. However, if instead we expand our horizon from finitedimensional manifolds to a continuously deforming finitedimensional manifold, known as a flow in [41], in this way we may still preserve finitedimensionality. To see this note that the process is consistent with the submanifold
of precisely at every time . Geometrically the family of manifolds are interpreted as smoothly deforming in time via the diffeomorphisms
(2.11)  
where is defined by equation (2.9).
We may endow each of the manifolds with a stochastically and geodesically complete Riemannian metric defined by pullingback the affine connection across the diffeomorphism (see [32] for details). This construction is an analogue of the idea of a timedependent family of manifolds, or flow, studied in [41]. We interpret the flow as deforming continuously in time to best incorporate current modeling requirements at time . We shall be primarily concerned with flows that are arbitragefree.
Example 2.4 (Bandlimited HJM models).
The space of bandlimited functions from to , denoted by , is the RKHS with underlying set of continuous functions from to satisfying
(2.12) 
Assume that is a forwardrate curve satisfying
(2.13) 
and is bandlimited in with for every . Then takes values in the set of function satisfying equation (2.12) which can be given the structure of a RKHS with reproducing kernel
(see [38, Example 4.2] for details). Let be a basis of , then equation (2.13) has weak solution (see [36, Chapter 10.3] for details on weak solutions to SDEs)
For every define the linear map by
(2.14) 
Consider the projection operator mapping onto the span of . Then, similar to [13] where the projection of an HJM model onto the span of the NelsonSiegel family was considered, we consider the projection of the weak solution of equation (2.4) onto given by
(2.15) 
Then is a geometric representation of the weak dimensional approximation of defined by equation (2.15), where is the Euclidean connection on , and are the maps defined in equation (2.9).
Assumption 2.5.
From now on we assume that solves the diffusion equation
where and are essentially bounded and predictable with respect to the canonical filtration generated by a dimensional Brownian motion .
This construction may be extended to the HJM framework inspired by the codebook methodology of [16]. The ideas in this paper may be translated to this generalization of the HJMframework which was further explored in [34] and adapted to a variety of asset classes as in [8]. Here we assume that the parameterization of the observable asset in terms of the unobservable driving process is achieved by a timedependent functional of the path of the geometry which does not look into the future (see [21, Definition 2.1]).
Following [11], recall that the horizontal (respectively, the vertical ) derivative of a nonanticipative pathdependent functional of a càdlàg path is defined by extending the path constantly in time (respectively, in space) infinitesimally and measuring the infinitesimal rate of change. Conceptually, it can be interpreted as an adaptive version of the Malliavin derivative of [37] through the intertwining formula
(see [2, Theorem 7.4.1]). In contrast to the Malliavin derivative, which is the adjoint of the nonadaptive Skorokhod integral, the vertical derivative is the adjoint of the Itôintegral and so satisfies a stochastic integration by parts formula (see [2, Table 7.1] for a summary of these facts). Moreover, the interpretation of the horizontal derivative is that it is a rough timederivative.
Following [21], denote the class of boundednesspreserving functionals of a path by . Denote the horizontal and vertical derivative operators of the path by and respectively (see Definitions 2.6 and 2.8 of [21]). Write for the class of once continuously horizontally and twice continuously vertically differentiable functionals of a path. Finally, denotes the class of all leftcontinuous functionals of the path by.
Definition 2.6 (Flow Model).
Let have the geometric representation and let be an asset price such that there exists a nonanticipative functional in satisfying
(2.16) 
where (resp. ) is the quadraticvariation of (resp. ). If satisfies the following regularity conditions

is predictable in its second argument,

, , are all in ,

are all in , and

is horizontally Lipschitz ([21])
then the triple is said to be a flow model. The process is refereed to as the observable asset, the geometric representation of is said to be the flow model’s geometry, and the nonanticipative functional is said to encode the geometry into .
Remark 2.7.
Definition 2.8 (ArbitrageFree Flow).
A flow model satisfying the property that follows arbitragefree dynamics is known as a arbitragefree flow.
2.2 Characterization of ArbitrageFree Dynamics
We make use of functional Itô calculus developed in [15] and made rigorous by [21]. In order to generalize the HJM driftrestriction on characterizing the dynamics of the observable asset arbitrage. Our results are consistent with the classical results of [28] and those of [19] in the case of the forwardrate codebook for bonds. We begin by motivating the central result in the case of the HJM forward rate and the stochastic localvolatility surface.
Proposition 2.9 (ArbitrageFree Characterization for the ForwardRate Curve Encoding).
Let is a realization of the Flow Model representing the price of a zerocoupon bond price encoded by the forwardrate curve as in Example 1.1. Moreover, the Flow Model defines an arbitragefree flow if and only if the SPDE
(2.17)  
holds, a.e.
If and we assume that and are both smooth and deterministic then Proposition 2.9 can be simplified to a PDE and we recover the consistency result of [19]. To see this, first consider the time reversal where . Then since the integral equation (2.17) must hold for all and all initial conditions of it follows that we may let . Doing so and replacing by we obtain the PDE found in [20, Proposition 9.1].
We may use the same techniques to derive a general characterization of the arbitragefree dynamics of a call option under the stochasticlocal volatility surface framework of [8] and [16].The uniform parabolicity of equation (2.5) implies that for every there exists a unique solution to the initialvalue problem (2.5). Therefore the solution operator, defined by equation (2.6), taking any to the solution of the initial value problem (2.5) is welldefined, exists, and is unique. Hence, the price of every calloption with timeto maturity and logstrike can be obtained by evaluating the function at . Define the evaluation map
(2.18)  
Similar to the bond setting, where equation (1.1) was inverted to obtain an expression for the bond price in terms of the forwardrate curve process , the composition
(2.19)  
defines a nonanticipative functional expressing the price of a call option with time to maturity and logstrike in terms of the volatility surface .
Further, assuming a geometry with representation for such that is a subset of assures that the evaluation map is a bounded linear functional. Therefore under this assumption becomes a nonanticipative boundedlinear functional from the subset of to such that
Proposition 2.10 (ArbitrageFree Characterization of the Stochastic Local Volatility Surface).
Let be a pair of timeto maturity and logstrike, let be a representation for the volatilitysurface process , and let be the nonanticipative functional defined in equation (2.19). Then defines an arbitragefree flow model for the call option if and only if for every pair the SPDE
(2.20)  
holds, a.e.
We note that Proposition 2.10 has analogous but different assumptions than the central result of [8]. Namely, the differentiability requirements for and are weakened and the dynamics for are allowed to be more general.
We close this section by giving a general result extending Propositions 2.9 and 2.10, and which will be used to derive the arbitrage penalty function in our regularization method, and two general examples which can be used to model the forward rate curves in our framework.
Theorem 2.11.
Let be a flow model with realization . Then is arbitragefree if and only if
(2.21)  
is satisfied a.e. Here is the horizontal lift of to the framebundle on with initial frame .
Proof.
See appendix. ∎
Example 2.12 (Orthonormal Affine Models).
Let be a set of twicecontinuously differentiable functions from to such their equivalence class in forms an orthonormal basis of . Define the order orthonormal geometric representation where is the map,
(2.22) 
are the coordinates of , is the usual Euclidean connection which identifies the tangent space at a points in by translation (see [9]), and is a function that is differentiable in . Since is the Euclidean connection on it follows that .
Example 2.13 (Orthonormal WaveletBasis Factor Model).
Wavelet analysis provides a locally parsimonious alternative to Fourier analysis by representing a function as a series of rapidly decaying wavelets.The series of orthonormal wavelets can form an orthonormal basis of