Optimal stopping via deeply boosted backward regression

# Optimal stopping via deeply boosted backward regression

## Abstract

In this note we propose a new approach towards solving numerically optimal stopping problems via boosted regression based Monte Carlo algorithms. The main idea of the method is to boost standard linear regression algorithms in each backward induction step by adding new basis functions based on previously estimated continuation values. The proposed methodology is illustrated by several numerical examples from finance.

## 1 Introduction

An optimal stopping problem, in finance virtually synonym with the pricing problem of an American style derivative, can be efficiently solved in low dimensions, for instance by tree methods or using deterministic numerical methods for the corresponding partial differential equation. However, many American options in practice (see e.g. [7]) involve high dimensional underlying processes and this made it necessary to develop Monte Carlo methods for pricing such options. Pricing American derivatives, hence solving optimal stopping problems via Monte Carlo is a challenging task, because this typically requires backward dynamic programming that for long time was thought to be incompatible with forward structure of the Monte Carlo methods. In recent years much research was focused on the development of efficient methods to compute approximations to the value functions or optimal exercise policy. Eminent examples include the functional optimization approach of [1], the mesh method of [4], the regression-based approaches of [5], [8], [9], [6] and [3]. The most popular type of algorithms are with no doubt the regression ones. In fact, in many practical pricing problems, the low-degree polynomials are typically used for regression (see [7]). The resulting least squares problem has a relatively small number of unknown parameters. However, this approach has an important disadvantage - it may exhibit too little flexibility for modeling highly non-linear behaviour of the exercise boundary. Higher-degree polynomials can be used, but they may contain too many parameters and, therefore, either over-fit the Monte Carlo sample or prohibit parameter estimation because the number of parameters is too large. One possible approach for controlling the complexity of a regression model is subset selection. The goal of subset selection is to find a subset, from a fixed full predetermined dictionary of basis functions, that corresponds to a model of the best predictive performance. Before performing the actual subset selection, one must first predefine the dictionary that will provide the basis functions for model generation. This is usually done by setting the maximum degree of a full polynomial and taking the set of its basis functions. By using subset selection, one implicitly assumes that the predefined fixed finite dictionary of basis functions contains a subset that is sufficient for a model to describe the target relation sufficiently well. The problem is that generally the required maximum degree is unknown beforehand and, since it may differ from one backward step to another, it needs to be either guessed or found by additional meta-search over the whole subset selection process.

In this paper a regression based Monte Carlo approach is developed for building sparse regression models at each backward step of the dynamic programming algorithm. This enables estimating the value function with virtually the same cost as the standard regression algorithms based on low degree polinomials but with higher precision. The additional basis functions are constructed specifically for the optimal stopping problem at hand without using a fixed predefined finite dictionary. Specifically, the new basis functions are learned during the backward induction via incorporating information from the preceding backward induction step. Our algorithm may be viewed as a method of constructing sparse nonlinear approximations of the underlying value function and in this sense it extends the literature on deep learning type algorithms for optimal stopping problems, see, for example, the recent paper [2] and references therein.

The structure of the paper is as follows. After recalling basic facts on American options and settling the main setup in Section 2, the boosting procedure is presented in Section 3. The numerical performance is studied in Section 4.

## 2 Main setup

An American option grants its holder the right to select the time at which she exercises the option, i.e calls a pre-specified reward or cash-flow. This is in contrast to a European option that may be exercised only at a fixed date. A general class of American option pricing problems, i.e. optimal stopping problems respectively, can be formulated with respect to an underlying -valued Markov process defined on a filtered probability space . The process is assumed to be adapted to a filtration in the sense that each is measurable. Recall that each is a -algebra of subsets of such that for Henceforth we restrict our selves to the case where only a finite number of exercise opportunities are allowed (the Bermudan case in financial terms, where for notational convenience exercise at is excluded). (In this respect it should be noted that a continuous exercise (American) option can be approximated by such a Bermudan option with arbitrary accuracy, and so this is not a huge restriction). We now consider the pre-specified reward in terms of the Markov chain

 Zj:=Xtj,j=1,…,J,

for some given functions mapping into In a financial context we may and will assume that the reward is expressed in units of some (tradable) pricing numéraire that has initial value Euro, say. That is, if exercised at time , the option pays cash equivalent with units of the numéraire. Let denote the set of stopping times taking values in . A standard result in the theory of contingent claims states that a fair price of the Bermudan option at time in state given that the option was not exercised prior to , is its value under the optimal exercise policy,

 Vj(x)=supτ∈TjE[gτ(Zτ)|Zj=x],x∈Rd, (1)

due to a corresponding martingale measure, hence the solution to an optimal stopping problem. In (1) we have to read for Note that any tradable expressed in units of the numéraire is a martingale under this measure. A common feature of many approximation algorithms is that they deliver estimates for the so-called continuation values:

 Cj(x):=E[Vj+1(Zj+1)|Zj=x],j=1,…,J−1. (2)

Here the index indicates that the above estimates are based on a set of independent “training” trajectories

 (Z(i)1,…,Z(i)J), \ \ i=1,…,N, (3)

all starting from one point. In the case of the so-called regression methods, the estimates for (1) and (2) are obtained via the Dynamic Programming Principle:

 VJ(x) =gJ(x), \ \ CJ(x)=0, (4) Cj(x) =E[Vj+1(Zj+1)|Zj=x], \ \ 1≤j

combined with Monte Carlo. These regression algorithms can be described as follows. Suppose that for some an estimate for is already constructed. Then in the th step one needs to estimate the conditional expectation

 E[VN,j+1(Zj+1))|Zj=x], (5)

where This can be done by performing regression (linear or nonlinear) on the set of paths

 (Z(i)j,VN,j+1(Z(i)j+1)),i=1,…,N.

The whole backward procedure is trivially initialized by setting Given the estimates , we next may construct a lower bound (low biased estimate) for using the (generally suboptimal) stopping rule:

 τN=min{1≤j≤J:gj(Zj)≥CN,j(Zj)},

with by definition. Indeed, fix a natural number and simulate new independent trajectories of the process A low-biased estimate for can be then defined as

 VNtest,N0=1NtestNtest∑r=1gτ(r)N(Z(r)τ(r)k) (6)

with

 τ(r)N=inf{1≤j≤J:gj(Z(r)j)≥CN,j(Z(r)j)}. (7)

In this section we outline our methodology for estimating the solution to (1) at time based on a set of training trajectories (3). In this respect, as a novel ingredient, we will boost the standard regression procedures by learning and incorporating new basis functions on the backward fly. As a canonical example one may consider the incorporation of as a basis function in the regression step of estimating Other possibilities are, for example, certain (spatial) derivatives of or functions directly related to the underlying exercise boundary at time for example In general one may choose a (typically small) number of suitable boosting basis functions at each step.

### 3.1 Enhancing basis on the fly

Let us suppose that we have at hand some fixed and a computationally cheep system of basis functions We now extend this basis at each backward regression step with an additional and sparse set of new functions that are constructed in the preceding backward step on the given training paths. The main idea is that the so boosted basis delivers more accurate regression estimate of the continuation function compared to the original basis, and at the same time remains cheap.

### 3.2 Backward boosted regression algorithm

Based on the training sample (3), we propose a boosted backward algorithm that in pseudo-algorithmic terms works as follows.

At time we initialize as Suppose that for is already constructed in the form

 CN,j(x)=K∑k=1γN,jkψk(x)+b∑k=1γN,jk+KνN,jk(x) \ \ for some \ γN,j∈RK+b.

For going from down to define the new boosted regression basis via

 ΨN,j−1(x):=(ψ1(x),…,ψK(x),νN,j−11(x),…,νN,j−1b(x)) (8)

(as a row vector) due to a choice of the set of functions based on the previously estimated continuation value . For example, we might take and consider functions of the form

 νN,j−11(x)=max(gj(x),CN,j(x)), \ \ νN,j−12(x)=1{gj(x)−CN,j(x)>0}. (9)

Then consider the design matrix with entries.

 Mj−1mk:=ΨN,j−1k(Z(m)j−1), \ \ m=1,…,N, k=1,…,K+b, (10)

and the (column) vector

 Vj =(VN,j(Z(1)j),…,VN,j(Z(N)j))⊤ (11) =(max(gj(Z(1)j),CN,j(Z(1)j)),…,max(gj(Z(N)j),CN,j(Z(N)j)))⊤.

Next compute and store

 γN,j−1:=((Mj−1)⊤Mj−1)−1(Mj−1)⊤Vj, (12)

and then set

 CN,j−1(x) =ΨN,j−1(x)γN,j−1 (13) =K∑k=1γN,j−1kψk(x)+b∑k=1γN,j−1k+KνN,j−1k(x).

### 3.3 Spelling out the algorithm

Let us spell out the above pseudo-algorithm under the choice (9) of boosting functions in more details. In a pre-computation step we first generate and save for the values

 ψk(Z(m)j), \ \ gi(Z(m)j), \ \ 1≤j≤i≤J, \ \ 1≤k≤K. (14)

#### Backward procedure

For a generic backward step we assume that the quantities

 CN,j(Z(m)l), \ \ 0≤l≤j, \ \ m=1,...,N, (15)

are already constructed and stored by using the functional approximations

 CN,j(x) =K∑k=1γN,jkψk(x)+γN,jK+1νN,j1(x)+γN,jK+2νN,j2(x) (16)

with

where for are constructed and stored.

At the initial time we set Let us now assume that and proceed to time We first compute (10) and (11). The latter one, is directly obtained by (15) for and the pre-computed values (14). To compute (10), we need Hence, we set

 νN,j−11(Z(m)j−1) =max(gj(Z(m)j−1),CN,j(Z(m)j−1)), νN,j−12(Z(m)j−1) =1{gj(Z(m)j−1)−CN,j(Z(m)j−1)≥0}

for using (15) for Next we may compute (and store) the coefficients vector (12), i.e., using (10) and (11), and formally establish (16). In order to complete the generic backward step, we now need to evaluate

 CN,j−1(Z(m)l)=K∑k=1γN,j−1kψk(Z(m)l) (17) +γN,j−1K+1νN,j−11(Z(m)l)+γN,j−1K+2νN,j−12(Z(m)l), (18)

for The first part (17) is directly obtained from the pre-computation (14) and the coefficients (12) computed in this step. For the second part (18) we have that

 νN,j−11(Z(m)l) =max(gj(Z(m)l),CN,j(Z(m)l)), νN,j−12(Z(m)l) =1{gj(Z(m)l)−CN,j(Z(m)l)≥0},

for and Thus the terms (18) are directly obtained from (14), the coefficients (12), and (15).

###### Remark 1

As can be seen, each approximation nonlinearly depends on all previously estimated continuation functions and hence on all “features” In this sense our procedure tries to find a sparse deep network type approximation (with indicator or maximum as activation functions) for the continuation functions based on simulated “features”. Compared to other deep learning type algorithms (see, e.g., [2]), our procedure doesn’t require any type of time-consuming nonlinear optimisation over high-dimensional parameter spaces.

#### Cost estimation

The total cost needed to perform the pre-computation (14) is about where denotes the maximal cost of evaluating each function and at a given point. The cost of one backward step from to can be then estimated from above by

 NK2c∗ \ \ due to computation of (???) NKjc∗ \ \ due to the construction of (???)+(% ???),

where denotes the sum of costs due to the addition and multiplication of two reals. Hence the total cost of the above algorithm can be upper bounded by

 12NJ2cf+NJKcf+NJK2c∗+12NJ2Kc∗ (19)

including the pre-computation.

###### Remark 2

In the above cost estimation the cost of determining the maximum of two numbers is neglected.

### 3.4 Lower estimate based on a new realization

Suppose that the backward algorithm of Section 3.2 has been carried out, and that we now have an independent set of realizations with In view of (6) and (7), let us introduce the stopping rule

 τN=inf{j:1≤j≤J, \ \ gj(Zj)≥CN,j(Zj)}. (20)

A lower estimate of is then obtained via

 V0–––:=1NtestNtest∑m=1gτ(m)N(˜Z(m)τ(m)N). (21)

Here the index in the indicates that these objects are constructed using the simulation sample used in (3.2). As a result, (20) is a suboptimal stopping time and (21) is a lower biased estimate. Let us consider the computation of (20). The coefficient vectors were already computed in the backward algorithm above. We now have to consider the computation of for an arbitrary point at a particular time for For this we propose the following backward procedure.

#### Procedure for computing CN,j(Z) for arbitrary state z.

1. We first (pre-)compute for and for leading to the cost of order

2. Next compute recursively as follows:

1. Initialize Once with is computed and saved, evaluate and using (9).

2. Compute

 CN,l−1(Z)=K∑k=1γN,l−1kψk(Z)+γN,l−1K+1νN,l−11(Z)+γN,l−1K+2νN,l−12(Z)

at a cost of order In this way we proceed all the way down to at a total cost of including the pre-computation step.

Due to the procedure described above, the costs of evaluating (21), based on the worst case costs of computing (20), will be of order

 NtestJKcf+12J2Ntestcf+12NtestKJ2c∗.

Obviously, (for ) this is the same order as for the regression base backward induction procedure described in Section 3.2.

###### Remark 3

From the cost analysis of the boosted regression algorithm it is obviously inferable that the standard regression procedure, i.e. the regression procedure due to a fixed basis without boosting, would require a computational cost of order

 NJKcf+NJK2c∗

for computing the regression coefficients. Hence the cost ratio due to the boosting procedure is approximately,

 Cost for coefficients of the boosted regressionCost for % coefficients of the standard regression=1+J2K

A subsequent lower estimate based on a new realization in the standard case would require about yielding a cost ratio

 1+J21+Kc∗/cfK≈1+J2c∗/cf

accordingly (assuming is large). From this we conclude that the boosted regression is not much more expensive than the standard one as long as  is small (i.e. large), while the lower bound construction due to the boosted basis is not substantially more expensive as long as

## 4 Numerical examples

In this section we illustrate the performance of boosted regression based Monte Carlo algorithms by considering two option pricing problems in finance.

### 4.1 Bermudan cancelable swap

We first test our algorithm in the case of the so-called complex structured asset based cancelable swap. In particular, we demonstrate how to achieve a trade-off between accuracy and computational complexity by choosing the number of basis functions.

We consider a multi-dimensional Black-Scholes model, that is, we define the dynamic of assets under the risk-neutral measure via a system of SDEs

 dXl(t)=(ρ−δ)Xl(t)dt+σlXl(t)dWl(t),0≤t≤T,l=1,…,d.

Here are correlated -dimensional Brownian motions with time independent correlations The continuously compounded interest rate and a dividend rate are assumed to be constant.

Define the asset based cancelable coupon swap. Let be a sequence of exercise dates. Fix a quantile , numbers (we assume ), and three rates . Let

 N(i)=#{l:1≤l≤d, Xl(ti)≤(1−α)Xl(0)},

i.e., is the number of assets which at time are below percents of the initial value. We then introduce the random rate

 a(i)=s11{N(i)≤n1}+s21{n1

and specify the -coupon to be

 C(i)=a(i)(ti−ti−1).

For pricing this structured product, we need to compare the coupons with risk free coupons over the period and thus to consider the discounted net coupon process

 C(i)=e−rti(er(ti−ti−1)−1−C(i)),i=1,…,J.

The product value at time zero may then be represented as the solution of an optimal stopping problem with respect to the adapted discounted cash-flow, obtained as the aggregated net coupon process,

 V0=supτ∈{1,…,J}E[Zτ],Zj:=j∑i=1C(i).

For our experiments, we choose a five-year option with semiannual exercise possibility, that is, we have

 J=10, \ \ ti−ti−1=0.5, \ \ 1≤i≤10,

on a basket of assets. In detail, we take the following values for the parameters,

 d=20,r=0.05,δ=0,σl=0.2,Xl(0)=100,1≤l,m≤20,d1=5,d2=10,α=0.05,s1=0.09,s2=0.03,s3=0,

and

 ρlm={ρ,l≠m,1,l=m.

As to the basis functions, we used a constant, the discounted net coupon process and the order statistics . Table 1 shows the results of the numerical experiment comparing the lower and the corresponding dual upper bounds by the standard linear regression method with fixed basis (the second column of Table 1) and by the boosted approach described in Section 3.3 with one additional basis function The main conclusion is that the boosted regression algorithm delivers estimates of the same quality as the standard least squares approach by using much less basis functions (sparse basis). As a result the new algorithm turns out to be computationally cheaper.

### 4.2 Bermudan MaxCal option

To illustrate the impact of including the additional basis functions, such as the indicator , we consider Bermudan option on the maximum of underlying assets, each modeled by the geometric Brownian motion,

 Zkj=zk0exp{(r−δk−σ2k/2)tj+σkWktj},k=1,…,d,

with equal initial values , interest rate , dividend yields , volatilities and -dimensional Brownian motion with independent components. The discounted payoff upon exercise at time is the defined as

 gj(x)=[max1≤j≤dxj−K]+exp(−rtj),

where we take .

Table 2 shows a significant overall increase of the lower bound (and corresponding decrease of upper bound), for when the indicator functions are added to the set of basis functions. On the other hand, it turned out that the addition of this single basis function did not lead to an appreciable increase of computation times (see also cost analysis in Remark 3).

### References

1. Leif BG Andersen. A simple approach to the pricing of bermudan swaptions in the multi-factor libor market model. Journal of Computational Finance, 3:5–32, 1999.
2. Sebastian Becker, Patrick Cheridito, and Arnulf Jentzen. Deep optimal stopping. arXiv preprint arXiv:1804.05394, 2018.
3. Denis Belomestny. Pricing bermudan options by nonparametric regression: optimal rates of convergence for lower estimates. Finance and Stochastics, 15(4):655–683, 2011.
4. Mark Broadie and Paul Glasserman. Pricing american-style securities using simulation. Journal of Economic Dynamics and Control, 21(8):1323–1352, 1997.
5. Jacques F Carriere. Valuation of the early-exercise price for options using simulations and nonparametric regression. Insurance: mathematics and Economics, 19(1):19–30, 1996.
6. Daniel Egloff et al. Monte carlo algorithms for optimal stopping and statistical learning. The Annals of Applied Probability, 15(2):1396–1432, 2005.
7. Paul Glasserman. Monte Carlo methods in financial engineering, volume 53. Springer Science & Business Media, 2003.
8. F.A. Longstaff and E.S. Schwartz. Valuing american options by simulation: a simple least-squares approach. Review of Financial Studies, 14(1):113–147, 2001.
9. J. Tsitsiklis and B. Van Roy. Regression methods for pricing complex american style options. IEEE Trans. Neural. Net., 12(14):694–703, 2001.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minumum 40 characters