Optimal stopping via deeply boosted backward regression
Abstract
In this note we propose a new approach towards solving numerically optimal stopping problems via boosted regression based Monte Carlo algorithms. The main idea of the method is to boost standard linear regression algorithms in each backward induction step by adding new basis functions based on previously estimated continuation values. The proposed methodology is illustrated by several numerical examples from finance.
1 Introduction
An optimal stopping problem, in finance virtually synonym with the pricing problem of an American style derivative, can be efficiently solved in low dimensions, for instance by tree methods or using deterministic numerical methods for the corresponding partial differential equation. However, many American options in practice (see e.g. [7]) involve high dimensional underlying processes and this made it necessary to develop Monte Carlo methods for pricing such options. Pricing American derivatives, hence solving optimal stopping problems via Monte Carlo is a challenging task, because this typically requires backward dynamic programming that for long time was thought to be incompatible with forward structure of the Monte Carlo methods. In recent years much research was focused on the development of efficient methods to compute approximations to the value functions or optimal exercise policy. Eminent examples include the functional optimization approach of [1], the mesh method of [4], the regressionbased approaches of [5], [8], [9], [6] and [3]. The most popular type of algorithms are with no doubt the regression ones. In fact, in many practical pricing problems, the lowdegree polynomials are typically used for regression (see [7]). The resulting least squares problem has a relatively small number of unknown parameters. However, this approach has an important disadvantage  it may exhibit too little flexibility for modeling highly nonlinear behaviour of the exercise boundary. Higherdegree polynomials can be used, but they may contain too many parameters and, therefore, either overfit the Monte Carlo sample or prohibit parameter estimation because the number of parameters is too large. One possible approach for controlling the complexity of a regression model is subset selection. The goal of subset selection is to find a subset, from a fixed full predetermined dictionary of basis functions, that corresponds to a model of the best predictive performance. Before performing the actual subset selection, one must first predefine the dictionary that will provide the basis functions for model generation. This is usually done by setting the maximum degree of a full polynomial and taking the set of its basis functions. By using subset selection, one implicitly assumes that the predefined fixed finite dictionary of basis functions contains a subset that is sufficient for a model to describe the target relation sufficiently well. The problem is that generally the required maximum degree is unknown beforehand and, since it may differ from one backward step to another, it needs to be either guessed or found by additional metasearch over the whole subset selection process.
In this paper a regression based Monte Carlo approach is developed for building sparse regression models at each backward step of the dynamic programming algorithm. This enables estimating the value function with virtually the same cost as the standard regression algorithms based on low degree polinomials but with higher precision. The additional basis functions are constructed specifically for the optimal stopping problem at hand without using a fixed predefined finite dictionary. Specifically, the new basis functions are learned during the backward induction via incorporating information from the preceding backward induction step. Our algorithm may be viewed as a method of constructing sparse nonlinear approximations of the underlying value function and in this sense it extends the literature on deep learning type algorithms for optimal stopping problems, see, for example, the recent paper [2] and references therein.
2 Main setup
An American option grants its holder the right to select the time at which she exercises the option, i.e calls a prespecified reward or cashflow. This is in contrast to a European option that may be exercised only at a fixed date. A general class of American option pricing problems, i.e. optimal stopping problems respectively, can be formulated with respect to an underlying valued Markov process defined on a filtered probability space . The process is assumed to be adapted to a filtration in the sense that each is measurable. Recall that each is a algebra of subsets of such that for Henceforth we restrict our selves to the case where only a finite number of exercise opportunities are allowed (the Bermudan case in financial terms, where for notational convenience exercise at is excluded). (In this respect it should be noted that a continuous exercise (American) option can be approximated by such a Bermudan option with arbitrary accuracy, and so this is not a huge restriction). We now consider the prespecified reward in terms of the Markov chain
for some given functions mapping into In a financial context we may and will assume that the reward is expressed in units of some (tradable) pricing numéraire that has initial value Euro, say. That is, if exercised at time , the option pays cash equivalent with units of the numéraire. Let denote the set of stopping times taking values in . A standard result in the theory of contingent claims states that a fair price of the Bermudan option at time in state given that the option was not exercised prior to , is its value under the optimal exercise policy,
(1) 
due to a corresponding martingale measure, hence the solution to an optimal stopping problem. In (1) we have to read for Note that any tradable expressed in units of the numéraire is a martingale under this measure. A common feature of many approximation algorithms is that they deliver estimates for the socalled continuation values:
(2) 
Here the index indicates that the above estimates are based on a set of independent “training” trajectories
(3) 
all starting from one point. In the case of the socalled regression methods, the estimates for (1) and (2) are obtained via the Dynamic Programming Principle:
(4)  
combined with Monte Carlo. These regression algorithms can be described as follows. Suppose that for some an estimate for is already constructed. Then in the th step one needs to estimate the conditional expectation
(5) 
where This can be done by performing regression (linear or nonlinear) on the set of paths
The whole backward procedure is trivially initialized by setting Given the estimates , we next may construct a lower bound (low biased estimate) for using the (generally suboptimal) stopping rule:
with by definition. Indeed, fix a natural number and simulate new independent trajectories of the process A lowbiased estimate for can be then defined as
(6) 
with
(7) 
3 Adaptive regression algorithms
In this section we outline our methodology for estimating the solution to (1) at time based on a set of training trajectories (3). In this respect, as a novel ingredient, we will boost the standard regression procedures by learning and incorporating new basis functions on the backward fly. As a canonical example one may consider the incorporation of as a basis function in the regression step of estimating Other possibilities are, for example, certain (spatial) derivatives of or functions directly related to the underlying exercise boundary at time for example In general one may choose a (typically small) number of suitable boosting basis functions at each step.
3.1 Enhancing basis on the fly
Let us suppose that we have at hand some fixed and a computationally cheep system of basis functions We now extend this basis at each backward regression step with an additional and sparse set of new functions that are constructed in the preceding backward step on the given training paths. The main idea is that the so boosted basis delivers more accurate regression estimate of the continuation function compared to the original basis, and at the same time remains cheap.
3.2 Backward boosted regression algorithm
Based on the training sample (3), we propose a boosted backward algorithm that in pseudoalgorithmic terms works as follows.
At time we initialize as Suppose that for is already constructed in the form
For going from down to define the new boosted regression basis via
(8) 
(as a row vector) due to a choice of the set of functions based on the previously estimated continuation value . For example, we might take and consider functions of the form
(9) 
Then consider the design matrix with entries.
(10) 
and the (column) vector
(11)  
Next compute and store
(12) 
and then set
(13)  
3.3 Spelling out the algorithm
Let us spell out the above pseudoalgorithm under the choice (9) of boosting functions in more details. In a precomputation step we first generate and save for the values
(14) 
Backward procedure
For a generic backward step we assume that the quantities
(15) 
are already constructed and stored by using the functional approximations
(16) 
with
where for are constructed and stored.
At the initial time we set Let us now assume that and proceed to time We first compute (10) and (11). The latter one, is directly obtained by (15) for and the precomputed values (14). To compute (10), we need Hence, we set
for using (15) for Next we may compute (and store) the coefficients vector (12), i.e., using (10) and (11), and formally establish (16). In order to complete the generic backward step, we now need to evaluate
(17)  
(18) 
for The first part (17) is directly obtained from the precomputation (14) and the coefficients (12) computed in this step. For the second part (18) we have that
for and Thus the terms (18) are directly obtained from (14), the coefficients (12), and (15).
Remark 1
As can be seen, each approximation nonlinearly depends on all previously estimated continuation functions and hence on all “features” In this sense our procedure tries to find a sparse deep network type approximation (with indicator or maximum as activation functions) for the continuation functions based on simulated “features”. Compared to other deep learning type algorithms (see, e.g., [2]), our procedure doesn’t require any type of timeconsuming nonlinear optimisation over highdimensional parameter spaces.
Cost estimation
The total cost needed to perform the precomputation (14) is about where denotes the maximal cost of evaluating each function and at a given point. The cost of one backward step from to can be then estimated from above by
where denotes the sum of costs due to the addition and multiplication of two reals. Hence the total cost of the above algorithm can be upper bounded by
(19) 
including the precomputation.
Remark 2
In the above cost estimation the cost of determining the maximum of two numbers is neglected.
3.4 Lower estimate based on a new realization
Suppose that the backward algorithm of Section 3.2 has been carried out, and that we now have an independent set of realizations with In view of (6) and (7), let us introduce the stopping rule
(20) 
A lower estimate of is then obtained via
(21) 
Here the index in the indicates that these objects are constructed using the simulation sample used in (3.2). As a result, (20) is a suboptimal stopping time and (21) is a lower biased estimate. Let us consider the computation of (20). The coefficient vectors were already computed in the backward algorithm above. We now have to consider the computation of for an arbitrary point at a particular time for For this we propose the following backward procedure.
Procedure for computing for arbitrary state

We first (pre)compute for and for leading to the cost of order

Next compute recursively as follows:

Initialize Once with is computed and saved, evaluate and using (9).

Compute
at a cost of order In this way we proceed all the way down to at a total cost of including the precomputation step.

Due to the procedure described above, the costs of evaluating (21), based on the worst case costs of computing (20), will be of order
Obviously, (for ) this is the same order as for the regression base backward induction procedure described in Section 3.2.
Remark 3
From the cost analysis of the boosted regression algorithm it is obviously inferable that the standard regression procedure, i.e. the regression procedure due to a fixed basis without boosting, would require a computational cost of order
for computing the regression coefficients. Hence the cost ratio due to the boosting procedure is approximately,
A subsequent lower estimate based on a new realization in the standard case would require about yielding a cost ratio
accordingly (assuming is large). From this we conclude that the boosted regression is not much more expensive than the standard one as long as is small (i.e. large), while the lower bound construction due to the boosted basis is not substantially more expensive as long as
4 Numerical examples
In this section we illustrate the performance of boosted regression based Monte Carlo algorithms by considering two option pricing problems in finance.
4.1 Bermudan cancelable swap
We first test our algorithm in the case of the socalled complex structured asset based cancelable swap. In particular, we demonstrate how to achieve a tradeoff between accuracy and computational complexity by choosing the number of basis functions.
We consider a multidimensional BlackScholes model, that is, we define the dynamic of assets under the riskneutral measure via a system of SDEs
Here are correlated dimensional Brownian motions with time independent correlations The continuously compounded interest rate and a dividend rate are assumed to be constant.
Define the asset based cancelable coupon swap. Let be a sequence of exercise dates. Fix a quantile , numbers (we assume ), and three rates . Let
i.e., is the number of assets which at time are below percents of the initial value. We then introduce the random rate
and specify the coupon to be
For pricing this structured product, we need to compare the coupons with risk free coupons over the period and thus to consider the discounted net coupon process
The product value at time zero may then be represented as the solution of an optimal stopping problem with respect to the adapted discounted cashflow, obtained as the aggregated net coupon process,
For our experiments, we choose a fiveyear option with semiannual exercise possibility, that is, we have
on a basket of assets. In detail, we take the following values for the parameters,
and
As to the basis functions, we used a constant, the discounted net coupon process and the order statistics . Table 1 shows the results of the numerical experiment comparing the lower and the corresponding dual upper bounds by the standard linear regression method with fixed basis (the second column of Table 1) and by the boosted approach described in Section 3.3 with one additional basis function The main conclusion is that the boosted regression algorithm delivers estimates of the same quality as the standard least squares approach by using much less basis functions (sparse basis). As a result the new algorithm turns out to be computationally cheaper.
Basis functions  Linear regression  

Low Estimation  High Estimation  
0  171.59(0.037)  177.24(0.061)  
173.62(0.044)  177.33(0.062)  
0.2  180.0(0.060)  199.62(0.125)  
188.01(0.055)  197.02(0.143)  
0.5  176.43(0.073)  201.21(0.189)  
183.41(0.033)  196.58(0.147)  
0.8  133.29(0.065)  158.12(0.197)  
140.17(0.061)  153.49(0.106) 
Basis functions  Linear regression &  

Low Estimation  High Estimation  
0  173.28(0.031)  177.32(0.091)  
174.33(0.036)  176.58(0.057)  
0.2  187.57(0.057)  195.09(0.121)  
188.07(0.046)  195.95(0.108)  
0.5  181.98(0.047)  194.04(0.088)  
182.93(0.057)  194.97(0.127)  
0.8  138.41(0.087)  153.08(0.106)  
139.62(0.035)  152.57(0.096) 
4.2 Bermudan MaxCal option
To illustrate the impact of including the additional basis functions, such as the indicator , we consider Bermudan option on the maximum of underlying assets, each modeled by the geometric Brownian motion,
with equal initial values , interest rate , dividend yields , volatilities and dimensional Brownian motion with independent components. The discounted payoff upon exercise at time is the defined as
where we take .
Table 2 shows a significant overall increase of the lower bound (and corresponding decrease of upper bound), for when the indicator functions are added to the set of basis functions. On the other hand, it turned out that the addition of this single basis function did not lead to an appreciable increase of computation times (see also cost analysis in Remark 3).
With indicator  Without indicator  
Lower  Upper  Lower  Upper  
10  90  85.219  86.636  85.001  86.895 
10  100  103.881  105.790  103.617  106.022 
10  110  122.793  125.141  122.612  125.176 
20  90  125.433  127.280  125.250  127.542 
20  100  149.047  151.062  148.877  151.099 
20  110  172.652  174.889  172.455  175.121 
30  90  154.070  156.109  153.968  156.007 
30  100  180.954  182.957  180.686  183.102 
30  110  207.656  210.214  207.339  210.266 
50  90  195.838  197.683  195.623  197.962 
50  100  227.052  229.356  227.042  229.535 
50  110  258.449  261.392  258.446  261.152 
100  90  263.192  265.571  263.157  265.542 
100  100  301.979  304.500  302.014  304.148 
100  110  340.745  343.917  340.831  343.470 
References
 Leif BG Andersen. A simple approach to the pricing of bermudan swaptions in the multifactor libor market model. Journal of Computational Finance, 3:5–32, 1999.
 Sebastian Becker, Patrick Cheridito, and Arnulf Jentzen. Deep optimal stopping. arXiv preprint arXiv:1804.05394, 2018.
 Denis Belomestny. Pricing bermudan options by nonparametric regression: optimal rates of convergence for lower estimates. Finance and Stochastics, 15(4):655–683, 2011.
 Mark Broadie and Paul Glasserman. Pricing americanstyle securities using simulation. Journal of Economic Dynamics and Control, 21(8):1323–1352, 1997.
 Jacques F Carriere. Valuation of the earlyexercise price for options using simulations and nonparametric regression. Insurance: mathematics and Economics, 19(1):19–30, 1996.
 Daniel Egloff et al. Monte carlo algorithms for optimal stopping and statistical learning. The Annals of Applied Probability, 15(2):1396–1432, 2005.
 Paul Glasserman. Monte Carlo methods in financial engineering, volume 53. Springer Science & Business Media, 2003.
 F.A. Longstaff and E.S. Schwartz. Valuing american options by simulation: a simple leastsquares approach. Review of Financial Studies, 14(1):113–147, 2001.
 J. Tsitsiklis and B. Van Roy. Regression methods for pricing complex american style options. IEEE Trans. Neural. Net., 12(14):694–703, 2001.