Optimal stopping of McKean-Vlasov diffusions via regression on particle systems

# Optimal stopping of McKean-Vlasov diffusions via regression on particle systems

Denis Belomestny  and John Schoenmakers
Duisburg-Essen University. Email: denis.belomestny@uni-due.deWeierstrass Institute of Applied Mathematics. Email: schoenma@wias-berlin.de
###### Abstract

In this paper we study optimal stopping problems for nonlinear Markov processes driven by a McKean-Vlasov SDE and aim at solving them numerically by Monte Carlo. To this end we propose a novel regression algorithm based on the corresponding particle system and prove its convergence. The proof of convergence is based on perturbation analysis of a related linear regression problem. The performance of the proposed algorithms is illustrated by a numerical example.

## 1 Introduction

Numerical solution of multidimensional optimal stopping problems remains an important and active area of research with applications in finance, operations research and control. As the underlying models are getting more and more complex, the computational issues are becoming more relevant than ever. As a matter of fact, analytic and usual finite difference methods for solving optimal stopping problems deteriorate in high-dimensional problems. Therefore attention has turned to probabilistic approaches, based on Monte Carlo based approximative dynamic programming. Historically, one of the first motivating examples was the pricing of an American (or Bermudan) put option in a Black-Scholes model. Not much later, many American options that showed up involved high dimensional underlying processes, which has led to the development of several Monte Carlo based methods in the last decades (see e.g. [11]). Pricing American derivatives, hence solving optimal stopping problems, via Monte Carlo has always been viewed as a challenging task, because it requires backward dynamic programming that seems to be incompatible with the forward structure of Monte Carlo methods. In particular much research was focused on the development of fast methods to compute approximations to the optimal exercise policy. The seminal paper of Longstaff and Schwartz [13] proposed to use a cross-sectional regression over a Monte Carlo sample to compute the conditional expectations involved in the dynamic programming algorithm. The key innovation in [13] was the use of the estimated conditional expectations for decision-making, rather than quantification of expected gains. This corresponds to replacing a regression problem with a classification one, see [9]. Originally designed for the setup of American option pricing, this algorithm, which we term RMC (Regression Monte Carlo), has become widely accepted in financial mathematics, insurance and dynamic programming settings. It has also been implemented in many proprietary valuation systems employed by the financial industry. The great success of RMC is due to its flexible and simple implementation as well as its strong empirical performance. Other eminent examples of fast approximation methods include the functional optimization approach of [1], the mesh method of [7], the regression-based approaches of [8], [16], [9] and [3].

In this paper we propose and study a simulation based method for solving an optimal stopping problem for nonlinear Markov processes of the McKean-Vlasov type. In spite of extensive literature on stochastic particle systems corresponding to MV-SDEs, generic numerical methods for solving optimal stopping problems in this context are hard to find (to the best of our knowledge). Our study is motivated by the recent theoretical developments in control problems for MV-SDEs and recent applications of MV-SDEs in financial mathematics. In this respect we mention the recent work of [14] (see also the references therein), where a general form of the Bellman principle is derived for the optimal control problems under McKean-Vlasov dynamics.

Because of dependence of the transition kernels on the marginal distributions, nonlinear Markov processes can not be sampled like standard diffusion processes. Instead the so-called interacting particle method combined with time discretisation is used to approximate them. However, unlike the standard Monte Carlo, the simulated particles are not independent. The key result in the theory of interacting particle systems is the so-called propagation of chaos result showing that the particles fulfil a kind of law of large numbers. In particular, one can prove strong convergence of the interacting particle system to the solution of the original McKean-Vlasov equation. Here we propose a fast approximation method for optimal stopping problems related to (generally multidimensional) MV- SDEs in spirit of the celebrated RMC method by Longstaff and Schwartz, which is based on the underlying particle system of essentially dependent particles, rather than a Monte Carlo sample of independent trajectories as in [13]. In this respect one can speak about the Particle-Regression-Monte-Carlo (PRMC) method. The convergence of this method is proved via a perturbation analysis of a related linear regression problem due to an i.i.d. sample of the original MV-SDE, and a recursive error analysis of the backward induction algorithm (in the spirit of [4] and [18] for the case of independent samples). From a mathematical point of view, this analysis may be considered as the main contribution of the present paper. Summing up, we provide a generic simulation based numerical approach for solving optimal stopping problems with respect to a reward that depends on a process following multidimensional MV-SDE. Although this problem may be relevant in a financial context, we note that financial terminology used in this paper is merely chosen for illustrative purposes, and that the application scope of the developed methods is not restricted to finance.

The structure of the paper is as follows. The general setup for optimal stopping in a Markovian environment is presented in Section 2. In this section we also give a concise recap of the Longstaff-Schwartz and Tsitsiklis van Roy method developed in [13] and [16], respectively. Section 3 introduces Mckean-Vlasov equations and their connection with particle systems, while Section 4 introduces a particle version of regression based backward dynamic programming in the spirit of [13]. In Section 5 we present one of our main results, Theorem 4, that deals with the convergence of the regression approach applied to (generally dependent) particles. The convergence of the PRMC algorithm is studied in Section 6. Before proceeding to a rather general perturbation analysis in Section 8 and proving Theorem 4 and Theorem 7 in Section 9.1, we present some numerical experiments in Section 7.

## 2 Optimal stopping, backward dynamic program

Let us explain the issue of optimal stopping in the context of American options as a popular illustration. An American option grants its holder the right to select time at which she exercises the option, i.e., calls a pre-specified reward or cash-flow. This is in contrast to a European option that may be exercised only at a fixed date. A general class of American option pricing problems, i.e., optimal stopping problems respectively, can be formulated with respect to an underlying -valued Markov process defined on a filtered probability space . The process is assumed to be adapted to a filtration in the sense that each is measurable. Recall that each is a -algebra of subsets of such that for In general, may describe any underlying physical or economical quantity of interest such as, for example, the outside temperature, some (not necessarily tradable) market index or price. Henceforth we restrict our attention to the case where only a finite number of stopping (exercise) opportunities are allowed (Bermudan options in financial terms). In this respect it should be noted that a continuous exercise (American) option can be approximated by such a Bermudan option with arbitrary accuracy, and so this is not a huge restriction. We now consider a pre-specified reward in terms of the discrete time Markov chain

 Zi:=Xti,i=0,…,J,

with for some given functions mapping into Note that for technical convenience and without loss of generality we exclude from the set of exercise dates. In a financial context we may assume that the reward is expressed in units of some (tradable) pricing numéraire that has initial value Euro, say. That is, if exercised at time , the option pays cash equivalent to units of the numéraire. Let denote for the set of stopping times taking values in In the spirit of contingent claim pricing in finance we then assume that, for a fair price of the corresponding Bermudan option at time in state given that the option was not exercised prior to , is its value under the optimal exercise policy,

 V∗j(z)=supτ∈TjE[gτ(Zτ)|Zj=z]=E[gτ∗j(Zτ∗j)|Zj=z],z∈Rd, (1)

hence the solution to an optimal stopping problem. In (1) we introduced an optimal stopping family (policy), which can be also expressed in the form

 τ∗j:=min{j≤l≤J:gl(Zl)≥C∗l(Zl)}, (2)

where

 C∗j(z):=E[V∗j+1(Zj+1)|Zj=z],j=1,…,J−1, \ \ and \ \ C∗J(z)≡0, (3)

are so-called continuation functions. The process is called the Snell envelope of the reward process Both the stopping family (2) and the set of continuation functions (3) satisfy the well-known Dynamic Programming Principle. In particular, for (2) we have the backward recursion

 τ∗J =J, τ∗j =j1{gj(Zj)≥C∗j(Zj)}+τ∗j+11{gj(Zj)

and for (3) it holds,

 C∗J(z) ≡0, (4) C∗j(z) =E[gτ∗j+1(Zτ∗j+1)|Zj=z] =E[max(gj+1(Zj+1),C∗j+1(Zj+1))|Zj=z],j=1,…,J−1.

A common feature of almost all existing fast approximation algorithms is that they deliver estimates for the continuation functions Here the index indicates that the above estimates are based on the set of independent “training” trajectories all starting from one point i.e., In the case of regression methods the estimates for the continuation values are obtained via regression based Monte Carlo. For example, at step one may estimate the expectation

 E[max(gj+1(Zj+1),CN,j+1(Zj+1)))Zj=z] (5)

via linear regression based on the set of paths

 (Z(i)j,max{gj+1(Z(i)j+1),CN,j+1(Z(i)j+1)}),i=1,…,N, (6)

where is the estimate for obtained in the previous step, and This approach is basically the method by Tsitsiklis and van Roy [16]. Alternatively, in the so-called Longstaff-Schwarz algorithm, one mixes the estimates of the continuation values with the corresponding estimates of the stopping times. More precisely, on each trajectory approximate stopping times are recursively constructed by first initializing for all Then, once is constructed for one computes from the sample

 (7)

an estimate of the continuation function by projection on the linear span of a set of basis functions. Subsequently, the approximate stopping times corresponding to exercise date on trajectories are defined via,

 τ(i)N,j=j1{gj(Z(i)j)≥CN,j(Zj(i))}+τ(i)N,j+11{gj(Z(i)j)

Working all the way back, we thus obtain again a set of approximate continuation functions (note that ).

###### Remark 1

It should be noted that the algorithm based on (7) is more popular than the one based on (6), because it behaves more stable in practice, particularly when the number of exercise dates is getting very large. Intuitively, this can be explained by the fact that the regression (7) is always carried out on actually realized cash-flows, rather than on approximative value functions as in (6) which may become quite unprecise because of error propagation due to a huge number of exercise dates, see for example [3] for more rigorous arguments.

Given the estimates , constructed by one of the fast approximation methods above, we can construct a lower bound (low biased estimate) for using the (generally suboptimal) stopping rule:

 τN=min{1≤j≤J:gj(Zj)≥CN,j(Zj)}

with by definition. Indeed, fix a natural number and simulate new independent trajectories of the process A low-biased estimate for can then be defined as

 VNtest,N0=1NtestNtest∑r=1gτ(r)N(Z(r)τ(r)k)

with

 τ(r)N=inf{1≤j≤J:gj(Z(r)j)≥CN,j(Z(r)j)},r=1,…,Ntest.

## 3 McKean-Vlasov equations and particle systems

Let be a finite time interval and be a complete probability space, where a standard -dimensional Brownian motion is defined. We consider a class of McKean-Vlasov stochastic differential equations (MVSDE), i.e., stochastic differential equations whose drift and diffusion coefficients may depend on the current distribution of the process, of the form:

 ⎧⎨⎩Xt=ξ+∫t0∫Rda(Xs,y)μs(dy)ds+∫t0∫Rdb(Xs,y)μs(dy)dWsμt=Law(Xt),t≥0,X0∼μ0 (9)

where is a distribution in and A popular way of simulating the MVSDE (9) is to sample the so-called particle system from -dimensional SDE

 Xi,Nt=ξi+1NN∑j=1∫t0a(Xi,Ns,Xj,Ns)ds+1NN∑j=1∫t0b(Xi,Ns,Xj,Ns)dWis (10)

for where are i.i.d copies of a r.v. distributed according the law and are independent copies of Under suitable assumptions (see, for example  [2]) one has that

 ∥∥∥sup0≤r≤T∣∣X⋅,Nr−X⋅r∣∣∥∥∥p≤CpN−1/2. (11)

In practice, -dimensional SDE system (10) cannot be solved analytically either and one has to approximate its solution by a suitable numerical integration scheme such as the Euler method, leading to a next approximation with being the size of the each Euler time step. Following [2], one then has

 ∥∥∥sup0≤r≤T∣∣X⋅,N,δr−X⋅,Nr∣∣∥∥∥p≲√δ, (12)

where involves a constant that does not depend on and

###### Remark 2

In order to fix the main ideas and to avoid a notational blow up, we assume in this paper that the system (cf. (10)) is constructed exactly, hence we neglect the numerical integration error (12). On the other hand, due to (12) it will be clear how several results in this paper have to be adapted in the case where (10) is approximated using the Euler scheme.

###### Remark 3

In fact, the solution to the MVSDE (9) may be considered as a usual non-autonomous, Markovian diffusion, since is some deterministic flow of distributions, although not explicitly known beforehand. Therefore, we may consider the stopping problem (1) with respect to the solution (9), while the standard notions of the Snell envelope and the Dynamic Programming Principle still apply. However, in contrast to the standard diffusion processes where independent trajectories of may be simulated straightforwardly by Monte Carlo, simulating of independent copies of (9) is not directly possible. As a way out, we will work with the particle system (10) of dependent particles, instead of an ensemble of independent trajectories of (9).

## 4 Dynamic programming on particle systems

In this section we describe a particle version of the Longstaff-Schwarz regression algorithm due to (7) and (8). First we run the particle system as described above and set

 Zi,Nj=Xi,NjΔ,j=0,…,J,i=1,…,N, (13)

with It should be noted that, unlike Monte Carlo, the trajectories (13) are generally dependent. We now consider an approximative dynamic programming algorithm based on the (generally dependent) paths In the spirit of the Longstaff-Schwarz algorithm we compute sequentially for approximate continuation functions and approximate stopping times That is, we initialize and and once and are constructed, is obtained from solving the minimization,

 CN,j:=argminh∈HK⎧⎨⎩1NN∑i=1(gτ(i)N,j+1(Zi,Nτ(i)N,j+1)−h(Zi,Nj))2⎫⎬⎭ (14)

(). Next is updated analogue to the scheme (8), i.e.,

 τ(i)N,j=j1{gj(Zi,Nj)≥CN,j(Zi,Nj)}+τN,j+11{gj(Zi,Nj)

Note that, for fixed the pairs

 (Zi,Nj,gτ(i)N,j+1(Zi,Nτ(i)N,j+1)), \ \ i=1,...,N, (15)

are generally dependent, but, have the same distribution for each As such (14) is indeed can be viewed as an estimator of (cf. (4)). Usually the space is taken to be the linear span of some given set of basis functions, so that the minimization (14) boils down to a linear least squares problem that can be solved via straightforward linear algebra. Let us refer to the above algorithm as the PRMC algorithm.

## 5 Regression on interacting particle systems

In this section we consider for some fixed and time a generic problem of computing the functionals of the form

 w(x)=E[f(XT)∣Xt=x], (16)

where is the solution of (9). In (16), may in general be any random time. Let be a particle system (10). Furthermore, let for each be a -dimensional linear space of functions and consider the estimate

 ˜wN:=argminh∈HK{1NN∑i=1(f(Xi,NT)−h(Xi,Nt))2}, (17)

where the dimension may depend on Let be a sequence of linearly independent basis functions and set In this section we are going to analyze the properties of the estimate Note that the random variables are generally dependent, so that the known results from regression analysis (see, e.g. [12]) can not be applied directly. Consider the truncated version of the estimate defined as where is a truncation operator defined for a generic function and a threshold as

 TMh=⎧⎨⎩M,h>M,h,−M≤f≤M,−M,h<−M. (18)

We have the following theorem.

###### Theorem 4

Assume that all functions and are globally bounded and Lipschitz continuous. That is, there exist constants such that for all

 |f(x)| ≤Mf,1KK∑k=1|ψk(x)|2≤M2K, (19) |ψk(x)−ψk(y)| ≤Lk|x−y|,|f(x)−f(y)|≤Lf|x−y|. (20)

Further suppose that

 0<ϰ∘≤λmin(ΣK)<λmax(ΣK)≤ϰ∘<∞ (21)

for all where

 ΣK=(∫ψk(x)ψl(x)μt(dx),k,l=1,…,K).

Then, it holds

 ∥TMf˜wN(x)−w(x)∥L2(μt)≲MK√K√N[d1MfℓK+d2Lf]+Mf√N[d3ℓK+√1+logN√K]+Mf√Kexp[−d4NKM2K]+infh∈HK∥h−w∥L2(μt) (22)

with

 ℓ2K:=K∑k=1L2k. (23)

In (22) the constants depend on only, denotes up to a universal constant for each term.

Theorem 4 will be proved in Section 9.1.

## 6 Convergence analysis of the PRMC algorithm

In this section we investigate the convergence properties of the PRMC regression algorithm. To this end, we modify the PRMC algorithm in such a way that our fundamental result, Theorem 4, may be applied. In fact, we follow an approach in the spirit of [18] (cf. [4]), and assume that instead of one particle sample we have at hand for independent particle samples all starting at We next assume that and are constructed in the following backward recursive way: As initialization we set and Once and are determined based on the samples then is constructed based on the samples via

 CN,j=argminh∈HKN∑i=1(gτ(i)N,j+1(Zj;i,Nτ(i)N,j+1)−h(Zj;i,Nj))2,

and, subsequently, is defined by

 τN,j:=j1{gj(Zj)≥CN,j(Zj)}+τN,j+11{gj(Zj)

for a generic dummy trajectory corresponding to the (exact) solution of (9) independent of

 Gj:=σ{Zj;N,…,ZJ−1;N}.

Let us further define

 ¯¯¯¯CN,j(z):=EGj+1[gτN,j+1(ZτN,j+1)∣∣Zj=z]. (24)

Note that the random function is -measurable while its estimate is -measurable. By running this procedure all the way down to we so end up with a sequence of approximative continuation functions and the corresponding conditional expectations The following lemma holds.

###### Lemma 5

For the conditional expectations (24) we have that,

 ∥∥¯¯¯¯CN,j−C∗j∥∥Lp(μj)≤J−1∑l=j+1∥∥CN,l−C∗l∥∥Lp(μl) (25)

with by slightly abusing notation and using Note that the inequality (25) involves -measurable objects.

###### Remark 6

It is interesting to compare the estimate (25) with similar ones in Lemma 2.3 of [18].

The following theorem states the convergence of the approximate continuation functions in the PRMC algorithm to the exact ones, respectively.

###### Theorem 7

Assume that the conditions (19), (20), (21) are fulfilled with replaced by uniformly in . By denoting the norm

 ∥⋅∥2L2(μj,P):=E[∥⋅∥2L2(μj)],

due to the unconditional expectation with respect to the “all in” probability measure one has for

 ∥∥CN,J−j−C∗J−j∥∥L2(μJ−j,P)≤ΔN,K(2+c)j, (26)

where for some with

 ϵN,K=√KN(MKℓK+η1√1+logN)+η2√Kexp[−η3NKM2K]

for some constants not depending on and

###### Example 8

The Hermite polynomial of order is given, for , by:

 Hj(x)=(−1)jex2djdxj(e−x2).

Hermite polynomials are orthogonal with respect to the weight function and satisfy: The Hermite function of order is given by:

 ψj(x)=cjHj(x)e−x2/2,cj=(2jj!√π)−1/2. (27)

The sequence is an orthonormal basis of . The infinite norm of satisfies (see Szegö [15] p.242):

 ∥ψj∥∞≤M0,M0≃1,086435/π1/4≃0.8160. (28)

Furthermore, since we derive that the condition (20) is fulfilled with

## 7 Numerical experiment

As a simple illustration of the proposed methodology, let us consider optimal stopping problem in the so-called Shimizu-Yamada model

 dXt=(aE[Xt]+bXt)dt+σdWt,X0=x0,t∈[0,T] (29)

(see [10], Section 3.10), which has the explicit solution

 Xt=x0e(a+b)t+σ∫t0eb(t−s)dWs (30)

that in turn solves the ordinary SDE

 dXt=(x0ae (a+b)t+bXt)dt+σdWt (31)

(cf. [5]). It is straightforward to show that the conditional mean and variance of (30) are given by

 E[Xt|Xs] =eb(t−s)Xs+x0ebt(eat−eas) \ \ and Var[Xt|Xs] =σ2e2b(t−s)−12b, \ \ 0≤s≤t,

respectively. That is, for we have that

 E[Xt]=x0e(a+b)t \ \ and \ \ Var[Xt]=σ2e2bt−12b. (32)

In the particular case one so has

 E[Xt|Xs] =e−a(t−s)Xs+x0(1−e−a(t−s)), (33) Var[Xt|Xs] =σ21−e−2a(t−s)2a, \ \ 0≤s≤t, \ \ a>0,

and (32) yields for all For this case the particle system (10) reads

 Xi,Nt=x0+aNN∑j=1∫t0Xj,Nsds−a∫t0Xi,Nsds+σWit,t∈[0,T]. (34)

We now consider the optimal stopping problem

 V∗0=supτ∈TjE[gτ(Xtτ)], (35)

for some reward functions where solves (29) with Since follows an ordinary SDE (31), we may compute an approximation to (35) numerically by the Longstaff-Schwarz method [13] based on independent trajectories of (31), and then compare it to the particle based Longstaff-Schwarz algorithm proposed in Section 4. Let us further consider, for illustration a Bermudan put option (in financial terms),

 gj(x)=e−rtj(x−K)+,j=0,…,J,

for some where can be interpreted as interest rate. We take the following parameters and implement the following two phase algorithm. In the first stage we run trajectories either of the particle system (34) or of the process (31). Using these trajectories, we estimate the corresponding continuation functions using linear regression with quadratic polynomials and reward functions as basis functions. In the second stage we use the estimated continuation values on a new set of testing trajectories to construct a suboptimal stopping rule and consequently a lower bound for by averaging over the testing paths. We also compute dual upper bounds using the estimated continuation functions and inner paths to approximate one step conditional expectations, see, e.g. Chapter 3 in [6]. The results for different values of are shown in Table 1.

As can be seen from Table 1, the PRMC (Particle Regression MC) performs a bit worse than RMC (Regression MC), but the difference becomes smaller as increases.

## 8 Perturbation analysis for linear regression

Consider a least squares problem of the form

 β∘=argminβ∈RdN∑i=1(Yi−β⊤Ui)2, (36)

where for are i.i.d. pairs of a random variable and a random (column) vector With and the solution of the problem (36) can be written in terms of pseudo inverses (denoted with ),

 β∘=(UU⊤)−1UY=(Z⊤Z)−1Z⊤V=Z†V. (37)

Consider now the least squares problem (36) due to a perturbation of the pairs and define and accordingly. We so consider (cf. (37))

 ˜β∘=(˜Z⊤˜Z)−1˜Z⊤˜V=˜Z†˜V (38)

and set

 ˜Z=Z+E,˜V=V+F. (39)

While the rows of and the components of are independent, the rows of the perturbation matrix and the components of the perturbation vector are generally dependent.  Also we note that we don’t assume any kind of independence between the perturbations and and the matrix and vector respectively.

###### Theorem 9

Consider the least squares problem (36) with solution (37), and its perturbation due to (39) with solution (38), respectively. Assume that in (36) are i.i.d. random vectors in such that for some a.s. Set

 E[U1U⊤1]=Σ,

so that

 Z⊤Z=1NUU⊤=1NN∑i=1UiU⊤i.

Let be the smallest eigenvalue, and be the largest eigenvalue of respectively. Then for any and we have on the set with

 C1 : \ \ λmax(Z⊤Z)<λmax(Σ)+ε, C2 : \ \ λmin(Z⊤Z)>λmin(Σ)−ε, C3 : \ \ λmin(Σ)−(2√λmax(Σ)+ε+1)∥E∥>ρ+ε, C4 : \ \ ∥E∥<1.

that

 ∥˜β∘−β∘∥≤c1(Σ,ε,ρ)∥E∥∥V∥+c2(Σ,ε,ρ)∥F∥,

where

 c1(Σ,ε,ρ) :=1ρ+2(λmax(Σ)+ε)+√λmax(Σ)+ερ2 \ \ and c2(Σ,ε,ρ) :=c1(Σ,ε,ρ)+√λmax(Σ)+ελmin(Σ)−ε.

Furthermore, for any and such that

 ε=εδ,N=M√log(2d/δ)NCλ3/2max(Σ)λmin(Σ)≤λmin(Σ)−ρ (40)

(cf. (51)), one has for the probability of

 P[C]≥1−δ−Cp⎛⎝(2√λmax(Σ)+ε+1λmin(Σ)−ε−ρ)p+1⎞⎠,

provided and are small enough (such that the above bound is positive).

Proof. Note that implies (46) in Lemma 11 and so by this Lemma,

 ∥(Z+E)†−Z†∥ ≤∥E∥ρ[1+(2∥Z∥+1)∥Z∥ρ] ≤∥E∥ρ[1+2(λmax(Σ)+ε)+√λmax(Σ)+ερ] =c1(Σ,ε,ρ)∥E∥

Thus, on one has also,

 ∥(Z+E)†∥ ≤∥(Z+E)†−Z†∥+∥Z†∥ ≤