A Dynamic Programming Principle for DistributionConstrained Optimal Stopping
Abstract.
We consider an optimal stopping problem where a constraint is placed on the distribution of the stopping time. Reformulating the problem in terms of socalled measurevalued martingales allows us to transform the marginal constraint into an initial condition and view the problem as a stochastic control problem; we establish the corresponding dynamic programming principle.
email: sigrid.kaellblad@tuwien.ac.at
1. Introduction
We consider the following optimal stopping problem with a constraint placed on the distribution of the stopping time: given a probability measure on and a filtered probability space supporting a Brownian motion , we aim at finding
(1.1) 
where is the set of stopping times with distribution and is a given measurable cost function satisfying . The problem is related to the socalled inverse firstpassagetime problem which has a long history; it has also attracted recent attention: see e.g. [3, 5, 11], to which we refer for further motivation, references and an exposition of its role within financial and actuarial mathematics.
When the underlying filtration is general enough to allow for an independent uniformly distributed random variable, problem (1.1) is equivalent to its weak formulation where the supremum is also taken over potential probability spaces. This observation underlies the approach in Beiglböck et al. [5]; specifically, identifying stopping times with measures on a certain canonical product space, the existence of an optimiser along with a monotonicity principle characterising the support set of any optimiser is obtained. In turn, for certain classes of cost functions, the latter is used to deduce the existence of socalled barriertype solutions. Herein, assuming that the filtration is general enough in the above sense, and imposing certain regularity but notably no specific structure on the costfunction, we reformulate the problem in terms of socalled measurevalued martingales which enables addressing the problem as a stochastic control problem; we establish the corresponding dynamic programming principle.
The notion of measurevalued martingales was used by Cox and Källblad [9] to study a robust pricing problem where optimisation takes places over a class of martingales (potential market models) satisfying a given terminal marginal constraint (guaranteeing fit to given market data). Each martingale was then identified with the process specifying the conditional distribution of its terminal value. The latter, naturally, belongs to a class of processes taking values in the space of probability measures on and satisfy a certain martingale property; they are referred to as measurevalued martingales (MVMs). Crucially, the terminal marginal constraint was then transformed into an initial condition for the corresponding MVMs, which allowed for the problem to be viewed as a stochastic control problem where the conditional law of the underlying appeared as an additional stateprocess. The problem can thus be addressed via the dynamic programming approach, and in [9] the DPP was established and a HJB equation deduced for the value function.
The same approach may naturally be applied also to the optimal stopping problem (1.1) and the aim herein is to specify this and prove the associated DPP. Specifically, each stopping time in is identified with the MVM defined as its conditional distribution given the current information:
any such process will satisfy the initial condition along with a martingale property and a certain adaptedness condition corresponding to the stoppingtime property of . When reformulating the optimal stopping problem as an optimisation problem over such MVMs, the distributionconstraint is then incorporated as an initial condition which allows the problem to be addressed as a stochastic control problem; our main result establishes that the dynamic programming principle holds for this problem.
We note that in Ankirchner et al. [2] and Miller [20], the optimal stopping problem was studied under a constraint on the expected value on the stopping time. The conditional expected value of the terminal constraint was then incorporated as an additional stateprocess and the problem addressed via the use of BSDEs (cf. also [6]); at an abstract level our approach thus bears a certain resemblance to theirs. The distributionconstrained problem that we consider has also been studied from a similar perspective by Bayraktar and Miller [3]. Therein, however, the authors consider the strong problem formulation and restrict to the case of atomic constraints whereas we let be a general probability measure on ; we also allow for more general cost functions. We note that our method of proof is distinct from theirs. Indeed, the approach in [3] uses that any point in the unit simplex can be written as a linear combination of a finite number of points whose convex hull includes that point; a method which, to us, seems difficult to generalise to arbitrary constraints. Moreover, since they consider the strong formulation, their filtration is generated by the Brownian motion alone and additional randomness is obtained by conditioning on the past of the Brownian motion itself, an approach which seems difficult to generalise beyond nonpathdependent cost functions. Notably, however, as a consequence of our results herein combined with theirs, we obtain that for the class of cost functions considered in [3], the problem when restricting to the Brownian filtration coincides with the weak formulation. Hence, aposteriori, we recover their main result as a special case of ours.
To prove the DPP for (the MVM version of) problem (1.1) we here consider its weak formulation on the associated canonical path space; the canonical framework has previously successfully been used for the study of stochastic control problems by e.g. El Karoui and Tan [17, 18], see also [21, 22, 25]. To account for the fact that we are dealing with measurevalued martingales, (a component of) our canonical space consists of (continuous) functions from into the space of probability measures on , where we equip the latter with the topology induced by the first Wasserstein distance rendering it a Polish space. We may then establish the DPP by proving analyticity, and stability under concatenation and disintegration, of a certain set of measures, and, in turn, apply a version of Jankovvon Neumann’s measurable selection theorem. Underlying this approach lies the fact that problem (1.1) is equivalent to its weak formulation; in particular, although there are alternatives to this convention, we herein let the MVMs correspond to the conditional distribution of the (disintegrated) randomised stopping times.
Apriori assuming continuity of the valuefunction, and restricting to a class of cost functions admitting a certain (Markovian) structure, we also provide an alternative proof of the DPP. Although this result is a special case of our main result, we choose to report this independent argument for we find it of interest. Specifically, following Bouchard and Touzi [7] (see also [6]), the idea is to exploit the continuity properties of the value function in order to explicitly construct an approximately optimal kernel and thus circumvent the need for measurable selection arguments. To deal with the measurevalued argument we here borrow ideas from optimal transport theory – the argument crucially relies on the structure of the first Wasserstein distance. Notably, the class of cost functions for which this argument applies includes the ones considered in [3].
The DPP aside, we also study the stability of the distributionconstrained optimal stopping problem in that we establish continuity properties of the value of the problem as a function of the marginal constraint. First, following the arguments used to prove existence of an optimiser in [5], we obtain upper semicontinuity of this mapping under rather general assumptions on the cost function. Imposing additional regularity on the cost function, we also establish a certain ’rightcontinuity’ of the value function; the result is of interest since it ensures that the value of the problem with a general constraint can be obtained as the limit of a sequence of approximating atomic problems which are easier to address by use of numerical methods.
The remainder of the article is organised as follows: in Section 2, we introduce our problem of study and establish the continuity properties. In Section 3.1 we reformulate the problem in terms of socalled adapted measurevalued martingales; we provide the DPP in Section 3.2 and its proof in Section 3.3. The alternative proof of the DPP is deferred to the Appendix.
2. The distributionconstrained optimal stopping problem
2.1. The problem of study: definition and first remarks
Given a fixed law , where denotes the set of probability measures on with finite first moment, and a filtered probability space supporting a Brownian motion and an independent measurable random variable , we consider the following distributionconstrained optimal stopping problem:
(2.1) 
where is the set of stopping times with given distribution , and is a given measurable cost function from to satisfying . We assume throughout that is well defined and for all ; and that for all , there exists some such that . In particular, defined in (2.1) is thus a function from to .
Preliminaries on the weak problem formulation
For our approach it is crucial that the distributionconstrained problem (2.1) is in fact equivalent to the corresponding weak formulation. To formalise this, following [4] and [5], we will make use of the notion of randomised stopping times; see however also [13, 16, 17, 18].
Let denote the space of continuous functions from to starting in zero, equipped with the topology of uniform convergence on compact sets; we denote the Wiener measure on by , and let the filtration be the usual augmentation of the canonical filtration.
Following [5], see also [4] to which we refer for further details, we then define the set of probability measures on the product space , referred to as randomised stopping times with prespecified law
where denotes the disintegration of in the first coordinate, and is the cumulative distribution function associated with .
Denoting by the canonical process on , according to Lemma 3.11 in [4], the distributionconstrained problem may then be viewed as an optimisation problem over randomised stopping times on the canonical space; specifically, given that supports a Brownian motion and an independent uniformly distributed random variable, problem (2.1) is equivalent to the problem of maximising
In particular, the specific choice of the probability space has thus no bearing on the problem. Hence, denoting by the set of all terms of the form such that is a probability space in which is a BM and a stopping time with , the value remains the same if optimising
(2.2) 
this further illustrates that we are indeed dealing with a weak problem formulation.
Since every randomised stopping time admits a characterisation in terms of its disintegration in the first variable, , working on the fixed probability space , still denoting the canonical process by , our problem may also equivalently be formulated as optimising
(2.3) 
over disintegrated kernels of . This last formulation emphasises the following intuitive viewpoint on randomised stopping times: while a standard stopping time assigns a single time to stop to each path, a stopping time depending on some external randomisation may be viewed as a distribution specifying the probability to stop at various points given the observation of a specific path.
Since the above formulations are all equivalent, we are free to switch between them and will consider the formulation most convenient in each individual situation.
2.2. The dependence on the constraint: stability of the value function
In this section we study the dependence of the problem on the marginal constraint in that we establish continuity properties of the mapping . Recall that the existence of an optimiser to Problem (2.1) was established in [5]; a slight variation of their argument yields upper semicontinuity of the value function:
Proposition 2.1.
Suppose that the cost function is bounded from above and that is upper semicontinuous for a.e. . Then, is upper semicontinuous on in the topology induced by , the first Wasserstein metric.
Proof of Proposition 2.1.
Let be a sequence converging to some ; w.l.o.g. let nonincreasing in . In turn, let a sequence such that and
(2.4) 
We first show that the sequence is tight; for this it suffices to show that its respective projections onto and are tight. The projections onto all coincide with the Wiener measure and are thus trivially tight. On the other hand, . For any , using that converges to , we may now choose an , such that , for all ; it follows that also the projections onto are tight.
Let be an accumulation point; by passing if necessary to a subsequence, we may assume that . Since is trivially continuous, it follows that . Further, using the continuity of combined with the fact that , we obtain for any ,
Hence, . Finally, using Theorem 3.8 (3) in [4] we obtain .
Next, according to Proposition 2.4 in [15] (see also Lemma 4.2 in [13]), on the space , the weak convergence topology coincides with the stable convergence topology, which is the coarsest topology under which is continuous for all bounded measurable functions such that is continuous for all . Hence, by use of the assumption (and Portmanteau’s lemma), we have
(2.5) 
which combined with (2.4) yields the required u.s.c. ∎
Since the value function is concave
Assumption 2.2.
Given a probability space supporting a BM , and a stopping time , there exists a modulus of continuity such that for any stopping time ,
(2.6) 
Example 2.3.
Assumption 2.2 holds for example in either of the following cases:

for some Hölder continuous function , for then ;

for some Hölder continuous function , where , since by Doob’s inequality, ;

the above two cases when is replaced by a local martingale with (i.e. evolving ’slower’ than a Brownian motion);

when for a concave modulus of continuity .
We denote by the set of measures in which may be obtained from by moving mass to the right; more precisely, if and is an disintegration in the first variable of an optimal coupling with respect to between and , then has support only on . We then have the following result; we note that its proof bears resemblance to the proof of Lemma 3.1 in [9].
Proposition 2.4.
Suppose that Assumption 2.2 holds. Let , , be a sequence such that . Then, .
Proof of Proposition 2.4.
Fix and take such that . In turn, let such that , and let the family of disintegrated kernels for which . Let independent of . Denoting by the rightcontinuous inverse of , we then define
Since is measurable and , on the enlarged space , we then have that is a stopping time with . Moreover,
(2.7)  
We may thus choose such that , for all , where is the modulus of continuity given in Assumption 2.2; applying the latter assumption we then obtain
Since was arbitrarily chosen, we obtain , which combined with Proposition 2.1 yields the result. ∎
3. Measurevalued martingales and the DPP
3.1. Measurevalued martingales and alternative problem formulation
The notion of measurevalued martingales was used in [9] to address a robust pricing problem where optimisation took place over a set of martingales satisfying a given marginal constraint. By reformulating the problem in terms of measurevalued martingales, the constraint was turned into an initial condition for an additional measurevalued state process, which enabled addressing the problem as a stochastic control problem. The aim herein in to apply the same approach to the distributionconstrained optimal stopping problem. To see that this is indeed natural, note that for a given stopping time , defining
we obtain a process taking its values in the set of probability measures on , with and ; it is also a martingale in a sense to be made precise (cf. Definition 3.1 below). In addition, the fact that is a stopping time implies that has the following property: denoting by , it holds that
(3.1) 
that is, either or a.s. Notably, there is a onetoone correspondence between the family of stopping times and such measurevalued processes for the former may be recovered from the latter via . In principle, the distributionconstrained optimal stopping problem may thus, equivalently, be formulated as an optimisation problem over measurevalued martingales satisfying condition (3.1) and the initial condition .
For our distributionconstrained stopping problem (2.1), we need however to consider a larger class of stopping times than the ones discussed above. The way we choose to formalise this, is that we take the conditional distribution of the disintegrated randomised stopping time as our controlled variable. From the perspective of the problem formulation in (2.1), in the simplest case where is generated by a Brownian motion initially enlarged by an independent uniformly distributed random variable, this corresponds to identifying a stopping time with the process yielding its conditional law given the Brownian filtration alone; see however Remark 3.5 below for an alternative approach. More precisely, recall from (2.3) that working on the space , we may choose to view our problem as an optimisation problem over kernels corresponding to disintegrations of (adapted) couplings between and . Viewing each such kernel as a valued measurable random variable satisfying the constraint that under it averages to , and aiming at including the latter constraint as an initial condition, we will identify each such kernel with the process yielding its conditional distribution. This allows us to view the problem as an optimisation problem over measurevalued martingales satisfying the constraint and a suitable adaptedness condition. Specifically, we define the following class of optimisation objects:
Definition 3.1.
Given a filtered probability space supporting an adapted process with , we say that

the process is a measurevalued martingale (MVM) if is a martingale, for any ;

a MVM is continuous if in continuous in the topology induced by , the first Wasserstein metric, for almost all ;

a MVM is adapted if a.s., for all .
For any , is trivially a uniformly integrable martingale with a welldefined limit ; more pertinently, defines a probability measure, see Proposition 2.1 in [14]. Further, note that the condition that is a martingale for any , is equivalent to being a martingale for any ; see Remark 2 in [9]. In particular, any MVM thus converges a.s. in the sense of weak convergence of measures to a limiting (random) measure .
We denote by the set of continuous adapted MVMs with . Our first claim then is that our original problem, indeed, admits the following equivalent formulation:
Problem 3.2.
On the given space , consider the problem of maximising
(3.2) 
Proof.
Lemma 3.11 in [4] and Theorem VI 65 in [10] (cf. Theorem 3.6 in [4]) combined implies that Problem (2.1) is equivalent to optimising , with , over kernels corresponding to disintegrations in the first variable of measures ; cf. (2.3). Viewing each such kernel as a valued measurable random variable, we define the associated process by , for ; since the thus defined process is in , and it is a measurevalued martingale, see e.g. Lemma 2.12 in [9] or Theorem 1.3 in [14]. Notably, the limit exists and , for a.a. . In consequence, the objective functions in (2.3) and (3.2) evaluated, respectively, with respect to and , coincide.
Next, note that since , or, equivalently, the kernels average to under , and since is trivial, we obtain . Further, according to Remark 4 in [9], since the filtration satisfies the usual conditions, any MVM in admits a version which is rightcontinuous in the sense that is rightcontinuous for any Lipschitz function ; w.l.o.g., we choose this version. Moreover, since the filtration is generated by a BM, it follows from the martingale representation theorem that each process is in fact continuous, and thus is continuous in the sense of Definition 3.1. Since for any , is measurable, and since a.s., we also obtain that is adapted in the sense of Definition 3.1. In consequence, the thus defined process is indeed a continuous adapted MVM with .
Conversely, for any , there exists a measure such that its disintegrated kernel satisfies a.s.; we easily conclude. ∎
Remark 3.4.
We choose in Problem 3.2 to work on the fixed space but note that this choice is in fact arbitrary and that we could have chosen any filtered probability space (satisfying the usual conditions) with trivial and supporting a BM . Indeed, denote by the probability space obtained by initially enlarging such a space by an independent uniformly distributed random variable (cf. the proof of Proposition 2.4). For any given adapted MVM we may then construct a stopping time in the latter space such that the value of the respective objective functions coincide in that . Conversely, given a stopping time in , by use of the same arguments as used in the proof of Lemma 3.3, we see that defining yields an adapted MVM in for which the corresponding objective functions coincide; notably admits a càdlàg version and satisfies since is trivial.
The independence of the specific choice of the probability space for problem (2.1) (cf. Section 2.1.1) is thus inherited also by the MVM formulation of the problem. In particular, denoting by the set of all terms such that is a filtered probability space with trivial and in which is a BM and an adapted MVM with , we have
(3.3) 
Remark 3.5.
Viewed from the perspective of problem (2.1), for the simplest case in which is the filtration generated by a Brownian motion initially enlarged by an independent uniformly distributed random variable, the MVMs considered in Problem 3.2 correspond to where denotes the filtration generated by the Brownian motion alone; alternatively, one may consider objects of the form . The latter also defines MVMs, which in addition terminate in a singular measure in that a.s. Typically, however, we then do not have . Nevertheless, without destroying the martingale property, one may extend onto by defining for ; will then generally possess a jump at time following the realisation of the measurable random variable. Naturally, the results herein may also be formulated using such MVMs; we consider however the present formulation to be most natural for our purposes and do not pursue those details any further; see, however, Corollary 3.9 below.
3.2. The Dynamic Programming Principle
The aim in this section it to establish the dynamic programming principle for the distributionconstrained optimal stopping problem in its equivalent form (3.2). To this end we introduce the associated value function. We continue to work on the space , although in light of Remark 3.4, we note that this is a somewhat arbitrary choice. Specifically, we define by
(3.4) 
where denotes the set of continuous adapted MVMs with a.s., and denotes the solution to the SDE , for , with initial condition a.s., for .
Our main result is the following; we use the convention with , for any random variable :
Theorem 3.6 (Dynamic Programming Principle).
For all , and any stopping time with values in , it holds that
(3.5) 
Remark 3.7.
Integrals with respect to are naturally understood in the sense of (pathwise) Lebesgue–Stieltjes integration. Hence, if has an atom at time , it will not contribute to the value function . Similarly, in the formulation on the DPP (3.5), if a given MVM has an atom at time , that atom will contribute to the objective function via the integral from to rather than via the value function evaluated at time . More specifically, the value function only depends on up to time and on in the sense that , where , . For this reason we introduce the notation for the (reweighted) restriction of to : , . It follows from the definition of that ; hence we also obtain the following version of the DPP:
Before giving the proof of Theorem 3.6 (in Section 3.3 below) we comment on two particular cases.
First, although we consider the (weak) problem formulation studied herein to be the natural one (in particular since it always admits a solution), we note that whenever problem (2.1) admits a ’strong’ solution
Definition 3.8.
A MVM is terminating if a.s.; it is finitely terminating if is almost surely finite.
Notably, a terminating MVM satisfies the adaptedness property, if and only if, condition (3.1) holds; in consequence, any adapted terminating MVM is, in fact, finitely terminating and we have . We denote the set of continuous adapted and finitely terminating MVMs by , and define analogously to above.
Corollary 3.9 (DPP: strong formulation).
Suppose that restricting in problem (2.1) to stopping times measurable with respect to the filtration generated by the Brownian motion alone, does not affect the value of the problem. Then, for any stopping time with values in ,
(3.6) 
Second, we consider the case when has support on a finite number of (fixed) atoms . For and , let be given by
(3.7) 
where we use the notation for the vector and similarly for vectorvalued stochastic processes. We then have the following corollary:
Corollary 3.10 (DPP: atomic constraint).
Suppose that has support on the finite number of atoms . Then, for , with denoting the set of martingales taking values in , we have that
(3.8)  
Remark 3.11 (Relation to Bayraktar and Miller [3]).
In [3], the authors consider cost functions of the type with Lipschitz, and restrict to atomic marginal constraints. Moreover, they consider the strong formulation where the filtration is the one generated by the Brownian motion alone. Their main result (Theorem 1) then establishes (3.8) for their value function ; notably they first derive the strong version with the additional condition a.s. (cf. (3.6)) and then relax this condition. Combining our Corollary 3.10 with their result, we see that for the cost functions and constraints considered in [3], the value function when restricting to Brownian stopping times coincides with the value function for the weak problem formulation. Hence, aposteriori, we recover their result as a special case of ours.
3.3. Proof of the DPP
Since our objective function is given in Lagrange form (with a reward function integrated over time), we first introduce an additional statevariable (governing its accumulated value) in order to transform it onto Mayer form. Specifically, given and , we define the process
(3.9) 
we note that it admits a welldefined limit which we denote by . For , we introduce the value function
(3.10) 
We then have that Theorem 3.6 follows if, for any stopping time with values in , it holds that
(3.11) 
Our aim is to prove this result by considering its weak formulation on the associated canonical path space; the canonical framework has previously successfully been used for the study of stochastic control problems in [17, 18], see also [25] and [21, 22], and was used to deduce the DPP also in [9].
We denote by the set of càdlàg paths on taking values in , where we equip with the topology induced by the metric and with the product topology; this renders a Polish space, and using the Skorokhod topology on it is a Polish space too

, , a.s.;

is a Brownian motion;

is an adapted measurevalued martingale,

, for , a.s., where .
Notably, only depends on via ; for ease of notation we keep the former notation. We note that there exists a measurable functional such that whenever the limit exists (recall that is measurable and see Lemma 3.12 in [25]); in particular, for any , with , we have , a.s. We then define
(3.12) 
Proof.
W.l.o.g., let , and . First, let denote the set of all terms such that is a filtered probability space in which is a BM and an adapted MVM with ; we denote the associated process defined via (3.9) by . We then have . Indeed, any multiple induces a term in . Conversely, any probability measure together with the space and the canonical process produces such a multiple, this since the properties (i)(iv) hold also with respect to the augmented filtration . Next, according to Lemma 3.3 and Remark 3.4 (cf. (3.3)), we have that . Since , the result follows. ∎
In order to establish the DPP for problem (3.12) we first establish two lemmas.
Lemma 3.13.
The set is analytic.
Proof.
For , , and , we consider the following subsets of :

;

;

;

;

.
The above sets are all Borel measurable. Furthermore, is the intersection of the above sets when are allowed to vary among all the rational numbers in ; and among countable dense subsets of, respectively, and ; and among a countable collection of sets generating . Indeed, for the adaptedness property of , note that for any it holds that for any rational number , and thus