A stochastic algorithm for deterministic multistage optimization problems

# A stochastic algorithm for deterministic multistage optimization problems

## Abstract

Several attempt to dampen the curse of dimensionnality problem of the Dynamic Programming approach for solving multistage optimization problems have been investigated. One popular way to address this issue is the Stochastic Dual Dynamic Programming method (SDDP) introduced by Perreira and Pinto in 1991 for Markov Decision Processes. Assuming that the value function is convex (for a minimization problem), one builds a non-decreasing sequence of lower (or outer) convex approximations of the value function. Those convex approximations are constructed as a supremum of affine cuts.

On continuous time deterministic optimal control problems, assuming that the value function is semiconvex, Zheng Qu, inspired by the work of McEneaney, introduced in 2013 a stochastic max-plus scheme that builds upper (or inner) non-increasing approximations of the value function.

In this note, we build a common framework for both the SDDP and a discrete time version of Zheng Qu’s algorithm to solve deterministic multistage optimization problems. Our algorithm generates monotone approximations of the value functions as a pointwise supremum, or infimum, of basic (affine or quadratic for example) functions which are randomly selected. We give sufficient conditions on the way basic functions are selected in order to ensure almost sure convergence of the approximations to the value function on a set of interest.

D
\headers

Marianne Akian, Jean-Philippe Chancelier and Benoît Tran \newsiamremarkassumptionsAssumptions \newsiamremarknotationsNotations \newsiamremarkremarkRemark \newsiamremarkexampleExample

eterministic multistage optimization problem, min-plus algebra, tropical algebra, Stochastic Dual Dynamic Programming, Dynamic Programming.

## Introduction

Throughout this paper we aim to solve an optimization problem involving a dynamic system in discrete time. Informally, given a time and a state , one can apply a control and the next state is given by the dynamic , that is . Then one wants to minimize the sum of costs induced by our controls starting from a given state and a given time horizon . Furthermore, one can add some final restrictions on the states at time which will be modeled by a function only depending on the final state . As in [1] we will call such optimization problems, multistage (optimization) problems:

 minx=(x0,…,xT)u=(u0,…uT−1) T−1∑t=0ct(xt,ut)+ψ(xT) (1) s.t.

One can solve the multistage problem Eq. 1 by Dynamic Programming as introduced by Richard Bellman around 1950 [1, 5]. This method breaks the multistage problem Eq. 1 into sub-problems that one can solve by backward recursion over the time . More precisely, denoting by the set of states and given some operators from the set of functionals that may take infinite values to itself, one can show (see for example [3]) that solving problem Eq. 1 is equivalent to solving the following system of sub-problems:

 {VT=ψ∀t∈[[0,T−1]], Vt:x∈X↦Bt(Vt+1)(x)∈¯¯¯¯R. (2)

We will call each operator the Bellman operator at time and each equation in Eq. 2 will be called the Bellman equation at time . Lastly, the functions defined in Eq. 2 will be called the (Bellman) value function at time . Note that by solving the system Eq. 2 we mean that we want to compute the value function at point , that is . We will state several assumptions on these operators in Section 1 under which we will devise an algorithm to solve the system of Bellman equations 2 (also called the Dynamic Programming formulation of the multistage problem). Let us stress on the fact that although we want to solve the multistage problem 1, we will mostly focus on its (equivalent) Dynamic Programming formulation given by Eq. 2.

One issue of the Dynamic Programming approach to solve multistage problems is the so-called curse of dimensionality [2], that is, grid-based methods to compute the value functions have a complexity exponential in the dimension of the state space. One popular algorithm (see [8, 9, 10, 14, 19, 20]) that aims to dampen the curse of dimensionality is the Stochastic Dual Dynamic Programming algorithm (or SDDP for short) introduced by Pereira and Pinto in 1991. Assuming that the cost functions are convex and the dynamics are linear, the value functions defined in the Dynamic Programming formulation Eq. 2 are convex [8]. The SDDP algorithm aims to build lower (or outer) approximations of the value functions as a supremum of affine functions and thus, doesn’t rely on a discretisation of the state space in order to compute (approximations of) the value functions. In the aforementioned references, this approach is used to solve stochastic multistage problems, however in this article we will restrict our study to deterministic multistage problems, that is, the above formulation Eq. 1. Still, the SDDP algorithm can be applied to our framework. One of the main drawback of the SDDP algorithm is the lack of an efficient stopping criterion: it builds lower approximations of the value functions but upper (or inner) approximations are built through a Monte-Carlo scheme that is costly.

During her thesis [15], Zheng Qu devised an algorithm [16] which builds upper approximations of the value functions in an infinite horizon and continuous time framework where the set of controls is both discrete and continuous. This work was inspired by the work of McEneaney [12, 13] using techniques coming from tropical algebra or also called min-plus techniques. Assume that for each fixed discrete control the cost functions are convex quadratic and the dynamics are linear. If the set of discrete controls is finite, then exploiting the min-plus linearity of the Bellman operators, one can show that the value functions can be computed as a finite pointwise infimum of convex quadratic functions:

 Vt=infϕt∈Ftϕt,

where is a finite set of convex quadratic forms. Moreover, in this framework, the elements of can be explicitly computed through the Discrete Algebraic Riccati Equation (DARE, [11]). Thus an approximation scheme that computes non-decreasing subsets of yields an algorithm that converges after a finite number of improvements

 Vkt:=infϕt∈Fktϕt≈infϕt∈Ftϕt=Vt.

However the size of the set of functions that need to be computed is growing exponentially with . Informally, in order to address this issue, Qu introduced a probabilistic scheme that adds to the best (given the current approximations) element of at some point drawn on the unit sphere (assuming the space of states to be Euclidean).

Our work aims to build a general algorithm that encompasses both a deterministic version of the SDDP algorithm and an adaptation of Qu’s work to a discrete time and finite horizon framework.

This paper is divided in sections. In the first section we make several assumptions on the Bellman operators and define an algorithm which builds approximations of the value functions as a pointwise optimum (i.e. either a pointwise infimum or a pointwise supremum) of basic functions in order to solve the associated Dynamic Programming formulation Eq. 2 of the multistage problem Eq. 1. At each iteration, the so-called basic function that is added to the current approximation will have to satisfy two key properties at a point randomly drawn, namely, tightness and validity. A key feature of our algorithm is that it can yield either upper or lower approximations, for example:

• If the basic functions are affine, then approximating the value functions by a pointwise supremum of affine functions will yield the SDDP algorithm.

• If the basic functions are quadratic convex, then approximating the value functions by a pointwise infimum of convex quadratic functions will yield an adaptation of Qu’s algorithm.

In the following section we study the convergence of the approximations of the value functions generated by our algorithm at a given time . Under the previous assumptions our approximating sequence converges almost surely (over the draws) to the value function on a set of interest (that will be specified).

Finally on the last section we will specify our algorithm to the two special cases mentioned in the first section. The convergence result of Section 2 specified to these two cases will be new for (an adaptation of) Qu’s algorithm and will be the same as in the literature for the SDDP algorithm. It will be a step toward addressing the issue of computing efficient upper approximations for the SDDP algorithm and opens another way to devise algorithms for a broader class of multistage problems.

## 1 Notations and definitions

{notations}
• Denote by the set of states endowed with its euclidean structure and its Borel -algebra.

• Denote by a finite integer that we’ll call the horizon.

• Denote by an operation that is either the pointwise infimum or the pointwise supremum of functions which we will call the pointwise optimum. Note that once a choice of which operation is associated with , it remains the same for the remainder of this article.

• Denote by the extended reals endowed with the operations .

• For every , fix and two subsets of the set of functionals on such that .

• We will say that a functional is a basic function if it’s an element of for some .

• For every set , denote by the function equal to on and elsewhere.

• For every and every set of basic function we denote by its pointwise optimum, , i.e.

 (3)
• Denote by a sequence of operators from to , that we will call the Bellman operators.

• Fix a functional . We define a sequence of functions , called the value functions, by the system of Bellman equations:

 {VT=ψ∀t∈[[0,T−1]], Vt:x∈X↦Bt(Vt+1)(x)∈¯¯¯¯R. (4)

We first make several assumptions on the structure of problem Eq. 4. Those assumptions will be satisfied in the examples of Section 3. Informally, we want some regularities on the Bellman operators so as to propagate, backward in time, good behaviours of the value function at the final time to the value function at the initial time . Moreover, at each time , we ask that the basic functions that build our approximations are such that their pointwise optimum share a common regularity.

{assumptions}

[Structural assumptions]

• Common regularity: for every , there exist a common (local) modulus of continuity for every , i.e. for every , there exist which is increasing, equal to and continuous at such that for every and for every we have that

 |f(x)−f(x′)|≤ωx(|x−x′|).
• Final condition: the value function at time is a pointwise optimum for some given subset of  as in Eq. 3, that is .

• Stability by the Bellman operators: for every , if then belongs to .

• Stability by pointwise optimum: for every , if then .

• Order preserving operators: the operators , are order preserving, i.e. if are such that , then .

• Existence of the value functions: there exists a solution to the Bellman equations Eq. 2 that never takes the value and which is not equal to .

• Existence of optimal sets: for every and every compact set , there exist a compact set such that for every functional we have

 Bt(ϕ+δXt+1)≤Bt(ϕ)+δXt.
• Additively -subhomogeneous operators: the operators , , are additively -subhomogeneous, i.e. there exist a constant such that for every positive constant functional and every functional we have that .

From a set of basic functions , one can build its pointwise optimum . We build monotone approximations of the value functions as an optimum of basic functions which will be computed through a compatible selection function as defined below. We illustrate this definition in Fig. 2.

###### Definition 1 (Compatible selection function)

Let be fixed. A compatible selection function is a function from to satisfying the following properties

• Validity: for all set of basic functions and every we have (resp. ) when (resp. ).

• Tightness: for all set of basic functions and every the functions and coincide at point , that is

 ϕ♯t(Ft+1,x)(x)=Bt(VFt+1)(x).

For , we say that is a compatible selection function if function is valid in the sense that for every and , the function remains above (resp. below) the value function at time when (resp. ). Moreover, we say that function is tight if it coincides with the value function at point , that is for every and we have

 ϕ♯T(FT,x)(x)=VT(x).

Note that the tightness assumption only asks for equality at the point between the functions and and not necessarily everywhere. The only global property between the functions and is an inequality given by the validity assumption.

In LABEL:\NomAlgo we will generate for every time a sequence of random points of crucial importance as they will be the ones where the selection functions will be evaluated, given the set which characterizes the current approximation. In order to generate those points, we will assume that we have at our disposition an Oracle that given set of functions (characterizing the current approximations) computes compact sets and a probability law of support equal to those compact sets. This Oracle will have to follow the following conditions on its output.

{assumptions}

[Oracle assumptions]

The Oracle takes as input sets of functions included in . Its output is compact sets each included in and a probability measure on (where is endowed with its borelian -algebra) such that:

• For every , .

• For every , there exist a function such that for every and every ,

 μ(B(x,η)∩Kt)≥gt(η).

An example of such Oracle would be one that outputs union of singletons in (for some positive integer ) and the product measure of , where is the uniform measure over the singletons. Then for every the constant function equal to satisfies Section 1.

For every time , we construct a sequence of functionals of as follows. For every time and for every , we build a subset of the set and define the sequence of functionals by pointwise optimum . As described here, the functionals are just byproducts of LABEL:\NomAlgo which only describes the way the sets are defined.

As LABEL:\NomAlgo was inspired by Qu’s work which uses tropical algebra techniques, we will call this algorithm Tropical Dynamic Programming.

## 2 Almost sure convergence on the set of accumulation points

First, we state several simple but crucial properties of the approximation functions generated by LABEL:\NomAlgo. They are direct consequences of the facts that the Bellman operators are order preserving and that the basic functions building our approximations are computed through compatible selection functions.

###### Lemma 1
1. Let be a non-decreasing (for the inclusion) sequence of set of functionals on . Then the sequence is monotone. More precisely, when then is non-increasing and when then is non-decreasing.

2. Monotone approximations: for every indices we have that

 Vkt≥Vk′t≥Vtwhen opt=inf (5)

and when .

3. For every and every we have that

 Bt(Vkt+1)≤Vktwhen opt=inf (6)

and when .

4. For every , we have

 (7)

###### Proof

We prove each point succesively when , as the case is similar.

1. Let be two set of functionals. When for every we have that

 VFk′(x):=infϕ∈Fk′ϕ(x)≤infϕ∈Fkϕ(x)=:VFk(x).
2. By construction of LABEL:\NomAlgo, the sequence of sets is non-decreasing, thus for every indices we have that when (and when ).

Now we show that when , the case is analogous. Fix , we show this by backward recursion on . For , by validity of the selection functions Definition 1, for every we have that . Thus . Now, suppose that for some we have . Applying the Bellman operator, using the definition of the value function Eq. 2 and as the Bellman operators are order preserving, we get the desired result.

3. We prove the assertion by induction on in the case . For , as we have . Thus the assertion is true for . Assume that for some we have

 Bt(Vkt+1)≤Vkt. (8)

By Eq. 5 we have that . Thus, as the Bellman operators are order preserving we have that . Thus by induction hypothesis Eq. 8 we get

 Bt(Vk+1t+1)≤Vkt. (9)

Moreover as the selection function is valid, we have that :

 Bt(Vk+1t+1)≤ϕk+1t. (10)

Finally, by construction of LABEL:\NomAlgo we have that , so using Eq. 9 and Eq. 10 we deduce the desired result

 Bt(Vk+1t+1)≤Vk+1t.
4. As the selection function is tight in the sense of Definition 1 we have by construction of LABEL:\NomAlgo that

 Bt(Vkt+1)(xk−1t)=ϕkt(xk−1t).

Combining it with Eq. 6 (or its variant when ) and the definition of , one gets the desired equality.

In the following two propositions, we state that the sequences of functionals and converge uniformly on any compact included in the domain of . The limit functional of , noted , will be our natural candidate to be equal to the value function . Moreover, the convergence will be -almost sure where (see [6, pages 257-259]) is the unique probability measure over the countable cartesian product endowed with the product -algebra such that for every borelian , ,

 μ(X1×…×Xk×∏i≥k+1XT+1)=μ1(X1)×…×μk(Xk),

where is the sequence of probability measures generated by LABEL:\NomAlgo through the Oracle.

###### Proposition 1 (Existence of an approximating limit)

Let be fixed. The sequence of functionals defined as (where the sets are generated by LABEL:\NomAlgo) -a.s. converges uniformly on every compact set included in the domain of to a functional .

###### Proof

Let be fixed and let be a given compact set included in the domain of . We denote by a sequence of approximations generated by LABEL:\NomAlgo. The proof relies on the Arzelà-Ascoli Theorem [18, Theorem 2.13.30 p.347]. More precisely,

First, by Section 1 each functional in have a common modulus of continuity. Thus as , the family of functionals is equicontinuous.

Now, by Lemma 1, the sequence of functionals is monotone. Now for every , the set is bounded by , which is finite since we assumed , and . Hence the set is a bounded subset of and thus relatively compact.

By Arzelà-Ascoli Theorem, the sequence of functions is a relatively compact set of for the topology of the uniform convergence, i.e. there exists a subsequence of converging uniformly to a function

Finally, as is a monotone sequence of functions, we conclude that the sequence converges uniformly on the compact to .

###### Proposition 2

Let be fixed and be the function defined in Proposition 1. The sequence -a.s. converges uniformly to the continuous function on every compact sets included in the domain of .

###### Proof

We will stick to the case when and leave the other case to the reader. Let be a given compact set included in . First, as the sequence is non-increasing and using the fact that the operator is order preserving, the sequence is also non-increasing. By stability of the Bellman operator (see Section 1), we have that the function is in for every and thus the family is equicontinuous using the common regularity assumption in Section 1. Moreover, given , the set is bounded by and which take finite values on . Thus, using again Arzelà-Ascoli Theorem, there exists a continuous functional such that converges uniformly to on any compact included in .

We now show that the functional is equal to on the given compact or equivalently we show that . As already shown in Proposition 1, the sequence is lower bounded by . We thus have that , which combined with the fact that the operator is order preserving, gives, for every , that

 Bt(Vkt+1)≥Bt(V∗t+1).

Adding, on both side of the previous inequality, the mapping and taking the limit as goes to infinity, we have that

 ϕ+δKt≥Bt(V∗t+1)+δKt.

For the converse inequality, by the existence of optimal sets (see Section 1), there exists a compact set such that

 Bt(V∗t+1+δKt+1)≤Bt(V∗t+1)+δKt. (11)

By Proposition 1, the non-increasing sequence converges uniformly to on the compact set . Thus, for any fixed , there exists an integer such that we have

 Vkt+1≤Vkt+1+δKt+1≤V∗t+1+ϵ+δKt+1,

for all . By Section 1, the operator is order preserving and additively M-subhomogeneous, thus we get

 Bt(Vkt+1)≤Bt(Vkt+1+δKt+1)≤Bt(V∗t+1+δKt+1)+Mϵ,

which, combined with Eq. 11 gives that

 Bt(Vkt+1)+δKt≤Bt(V∗t+1)+Mϵ+δKt,

for every . Taking the limit when goes to infinity we obtain that

 ϕ+δKt≤Bt(V∗t+1)+δKt+Mϵ.

The result is proved for all and we have thus shown that on the compact set . We conclude that converges uniformly to the functional on the compact set .

We want to exploit that our approximations of the final cost function are exact in the sense that we have equality between and at the points drawn in LABEL:\NomAlgo, that is, the tightness assumption of the selection function is much stronger at time than for times . Thus we want to propagate the information backward in time: starting from time we want to deduce information on the approximations for times .

In order to show that on some set , a dissymmetry between upper and lower approximations is emphasized. We introduce the notion of optimal sets with respect to a sequence of functionals as a condition on the sets such that if one wants to compute the restriction of to , ones only need to know on the set . The Fig. 3 illustrates this notion.

###### Definition 2 (Optimal sets)

Let be functionals on . A sequence of sets is said to be -optimal or optimal with respect to , if for every we have

 Bt(ϕt+1+δXt+1)+δXt=Bt(ϕt+1)+δXt. (12)

When approximating from below, the optimality of sets is only needed for the functions whereas when approximating from above, one needs the optimality of sets with respect to . It seems easier to ensure the optimality of sets for than for as the functional is known through the sequence whereas the function is, a priori, unknown. This fact is discussed in Section 3.

###### Lemma 2 (Unicity in Bellman Equation)

Let be a sequence of sets which is

• optimal with respect to when ,

• optimal with respect to when .

If the sequence of functionals satisfies the following modified Bellman Equations:

 (13)

Then for every and every we have that .

###### Proof

We prove the lemma by backward recursion on the time , first for the case . For time , since by definition of (see Eq. 2) we have . Now, let time be fixed and assume that we have for every i.e.

 V∗t+1+δXt+1=Vt+1+δXt+1. (14)

Using Lemma 1, the sequence of functions is lower bounded by . Taking the limit in , we obtain that , we have thus only to prove that on , that is . We successively have:

 V∗t+δXt =Bt(V∗t+1)+δXt (by (13)) ≤Bt(V∗t+1+δXt+1)+δXt (Bt is order preserving) =Bt(Vt+1+δXt+1)+δXt (by induction assumption (14)) =Bt(Vt+1)+δXt ((Xt)t∈[[0,T]] is (Vt)-optimal) =Vt+δXt, (by (2))

which concludes the proof in the case of . Now we briefly prove the case by backward recursion on the time . As for the case , at time , one has . Now assume that for some one has . By Lemma 1, the sequence of functions is upper bounded by . Thus, taking the limit in we obtain that and we only need to prove that . We successively have:

 Vt+δXt =Bt(Vt+1)+δXt (by (2)) ≤Bt(Vt+1+δXt+1)+δXt (Bt is order preserving) (by induction assumption (14)) =Bt(V∗t+1)+δXt ((Xt)t∈[[0,T]] is (V∗t)-optimal) =V∗t+δXt, (by (13))

In the general case, one cannot hope for the limit object to be equal to the value function everywhere. However, one can expect an (almost sure over the draws) equality between the two functions and on all possible cluster points of sequences with for all , that is, on the set (see [17, Definition 4.1 p. 109]).

###### Theorem 1 (Convergence of Tropical Dynamic Programming)

Define , for every time . Assume that, -a.s the sets are -optimal when (resp. -optimal when ).

Then, -a.s. for every the functional defined in Proposition 1 is equal to the value function on .

###### Proof

We will only study the case as the case is analogous. We will show that Eq. 13 holds with . By tightness of the selection function at time , we get that on