Stochastic decomposition applied to large-scale hydro valleys management

# Stochastic decomposition applied to large-scale hydro valleys management

P. Carpentier J.-P. Chancelier V. Leclère F. Pacaud UMA, ENSTA ParisTech, Université Paris-Saclay, 828 bd des Maréchaux,91762 Palaiseau cedex France UniversitÃ© Paris-Est, CERMICS (ENPC), F-77455 Marne-la-VallÃ©e, FRANCE Efficacity, 14-20 boulevard Newton, F-77455 Marne-la-VallÃ©e, FRANCE
June 30, 2019
###### Abstract

We are interested in optimally controlling a discrete time dynamical system that can be influenced by exogenous uncertainties. This is generally called a Stochastic Optimal Control (SOC) problem and the Dynamic Programming (DP) principle is one of the standard way of solving it. Unfortunately, DP faces the so-called curse of dimensionality: the complexity of solving DP equations grows exponentially with the dimension of the variable that is sufficient to take optimal decisions (the so-called state variable). For a large class of SOC problems, which includes important practical applications in energy management, we propose an original way of obtaining near optimal controls. The algorithm we introduce is based on Lagrangian relaxation, of which the application to decomposition is well-known in the deterministic framework. However, its application to such closed-loop problems is not straightforward and an additional statistical approximation concerning the dual process is needed. The resulting methodology is called Dual Approximate Dynamic Programming (DADP). We briefly present DADP, give interpretations and enlighten the error induced by the approximation. The paper is mainly devoted to applying DADP to the management of large hydro valleys. The modeling of such systems is presented, as well as the practical implementation of the methodology. Numerical results are provided on several valleys, and we compare our approach with the state of the art SDDP method.

###### keywords:
Discrete time stochastic optimal control, Decomposition methods, Dynamic programming, Energy management
###### Msc:
[2010] 90-08, 90C06, 90C15, 90C39, 49M27
\biboptions

authoryear

ï»¿

## 1 Introduction

### 1.1 Large-scale systems and energy applications

Consider a controlled dynamical system over a discrete and finite time horizon. This system may be influenced by exogenous noises that affect its behavior. We suppose that, at every instant, the decision maker is able to observe the noises and to keep these observations in memory. Since it is generally profitable to take available observations into account when designing future decisions, we are looking for strategies rather than simple decisions. Such strategies (or policies) are feedback functions that map every instant and every possible history of the system to a decision to be made.

The typical applications we have in mind are in the field of energy management. Consider a power producer that owns a certain number of power units. Each unit has its own local characteristics such as physical constraints that restrain the set of feasible decisions, and induces a production cost or a revenue. The power producer has to control the power units so that an overall goal is met. A classical example is the so-called unit commitment problem Takriti et al. (1996) where the producer has to satisfy a global power demand at every instant. The power demand, as well as other parameters such as unit breakdowns, are random. The producer is looking for strategies that make the overall production cost minimal, over a given time horizon. Another application, which is considered in this paper, is the management of a large-scale hydro valley: here the power producer manages a cascade of dams, and maximizes the revenue obtained by selling the energy produced by turbinating the water inside the dams. Both natural inflows in water reservoirs and prices are random. In all these problems, both the number of power units and the number of time steps are usually large de Matos et al. (2015).

### 1.2 Standard resolution methods

One classical approach when dealing with stochastic dynamic optimization problems is to discretize the random inputs of the problem using a scenario tree. Such an approach has been widely studied within the stochastic programming community (Heitsch & Römisch (2009), Shapiro et al. (2009)), and also used to model and solve energy problems Pflug & Pichler (2014). One of the advantages of such a technique is that, as soon as the scenario tree is drawn, the derived problem can be treated by classical mathematical programming techniques. Thus, a number of decomposition methodologies have been proposed (see for instance Rockafellar & Wets (1991), Carpentier et al. (1996), Ruszczyński (1997), (Ruszczyński & Shapiro, 2003, Chap. 3)) and applied to energy planning problems Bacaud et al. (2001). The way to combine the discretization of expectation together with the discretization of information in a general setting has been presented in Heitsch et al. (2006), Carpentier et al. (2015) and Pflug & Pichler (2014)). However, in a multi-stage setting, this methodology suffers from the drawback that arises with scenario trees: as it was pointed out by Shapiro (2006), the number of scenarios needed to achieve a given accuracy grows exponentially with the number of time steps of the problem.

The other natural approach to solve SOC problems is to rely on the dynamic programming (DP) principle (see Bellman (1957), Puterman (1994), Bertsekas (2000)). The core of the DP approach is the definition of a state variable that is, roughly speaking, the variable that, in conjunction with the time variable, is sufficient to take an optimal decision at every instant. It does not have the drawback of the scenario trees concerning the number of time steps since strategies are, in this context, depending on a state variable whose space dimension does not grow with time111In the case of power management, the state dimension is usually linked to the number of power units.. However, DP suffers from another drawback which is the so-called curse of dimensionality: the complexity of solving the DP equation grows exponentially with the state space dimension. Hence, solving the DP equation by brute force is generally intractable when the state space dimension goes beyond several units. In Vezolle et al. (2009), the authors were able to solve DP on a  state variables energy management problem, using parallel computation coupled with adequate data distribution, but the DP limits are around state variables in a straightforward use of the method.

Another popular idea is to represent the value functions (solutions of the DP equation) as a linear combination of a priori chosen basis functions (see Bellman & Dreyfus (1959) Bertsekas & Tsitsiklis (1996)). This approach, called Approximate Dynamic Programming (ADP) has become very popular and the reader is referred to Powell (2011) and Bertsekas (2012) for a precise description of ADP. This approximation drastically reduces the complexity of solving the DP equation. However, in order to be practically efficient, such an approach requires some a priori information about the problem, in order to define a well suited functional subspace. Indeed, there is no systematic means to choose the basis functions and several choices have been proposed in the literature Tsitsiklis & Van Roy (1996).

Last but not least is the popular DP-based method called Stochastic Dual Dynamic Programming (SDDP). Starting with the seminal work Van Slyke & Wets (1969), the SDDP method has been designed in Pereira & Pinto (1991). It has been widely used in the energy management context and lately regained interest in the Stochastic Programming community Philpott & Guan (2008), Shapiro (2011). The idea is somehow to extend Kelley’s cutting plane method to the case of multi-stage problems. Alternatively it can be seen as a multistage Benders (or L-shaped) decomposition method with sampling. It consists in a succession of forward (simulation) and backward (Bellman function refining) passes that ultimately aim at approaching the Bellman function by the supremum of affine hyperplanes (cuts) generated during the backward passes. It provides an efficient alternative to simply discretizing the state space to solve the DP equation. In the convex case with finite support random variables, it has been proven by Girardeau et al. (2015) that the method converges to optimality.

### 1.3 Decomposition approach

When dealing with large-scale optimization problems, the decomposition-coordination approach aims at finding a solution to the original problem by iteratively solving subproblems of smaller dimension. In the deterministic case, several types of decomposition have been proposed (e.g. by prices, by quantities or by interaction prediction) and unified in Cohen (1980) using a general framework called Auxiliary Problem Principle. In the open-loop stochastic case, i.e. when controls do not rely on any observation, it is proposed in Cohen & Culioli (1990) to take advantage of both decomposition techniques and stochastic gradient algorithms.

The natural extension of these techniques in the closed-loop stochastic case (see Barty et al. (2009)) fails to provide decomposed state dependent strategies. Indeed, the optimal strategy of a subproblem depends on the state of the whole system, and not only on the local state. In other words, decomposition approaches are meant to decompose the control space, namely the range of the strategy, but the numerical complexity of the problems also arises because of the dimensionality of the state space, that is to say the domain of the strategy.

We recently proposed a way to use price decomposition within the closed-loop stochastic case. The coupling constraints, namely the constraints preventing the problem from being naturally decomposed, are dualized using a Lagrange multiplier (price). At each iteration, the price decomposition algorithm solves each subproblem using the current price, and then uses the solutions to update it. In the stochastic context, the price is a random process whose dynamics is not available, so the subproblems do not in general fall into the Markovian setting. However, in a specific instance of this problem Strugarek (2006), the author exhibited a dynamics for the optimal multiplier and showed that these dynamics were independent with respect to the decision variables. Hence it was possible to come down to the Markovian framework and to use DP to solve the subproblems in this case. Following this idea, it is proposed in Barty et al. (2010b) to choose a parameterized dynamics for these multipliers in such a way that solving subproblems using DP becomes possible. While the approach, called Dual Approximate Dynamic Programming (DADP), showed promising results on numerical examples, it suffers from the fact that the induced restrained dual space is non-convex, leading to some numerical instabilities. Moreover, it was not possible to give convergence results for the algorithm.

The method has then been improved through a series of PhD theses (Girardeau (2010), Alais (2013) and Leclère (2014)) both from the theoretical and from the practical point of view. The methodology still relies on Lagrangian decomposition. In order to make the resolution of subproblems tractable, the core idea is to replace the current Lagrange multiplier by its conditional expectation with respect to some information process, at every iteration. This information process has to be a priori chosen and adapted to the natural filtration. Assume that the information process is driven by a dynamics: the state in each subproblem then consists of the original state augmented by the new state induced by the information process, making the resolution of the subproblem tractable by DP. The quality of the results produced by the algorithm highly depends on the choice of this information variable. An interesting point is that approximating the multipliers by their conditional expectations has an interpretation in terms of a relaxed optimization problem: this revisited DADP algorithm in fact aims at solving an approximate primal problem where the almost-sure coupling constraint has been replaced by its conditional expectation with respect to the information variable. In other words, this methodology consists in solving a relaxed primal problem, hence giving a lower bound of the true optimal cost. Another consequence of this approximation is that the solutions obtained by the DADP algorithm do not satisfy the initial almost-sure coupling constraint, so we must rely on a heuristic procedure to produce a feasible solution to the original problem.

### 1.4 Contents of the paper

The main contribution of the paper is to give a practical algorithm aiming at solving large scale stochastic optimal control problems and providing closed-loop strategies. The numerous approximations used in the algorithm, and especially the one allowing for feasible strategies, make difficult to theoretically assess the quality of the solution finally adopted. Nevertheless, numerical implementation shows that the method is promising to solve large scale optimization problems such as those encountered in the field of energy management.

The paper is organized as follows. In §2, we present the hydro valley management problem, the corresponding general SOC formulation and the DP principle. We then concentrate on the spatial decomposition of such a problem and the difficulties of using DP at the subproblem level. In §3, we present the DADP method and give different interpretations. We then propose a way to recover an admissible solution from the DADP results and we briefly discuss the theoretical and practical questions associated to the convergence and implementation of the method. Finally, in §4, we apply the DADP method to the management of hydro valleys. Different examples, corresponding to either academic or realistic valleys, are described. A comparison of the method with SDDP is outlined.

### 1.5 Notations

We will use the following notations, considering a probability space :

• bold letters for random variables, normal font for their realizations;

• (resp. ) means that the random variable is measurable with respect to the -algebra (resp. with respect to the -algebra generated by , denoted by );

• generally stands for the state, for the control, for an exogeneous noise;

• for a cost function at time , for a final cost function;

• represent a Bellman’s value function at time ;

• is the set of integer between and ;

• the notation  (resp.  and ) stands for the discrete time state process  (resp. control processes , ).

ï»¿

## 2 Mathematical formulation

In this section, we present the modeling of a hydro valley and the associated optimization framework.

### 2.1 Dams management problem

We consider a hydro valley constituted of  cascaded dams as represented in Figure 2. The water turbinated at a dam produces energy which is sold on electricity markets, and then enters the nearest downstream dam.222Note that the valley geometry may be more complicated than a pure cascade: see for example the valleys represented at Figure 6.

We formulate the problem of maximizing the cascade revenue over a discrete time horizon . The representative variables of dam  at time  are: for the water turbinated, for the current water volume, for the natural water inflow entering dam , for the market value of the water of dam . The randomness is given by .

The modeling of a dam takes into account a possible overflow: the spilled water does not produce electricity, but enters the next dam (see Figure 3).

 xit+1 =xit−uit+ait+zit−sit=fit(xit,uit,wit,zit),

where  is the volume of water spilled by overflow of the dam:

 sit=max{0,xit−uit+ait+zit−¯¯¯xi}.

The constant value  stands for the maximal capacity of dam . The outflow of dam , that is, the sum of the turbinated water and of the spilled water, is denoted by :

 zi+1t =uit+sit=git(xit,uit,wit,zit).

Note that these dynamic equations are nonlinear because of the  operator in the definition of the spilled water volume. We assume the Hazard-Decision information structure: the control  applied at time  is chosen once the noise  at time  has been observed. It is thus possible to ensure that the dam always remains above its minimal admissible volume  by limiting the control range:

 u––i≤uit≤min{¯¯¯ui,xit+ait+zit−x––i}.
###### Remark 1.

As will be seen in §4, the typical time step length we use is the month (with a time horizon of one year). It is thus reasonable to assume the hazard-decision framework, the control applied for a given month being in fact implemented each day taking into account the observed information on a daily basis.

• The gain of the dam is the sum of different terms.

• Gain at each time :

 pituit−ϵ(uit)2;

we denote by  the opposite of this gain (cost). The quadratic terms  ensure the strong convexity of the cost function. These terms model the operating cost of the turbine and are usually small.

• Final gain at time :

 −aimin{0,xiT−ˆxi}2;

we again denote by  the opposite of this gain. It corresponds to a quadratic penalization around a target value  representing the desired water volume in the dam at the end of the time horizon.

Taking into account the opposite of the gains, we thus have to deal with a minimization problem.

Thus the global optimization problem reads

 min(X,U,Z) (1a) s.t. Xi0given, Xit+1=fit(Xit,Uit,Wit,Zit),i∈\llbracket1,N\rrbracket,t∈\llbracket0,T−1\rrbracket, (1b) Uit⪯σ(W0,…,Wt),i∈\llbracket1,N\rrbracket,t∈\llbracket0,T−1\rrbracket, (1c) Z1t=0, Zi+1t=git(Xit,Uit,Wit,Zit),i∈\llbracket1,N\rrbracket,t∈\llbracket0,T−1\rrbracket. (1d) Equations (1c) represent the so-called non-anticipativity constraints, that is, the fact that each control Uit, considered as a random variable, has to be measurable with respect to the sigma-field σ(W0,…,Wt) generated by the noise sequence (W0,…,Wt) up to time t.

### 2.2 A more generic formulation

With a slight abuse of notation,333which consists in denoting by  the pair  the stochastic optimization problem formulated at §2.1 reads

 min(Xi,Ui)i∈\llbracket1,N\rrbracket E(N∑i=1(T−1∑t=0Lit(Xit,Uit,Wt)+Ki(XiT))), (2a) s.t. Xit+1=fit(Xit,Uit,Wt),Xi0given, (2b) Uit⪯σ(W0,…,Wt), (2c) N∑i=1Θit(Xit,Uit,Wt)=0. (2d) Constraints (2b) represent the dynamics and constraints (2c) are the non-anticipativity constraints. The last constraints (2d) express the interactions between the dams in a more general way than Equations (1d). They represent an additive coupling with respect to the different production units, which is termed the “spatial coupling of the problem”. Such a general modeling covers other cases than the cascade problem, such that the unit commitment problem, or the problem of exchanging energy on a smart grid.

### 2.3 Dynamic Programming like approaches

In the remainder of the paper, we assume that we are in the so-called white noise setting.

###### Assumption 1.

Noises are independent over time.

This assumption is of paramount importance in order to use Dynamic Programming or related approaches such that Stochastic Dual Dynamic Programming since in that case the controls given by DP or SDDP are the optimal ones for Problem 2 (they are given as feedback functions depending on the state variable). This assumption can be alleviated, in the case where it is possible to identify a dynamics in the noise process (such as an ARMA model), and by incorporating this new dynamics in the state variables (see e.g. Maceira & Damazio (2006) on this topic).

Under Assumption 1, Dynamic Programming (DP) applies to Problem (2) (see e.g  Bellman (1957); Puterman (1994); Bertsekas (2000) for the general theory): there is no optimality loss to seek each control at time as a function of both the state and the noise at time .444Remember that we have assumed the Hazard-Decision framework. Then, thanks to the measurability properties of the control the Bellman functions  are obtained by solving the Dynamic Programming equation backwards in time:

 VT(xT) =N∑i=1Ki(xiT), (3a) Vt(xt) =E(minu1,…,uNN∑i=1Lit(xi,Ui,Wt)+Vt+1(ft(xt,ut,Wt)))). (3b)

where , and is the collection of new states .

The DP equation is agnostic to whether the state and control variables are continuous or discrete, whether the constraints and the cost functions are convex or not, etc. However, in order to exhaustively solve it we need to have discrete state, and to be able to solve each equation to optimality. In practice, the method is subject to the curse of dimensionality and cannot be used for large-scale optimization problems. For example, applying DP to dams management problems is practically untractable for more than five dams (see the results given at §4.3).

Another way to compute the Bellman functions asociated to Problem (2) is to use the Stochastic Dual Dynamic Programming (SDDP) method. The method has been first described in Pereira & Pinto (1991), and its convergence has been analysed in Philpott & Guan (2008) for the linear case and in Girardeau et al. (2015) for the general convex case. SDDP recursively constructs an approximation of each Bellman function as the supremum of a number of affine functions, thus exploiting the convexity of the Bellman functions (arising from the convexity of the cost and constraint functions). SDDP has been used for a long time for solving large-scale hydrothermal problems (see de Matos et al. (2015) and the references therein) and allows to push the limits of DP in terms of state dimension (see the results in §4.4).

### 2.4 Spatial coupling and approach by duality

A standard way to tackle large-scale optimization problems is to use Lagrange relaxation in order to split the original problem into a collection of smaller subproblems by dualizing coupling constraints. As far as Problem (2) is concerned, we have in mind to use DP for solving the subproblems and thus want to dualize the spatial coupling constraints (2d) in order to formulate subproblems, each incorporating a single dam. The associated Lagrangian  is accordingly

 L(X,U,Λ)=E(N∑i=1(T−1∑t=0Lit(Xit,Uit,Wt)+Ki(XiT)+T−1∑t=0Λt⋅Θit(Xit,Uit,Wt))),

where the multiplier  associated to Constraint (2d) is a random variable. From the measurability of the variables , and , we can assume without loss of optimality that the multipliers  are -measurable random variables.

In order to be able to apply duality theory to the problem (which is mandatory for algorithmic resolution), we make the two following assumptions.

###### Assumption 2.

A saddle point of the Lagrangian  exists.

###### Assumption 3.

Uzawa algorithm applies to compute a saddle-point of .

Assumption 2 corresponds to a Constraint Qualification condition and ensures the existence of an optimal multiplier. Assumption 3 allows to use a gradient ascent algorithm to compute the optimal multiplier. An important question in order to be able to satisfy these two assumptions is the choice of the spaces where the various random variables of the problem are living in. Duality theory and associated algorithms have been extensively studied in the framework of Hilbert spaces Ekeland & Temam (1999), but the transition to the framework of stochastic optimal control poses difficult challenges Rockafellar (1968, 1971), which will be briefly presented in §3.4.555Remember that the aim of the present paper is mainly to present numerical results. The reader is referred to Leclère (2014) for these theoretical questions. One way to get rid of these difficulties is to assume that the space  is finite.

When using the Uzawa algorithm to compute a saddle-point of the Lagrangian, the minimization step with respect to  splits in  independent subproblems each depending on a single pair , and therefore allows for a dam by dam decomposition. More precisely, the -th iteration of Uzawa algorithm consists of the two following steps.

1. Solve Subproblem , , with fixed :

 minXi,Ui E(T−1∑t=0Lit(Xit,Uit,Wt)+Λ(k)t⋅Θit(Xit,Uit,Wt)+Ki(XiT)), s.t. Xit+1=fit(Xit,Uit,Wt),Xi0given (4a) Uit⪯σ(W0,…,Wt), (4b) whose solution is denoted (Ui,(k),Xi,(k)).
2. Use a gradient step to update the multipliers :

 Λ(k+1)t=Λ(k)t+ρt(N∑i=1Θit(Xi,(k)t,Ui,(k)t,Wt)). (5)

Consider the resolution of Subproblem (4). This subproblem only involves the “physical” state variable  and the control variable , a situation which seems favorable to DP. It also involves two exogeneous random processes, namely and . The white noise Assumption 1 applies for the first process , but not for the second one , so that the state of the system cannot be summarized by the physical state  ! Moreover if we just use the fact that  is measurable with respect to the past noises, the state of the system must incorporate all noises prior to time , that is, . The state size of the subproblem increases with time. Without some additional knowledge on the process , DP cannot be applied in a straightforward manner: something has to be compressed in order to use Dynamic Programming. ï»¿

## 3 Dual Approximate Dynamic Programming

For a very specific instance of Problem (2), Strugarek (2006) exhibited the dynamics of the optimal multiplier. Hence it was possible to come down to the Markovian framework and to use DP to solve the subproblems (4) with an augmented space, namely the “physical” state  and the state associated to the mutiplier’s dynamics. Following this idea for a general Problem (2), Barty et al. (2010b) proposed to choose a parameterized dynamics for the multiplier: then solving the subproblems using DP becomes possible, the parameters defining the multiplier dynamics being updated at each iteration of the Uzawa algorithm. This new approach, called Dual Approximate Dynamic Programming (DADP), has then improved through a series of PhD theses (Girardeau (2010), Alais (2013) and Leclère (2014)) both from the theoretical and from the practical point of view. We give here a brief overview of the current DADP method.

### 3.1 DADP core idea and associated algorithm

In order to overcome the obstacle explained at §2.4 concerning the random variables , we choose a random variable  at each time ,666Note that the information variables  may depend on the subproblem index : see Girardeau (2010) for further details. each  being measurable with respect to the noises up to time  . We call the information process associated to Problem (2).

#### 3.1.1 Method foundation

The core idea of DADP is to replace the multiplier by its conditional expectation with respect to . From an intuitive point of view, the resulting optimization problem will be a good approximation of the original one if  is close to the random variable . Note that we require that the information process is not influenced by controls because introducing a dependency of the conditioning term with respect to the control would lead to very serious difficulties for optimization.

Using this core idea, we replace Subproblem (4) by:

 minXi,Ui E(T−1∑t=0(Lit(Xit,Uit,Wt)+Ki(XiT)) +E(Λ(k)t∣Yt)⋅Θit(Xit,Uit,Wt)), (6a) s.t. Xit+1=fit(Xit,Uit,Wt),Xi0given, (6b) Uit⪯σ(W0,…,Wt). (6c)

According to the Doob property (Dellacherie & Meyer, 1975, Chapter 1, p. 18), the -measurable random variable  can be represented by a measurable mapping , that is,

 μ(k)t(Yt)=E(Λ(k)t ∣∣ Yt), (7)

so that Subproblem (6) in fact involves the two fixed random processes and . If the process  follows a non-controlled Markovian dynamics driven by the noise process , i.e. if there exists functions such that

 Yt+1=ht(Yt,Wt), (8)

then  is a valid state for the subproblem and DP applies.

Assume that the information process  follows the dynamics (8).

• The first step of the DADP algorithm at iteration  consists in solving the subproblems (6) with  fixed, that is, with  given. It is done by solving the Bellman functions associated to each subproblem , that is,

 ViT(xi,y)= Ki(x), Vit(xi,y)= E(Qkt(xi,y,Wt))

where is the value of

 minui Lit(xi,ui,wt)+μ(k)t(y)⋅Θit(xi,ui,wt)+Vit+1(xit+1,yit+1) s.t. xit+1=fit(xi,ui,wt), yt+1=ht(y,wt).

Storing the obtained during the Bellman resolution, we obtain the optimal feedback laws as functions of both the state and the noise at time . These functions allow to compute the optimal state and control processes of subproblem  at iteration .777Remember that the process  follows the non-controlled Markovian dynamics (8) and thus can be obtained once for all. Starting from the optimal control and state variables are obtained by applying the optimal feedback laws from  up to :

 Ui,(k)t= γi,(k)t(Xi,(k)t,Yt,Wt), Xi,(k)t+1= fit(Xi,(k)t,Ui,(k)t,Wt).
• The second step of the DADP algorithm consists in updating the multiplier process . Instead of updating the multipliers themselves by the standard gradient formula888More sophisticated formulas can be used in practice: see §4.3.1.

 Λ(k+1)t=Λ(k)t+ρt(N∑i=1Θit(Xi,(k)t,Ui,(k)t,Wt)),

it is sufficient to deal with their conditional expectations w.r.t. . Using the optimal processes  and  obtained at the previous step of the algorithm for all subproblems, the conditional deviation from the coupling constraint is obtained:

 E(N∑i=1Θit(Xi,(k)t,Ui,(k)t,Wt) ∣∣∣ Yt).

By the Doob property, this conditional expectation can be represented by a measurable mapping :

 (9)

Gathering the functional representations (7) and (9) of the conditional multiplier and of the conditional deviation, the gradient update reduces to the following functional expression:

 μ(k+1)t(⋅)=μ(k)t(⋅)+ρtΔ(k)t(⋅). (10)

This last equation is equivalent to the multipliers conditional expectation update:

 E(Λ(k+1)t ∣∣ Yt)=E(Λ(k)t ∣∣ Yt)+ρtE(N∑i=1Θit(Xi,(k)t,Ui,(k)t,Wt) ∣∣ Yt). (11)

DADP algorithm is depicted in Figure 4.

The DADP method, as it has been presented up to now, consists in an approximation of the optimal multiplier, that is, the multiplier is replaced by its conditional expectation . Such an approximation is equivalent to a decision-rule approach for the dual problem, obtained by imposing measurability conditions to the dual variables :

DADP may also be viewed as a relaxation of the constraints in the primal problem. More precisely, we replace the almost sure coupling constraint (2a) by a conditional expectation constraint, that is we consider the following relaxed version of the initial problem (2):

 min(X,U)i∈\llbracket1,N\rrbracketE(N∑i=1(T−1∑t=0Lit(Xit,Uit,Wt)+Ki(XiT))), (12a) subject to the dynamics constraints (2b), to the measurability constraints (2c) and to the conditional coupling constraints: E(N∑i=1Θit(Xit,Uit,Wt) ∣∣ Yt)=0. (12b)
###### Proposition 1.

Suppose the Lagrangian associated with Problem (12) has a saddle point. Then the DADP algorithm can be interpreted as the Uzawa algorithm applied to Problem (12).

###### Proof.

Consider the duality term which appears in the cost function of subproblem  in DADP. This term can be written equivalently . which corresponds to the dualization of the coupling constraint handled by Problem (12). ∎

DADP thus consists in replacing an almost-sure constraint by its conditional expectation w.r.t. the information variable . From this interpretation, we deduce that the optimal value provided by DADP is a guaranteed lower bound of the optimal value of Problem (2).

Instead of solving the original problem (2), DADP deals with the relaxed problem (12) in which the almost-sure coupling constraint (2d) is replaced by the less binding constraint (12b). As a consequence, a solution of Problem (12) does not satisfy the set of constraints of Problem (2). An additional procedure has to be devised in order to produce an (at last) admissible solution of (2).

Nevertheless, solving Problem (12) produces at each time  a set of  local Bellman functions , each depending on the extended state . We use these functions to produce a single Bellman function  depending on the global state  which is used as an approximation of the “true” Bellman function  of Problem (2). The heuristic rule leading to this approximated Bellman function simply consists in summing the local Bellman functions:

 ˆVt(x1t,…,xNt,yt)=N∑i=1Vit(xit,yt).

The approximated Bellman functions  allow us to devise an admissible feedback policy for Problem (2): for any value of the state , any value of the information  and any value of the noise  at time , the control value is obtained by solving the following one-step DP problem

 min(u1t,…,uNt) s.t. xit+1=fit(xit,uit,wit),i∈\llbracket1,N\rrbracket, yt+1=ht(yit,wit), N∑i=1Θit(xit,uit,wt)=0.

In this framework, DADP can be viewed as a tool allowing to compute approximated Bellman functions for Problem (2). These functions are then used to compute control values satisfying all the constraints of the problem, that is, to produce an online admissible feedback policy for Problem (2).

Applying this online feedback policy along a bunch of noises scenarios allows to compute a Monte Carlo approximation of the cost, which is accordingly a stochastic upper bound of the optimal value of Problem (2).

### 3.4 Theoretical and practical questions

The theoretical questions linked to DADP are adressed in Leclère (2014), and the practical ones in Girardeau (2010)and Alais (2013).

#### 3.4.1 Theoretical questions

In the DADP approach, we treat the coupling constraints of a stochastic optimization problem by duality methods and solve it using the Uzawa algorithm. Uzawa algorithm is a dual method which is usually described in an Hilbert space such as , but we cannot guarantee the existence of an optimal multiplier in such a space. To overcome the difficulty, the approach consists in extending the setting to the non-reflexive Banach space , to give conditions for the existence of an optimal multiplier in (rather than in the dual space of ) and to study the Uzawa algorithm convergence in this space. The interested reader is referred to Leclère (2014) for more information.

#### 3.4.2 Practical questions

An important practical question is the choice of the information variables . We present here some possibilities.

1. Perfect memory: .
From the measurability properties of , we have , that is, there is no approximation! A valid state for each subproblem is then : the state is growing with time.

2. Minimal information: .999or equivalently being any constant random variable
Here is approximated by its expectation . The information variable does not deliver any online information, and a valid state for subproblem   is .

3. Dynamic information: .
This choice corresponds to a number of possibilities, as mimicking the state of another unit, or adding a hidden dynamics. A valid state for subproblem  is .

Finally, the question of accelerating the DADP algorithm by replacing the standard Lagrangian by an augmented one, or by using more sophisticated methods than the simple gradient ascent method in the multiplier update step, have a great interest in order to improve the method. This point is developed in §4.3.1. ï»¿

## 4 Numerical experiments

In this section, we present numerical results obtained on a large selection of hydro valleys. Some of these valleys (see Figure 6) correspond to academic examples, in the sense that their characteristics (size of dams, range of controls, inflows values) do not rely on existing valleys. These examples allow us to quantify the performance of different optimization methods (DP, DADP and two different flavors of SDDP) on problems of increasing size, from a valley incorporating 4 dams, and thus solvable by DP, up to a valley with 30 dams, and thus facing the curse of dimensionality (§4.3 and §4.4). We also present two instances corresponding to more realistic hydro valleys, where the models respect the orders of magnitude of the dam sizes of existing valleys (§4.5).

All the results presented here have been obtained using a 3.4GHz, 4 cores – 8 threads Intel Xeon E3 based computer.

### 4.1 Application of DADP to a hydro valley

We go back to the problem formulation (1) presented at §2.1. In order to implement the DADP algorithm, we dualize the coupling constraints

 Zi+1t−git(Xit,Uit,Wit,Zit)=0, (13)

and we denote by the associated multiplier (random variable).

When minimizing the dual problem at iteration  of the algorithm, the product with a given multiplier by

 Λi+1,(k)t⋅(Zi+1t−git(Xit,Uit,Wit,Zit)),

is additive with respect to the dams, that is,

• the first term pertains to dam  subproblem,

• whereas the second term pertains to dam  subproblem,

hence leading to a dam by dam decomposition for the dual problem maximization in  at  fixed.

The DADP method consists in choosing a multiplier process and then replacing the coupling constraints by their conditional expectations with respect to . Here we adopt the choice (minimal information), so that Constraints (13) are replaced in the approximated problem by their expectations:

 E(Zi+1t−git(Xit,Uit,Wit,Zit))=0. (14)

The expression of Subproblem (6) attached to dam  reads

 maxUi,Zi,Xi E(T−1∑t=0(Lit(Xit,Uit,Wit,Zit)+E(Λi,(k)t)⋅Zit (15a) −E(Λi+1,(k)t)⋅git(Xit,Uit,Wit,Zit))+Ki(XiT)), (15b) s.t. Xit+1=fit(Xit,Uit,Wt),Xi0given (15c) Uit⪯σ(W0,…,Wt). (15d)

Because of the crude relaxation due to a constant , the multipliers appear only in the subproblems by means of their expectations , so that all subproblems involve a -dimensional state variable, that is, the dam stock , and hence are easily solvable by Dynamic Programming.

We denote by the optimal solution of each subproblem , and by the Bellman function obtained for each dam  at time .

With the choice of constant information variables , the coordination update step (11) reduces to

 E(Λi,(k+1)t)=E(Λi,(k)t)+ρtE(Zi+1,(k)t−git(Xi,(k)t,Ui,(k)t,Wit,Zi,(k)t)), (16)

that is, a collection of deterministic equations involving the expectation of (13) which is easily computable by a Monte Carlo approach.

Assume that DADP converges, leading to optimal Bellman functions  and to optimal solutions . We know that the initial almost-sure coupling constraints are not satisfied. To recover admissibility, the heuristic rule proposed at §3.3 consists in forming the global approximated Bellman functions as

 V∞t(x1,…,xN)=N∑i=1Vi,∞t(xi),

and then computing, at any time and for any pair , a control satisfying all the constraints of Problem (1) by solving the following one-step DP problem:

 max(u1,…,uN) N∑i=1Lit(xi,ui,wit,zi)+V∞t+1(x1t+1,…,xNt+1), s.t. xit+1=fit(xi,ui,wit,zi)∀i, zi+1=git(xi,ui,wit,zi)∀i.

#### 4.1.2 Complete process

To summarize, the whole process for implementing DADP is as follows.

• Optimization stage

• Apply the DADP algorithm, and obtain at convergence the local Bellman functions .

• Form the approximated global Bellman functions .

• Simulation stage

• Draw a large number of noise scenarios (Monte Carlo sampling).

• Compute the control values along each scenario by solving the one-step DP problems involving the Bellman functions ’s, thus satisfying all the constraints of Problem (1) as explained in §3.3; these computations produce for each scenario a state and control trajectories, as well as a payoff.

• Evaluate the quality of the solution: trajectories variability, payoff distribution and associated mean…

### 4.2 SDDP implementations

As will be explained in §4.3, the controls of the original problem are discrete, which is a priori a difficulty for SDDP implemetation.

We describe here the two implementations of SDDP that we use for numerical comparisons, that is, a classical version of SDDP in which the integrity constraints on the control variables are relaxed (continuous controls) in order to use standard quadratic programming, and a homemade discrete version of SDDP described later on.

#### 4.2.1 Continuous version of SDDP

The continuous implementation  of SDDP relaxes the integrity constraints upon the control , and assume that for all time and all dam

 xit∈[x––i,¯¯¯xi]uit∈[u––i,¯¯¯ui]. (17)

Furthermore,  considers that the spillage is a control variable, so as to render the dynamic linear. As the dynamic is linear and costs convex, we are in the standard framework of Stochastic Dual Dynamic Programming and the resolution converges asymptotically to the optimal solution of the relaxed problem Girardeau et al. (2015).

The whole process of  is as follows.

• Optimization stage. The implementation of  corresponds to the one described in Shapiro (2011), that is, the so-called DOASA implementation. Lower approximations of the Bellman functions are built iteratively. At iteration , the procedure consists of two passes.

• During the forward pass, we first sample a scenario of noise. We then simulate a stock trajectory by using the current approximation of the Bellman functions. This is done by successively solving one-step DP problem to determine the next stock value. Each of these one-step DP problems is in fact a continuous quadratic programming (QP) problem.

• In the backward pass, duality theory allows to find subgradient of lower approximations of the Bellman functions. This subgradients are used to construct valid cut, that is hyperplanes that are lower that the Bellman functions. Those cuts are then added to the current approximations of Bellman functions.

• Simulation stage. The simulation stage is identical to the one described at §4.1.2: the controls are computed with a one-step DP problem using the approximation of the Bellman functions obtained by .

The continuous version  is implemented in Julia, with the package StochDynamicProgramming101010See the github link https://github.com/JuliaOpt/StochDynamicProgramming.jl. built on top of the JuMP package used as a modeller Dunning et al. (2015). The QP problems are solved using CPLEX 12.5. Every 5 iterations, redundant cuts are removed thanks to the limited memory level-1 heuristic described in Guigues (2017). Indeed, without cuts removal, the resolution of each QP becomes too slow as the number of cuts increase along iterations. The algorithm is stopped after a fixed number of iterations and the gap is estimated with Monte-Carlo as described in Shapiro (2011).

#### 4.2.2 Discrete version of SDDP

In the discrete version of , the controls are discrete variables which have the same constraints as in the original problem formulation. Some other works mixing SDDP and binary variables are described in Zou et al. (2016) The whole process of  is as follows.

• Optimization stage. The implementation of  corresponds to the one described for  that is, approximations of the Bellman functions by a set of cuts which are built iteratively. At iteration , the procedure consists of two passes.

• During the forward pass, we first sample a scenario of noise. We then simulate a stock trajectory by using the current approximation of the Bellman functions. This is done by successively solving one-step DP problem to determine the next stock. Each of these one-step DP problems is solved by enumerating all possible values of the discrete control. In order to take advantage of the threading facilities, we may treat multiple noise scenarios in parallel (8 scenarios in practice).

• In the backward pass, the approximations of the Bellman functions are refined along the trajectories computed during the forward pass. For each new added state point we first solve a one-step DP problem to obtain a new Bellman value. Then we solve a family of one step DP problem using as state points the neighbors on the grid of the current state point. We are thus able to approximate a new cut at the current state point by approximating derivatives of the Bellman function by finite differences (we use centered finite differences for regular states and forward, backward differences for states at domain boundary).

• Simulation stage. The simulation stage is identical to the one described at §4.1.2: the controls are computed with a one-step DP problem using the approximation of the Bellman functions obtained by .

The algorithm is stopped after a fixed number of iterations (25 in our experiments). The number of added points at forward step as already explained was set to 8. During the iterations some cuts were dropped by only keeping the last 100 cuts obtained for each Bellman function.

### 4.3 Results obtained for academic valleys

We model a first collection of hydro valleys including from 4 to 12 dams, with arborescent geometries (see Figure 6).

The optimization problem is stated on a time horizon of one year, with a monthly time step (). All the dams have more or less the same maximal volume. The maximal amount of turbinated water for each dam varies with the location of the dam in the valley (more capacity for a downstream dam than for an upstream dam), as well as the random inflows in a dam (more inflow for an upstream dam than for a downstream dam). We assume discrete probability laws with finite supports for the inflows,111111Market prices are assumed to be deterministic. and we also assume that the available turbine controls are discrete, so that each dam is in fact modeled using a discrete Markov chain. These valleys do not correspond to realistic valleys, in the sense that a true valley incorporates dams with very heterogeneous sizes.

Let us first detail the gradient method used for the update of the multipliers involved by DADP. Thanks to the choice of constant information variables, the gradient expression involved in the update formula (16) is an expectation:

 E(Zi+1,(k)t−git(Xi,(k)t,Ui,(k)t,Wit,Zi,(k)t)).

This expectation can be approximated by a Monte Carlo approach. We draw a collection of scenarios121212Note that this collection of scenarios has nothing to do with the one used during the simulation stage of the complete process described at §4.1.2. of  and then compute at iteration of DADP the optimal solutions of Subproblem (15) along each scenario. We thus obtain realizations of , whose arithmetic mean gives the (approximated) gradient component at time for the coupling at dam . This gradient can be used either in the standard steepest descent method such as in (16), or in a more sophisticated algorithm such as the conjugate gradient or the quasi-Newton method. We use in our numerical experiments a solver (limited memory BFGS) of the MODULOPT library from INRIA Gilbert & Jonsson (2007). For all the valleys we studied, the convergence was fast (around 100 iterations regardless of the problem size). Figure 7 represents the evolution of the multipliers of dam connections for the 4-Dams valley along the iterations of the algorithm.

#### 4.3.2 Methods comparison

We solve Problem (1) for the first collection of academic valleys by 4 different methods:

1. the standard Dynamic Programming method (DP), if possible,

2. the continuous version of SDDP () presented at §4.2.1,

3. the discrete version of SDDP () presented at §4.2.2,