Joint dynamic probabilistic constraints with projected linear decision rules

Joint dynamic probabilistic constraints with
projected linear decision rules

Vincent Guigues
22250-900 Rio de Janeiro, Brazil
   René Henrion
Weierstrass Institute Berlin
10117 Berlin, Germany

We consider multistage stochastic linear optimization problems combining joint dynamic probabilistic constraints with hard constraints. We develop a method for projecting decision rules onto hard constraints of wait-and-see type. We establish the relation between the original (infinite-dimensional) problem and approximating problems working with projections from different subclasses of decision policies. Considering the subclass of linear decision rules and a generalized linear model for the underlying stochastic process with noises that are Gaussian or truncated Gaussian, we show that the value and gradient of the objective and constraint functions of the approximating problems can be computed analytically.

Keywords  dynamic probabilistic constraints, multistage stochastic linear programs, linear decision rules.

AMS subject classifications: 90C15, 90C90, 90C30.

1 Introduction

Probabilistic constraints were introduced some fifty years ago under the name ’chance constraints’ by Charnes and Cooper [9]. A probabilistic constraint is an inequality


where is a mapping defining a random inequality system, is a decision vector, and is a random vector living on a probability space . The meaning of (1) is the following: a decision is feasible if and only if the random inequality system is satisfied at least with probability . Choosing close to one reflects the wish for robust decisions which can be interpreted in a probabilistic way.

In the beginning, efforts focussed on finding explicit deterministic equivalents for (1), i.e., on finding analytical functions such that (1) is equivalent with the inequality , see [29] for instance. Even if such instances are rare and usually related with special assumptions, e.g., one-dimensional random variables, individual probabilistic constraints, or assuming independent components of the random vector, it has been successfully applied more recently using Boolean Programming to attack joint probabilistic constraints with dependent random right-hand sides [25, 26] and later extended to stochastic programming problems with joint probabilistic constraints and multi-row random technology matrix [24].

A new era in the theoretical and algorithmical treatment of probabilistic constraints began with the pioneering work by Prékopa in the early seventies, when the theory of log-concave probabilty measures allowed to derive the convexity of feasible decisions induced by a large class of probabilistic constraints (1). Along with bounding and simulation techniques outperforming crude Monte Carlo approaches, this paved the way for applying efficient methods from convex optimization for the numerical solution of probabilistically constrained optimization problems. The monograph [34] is still a standard reference in this area.

Another breakthrough in this direction happened in the early nineties and was related with efficient codes for numerical integration of multivariate normal and t-probabilities due to Genz [16]. These codes are to the best of our knowledge the best performing ones in this area, up until now. For a recent survey on this topic, we refer to the monograph [17]. Along with a reduction technique which allows us to lead back analytically the computation of gradients to the computation of values in (1), these codes may be used for solving probabilistically constrained optimization problems in meaningful dimension of up to a few hundred (as far as the random vector is concerned). For some recent applications in energy management, we refer to [2] and [3].

Alternative solution methods rely on convex approximations of the chance constraints, see for instance [31] (where Bernstein approximations are used) and [11], and on the scenario approach to build computationally tractable approximations as in [6], [7], [12].

Applications of probabilistic constraints are abundant in engineering and finance (for an overview on the theory, numerics and applications of probabilistic constraints, we refer to, e.g., [40], [34], and [35]). Within engineering, power management problems are dominating as far as probabilistic constraints are concerned. In particular, hydro reservoir management is a fruitful instance for this class of optimization problems. We may refer to the basic monograph [28] and to some exemplary work in this field ([10], [14], [15], [19], [27], [30], [36], [37]).

In many applications, the decision has to be taken before the realization of the random parameter is observed (’here-and-now decisions’). However, decisions often depend on time, i.e., the vector represents a discrete decision process. In such case, the ’here-and-now’ setting of (1) means that decisions for the whole time period are taken prior to observing the random parameter, which is now a discrete stochastic process. Then inequality (1) represents a static probabilistic constraint because the decision process does not take into account the gain of information over time while observing the random process. To overcome this deficiency, one may pass from a decision vector to a closed-loop decision policy


each component of which represents a function of previously observed values of the random process for a given time. A simple way to compute a closed-loop strategy is the application of a rolling horizon policy which at any time of the horizon hedges against future uncertainty conditional to past realizations of the random process (see, e.g., [36], [37], [19], [20], [21]). Only the obtained optimal decision for the next time step is applied in reality. Another possibility consists in computing the policy at the beginning of the optimization period plugging (2) into (1) and (1) becomes a dynamic probabilistic constraint now acting on a variable from an infinite-dimensional space.

In this setting, in order to return to a numerically tractable problem in finite dimensions, the decision policies are often parameterized, the most common approach being the introduction of linear decision rules, i.e., for appropriate which now become the finite-dimensional substitutes for the originally infinite-dimensional variables. This strategy has been introduced to probabilistically constrained hydro reservoir problems as early as 1969 [38]. It was used there (and in subsequent publications) in the context of so-called individual probabilistic constraints where each component of the given random inequality system is individually turned into a probabilistic constraint:

The big advantage of such individual constraints is that - in case the component is separable with respect to - they are easily converted into explicit constraints via quantiles. In particular, if happens to be a linear mapping and the objective is linear too, then all one has to do to solve such a probabilistic optimization problem is to apply linear programming. It is well known, however, that the probability level chosen in an individual model may by far not correspond to the level in a joint model, given by (1), where the probability is taken over the entire inequality system. In [2] a hydro reservoir problem is presented where at an optimal release policy the level constraints are satisfied in each time interval with probability 90% individually, whereas the probability of keeping the level constraints through the whole time period is as low as 32%. This observation strongly suggests to deal with the joint model (1) albeit much more difficult to treat algorithmically.

Joint probabilistic constraints in the closed-loop sense discussed above have been investigated in [5] again in the context of a reservoir problem. Here a highly flexible piecewise constant approximation of decision policies was considered and it turned out that the optimal policies of the given problem were definitely not linear. However, a sufficiently fine piecewise approximation requires a big computational effort and limits the applicability of the model to a few time stages like three or four. Therefore, picking up again the idea of parameterized (in particular, linear) decision rules but now in the context of joint constraints appears to be reasonable.

Other authors embed optimization problems with dynamic probabilistic constraints into a dynamic programming scheme of optimal control, however, typically imposing simplifications with regard to the joint system of constraints like the assumption of independent components, or of a discrete distribution (scenarios) or of an individualized (via Boole-Bonferroni inequality) surrogate model (e.g., [8, 32]).

The aim of the current paper is to discuss several modeling issues in the context of dynamic probabilistic constraints putting the emphasis on

  • joint probabilistic constraints as in (1);

  • continuous multivariate distributions of the random vector (in particular, Gaussian) with typically correlated components;

  • parameterized decision rules (in particular, linear and projected linear ones); and

  • mixed probabilistic and hard (almost sure) constraints.

We do not intend to investigate the so-called time consistent models for dynamic probabilistic constraints as it was done, for instance, in [8]. This issue has been considered so far in the framework of Dynamic Programming, where the assumption of the random vector having independent components is paramount, e.g., [4]. Moreover, typically, a discrete distribution is assumed for numerical analysis. As pointed out above, we are interested here in continuously distributed distributions with potentially correlated variables. Though it seems possible to establish time consistent models for dynamic chance constraints under multivariate Gaussian distribution, this issue would complicate the analysis we have in mind here and is yet to be explored in future research.

Moreover, the focus of this paper is not to develop a new algorithm neither the study of a concrete application, although a simple hydro reservoir problem will guide us as an illustration. Our idea is rather to provide a modeling framework taking into account the items listed above and yielding a link to algorithmic approaches for static probabilistic constraints. The latter have been successfully dealt with numerically in the context of linear probabilistic constraints under multivariate Gaussian (and Gaussian-like) distribution (see, e.g., [34, 36, 2, 3]).

The paper is organized as follows: Section 2 presents a general linear multistage problem with probabilistic and hard constraints. It describes a method for projecting decision rules onto hard constraints of wait-and-see type. It finally establishes the relation between the original (infinite-dimensional) problem and approximating problems working with projections from different subclasses of decision policies. These subclasses are kept very general in this section while they are specialized to linear decision rules in Section 3. In that same section the probabilistic time series model we intend to use for the discrete stochastic process is made precise. It is clarified, how the objective, the probabilistic constraint and the hard constraints look like under this probabilistic model and the assumed linear decision rules. Finally, Section 4 explicitly develops the shape of general optimization problems introduced in Section 2 when assuming multivariate Gaussian and truncated Gaussian models for the discrete process. Advantages and difficulties for the different problems are discussed.

2 A linear multistage problem with probabilistic constraints

2.1 The general model

For given with , we consider a -stage stochastic linear minimization problem with the following random constraints:


Here, for , are -dimensional decision vectors, are -dimensional random vectors, and are given matrices of orders and , respectively, and are given vectors. In what follows, the index ’’ will be interpreted as time and and represent discrete decision and stochastic processes, respectively, having finite horizon. In this time-dependent setting, we shall assume that all components of the random process have the same dimension . The joint random vector is supposed to live in a probability space . Similarly to traditional multistage stochastic programming, we shall assume that the decision is taken in the beginning of time interval but the random vector is observed only at the end of that same interval. Therefore, the realization of is unknown at the time one has to decide on . On the other hand, in order to take into account the gain of information due to past observations of randomness, the decision is allowed to depend on such that is Borel measurable. In the following, we will refer to the , (including the deterministic first stage decision ) as decision policies rather than decision vectors in order to emphasize their functional character. Summarizing, we are dealing with the following problem:


where is the expectation operator and is a deterministic cost vector for stage .

Example 2.1

As an illustration, we consider a two-stage problem for the optimal release of a hydro-reservoir under stochastic inflow . The released water is used to produce and sell hydro-energy at a price which is assumed to be known in advance. Given the two stages, these quantities have components , , . The reservoir level is required to stay at both stages between given lower and upper limits , , respectively. Finally, the release is supposed to be bounded by fixed operational limits , , respectively, for turbining water at both time stages. Denoting by the initial water level in the reservoir, the random cost is given by while the random constraints can be written


It is easy to see that this is a special instance of problem (2.1) with data

As far as the constraints are concerned, satisfying them in expectation only, would result in decisions leading to frequent violation of constraints which is not desirable for a stable operation say of technological equipment, etc. At the other extreme, constraints could be required to hold almost surely, thus yielding very robust decisions avoiding violation of constraints with probability one. In that case, we obtain the well-defined optimization problem


If in the constraints of (6) one had that , then the last component of the random process would not enter the constraints and (6) would represent a conventional multistage stochastic linear program. Note, however, that in the two-stage problem (5) and so the random inflow observed only after taking the last decision plays a role in some of the (level) constraints. In such cases, insisting on almost sure satisfaction of constraints may be impossible in particular for unbounded random distributions. In (5), for instance, no matter what has been observed () or decided on (,) until the beginning of the second time interval, the last unknown inflow could always be large enough to eventually violate the upper-level constraint

Therefore, one has to look for alternative models for such constraints leaving the possibility of a ’controlled’ violation. These observations lead us to distinguish in (6) between hard constraints which have to be satisfied almost surely for physical or logical reasons and soft constraints which can be dealt with in a more flexible way. A typical example for hard constraints are the lower and upper limits for the amounts of turbined water () in (5): there is no turbining beyond the given operational limits just for physical reasons.

On the other hand, the reservoir level constraints could be considered to be soft ones. Suppose, for instance, that in (5) represents the physical lower limit of the reservoir below which no water is released and turbined. Then, a violation of the lower-level constraint can never happen and so the corresponding two inequalities can be removed from (5). Doing so, one has to take into account, however, that not the total amount of the release policies and , respectively, can be turbined and sold at the given prices but only the part not violating the lower-level constraint, i.e., in the first stage and in the second stage. This means that the original profits and at the two stages have to be reduced by the amounts and , respectively, where the lower index ’+’ as usual represents the component-wise maximum of the given expression and zero. In this way, the original lower-level constraints in (5) have been removed and compensated for by appropriate penalty terms in the objective.

Next, suppose that in (5) represents some upper limit of the reservoir which is considerably lower than the physical one and serves the purpose of keeping a flood reserve. Then we may neither be able to satisfy this upper limit almost surely (see above) nor to remove it in exchange for an appropriate penalty. In such cases it is reasonable to impose a probabilistic constraint instead:

where is a specified probability level. Hence, the release policies are defined to be feasible if the indicated set of random inequalities is satisfied at least with probability . Observe that would yield the almost sure constraints again, hence choosing close to but smaller than one, offers us the possibility of finding a feasible release policy while keeping the soft upper-level constraint in a very robust sense.

Example 2.2

Taking into account all three kinds of hard and soft constraints in the (random) hydro reservoir model (5), one ends up with the following well-defined optimization problem:

subject to

Here, the group of soft lower-level constraints has disappeared and entered the objective as a second penalization term, the group of soft upper-level constraints (for which no penalization costs are available) has turned into a probabilistic constraint and the group of hard box constraints is formulated in the almost sure sense.

Applying this strategy to the general random constraints (6), we are led to partition the data matrices and vectors for  ,   and  ,  as

according to penalized soft constraints (upper index (1)), probabilistic soft constraints (upper index (2)) and almost sure hard constraints (upper index (3)). Accordingly, (6) turns into the well-defined optimization problem

minimize (8)
subject to

Here, the refer to cost vectors penalizing the violation of soft constraints with upper index (1).

2.2 Projection onto hard constraints of wait-and-see type

We will refer in (6) to wait-and-see constraints if for all , and to here-and-now constraints otherwise. The distinction is made according to whether in the constraint of any stage there is unobserved randomness left or not. For example, in (5), the first two inequalities (level constraints) are here-and-now whereas the last two (operational limits) are wait-and-see. As mentioned earlier, the almost sure constraints in (8) do not have a good chance to be ever satisfied if and the support of the random distribution is unbounded. We will get back to such here-and-now constraints for bounded support of the random distribution in Section 4.5. First, let us deal with the case where all hard constraints are of wait-and-see type as in (7). In this case, owing to for all , the constraint set of (8) can be written as


In the context of numerical solution approaches, one will usually not work in the infinite-dimensional setting of all Borel measurable policies but rather with a finite-dimensional approximation which may be defined by some proper subset of policies. Later in this paper we will deal with the class of linear decision rules (see Section 3.2). The feasible set of (8) will then become the intersection rather than just . This intersection may turn out to be very small or even empty thus leading to a poor approximation of the infinite-dimensional problem (8). If, for instance, one of the hard constraints is given as (-almost surely) and if, moreover, the class of policies is

then, clearly, . One possibility to avoid this kind of problem is to operate with projections of policies onto the feasible domain of hard constraints.

Given a closed convex subset of a finite-dimensional space, we denote the uniquely defined projection onto this set by . For , we introduce the multifunctions


Here, we adopt the previous notation from . By we denote the operator which maps a policy to a new policy defined iteratively by


starting from . For example, for one gets successively that

so that is correctly defined and by (10) satisfies the hard (almost sure) constraints of (8). (11) amounts to a scenario-wise projection onto the polyhedra (10) which can be carried out numerically by solving a convex quadratic program subject to linear constraints. In the special case of rectangular sets , which can be modeled as a hard constraint in (9) by putting for and :


an explicit formula can be exploited: projection of a policy then just means cutting it off at the given lower and upper limits. For instance, in the context of the hard constraints in (7), one has that


As mentioned above, projection via is a way to enforce the hard constraints. This offers several alternatives to the above-mentioned direct intersection of feasible policies from with a given (typically finite-dimensional) subclass . One option would consist in working from the very beginning with projected policies so that the feasible set would become rather than . Indeed, we shall see in Lemma 2.3 that the intersection with the original infinite-dimensional feasible set may be substantially larger by doing so (in particular it would be no more empty in the example discussed before). A second option would consist in relaxing the hard constraints to probabilistic constraints similar to the ones given from the beginning and projecting them afterwards onto the set defined by hard constraints. We formalize this idea by introducing the alternative (infinite-dimensional) constraint set


We shall see in Lemma 2.3 that the projection of onto the hard constraints yields the set , so there is no difference in the solution of (8) in the original infinite-dimensional setting. When considering intersections with a subclass , however, a significant advantage over working with may be observed.

2.3 Approximating the original problem by means of subclasses of decision rules

The following result clarifies the relations between the feasible sets , introduced above and their intersection with (projections of) subclasses of decision rules:

Lemma 2.3

If is an arbitrary subset of Borel measurable policies , then the following chain of inclusions holds true:

In particular, by setting equal to the space of all Borel measurable policies, we derive that .

Proof. Let . Then, the probabilistic constraint for the first and the almost sure constraints for the other inequality system in (9), respectively, guarantee that the joint probabilistic constraint in (14) is satisfied, hence . With fulfilling the almost sure constraints in (9), we have that , whence . This proves the first inclusion in the above chain. Next, as for the second inequality, let , hence for some . In particular, and it remains to show that . As an image of the mapping , satisfies the almost sure constraints of (9). By and (14), there exists a measurable set such that and

are satisfied for all and all . By (11), the second inequality system implies (successively for from to ) that

Hence, again by (11), for all . Therefore, the first inequality system above can be written as

Since it follows that satisfies the probabilistic constraint in (9). Summarizing we have shown that also , whence the desired inclusion follows. The last inclusion is trivial.

The previous lemma suggests to consider the following four optimization problems each of them being some relaxation of our original optimization problem (8):


Here refers to the objective function of (8) and is a given subclass of decision policies. The meaning of (15), (17) and (18) is clear and relates to the feasible sets considered in Lemma 2.3. In (16) we determine first the solution(s) of the inner optimization problem and then project them via . If this inner optimization problem has multiple solutions, then we choose those of their projections under yielding the smallest value of the objective. We observe that (17) has the same optimal value as the problem


where the projection is shifted from the constraints to the objective, and that is a solution of (19) if and only if is  a solution of (17). Hence, (17) and (19) are equivalent and it may be a matter of convenience which of the two forms is preferred. The potential advantage of (16) say over (17) and (18) is that projections do not have to be dealt with in the constraints or in the objective directly but can be carried out after solving the problem.

Lemma 2.4

Denote by , , , , respectively, the optimal values of problems (15)-(18) and by the optimal value of the originally given problem (8). Then, any solution of problems (15)-(18) is feasible for problem (8) and it holds that

Proof. From Lemma 2.3 we see that any feasible point and, hence, any solution of (15), (17) and (18) is feasible for (8). From the inclusions of Lemma 2.3 it follows that . Now, let be a solution of (16). Then, there exists some such that and solves the problem . In particular, is feasible for (17). This implies first by Lemma 2.3 and, hence, the asserted feasibility of for (8). Second, it implies the desired remaining relation .

Lemma 2.4 can be interpreted as follows: Problem (15) reflects the pure transition to a subclass of policies in the originally given problem (8). The resulting loss in optimal value equals . In contrast, using projections onto hard constraints in the one or other way as in (17) and (18) may lead to smaller losses in the optimal values. Of course, this advantage of working with projections requires that the computational gain by passing to an interesting subclass is not destroyed by the projection procedure. This is why in Section 3.2 we shall introduce the class of linear decision rules as a suitable one harmonizing well to a certain degree with projections onto polyhedral sets. The following example illustrates Lemma 2.4:

Example 2.5

Consider the following problem with policies as variables:

We assume that the random vector follows a uniform distribution over the set and that . As a subclass of policies, we consider (purely) linear second stage decisions:

  • Solution of the original problem (8):

    We claim that the optimal value of the original problem equals . Indeed, it cannot be smaller than due to the constraint . On the other hand, and for all represents a feasible policy because it clearly satisfies the almost sure constraints and the set of satisfying and    covers one-third of the support of . Hence the probabilistic constraint is satisfied too. The objective value associated with this feasible policy equals , so as asserted.

  • Solution of problem (15):

    The feasible set here is and a feasible second stage policy has to be trivial in order to satisfy the almost sure constraint . Then, the only choice for such that satisfies the probabilistic constraint is (only then, the set of satisfying and covers one-third of the support of ). Hence the feasible set in this problem reduces to a singleton and its optimal value equals to the objective value of this singleton: .

  • Solution of problems (16) and (17):

    As stated above, (17) is equivalent with (19). In our example, is the projection onto the first component, hence we seek to minimize over the constraint set


    where with

    (see Figure 1).

    Figure 1: Representations of and : top figures for , bottom left for , and bottom right for .

    Note that in (20) we were allowed to extract the deterministic constraint from the probabilistic constraint.

    As is the projection of onto the first stage almost sure constraint set (see (11) and (10)), we get that . Consequently, according to (19), we want to minimize for all policies belonging to (20). We consider three cases:

    • For (see top left in Figure 1), we have .

    • For (see bottom left in Figure 1), we have . The smallest value of satisfying is obtained taking and .

    • For (see bottom right in Figure 1), we get

      In particular, . We distinguish the two subcases:

      • : if then and if then .

      • : If then and, hence,

    Summarizing, the best value of the objective at an admissible solution of (19) equals and is realized uniquely by the optimal policy . The latter is therefore the unique optimal solution of (19). According to our observation above, its projection


    onto the almost sure constraints in our example is an optimal solution of (17). The associated function value equals which therefore is the optimal value of (17). It follows that .

    On the other hand, as we have already observed that    due to  , it follows that the unique optimal solution of (19) yields the unique optimal solution to the problem

    at the same time. Hence, its projection onto the almost sure constraints is the already identified solution (21) of problem (17) implying that the optimal value of problem (16) is the same as that of (17): .

  • Solution of problem (18): By virtue of (13), the policies belonging to the set have the form for some and (see Figure 1). Since these policies already satisfy the almost sure constraints, all one has to add in order to get a policy feasible for (18) is the satisfaction of the probabilistic constraint. Observe that

    where with

    (see Figure 1). For (see Figure 1), we have

    For (see bottom left in Figure 1), we have . The smallest value of satisfying is obtained taking and . Finally, for (see bottom right in Figure 1), we assume that . Then,

    This means that there is no feasible policy with and . Consequently, the optimal value of (18) equals and is realized by the policy which is the projection of the decision rule onto the hard box constraints.

3 Probabilistic model and linear decision rules

Example 2.5 has illustrated the different approximating optimization problems with respect to the given one (8). In order to formulate these ideas in a practically meaningful framework, one has to specify the probabilistic model for the random vector and a suitable subclass of decision rules in Lemma 2.3.

3.1 Probabilistic model

We introduce in this section the class of stochastic processes we consider. Each component of follows a linear model of the form


where is the tendency for period and lags are nonnegative and depend on time. We assume that for every , the coefficients , and are nonzero.

Finally, the noises are supposed to obey centered Gaussian laws