The Vanishing Approach for the Average Continuous Control of Piecewise Deterministic Markov Processes

The Vanishing Approach for the Average Continuous Control of Piecewise Deterministic Markov Processes

O.L.V. Costa
Departamento de Engenharia de Telecomunicações e Controle
Escola Politécnica da Universidade de São Paulo
CEP: 05508 900-São Paulo, Brazil.
e-mail: oswaldo@lac.usp.br
This author received financial support from CNPq (Brazilian National Research Council), grant 304866/03-2 and FAPESP (Research Council of the State of São Paulo), grant 03/06736-7.

F. Dufour
Universite Bordeaux I
IMB, Institut Mathématiques de Bordeaux
INRIA Bordeaux Sud Ouest, Team: CQFD
351 cours de la Liberation
33405 Talence Cedex, France
e-mail : dufour@math.u-bordeaux1.fr
Author to whom correspondence should be sent.
Abstract

The main goal of this paper is to derive sufficient conditions for the existence of an optimal control strategy for the long run average continuous control problem of piecewise deterministic Markov processes (PDMP’s) taking values in a general Borel space and with compact action space depending on the state variable. In order to do that we apply the so-called vanishing discount approach (see [16], page 83) to obtain a solution to an average cost optimality inequality associated to the long run average cost problem. Our main assumptions are written in terms of some integro-differential inequalities related to the so-called expected growth condition, and geometric convergence of the post-jump location kernel associated to the PDMP.

 Keywords: piecewise-deterministic Markov processes, continuous-time, long-run average cost, optimal control, integro-differential optimality inequation, vanishing approach AMS 2000 subject classification: 60J25, 90C40, 93E20

1 Introduction

A general family of non-diffusion stochastic models suitable for formulating optimization problems in several areas of operations research, namely piecewise-deterministic Markov processes (PDMP’s), was introduced in [6] and [8]. These processes are determined by three local characteristics; the flow , the jump rate , and the transition measure . Starting from the motion of the process follows the flow until the first jump time which occurs either spontaneously in a Poisson-like fashion with rate or when the flow hits the boundary of the state-space. In either case the location of the process at the jump time is selected by the transition measure and the motion restarts from this new point as before. A suitable choice of the state space and the local characteristics , , and provide stochastic models covering a great number of problems of operations research [8].

There exist two types of control for PDMP’s: continuous control and impulse control. This terminology has been introduced by M.H.A. Davis in [8, page 134] where continuous control is used to describe situations in which the control variable acts at all times on the process through the characteristics by influencing the deterministic motion and the probability of the jumps. On the other hand the terminology impulse control refers to a control that intervenes on the process by moving it to a new point of the state space at some times specifed by the controller.

In [4] it was studied the long run average continuous control problem of PDMP’s taking values in a general Borel space. At each point of the state space a control variable is chosen from a compact action set and is applied on the jump parameter and transition measure . The goal was to minimize the long run average cost, which is composed of a running cost and a boundary cost (which is added each time the PDMP touches the boundary). Both costs are assumed to be positive but not necessarily bounded. As far as the authors are aware of, this was the first time that this kind of problem was considered in the literature. Indeed, results are available for the long run average cost problem but for impulse control see Costa [3], Gatarek [13] and the book by M.H.A. Davis [8] (see the references therein). On the other hand, the continuous control problem has been studied only for discounted costs by A. Almudevar [1], M.H.A. Davis [7, 8], M.A.H. Dempster and J.J. Ye [9, 10], Forwick, Schäl, and Schmitz [12], M. Schäl [18], A.A. Yushkevich [20, 21].

This paper deals with the vanishing approach for the long run average continuous control problem of a PDMP and can be seen as a continuation of the results derived in [4]. By exploiting the special features of the PDMP’s we trace a parallel with the general theory for discrete-time Markov Decision Processes (see, for instance, [15, 16]) rather than the continuous-time case (see, for instance [14, 22]). The two main reasons for doing that is to use the powerful tools developed in the discrete-time framework (see for example the references [2, 11, 16, 17]) and to avoid working with the infinitesimal generator associated to a PDMP, which in most cases has its domain of definition difficult to be characterized. We develop further on the approach presented by the authors in [4] which consists of using a connection between the continuous-time control problem of a PDMP and a discrete-time optimality equation (see the introduction of section 4 for a detailed explanation of this method). In particular, we derive sufficient conditions under which a boundedness condition (with the lower bound being a function rather than a constant as supposed in [4]) on the value functions for the discounted problems is satisfied. The main assumptions for this are based on some integro-differential inequalities related to the so-called expected growth condition (see Assumption 3.1), and geometric convergence of the post-jump location kernel associated to the PDMP (see Assumption 3.6). As a consequence, we obtain a result of existence of an optimal ordinary control strategy for the long run average control problem of a PDMP having the important property of being in a feedback form.

The paper is organized in the following way. In section 2 we introduce some notation, basic assumptions, and the problem formulation. In section 3 we introduce several assumptions related to the continuity of the parameters, the expected growth condition and geometric convergence of the post-jump location of the PDMP. In the sequence we provide several key auxiliary results for obtaining a bound for the discounted problems, and some extensions of the results presented in [4] to the case in which the functions under consideration are not necessarily positive but just bounded by a test function . The main results are presented in section 4, which provides sufficient conditions for the existence of an optimal control strategy for the long run average continuous control problem of a PDMP and obtain a solution to an average cost optimality inequality associated to the long run average cost problem.

2 Notation, basic assumptions, and problem formulation

2.1 Presentation of the control problem

In this section we present some standard notation and some basic definitions related to the motion of a PDMP , and the control problems we will consider throughout the paper. For further details and properties the reader is referred to [8]. The following notation will be used in this paper: denotes the set of natural numbers, the set of real numbers, the set of positive real numbers and the -dimensional euclidian space. We write as the Lebesgue measure on . For a metric space represents the -algebra generated by the open sets of . (respectively, ) denotes the set of all finite (respectively probability) measures on . Let and be metric spaces. The set of all Borel measurable (respectively bounded) functions from into is denoted by (respectively ). Moreover, for notational simplicity (respectively , , ) denotes (respectively , , ). For with for all , is the set of functions such that . denotes the set of continuous functions from to . For , (respectively ) denotes the positive (respectively, negtive) part of .

Let be an open subset of , its boundary, and its closure. A controlled PDMP is determined by its local characteristics , as presented in the sequel. The flow is a function continuous in and such that For each the time the flow takes to reach the boundary starting from is defined as . For such that (that is, the flow starting from never touches the boundary), we set , where is a fixed point in . We define the following space of functions absolutely continuous along the flow with limit towards the boundary:

 Mac(E) ={g∈M(E):g(ϕ(x,t)):[0,t∗(x))↦R is absolutely continuous for each x∈E and whenever t∗(x)<∞ the limit limt→t∗(x)g(ϕ(x,t)) exists}.

For and for which there exists such that where we define (note that the limit exists by assumption). As shown in Lemma 2 in [5], for there exists a function such that for all and .

The local characteristics and depend on a control action where is a compact metric space (there is no loss of generality in assuming this property for , see Remark 2.8 in [4]), in the following way: and is a stochastic kernel on given . For each we define the subsets of as the set of feasible control actions that can be taken when the state process is in , that is, the control action that will be applied to and must belong to . The following assumptions, based on the standard theory of Markov decision processes (see for example [16]), will be made throughout the paper:

Assumption 2.1

For all , is a compact subspace of .

Assumption 2.2

The set is a Borel subset of .

We present next the definition of an admissible control strategy and the associated motion of the controlled process. A control policy is a pair of functions satisfying , and for all . The class of admissible control strategies will be denoted by . Consider the state space . For a control policy let us introduce the following parameters for : the flow , the jump rate , and the transition measure

for and in . From [8, section 25], it can be shown that for any control strategy there exists a filtered probability space such that the piecewise deterministic Markov process with local characteristics may be constructed as follows. For notational simplicity the probability will be denoted by for . Take a random variable such that

 PU(x,k)(T1>t)≐{e−ΛU(x,k,t)for t

where for and , If is equal to infinity, then for , . Otherwise select independently an -valued random variable (labelled ) having distribution

 PU(x,k)(ˆXU1∈A×B×{0}×{k+1}|σ{T1})={Q(ϕ(x,T1),u(k,x,T1));A∩B) if ϕ(x,T1)∈E,Q(ϕ(x,T1),u∂(k,x);A∩B) if ϕ(x,T1)∈∂E.

The trajectory of starting from , for , is given by

 ˆXU(t)≐{(ϕ(x,t),x,t,k)for% t

Starting from , we now select the next inter-jump time and post-jump location in a similar way. Let us define the components of the PDMP by . From the previous construction, it is easy to see that corresponds to the trajectory of the system, is the value of at the last jump time before , is time elapsed between the last jump and time , and is the number of jumps of the process at time . As in Davis [8], we consider the following assumption to avoid any accumulation point of the jump times:

Assumption 2.3

For any , , and , we have .

The costs of our control problem will contain two terms, a running cost and a boundary cost , satisfying the following properties:

Assumption 2.4

, and .

Define for , , and ,

 Jα(U,t)=∫t0e−αsf(X(s),

where counts the number of times the process hits the boundary up to time and, for notational simplicity, set . The long-run average cost we want to minimize over is given by: and we set . For the discounted case, with , the cost we want to minimize is given by: and we set . We need the following assumption, to avoid infinite costs for the discounted case.

Assumption 2.5

For all and all , .

2.2 Discrete-time relaxed and ordinary controls

We present in this sub-section the set of discrete-time relaxed and ordinary controls.
Consider equipped with the topology of uniform convergence and equipped with the weak topology . For , define as the set of measures satisfying . and for are subsets of and are equipped with the relative topology.

Let (respectively for ) be the set of all -measurable functions defined on with value in such that -a.e. (respectively -a.e.). It can be shown (see sub-section 3.1 in [4]) that is a compact set of the metric space : a sequence in converges to if and only if for all

 limn→∞∫R+∫U(ϕ(x,t))g(t,u)μn(t,du)dt=∫R+∫U(ϕ(x,t))g(t,u)μ(t,du)dt.

The sets of relaxed controls can be defined as follows: , for and . The set of ordinary controls, denoted by (respectively for ), is defined as above except that it is composed of deterministic functions instead of probability measures. More specifically we have , . Consequently, the set of ordinary controls is a subset of the set of relaxed controls (respectively for ) by identifying any control action with the Dirac measure concentrated on . Thus we can write that (respectively for ) and from now on we will consider that (respectively for ) will be endowed with the topology generated by . The necessity to introduce the class of relaxed control is justified by the fact that in general there does not exist a topology for which and are compact sets.

As in [16], page 14, we need that the set of feasible state/relaxed-control pairs is a measurable subset of , that is, we need the following assumption.

Assumption 2.6

A sufficient condition is presented in [4, Proposition 3.3] to ensure that Assumption 2.6 holds.

2.3 Discrete-time operators and measurability properties

In this sub-section we present some important operators associated to the optimality equation of the discrete-time problem. We consider the following notation and , and for , , and .

The following operators will be associated to the optimality equations of the discrete-time problems that will be presented in the next sections. For , , , according to Lemma 2 in [11, Appendix 5] define

 Λμ(x,t) ≐ ∫t0λ(ϕ(x,s),μ(s))ds Gα(x,Θ;A) ≐ ∫t∗(x)0e−αs−Λμ(x,s)λQIA(ϕ(x,s),μ(s))ds (1) +e−αt∗(x)−Λμ(x,t∗(x))Q(ϕ(x,t∗(x)),μ∂;A).

For , we define . For , , , , , introduce

 Lαv(x,Θ) ≐ ∫t∗(x)0e−αs−Λμ(x,s)v(ϕ(x,s),μ(s))ds, (2) Hαw(x,Θ) ≐ e−αt∗(x)−Λμ(x,t∗(x))w(ϕ(x,t∗(x)),μ∂). (3)

For (respectively, ), (respectively, ) provided the difference has a meaning. It will be useful in the sequel to define the function as follows: . In particular for we write for simplicity , , , . Measurability properties of the operators , , and are shown in [4, Proposition 3.4].

We present now the definitions of the one-stage optimization operators.

Definition 2.7

Let , , and . Assume that for any and , is well defined. The (ordinary) one-stage optimization operator is defined by

 Tα(ρ,h)(x)=infΥ∈V(x){−ρLα(x,Υ)+Lαf(x,Υ)+Hαr(x,Υ)+Gαh(x,Υ)}.

Assume that for any and , is well defined. The relaxed one-stage optimization operator is defined by

 Rα(ρ,h)(x)=infΘ∈Vr(x){−ρLα(x,Θ)+Lαf(x,Θ)+Hαr(x,Θ)+Gαh(x,Θ)}.

In particular for we write for simplicity , and .

The sets of measurable selectors associated to , , are defined by , , .

For , , and , the one-stage optimization problem associated to the operator , respectively , consists of finding a measurable selector , respectively such that for all , and respectively .

Finally we conclude this section by recalling (see Propositions 3.8 and 3.10 in [4]) that there exist two natural mappings from to and from to .

Definition 2.8

For , define the measurable mapping of the space into by
.

Definition 2.9

For , define the measurable mapping of the space into by of the space into .

Remark 2.10

The measurable selectors of the kind as in Definition 2.8 are called ordinary feedback measurable selectors in the class and the control strategies of the kind as in definition 2.9 are called ordinary feedback control strategies in the class .

3 Assumptions and auxiliary results

The purpose of this section is to introduce several assumptions (see sub-section 3.1) and to derive preliminary results that will ensure the existence of an optimal control for the long run average cost. More specifically, the two main results of sub-section 3.2 consist, roughly speaking, of providing a bound for in terms of (see Corollary 3.13) and of proving that the mapping defined by - for fixed in belongs to (see Theorem 3.17). The results of sub-section 3.3 are extensions of those presented in [4] to the case in which the functions under consideration are not necessarily positive (as it was supposed in [4]) but instead belong to . It must be pointed out that these generalizations are not straightforward and are crucial for obtaining the existence of an optimal ordinary feedback control strategy for the long run average-cost problem of a PDMP. In particular, Theorem 3.22 states that for any function , the one-stage optimization operators and are equal and that there exists an ordinary feedback measurable selector for the one-stage optimization problems associated to these operators.

3.1 Assumptions and definitions

The next assumption is somehow related to the so-called expected growth condition (see, for instance, Assumption 3.1 in [15] for the discrete-time case, or Assumption A in [14] for the continuous-time case) used, among other things, to guarantee uniform boundedness of with respect to .

Assumption 3.1

Suppose that there exist , , , and , , , satisfying for all

 supa∈U(x){Xg(x)+cg(x)−λ(x,a)[g(x)−Qg(x,a)]}≤b, (4) supa∈U(x){f(x,a)}≤Mg(x), (5)

and for all with

 supa∈U(ϕ(x,t∗(x))){¯¯¯r(ϕ(x,t∗(x)))+Qg(ϕ(x,t∗(x)),a)}≤g(ϕ(x,t∗(x))), (6) supa∈U(ϕ(x,t∗(x))){r(ϕ(x,t∗(x)),a)}≤Mc+δ¯¯¯r(ϕ(x,t∗(x))). (7)

Assumptions 3.2, 3.3 and 3.4, presented in the sequel, are needed to guarantee some convergence and semi-continuity properties of the one-stage optimization operators (see sub-section 3.3), and the existence of a measurable selector.

Assumption 3.2

For each , the restriction of to is continuous, for , and if then .

Assumption 3.3

There exists a sequence of measurable functions in such that for all , as and the restriction of to is continuous. There exists a sequence of measurable functions in such that for all , as and the restriction of to is continuous.

Assumption 3.4

For all and , the restriction of to is continuous.

We make the following definition:

Definition 3.5

Consider and . We define:

1. as the measurable selector satisfying

 infa∈U(x){f(x,a)−λ(x, a)[w(x)−Qh(x,a)]} =f(x,ˆu(w,h)(x))−λ(x,ˆu(w,h)(x))[w(x)−Qh(x,ˆu(w,h)(x))],
 infa∈U(z){r(z,a)+Qh(z,a)} = r(z,ˆu(w,h)(z))+Qh(z,ˆu(w,h)(z)).
2. as the measurable selector derived from through the Definition 2.8.

3. as the control strategy derived from through the Definition 2.9.

Notice that the existence of follows from Assumptions 3.1-3.4 and Theorem 3.3.5 in [16], and the fact that , and follow from Proposition 3.10 in [4].

In the next assumption notice that for any , can be seen as the stochastic kernel associated to the post-jump location of a PDMP. This assumption is related to some geometric ergodic properties of the operator (see for example the comments on page 122 in [17] or Lemma 3.3 in [15] for more details on this kind of assumption).

Assumption 3.6

Suppose that there exist , and for any there exists a probability measure , such that and

 ∣∣Gkh(x,uϕ)−νu(h)∣∣≤a∥h∥gκkg(x), (8)

for all and .

The final assumption is:

Assumption 3.7

There exist , , such that

1. and for all and ,

2. , for all ,

3. , for all with ,

4. , for all with ,

5. .

Remark 3.8

Notice the following consequences of Assumption 3.7:

1. Assumption 3.7 c) implies that , and , for any with , , , , .

2. Assumptions 3.7 a) and b) imply that for any , , .

3.2 Properties of the α-discount value function JαD(⋅)

The next two propositions establish a connection between a general intro-differential inequality (respectively equality) related to the local characteristics of the PDMP and an inequality (respectively equality) related to the operators , and . They will be crucial for the boundedness results on to be developed in the sequel.

Proposition 3.9

Suppose that there exist , , , , , , and satisfying

 Xv(ϕ(x,t))−[α+λ(ϕ(x,t),μ(x,t))] v(ϕ(x,t))+ℓ(ϕ(x,t)) +λ(ϕ(x,t),μ(x,t))Qk(ϕ(x,t),μ(x,t))≤d, (9)

for all , and

 v(ϕ(x,t∗(x)))≥p(ϕ(x,t∗(x)))+Qk(ϕ(x,t∗(x)),μ∂(ϕ(x,t∗(x)))), (10)

for all with .
Then

 v(x) ≥ −dLα(x,Θ(x))+Lαℓ(x,Θ(x))+Hαp(x,Θ(x))+Gαk(x,Θ(x)). (11)

Proof: Multiplying both sides of equation (9) by and integrating over for we get that

 d∫s0e−αt−Λμ(x)(x,t)dt≥ e−αs−Λμ(x)(x,s)v(ϕ(x,s))−v(x)+∫s0e−αt−Λμ(x)(x,t)[ℓ(ϕ(x,t)) +λ(ϕ(x,t),μ(x,t))Qk(ϕ(x,t),μ(x,t))]dt. (12)

Consider the case in which . By using the fact that , we obtain from Remark 3.8 and equation (12) that

 v(x)≥ −dLα(x,Θ(x))+Lαℓ(x,Θ(x))+e−αt∗(x)−Λμ(x)(x,t∗(x))v(ϕ(x,t∗(x))) +∫t∗(x)0e−αt−Λμ(x)(x,t)λ(ϕ(x,t),μ(x,t))Qk(ϕ(x,t