The Vanishing Approach for the Average Continuous Control of Piecewise Deterministic Markov Processes
Abstract
The main goal of this paper is to derive sufficient conditions for the existence of an optimal control strategy for the long run average continuous control problem of piecewise deterministic Markov processes (PDMP’s) taking values in a general Borel space and with compact action space depending on the state variable. In order to do that we apply the socalled vanishing discount approach (see [16], page 83) to obtain a solution to an average cost optimality inequality associated to the long run average cost problem. Our main assumptions are written in terms of some integrodifferential inequalities related to the socalled expected growth condition, and geometric convergence of the postjump location kernel associated to the PDMP.
Keywords: piecewisedeterministic Markov processes, continuoustime, longrun average  
cost, optimal control, integrodifferential optimality inequation, vanishing approach  
AMS 2000 subject classification: 60J25, 90C40, 93E20 
1 Introduction
A general family of nondiffusion stochastic models suitable for formulating optimization problems in several areas of operations research, namely piecewisedeterministic Markov processes (PDMP’s), was introduced in [6] and [8]. These processes are determined by three local characteristics; the flow , the jump rate , and the transition measure . Starting from the motion of the process follows the flow until the first jump time which occurs either spontaneously in a Poissonlike fashion with rate or when the flow hits the boundary of the statespace. In either case the location of the process at the jump time is selected by the transition measure and the motion restarts from this new point as before. A suitable choice of the state space and the local characteristics , , and provide stochastic models covering a great number of problems of operations research [8].
There exist two types of control for PDMP’s: continuous control and impulse control. This terminology has been introduced by M.H.A. Davis in [8, page 134] where continuous control is used to describe situations in which the control variable acts at all times on the process through the characteristics by influencing the deterministic motion and the probability of the jumps. On the other hand the terminology impulse control refers to a control that intervenes on the process by moving it to a new point of the state space at some times specifed by the controller.
In [4] it was studied the long run average continuous control problem of PDMP’s taking values in a general Borel space. At each point of the state space a control variable is chosen from a compact action set and is applied on the jump parameter and transition measure . The goal was to minimize the long run average cost, which is composed of a running cost and a boundary cost (which is added each time the PDMP touches the boundary). Both costs are assumed to be positive but not necessarily bounded. As far as the authors are aware of, this was the first time that this kind of problem was considered in the literature. Indeed, results are available for the long run average cost problem but for impulse control see Costa [3], Gatarek [13] and the book by M.H.A. Davis [8] (see the references therein). On the other hand, the continuous control problem has been studied only for discounted costs by A. Almudevar [1], M.H.A. Davis [7, 8], M.A.H. Dempster and J.J. Ye [9, 10], Forwick, Schäl, and Schmitz [12], M. Schäl [18], A.A. Yushkevich [20, 21].
This paper deals with the vanishing approach for the long run average continuous control problem of a PDMP and can be seen as a continuation of the results derived in [4]. By exploiting the special features of the PDMP’s we trace a parallel with the general theory for discretetime Markov Decision Processes (see, for instance, [15, 16]) rather than the continuoustime case (see, for instance [14, 22]). The two main reasons for doing that is to use the powerful tools developed in the discretetime framework (see for example the references [2, 11, 16, 17]) and to avoid working with the infinitesimal generator associated to a PDMP, which in most cases has its domain of definition difficult to be characterized. We develop further on the approach presented by the authors in [4] which consists of using a connection between the continuoustime control problem of a PDMP and a discretetime optimality equation (see the introduction of section 4 for a detailed explanation of this method). In particular, we derive sufficient conditions under which a boundedness condition (with the lower bound being a function rather than a constant as supposed in [4]) on the value functions for the discounted problems is satisfied. The main assumptions for this are based on some integrodifferential inequalities related to the socalled expected growth condition (see Assumption 3.1), and geometric convergence of the postjump location kernel associated to the PDMP (see Assumption 3.6). As a consequence, we obtain a result of existence of an optimal ordinary control strategy for the long run average control problem of a PDMP having the important property of being in a feedback form.
The paper is organized in the following way. In section 2 we introduce some notation, basic assumptions, and the problem formulation. In section 3 we introduce several assumptions related to the continuity of the parameters, the expected growth condition and geometric convergence of the postjump location of the PDMP. In the sequence we provide several key auxiliary results for obtaining a bound for the discounted problems, and some extensions of the results presented in [4] to the case in which the functions under consideration are not necessarily positive but just bounded by a test function . The main results are presented in section 4, which provides sufficient conditions for the existence of an optimal control strategy for the long run average continuous control problem of a PDMP and obtain a solution to an average cost optimality inequality associated to the long run average cost problem.
2 Notation, basic assumptions, and problem formulation
2.1 Presentation of the control problem
In this section we present some standard notation and some basic definitions related to the motion of a PDMP , and the control problems we will consider throughout the paper. For further details and properties the reader is referred to [8]. The following notation will be used in this paper: denotes the set of natural numbers, the set of real numbers, the set of positive real numbers and the dimensional euclidian space. We write as the Lebesgue measure on . For a metric space represents the algebra generated by the open sets of . (respectively, ) denotes the set of all finite (respectively probability) measures on . Let and be metric spaces. The set of all Borel measurable (respectively bounded) functions from into is denoted by (respectively ). Moreover, for notational simplicity (respectively , , ) denotes (respectively , , ). For with for all , is the set of functions such that . denotes the set of continuous functions from to . For , (respectively ) denotes the positive (respectively, negtive) part of .
Let be an open subset of , its boundary, and its closure. A controlled PDMP is determined by its local characteristics , as presented in the sequel. The flow is a function continuous in and such that For each the time the flow takes to reach the boundary starting from is defined as . For such that (that is, the flow starting from never touches the boundary), we set , where is a fixed point in . We define the following space of functions absolutely continuous along the flow with limit towards the boundary:
For and for which there exists such that where we define (note that the limit exists by assumption). As shown in Lemma 2 in [5], for there exists a function such that for all and .
The local characteristics and depend on a control action where is a compact metric space (there is no loss of generality in assuming this property for , see Remark 2.8 in [4]), in the following way: and is a stochastic kernel on given . For each we define the subsets of as the set of feasible control actions that can be taken when the state process is in , that is, the control action that will be applied to and must belong to . The following assumptions, based on the standard theory of Markov decision processes (see for example [16]), will be made throughout the paper:
Assumption 2.1
For all , is a compact subspace of .
Assumption 2.2
The set is a Borel subset of .
We present next the definition of an admissible control strategy and the associated motion of the controlled process. A control policy is a pair of functions satisfying , and for all . The class of admissible control strategies will be denoted by . Consider the state space . For a control policy let us introduce the following parameters for : the flow , the jump rate , and the transition measure
for and in . From [8, section 25], it can be shown that for any control strategy there exists a filtered probability space such that the piecewise deterministic Markov process with local characteristics may be constructed as follows. For notational simplicity the probability will be denoted by for . Take a random variable such that
where for and , If is equal to infinity, then for , . Otherwise select independently an valued random variable (labelled ) having distribution
The trajectory of starting from , for , is given by
Starting from , we now select the next interjump time and postjump location in a similar way. Let us define the components of the PDMP by . From the previous construction, it is easy to see that corresponds to the trajectory of the system, is the value of at the last jump time before , is time elapsed between the last jump and time , and is the number of jumps of the process at time . As in Davis [8], we consider the following assumption to avoid any accumulation point of the jump times:
Assumption 2.3
For any , , and , we have .
The costs of our control problem will contain two terms, a running cost and a boundary cost , satisfying the following properties:
Assumption 2.4
, and .
Define for , , and ,
where counts the number of times the process hits the boundary up to time and, for notational simplicity, set . The longrun average cost we want to minimize over is given by: and we set . For the discounted case, with , the cost we want to minimize is given by: and we set . We need the following assumption, to avoid infinite costs for the discounted case.
Assumption 2.5
For all and all , .
2.2 Discretetime relaxed and ordinary controls
We present in this subsection the set of discretetime relaxed and ordinary controls.
Consider equipped with the topology of uniform convergence and equipped with the
weak topology .
For , define as the set of measures satisfying .
and for are subsets of and are equipped with the relative topology.
Let (respectively for ) be the set of all measurable functions defined on with value in such that a.e. (respectively a.e.). It can be shown (see subsection 3.1 in [4]) that is a compact set of the metric space : a sequence in converges to if and only if for all
The sets of relaxed controls can be defined as follows: , for and . The set of ordinary controls, denoted by (respectively for ), is defined as above except that it is composed of deterministic functions instead of probability measures. More specifically we have , . Consequently, the set of ordinary controls is a subset of the set of relaxed controls (respectively for ) by identifying any control action with the Dirac measure concentrated on . Thus we can write that (respectively for ) and from now on we will consider that (respectively for ) will be endowed with the topology generated by . The necessity to introduce the class of relaxed control is justified by the fact that in general there does not exist a topology for which and are compact sets.
As in [16], page 14, we need that the set of feasible state/relaxedcontrol pairs is a measurable subset of , that is, we need the following assumption.
Assumption 2.6
2.3 Discretetime operators and measurability properties
In this subsection we present some important operators associated to the optimality equation of the discretetime problem. We consider the following notation and , and for , , and .
The following operators will be associated to the optimality equations of the discretetime problems that will be presented in the next sections. For , , , according to Lemma 2 in [11, Appendix 5] define
(1)  
For , we define . For , , , , , introduce
(2)  
(3) 
For (respectively, ), (respectively, ) provided the difference has a meaning. It will be useful in the sequel to define the function as follows: . In particular for we write for simplicity , , , . Measurability properties of the operators , , and are shown in [4, Proposition 3.4].
We present now the definitions of the onestage optimization operators.
Definition 2.7
Let , , and . Assume that for any and , is well defined. The (ordinary) onestage optimization operator is defined by
Assume that for any and , is well defined. The relaxed onestage optimization operator is defined by
In particular for we write for simplicity , and .
The sets of measurable selectors associated to , , are defined by , , .
For , , and , the onestage optimization problem associated to the operator , respectively , consists of finding a measurable selector , respectively such that for all , and respectively .
Finally we conclude this section by recalling (see Propositions 3.8 and 3.10 in [4]) that there exist two natural mappings from to and from to .
Definition 2.8
For , define the measurable mapping of the space into by
.
Definition 2.9
For , define the measurable mapping of the space into by of the space into .
3 Assumptions and auxiliary results
The purpose of this section is to introduce several assumptions (see subsection 3.1) and to derive preliminary results that will ensure the existence of an optimal control for the long run average cost. More specifically, the two main results of subsection 3.2 consist, roughly speaking, of providing a bound for in terms of (see Corollary 3.13) and of proving that the mapping defined by  for fixed in belongs to (see Theorem 3.17). The results of subsection 3.3 are extensions of those presented in [4] to the case in which the functions under consideration are not necessarily positive (as it was supposed in [4]) but instead belong to . It must be pointed out that these generalizations are not straightforward and are crucial for obtaining the existence of an optimal ordinary feedback control strategy for the long run averagecost problem of a PDMP. In particular, Theorem 3.22 states that for any function , the onestage optimization operators and are equal and that there exists an ordinary feedback measurable selector for the onestage optimization problems associated to these operators.
3.1 Assumptions and definitions
The next assumption is somehow related to the socalled expected growth condition (see, for instance, Assumption 3.1 in [15] for the discretetime case, or Assumption A in [14] for the continuoustime case) used, among other things, to guarantee uniform boundedness of with respect to .
Assumption 3.1
Suppose that there exist , , , and , , , satisfying for all
(4)  
(5) 
and for all with
(6)  
(7) 
Assumptions 3.2, 3.3 and 3.4, presented in the sequel, are needed to guarantee some convergence and semicontinuity properties of the onestage optimization operators (see subsection 3.3), and the existence of a measurable selector.
Assumption 3.2
For each , the restriction of to is continuous, for , and if then .
Assumption 3.3
There exists a sequence of measurable functions in such that for all , as and the restriction of to is continuous. There exists a sequence of measurable functions in such that for all , as and the restriction of to is continuous.
Assumption 3.4
For all and , the restriction of to is continuous.
We make the following definition:
Definition 3.5
Notice that the existence of follows from Assumptions 3.13.4 and Theorem 3.3.5 in [16], and the fact that , and follow from Proposition 3.10 in [4].
In the next assumption notice that for any , can be seen as the stochastic kernel associated to the postjump location of a PDMP. This assumption is related to some geometric ergodic properties of the operator (see for example the comments on page 122 in [17] or Lemma 3.3 in [15] for more details on this kind of assumption).
Assumption 3.6
Suppose that there exist , and for any there exists a probability measure , such that and
(8) 
for all and .
The final assumption is:
Assumption 3.7
There exist , , such that

and for all and ,

, for all ,

, for all with ,

, for all with ,

.
3.2 Properties of the discount value function
The next two propositions establish a connection between a general introdifferential inequality (respectively equality) related to the local characteristics of the PDMP and an inequality (respectively equality) related to the operators , and . They will be crucial for the boundedness results on to be developed in the sequel.
Proposition 3.9
Suppose that there exist , , , , , , and satisfying
(9) 
for all , and
(10) 
for all with .
Then
(11) 