Controlled stochastic networks in heavy traffic: Convergence of value functions
Abstract
Scheduling control problems for a family of unitary networks under heavy traffic with general interarrival and service times, probabilistic routing and an infinite horizon discounted linear holding cost are studied. Diffusion control problems, that have been proposed as approximate models for the study of these critically loaded controlled stochastic networks, can be regarded as formal scaling limits of such stochastic systems. However, to date, a rigorous limit theory that justifies the use of such approximations for a general family of controlled networks has been lacking. It is shown that, under broad conditions, the value function of the suitably scaled network control problem converges to that of the associated diffusion control problem. This scaling limit result, in addition to giving a precise mathematical basis for the above approximation approach, suggests a general strategy for constructing near optimal controls for the physical stochastic networks by solving the associated diffusion control problem.
10.1214/11AAP784 \volume22 \issue2 2012 \firstpage734 \lastpage791 \newproclaimassu[theorem]Assumption \newproclaimremark[theorem]Remark \newproclaimdefinition[theorem]Definition \newproclaimdefn[theorem]Definition \newproclaimcond[theorem]Condition \newproclaimpr[theorem]Property \newproclaimexampleExample
Convergence of value functions
A]\fnmsAmarjit \snmBudhiraja\thanksreft1label=e1]budhiraj@email.unc.edu and B]\fnmsArka P. \snmGhosh\corref\thanksreft2label=e2]apghosh@iastate.edu \thankstextt1Supported in part by the NSF (DMS1004418), Army Research Office (W911NF010080, W911NF1010158) and the USIsrael Binational Science Foundation (2008466). \thankstextt2Supported in part by NSF Grant DMS0608634.
class=AMS] \kwd[Primary ]60K25 \kwd68M20 \kwd90B22 \kwd90B36 \kwd[; secondary ]60J70. Heavy traffic \kwdstochastic control \kwdscaling limits \kwddiffusion approximations \kwdunitary networks \kwdcontrolled stochastic processing networks \kwdasymptotic optimality \kwdsingular control with state constraints \kwdBrownian control problem (BCP).
1 Introduction
As an approximation to control problems for criticallyloaded stochastic networks, Harrison (in harri2 (), see also harri1 (), harricanon ()) has formulated a stochastic control problem in which the state process is driven by a multidimensional Brownian motion along with an additive control that satisfies certain feasibility and nonnegativity constraints. This control problem, that is, usually referred to as the Brownian Control Problem (BCP) has been one of the key developments in the heavy traffic theory of controlled stochastic processing networks (SPN). BCPs can be regarded as formal scaling limits for a broad range of scheduling and sequencing control problems for multiclass queuing networks. Finding optimal (or even nearoptimal) control policies for such networks—which may have quite general nonMarkovian primitives, multiple server capabilities and rather complex routing geometry—is in general prohibitive. In that regard, BCPs that provide significantly more tractable approximate models are very useful. In this diffusion approximation approach to policy synthesis, one first finds an optimal (or nearoptimal) control for the BCP which is then suitably interpreted to construct a scheduling policy for the underlying physical network. In recent years there have been many works atakumar (), bellwill (), bellwill2 (), BudGho (), meyn (), wardkumar (), chenyao (), dailin () that consider specific network models for which the associated BCP is explicitly solvable (i.e., an optimal control process can be written as a known function of the driving Brownian motions) and, by suitably adapting the solution to the underlying network, construct control policies that are asymptotically (in the heavy traffic limit) optimal. The paper KuMa () also carries out a similar program for the crisscross network where the state–space is three dimensional, although an explicit solution for the BCP here is not available.
Although now there are several papers which establish a rigorous connection between a network control problem and its associated BCP by exploiting the explicit form of the solution of the latter, a systematic theory which justifies the use of BCPs as approximate models has been missing. In a recent work BudGho2 () it was shown that for a large family of Unitary Networks (following terminology of WillBram2work (), these are networks with a structure as described in Section 2), with general interarrival and service times, probabilistic routing and an infinite horizon discounted linear holding cost, the cost associated with any admissible control policy for the network is asymptotically, in the heavy traffic limit, bounded below by the value function of the BCP. This inequality, which provides a useful bound on the best achievable asymptotic performance for an admissible control policy, was a key step in developing a rigorous general theory relating BCPs with SPN in heavy traffic.
The current paper is devoted to the proof of the reverse inequality. The network model is required to satisfy assumptions made in BudGho2 () (these are summarized above Theorem 2.1). In addition, we impose a nondegeneracy condition (Assumption 2), a condition on the underlying renewal processes regarding probabilities of deviations from the mean (Assumption 2) and regularity of a certain Skorohod map (Assumption 2) (see next paragraph for a discussion of these conditions). Under these assumptions we prove that the value function of the BCP is bounded below by the heavy traffic limit (limsup) of the value functions of the network control problem (Theorem 2.2). Combining this with the result obtained in BudGho2 () (see Theorem 2.1), we obtain the main result of the paper (Theorem 2.4). This theorem says that, under broad conditions, the value function of the network control problem converges to that of the BCP. This result provides, under general conditions, a rigorous basis for regarding BCPs as approximate models for critically loaded stochastic networks.
Conditions imposed in this paper allow for a wide range of SPN models. Some such models, whose description is taken from WillBram2work (), are discussed in detail in Examples 2(a)–(c). We note that our approach does not require the BCP to be explicitly solvable and the result covers many settings where explicit solutions are unavailable. Most of the conditions that we impose are quite standard and we only comment here on three of them: Assumptions 2, 2 and 2. Assumption 2 says that each buffer is processed by at least one basic activity (see Remark 2). This condition, which was introduced in WillBram2work (), is fundamental for our analysis. In fact, WillBram2work () has shown that without this assumption even the existence of a nonnegative workload matrix may fail. Assumption 2 is a natural condition on the geometry of the underlying network. Roughly speaking, it says that a nonzero control action leads to a nonzero state displacement. Assumption 2 is the third key requirement in this work. It says that the Skorohod problem associated with a certain reflection matrix [see equation (43) for the definition of ] is well posed and the associated Skorohod map is Lipschitz continuous. As Example 2 discusses, this condition holds for a broad family of networks (including all multiclass open queuing networks, as well as a large family of parallel server networks and jobshop networks).
The papers atakumar (), bellwill (), bellwill2 (), BudGho (), wardkumar (), dailin () noted earlier, that treat the setting of explicitly solvable BCP, do much more than establish convergence of value functions. In particular, these works give an explicit implementable control policy for the underlying network that is asymptotically optimal in the heavy traffic limit. In the generality treated in the current work, giving explicit recipes (e.g., threshold type policies) is unfeasible, however, the policy sequence constructed in Section 4.1 suggests a general approach for building near asymptotically optimal policies for the network given a near optimal control for the BCP. Obtaining near optimal controls for the BCP in general requires numerical approaches (see, e.g., DuKu (), Kushbook (), meynbook ()), discussion of which is beyond the scope of the current work.
We now briefly describe some of the ideas in the proof of the main result—Theorem 2.2. We begin by choosing, for an arbitrary , a suitable optimal control for the BCP and then, using , construct a sequence of control policies for the network model such that the (suitably scaled) cost associated with converges to that associated with , as . This yields the desired reverse inequality. One of the key difficulties is in the translation of a given control for the BCP to that for the physical network. Indeed, a (near) optimal control for the BCP can be a very general adapted process with RCLL paths. Without additional information on such a stochastic process, it is not at all clear how one adapts and applies it to a given network model. A control policy for the network needs to specify how each server distributes its effort among various job classes at any given time instant. By a series of approximations we show that one can find a rather simple optimal control for the BCP, that is, easy to interpret and implement on a network control problem. As a first step, using PDE characterization results for general singular control problems with state constraints from AtBu () (these, in particular, make use of the nondegeneracy assumption—Assumption 2), one can argue that a nearoptimal control can be taken to be adapted to the driving Brownian motion and be further assumed to have moments that are subexponential in the time variable (see Lemma 3.7). Using results from BuRo (), one can perturb this control so that it has continuous sample paths without significantly affecting the cost. Next, using ideas developed by Kushner and Martins KuMa () in the context of a twodimensional BCP, one can further approximate such a control by a process with a fixed (nonrandom) finite number of jumps that take values in a finite set. Two main requirements (in addition to the usual adaptedness condition) for such a process to be an admissible control of a BCP (see Definition 2) are the nonnegativity constraints (39) and state constraints (38). It is relatively easy to construct a pure jump process that satisfies the first requirement of admissibility, namely, the nonnegativity constraints, however, the nondegenerate Brownian motion in the dynamics rules out the satisfaction of the second requirement, that is, state constraints, without additional modifications. This is where the regularity assumption on a certain Skorohod map (Assumption 2) is used. The pure jump control is modified in a manner such that in between successive jumps one uses the Skorohod map to employ minimal control needed in order to respect state constraints. Regularity of the Skorohod problem ensures that this modification does not change the associated cost much. The Skorohod map also plays a key role in the weak convergence arguments used to prove convergence of costs. The above construction is the essential content of Theorem 3.3. The optimal control that we use for the construction of the policy sequence requires two additional modifications [see part (iii) of Theorem 3.3 and below (59)] which facilitate adapting such a control for the underlying physical network and in some weak convergence proofs, but we leave that discussion for later in Section 3 (see Remark 3 and above Theorem 3.5).
Using a nearoptimal control of the form given in Section 3 (cf. Theorem 3.5), we then proceed to construct a sequence of policies for the underlying network. The key relation that enables translation of into is (16) using which one can loosely interpret as the asymptotic deviation, with suitable scaling, of from the nominal allocation (see Definition 2 for the definition of nominal allocation vector). Recall that is constructed by modifying, through a Skorohod constraining mechanism, a pure jump process (say, ). In particular, has sample paths that are, in general, discontinuous. On the other hand, note that an admissible policy is required to be a Lipschitz function (see Remark 2). This suggests the following construction for . Over time periods (say, ) of constancy of one should use the nominal allocation (i.e., ), while jumpinstants should be stretched into periods of length of order (note that in the scaled network, time is accelerated by a factor of and so such periods translate to intervals of length in the scaled evolution and thus are negligible) over which a nontrivial control action is employed as dictated by the jump vector (see Figure 4 for a more complete description). This is analogous to the idea of a discrete review policy proposed by Harrison bigstep () (see also atakumar () and references therein). There are some obvious difficulties with the above prescription, for example, a nominal allocation corresponds to the average behavior of the system and for a given realization is feasible only when the buffers are nonempty. Thus, one needs to modify the above construction to incorporate idleness, that is, caused due to empty buffers. The effect of such a modification is, of course, very similar to that of a Skorohod constraining mechanism and it is tempting to hope that the deviation process corresponding to this modified policy converges to (in an appropriate sense), as . However, without further modifications, it is not obvious that the reflection terms that are produced from the idling periods under this policy are asymptotically consistent with those obtained from the Skorohod constraining mechanism applied to (the state process corresponding to) . The additional modification [see (100)] that we make roughly says that jobs are processed from a given buffer over a small interval , only if at the beginning of this interval there are a “sufficient” number of jobs in the buffer. This idea of safety stocks is not new and has been used in previous works (see, e.g., bellwill (), bellwill2 (), atakumar (), BudGho (), meynbook ()). The modification, of course, introduces a somewhat nonintuitive idleness even when there are jobs that require processing. However, the analysis of Section 4 shows that this idleness does not significantly affect the asymptotic cost. The above very rough sketch of construction of is made precise in Section 4.1.
The rest of the paper is devoted to showing that the cost associated with converges to that associated with . It is unreasonable to expect convergence of controls (e.g., with the usual Skorohod topology)—in particular, note that has Lipschitz paths for every while is a (modification of) a pure jump process – however, one finds that the convergence of costs holds. This convergence proof, and the related weak convergence analysis, is carried out in Sections 4.2 and 4.3.
The paper is organized as follows. Section 2 describes the network structure, all the associated stochastic processes and the heavytraffic assumptions as well as the other assumptions of the paper. The section also presents the SPN control problem, that is, considered here, along with the main result of the paper (Theorem 2.4). Section 3 constructs (see Theorem 3.5) a nearoptimal control policy for the BCP which can be suitably adapted to the network control problem. In Section 4 the nearoptimal control policy from Section 3 is used to obtain a sequence of admissible control policies for the scaled SPN. The main result of the section is Theorem 4.5, which establishes weak convergence of various scaled processes. Convergence of costs (i.e., Theorem 2.3) is an immediate consequence of this weak convergence result. Theorem 2.4 then follows on combining Theorem 2.3 with results of BudGho2 () (stated as Theorem 2.1 in the current work). Finally, the Appendix collects proofs of some auxiliary results.
The following notation will be used. The space of reals (nonnegative reals), positive (nonnegative) integers will be denoted by (), (), respectively. For and will denote the space of continuous functions from (resp. ) to with the topology of uniform convergence on compacts (resp. uniform convergence). Also, will denote the space of right continuous functions with left limits, from (resp. ) to with the usual Skorohod topology. For and , we write and , where for , All vector inequalities are to be interpreted componentwise. We will call a function nonnegative if for all . A function is called nondecreasing if it is nondecreasing in each component. All (stochastic) processes in this work will have sample paths that are right continuous and have left limits, and thus can be regarded as valued random variables with a suitable . For a Polish space , will denote the corresponding Borel sigmafield. Weak convergence of valued random variables to will be denoted as . Sequence of processes is tight if and only if the measures induced by ’s on form a tight sequence. A sequence of processes with paths in () is called tight if it is tight in and any weak limit point of the sequence has paths in almost surely (a.s.). For processes , defined on a common probability space, we say that converge to , uniformly on compact time intervals (u.o.c.), in probability (a.s.) if for all , converges to zero in probability (resp. a.s.). To ease the notational burden, standard notation (that follow bramwill1 (), WillBram2work ()) for different processes are used (e.g., for queuelength, for idle time, for workload process etc.). We also use standard notation, for example, , to denote fluid scaled, respectively, diffusion scaled, versions of various processes of interest [see (2) and (2)]. All vectors will be column vectors. An dimensional vector with all entries will be denoted by . For a vector , will denote the diagonal matrix such that the vector of its diagonal entries is . will denote the transpose of a matrix . Also, will denote generic constants whose values may change from one proof to the next.
2 Multiclass queueing networks and the control problem
Let be a probability space. All the random variables associated with the network model described below are assumed to be defined on this probability space. The expectation operation under will be denoted by .
Network structure
We begin by introducing the family of stochastic processing network models that will be considered in this work. We closely follow the terminology and notation used in harri2 (), harri1 (), harricanon (), bellwill2 (), WillBram2work (), bramwill1 (). The network has infinite capacity buffers (to store many different classes of jobs) and nonidentical servers for processing jobs. Arrivals of jobs, given in terms of suitable renewal processes, can be from outside the system and/or from the internal rerouting of jobs that have already been processed by some server. Several different servers may process jobs from a particular buffer. Service from a given buffer by a given server is called an activity. Once a job starts being processed by an activity, it must complete its service with that activity, even if its service is interrupted for some time (e.g., for preemption by a job from another buffer). When service of a partially completed job is resumed, it is resumed from the point of preemption—that is, the job needs only the remaining service time from the server to get completed (preemptiveresume policy). Also, an activity must complete service of any job that it started before starting another job from the same buffer. An activity always selects the oldest job in the buffer that has not yet been served, when starting a new service [i.e., First In First Out (FIFO) within class]. There are activities [at most one activity for a serverbuffer pair , so that ]. Here the integers are strictly positive. Figure 1 gives a schematic for such a model.
Let , and . The correspondence between the activities and buffers, and activities and servers are described by two matrices and respectively. is an matrix with if the th activity processes jobs from buffer , and otherwise. The matrix is with if the th server is associated with the th activity, and otherwise. Each activity associates one buffer and one server, and so each column of has exactly one 1 (and similarly, every column of has exactly one 1). We will further assume that each row of (and ) has at least one 1, that is, each buffer is processed by (server is processing, resp.) at least one activity. For , let , if activity corresponds to the th server processing class jobs. Let, for , and . Thus, for the th server, denotes the set of activities that the server can perform, and represents the corresponding buffers from which the jobs can be processed.
Stochastic primitives
We are interested in the study of networks that are nearly critically loaded. Mathematically, this is modeled by considering a sequence of networks that “approach heavy traffic,” as , in the sense of Definition 2 below. Each network in the sequence has identical structure, except for the rate parameters that may depend on . Here , where is a countable set: with and , as . One thinks of the physical network of interest as the th network embedded in this sequence, for a fixed large value of . For notational simplicity, throughout the paper, we will write the limit along the sequence as simply as “.” Also, will always be taken to be an element of and, thus, hereafter the qualifier will not be stated explicitly.
The th network is described as follows. If the th class () has exogenous job arrivals, the interarrival times of such jobs are given by a sequence of nonnegative random variables that are i.i.d with mean and standard deviation respectively. Let, by relabeling if needed, the buffers with exogenous arrivals correspond to , where . We set and , for . Service times for the th type of activity (for ) are given by a sequence of nonnegative random variables that are i.i.d. with mean and standard deviation respectively. We will assume that the above random variables are in fact strictly positive, that is,
(1) 
We will further impose the following uniform integrability condition:

(2) 
Rerouting of jobs completed by the th activity is specified by a sequence of dimensional vector , where . For each and , if the th completed job by activity gets rerouted to buffer , and takes the value zero otherwise, where represents jobs leaving the system. It is assumed that for each fixed , , , are (mutually) independent sequences of i.i.d , where . That, in particular, means, for , . Furthermore, for fixed ,
(3) 
where is if and otherwise. We also assume that, for each , the random variables

(4) 
Next we introduce the primitive renewal processes, , that describe the state dynamics. The process is the dimensional exogenous arrival process, that is, for each , is a renewal process which denotes the number of jobs that have arrived to buffer from outside the system over the interval . For class to which there are no exogenous arrivals (i.e., ), we set for all . We will denote the process by . For each activity , denotes the number of complete jobs that could be processed by activity in if the associated server worked continuously and exclusively on jobs from the associated buffer in and the buffer had an infinite reservoir of jobs. The vector is denoted by . More precisely, for , let
(5) 
We set . Then , are renewal processes given as follows. For ,
(6) 
Finally, we introduce the routing sequences. Let denote the number of jobs that are routed to the th buffer, among the first jobs completed by activity . Thus, for ,
(7) 
We will denote the dimensional sequence corresponding to routing of jobs completed by the th activity by . Also, will denote the matrix .
Control
A Scheduling policy or control for the th SPN is specified by a nonnegative, nondecreasing dimensional process . For any , represents the cumulative amount of time spent on the th activity up to time . For a control to be admissible, it must satisfy additional properties which are specified below in Definition 2.
State processes
For a given scheduling policy , the state processes of the network are the associated dimensional queue length process and the dimensional idle time process . For each , , represents the queuelength at the th buffer at time (including the jobs that are in service at that time), and for , is the total amount of time the th server has idled up to time . Let be the dimensional vector of queuelengths at time . Note that, for , is the total number of services completed by the th activity up to time . The total number of completed jobs (by activity ) up to time that get rerouted to buffer equals . Recalling the definition of matrices and , the state of the system at time can be described by the following equations:
(8)  
(9) 
Heavy traffic
We now describe the main heavy traffic assumptionharri1 (), harricanon (). We begin with a condition on the convergence of various parameters in the sequence of networks .
There are , , such that , if and only if , and, as ,
(10)  
The definition of heavy traffic, for the sequence , as introduced in harri1 () (also see WillBram2work (), bramwill1 (), harricanon ()), is as follows.
[[Heavy traffic]] Define matrices , such that , for , and
(11) 
We say that the sequence approaches heavy traffic as if, in addition to Assumption 2, the following two conditions hold: {longlist}[(ii)]
There is a unique optimal solution to the following linear program (LP):
(12) 
The pair satisfies
(13) 
The sequence of networks approaches heavy traffic as .
From Assumption 2, given in (i) of Definition 2 is the unique dimensional nonnegative vector satisfying
(14) 
Following harri1 (), assume without loss of generality (by relabeling activities, if necessary), that the first components of are strictly positive (corresponding activities are referred to as basic) and the rest are zero (nonbasic activities). For later use, we partition the following matrices and vectors in terms of basic and nonbasic components:
(15) 
where is some control policy, is a dimensional vector of zeros, are , , and matrices, respectively. The following assumption (see WillBram2work ()) says that for each buffer there is an associated basic activity.
For every , there is a such that and .
Other processes
Components of the vector defined above can be interpreted as the nominal allocation rates for the activities. Given a control policy , define the deviation process as the difference between and the nominal allocation:
(16) 
It follows from (9) and (14) that the idletime process has the following representation:
Let . Next we define a matrix and dimensional process as follows:
(17) 
where denotes a identity matrix. Note that, with as in (15),
(18) 
Finally, we introduce the workload process which is defined as a certain linear transformation of the queuelength process and is of dimension no greater than of the latter. More precisely, is an dimensional process (, see WillBram2work ()) defined as
(19) 
where is a dimensional matrix with rank and nonnegative entries, called the workload matrix. We will not give a complete description of since that requires additional notation; and we refer the reader to WillBram2work (), harricanon () for details. The key fact that will be used in our analysis is that there is a matrix with nonnegative entries (see (3.11) and (3.12) in harricanon ()) such that
(20) 
We will impose the following additional assumption on which says that each of its columns has at least one strictly positive entry. The assumption is needed in the proof of Lemma 3.7 [see (3.1)]. {assu} There exists a such that for every , .
Rescaled processes
We now introduce two types of scalings. The first is the socalled fluid scaling, corresponding to a law of large numbers, and the second is the standard diffusion scaling, corresponding to a central limit theorem.
Fluid Scaled Process: This is obtained from the original process by accelerating time by a factor of and scaling down space by the same factor. The following fluid scaled processes will play a role in our analysis. For ,
(21)  
Here for , denotes its integer part, that is, the greatest integer bounded by .
Diffusion Scaled Process: This is obtained from the original process by accelerating time by a factor of and, after appropriate centering, scaling down space by . Some diffusion scaled processes that will be used are as follows. For ,
(22)  
The processes are not centered, as one finds (see Lemma 3.3 of BudGho2 ()) that, with any reasonable control policy, their fluid scaled versions converge to zero as . Define for ,
Recall and from Assumption 2. Using (8), (9), (14) and (17), one has the following relationships between the various scaled quantities defined above. For all ,
where
(24) 
Also, using (19), (20) and (24), for all ,
(25) 
Admissibility of control policies
The definition of admissible policies (Definition 2), given below, incorporates appropriate nonanticipativity requirements and ensures feasibility by requiring that the associated queuelength and idletime processes () are nonnegative.
For we define the multiparameter filtration generated by interarrival and service times and routing variables as
(26)  
Then is a multiparameter filtration with the following (partial) ordering:
We refer the reader to Section 2.8 of Kurtzredbook () for basic definitions and properties of multiparameter filtrations, stopping times and martingales. Let
(27) 
For all , we define where denotes the vector of 1’s. It will be convenient to allow for extra randomness, than that captured by , in formulating the class of admissible policies. Let be a field independent of . For , let .
For a fixed and , a scheduling policy is called admissible for with initial condition if for some independent of , the following conditions hold: {longlist}[(iii)]
is nondecreasing, nonnegative and satisfies for .
defined by (9) is nondecreasing, nonnegative and satisfies for .
defined in (8) is nonnegative for .
Define for each ,
Then, for each ,
(29) 
Define the filtration as
Then
(31) 
Denote by the collection of all admissible policies for with initial condition .
(i) and (ii) in Definition 2 imply, in view of (9) and properties of the matrix , that
(32) 
In particular, is a process with Lipschitz continuous paths. Condition (iv) in Definition 2 can be interpreted as a nonanticipativity condition. Proposition 2.8 and Theorem 5.4 of BudGho2 () give general sufficient conditions under which this property holds (see also Proposition 4.1 of the current work).
Cost function
For the network , we consider an expected infinite horizon discounted (linear) holding cost associated with a scheduling policy and initial queue length vector :
(33) 
Here, is the “discount factor” and , an dimensional vector with each component , is the vector of “holding costs” for the buffers. In the second term, is an dimensional vector. The first block of corresponds to the idleness process , and, thus, the second term in the cost, in particular, captures the idleness cost. The last components of correspond to the time spent on nonbasic activities. Thus, this formulation of the cost allows, in addition to the idleness cost, the user to put a penalty for using nonbasic activities.
The formulation of the cost function considered in our work goes back to the original work of Harrison et al. harri1 (), harri2 ().
The scheduling control problem for is to find an admissible control policy that minimizes the cost . The value function for this control problem is defined as
(34) 
Brownian control problem
The goal of this work is to characterize the limit of value functions as , as the value function of a suitable diffusion control problem. In order to see the form of the diffusion control problem, we will like to send in (24). Using the functional central limit theorem for renewal processes, it is easily seen that, for all reasonable control policies (see again Lemma 3.3 of BudGho2 ()), when converges to some , defined in (24) converges weakly to
(35) 
where
(36) 
Here is the identity map and is a Brownian motion with drift 0 and covariance matrix
(37) 
where is a diagonal matrix with diagonal entries , is a diagonal matrix with diagonal entries and s are matrices with entries [see (3)]. Although the process in (24), for a general policy sequence , need not converge, upon formally taking limit as , one is led to the following diffusion control problem.
[[Brownian Control Problem (BCP)]] A dimensional adapted process , defined on some filtered probability space which supports an dimensional Brownian motion with drift 0 and covariance matrix given by (37), is called an admissible control for the Brownian control problem with the initial condition iff the following two properties hold a.s.:
(38)  
(39) 
where and are as in (35) and (36) respectively. We refer to as a system. We denote the class of all such admissible controls by . The Brownian control problem is to
(40) 
over all admissible controls . Define the value function
(41) 
Recall our standing assumptions (1), (2), (4), Assumptions 2, 2, 2 and 2. The following is the main result of BudGho2 ().
Theorem 2.1 ((Budhiraja and Ghosh BudGho2 (), Theorem 3.1, Corollary 3.2))
Fix and for , such that as . Then
The proof in BudGho2 () is presented for the case where in the definition of [see (34)], is replaced by the smaller family which consists of all that satisfy (iv) of Definition 2 with replaced by . Proof for the slightly more general setting considered in the current paper requires only minor modifications and, thus, we omit the details.
For the main result of this work, we will need additional assumptions.
The matrix is positive definite. We will make the following assumption on the probabilities of deviations from the mean for the underlying renewal processes. Similar conditions have been used in previous works on construction of asymptotically optimal control policies bellwill (), bellwill2 (), BudGho (), atakumar (), dailin ().
There exists and, for each , some such that, for