An ergodic control problem for manyserver multiclass queueing systems with crosstrained servers
Abstract.
A queueing network is considered with independent customer classes and server pools in HalfinWhitt regime. Class customers has priority for service in pool for , and may access some other pool if the pool has an idle server and all the servers in pool are busy. We formulate an ergodic control problem where the running cost is given by a nonnegative convex function with polynomial growth. We show that the limiting controlled diffusion is modeled by an action space which depends on the state variable. We provide a complete analysis for the limiting ergodic control problem and establish asymptotic convergence of the value functions for the queueing model.
Key words and phrases:
Halfin–Whitt, multiclass Markovian queues, heavytraffic, crosstraining, scheduling control, HamiltonJacobiBellman equation, asymptotic optimality.2000 Mathematics Subject Classification:
93E20, 60H30, 35J601. Introduction
In this article we consider a queueing system consisting of independent customer classes and server pools (or stations). Each server pool contains identical servers. Customers of class have priority for service in pool and this priority is of preemptive type. Therefore a newly arrived job of class at time would preempt the service of a class job, , if there is a class job receiving service from pool at time . A customer from class may access service from pool if and only if there is an empty server in the pool and all the servers in the pool are busy. Therefore service stations are crosstrained to serve nonpriority customers when its own priority customer class is underloaded. Customers are also allowed to renege the system from the queue. The arrival of customers are given by independent Poisson processes with parameter . By we denote the service rate of class customers at station . We shall use instead of for . The network is assumed to work under HalfinWhitt setting in the sense that
(1.1) 
Therefore under (1.1) each customer class is in criticality with respect to pool , , i.e., for some constant where is the mean offered load of class to the pool . Note that the above criticality condition is different from those generally used in multiclass multipool setting [7]. This condition could be seen as a generalization to [29, Assumption 4.12(3)] to the manyserver setting. To elaborate we recall the generalized processor sharing (GPS) network from [29]. In a multiclass GPS network with customer classes and single server the server would use a fraction , is a given probability vector, of the total processing capacity to serve class jobs when all the job classes are available in the system, otherwise (that is when positive number of classes are empty) any excess processing capacity, normally reserved for the job classes that are empty, is redistributed among the nonempty classes in proportion to the weight vector . In this case the conventional heavy traffic condition implies that exists for all , see [29, Assumption 4.12(3)]. Therefore (1.1) can be seen as a manyserver analogue to the conventional heavytraffic condition of GPS network.
Control is given by a matrix value process where denotes the number of class customers getting served at station . We note that for our model a control need not be workconserving. The running cost is given by where is a nonnegative, convex function of polynomial growth and denotes the diffusionscaled queue length vector i.e., where is the queue length vector. We are interested in the cost function
The value function is defined to be the infimum of the above cost where the infimum is taken over all admissible controls. One of the main results of this paper analyze asymptotic behavior of as . In Theorem 2.1 we show that tends to the optimal value of an ergodic control problem governed by certain class of controlled diffusions. We also study the limiting ergodic control problem and establish the existenceuniqueness results of the value function in Theorem 3.1. It is worth mentioning that results like Theorem 2.1 and 3.1 continue to hold if one considers other types of convex running cost functions which might depend on (see Remark 5.1). Let us denote by () when class customers can (resp., can not) access service from station . In this article we have concentrated on the situation where , for all , but it is not a necessary condition for our method to work. As noted in Remark 5.2, if we impose for some in the above queueing model, our results continue to hold without any major change in the arguments.
Literature review: Scheduling control problems have a rich and profound history in queueing literature. The main goal of these problems is to schedule the customers/jobs in a network in an optimal way. But it is not always possible to find a simple policy that is optimal. It is well known that for various queueing networks with finitely many servers policy is optimal [8, 13, 27, 32]. scheduling rule prioritize the service of the job classes according to the index (higher index gets priority for receiving service) where denotes the holding cost for class and denotes the mean service time of class jobs. In case of many servers, it is shown in [9, 10] that a similar priority policy, known as policy, that prioritize the jobs according to the index , being the abandonment rate of class, is optimal when the queueing system asymptotic is considered under fluid scaling and the cost is given by the long run average of a linear running cost. But existence of such simple optimal priority policies are rare in HalfinWhitt setting. In general, by HalfinWhitt setting we mean the number of servers and the total offered load scale like for some constant . See (2.1) below for exact formulation for our model. There are several papers devoted to the study of control problems in HalfinWhitt regime. [11, 21] studied a control problem with discounted payoff for multiclass single pool queueing network and asymptotics of the value functions are obtained in [11]. Later these works are generalized to multipool case in [7]. [12] considered a simplified multiclass multipool control problem with a discounted cost criterion where the service rates either depend on the class or the pool but not the both, and established asymptotic optimality for the scheduling policies. Under some assumptions on the holding cost, a static priority policy is shown to be optimal in [17, 18] in a multiclass multipool queueing network where the cost function is of finite horizon type. [20] studied queueandidlenessratio controls, and their associated properties and staffing implications for multiclass multipool networks. In [2] the authors considered an ergodic control problem for multiclass manyserver queueing network and established convergence of the value functions. Some other works that have considered ergodic control problems for queueing networks are as follows: [16] considers an ergodic control problem in the conventional heavytraffic regime and establishes asymptotic optimality, [25] studies admission control problem with an ergodic cost criterion for a single class queueing network. For an inverted ’V’ model it is shown in [6] that the fastestserverfirst policy is asymptotically optimal for minimizing the steadystate expected queue length and waiting time. [33] considered a blind fair routing policy for multiclass multipool networks and used simulations to validate the performance of the blind fair routing policies comparing them with nonblind policies derived from the limiting diffusion control problem. Recently, ergodic control of multiclass multipool queueing networks is considered in [4] where the authors have addressed existence and uniqueness of solutions of the HJB (HamiltonJacobiBellman) equation associated to the limiting diffusion control problem. Asymptotic optimality results for the Nnetwrok queueing model are obtained in [5]. Most of these above works [11, 21, 7, 12, 2] on many server networks consider workconserving policies as their admissible controls. It is necessary to point out some key differences of the present queueing network with the one considered in [7, 4]. First of all the HalfinWhitt condition above (see (1.1)) is different from [7, p. 2614] and therefore, the diffusion scalings of the customer count processes are also different. Moreover, the collection of admissible controls in [7] includes a wider class of scheduling policies which are jointly workconserving and need not follow any classtopool priority, whereas for the queueing model under consideration every admissible control must obey the classtopool priority constrain. The particular nature of our network allow us to consider an optimal ergodic control problem with a general running cost function and to obtain asymptotic optimality (Theorem 2.1) under standard assumptions on the service rates.
Motivation and comparison: The above model is realistic in many scenario. For instance, in a call center different service stations are designed to serve certain type of customers and they may choose to help other type of customers when one or many servers are idle in the station. It is known that crosstraining of customer sales representatives in inbound call center reduces the average number of customers in queue. We refer to [23] and the references therein for a discussion on labor crosstraining and its effect on the performance on queueing networks. Since it is expensive to train every sales representative in all skills it becomes important to understand the optimal crosstraining structure of the agents which reduces the average number of customers in queue. [23] uses average shortest path length as a metric to predict the more effective crosstraining structures in terms of customer waiting times. Our model is a variant of these networks. As mentioned before, we may have for some in our queueing network which should be interpreted as the inability of station to serve class jobs. It is also reasonable to have classtopool priority when the agents in pool are primarily trained to serve jobs of class and might not be efficient in serving class . It should be noted that for our queueing model we have fixed a crosstraining structure and we are trying to investigate the optimal dynamic scheduling policy that will optimize the payoff function.
Our model also bears resemblance with Generalized Processor Sharing (GPS) models from [29, 30]. In fact, our model can be scaled to a single pool case where each class of customers have priority in accessing a fixed fraction of the total number of servers and they may get access to other servers, fixed for other customer classes, if the queues of other customer classes are empty. It should be observed that the multipool version is more general than the single pool version. For instance, it is not natural to have , for , in the setting of a single pool with homogeneous servers, but this is not the case for a multipool model. Therefore we stick to the multipool network model. A legitimate question for these processor sharing type model is that whether the GPS type policy is optimal or not for the payoff function considered above. Motivated by this question a similar control problem is considered in [14] for a queueing model with finitely many servers and it is shown that the value function associated to the limiting controlled diffusion model solves a nonlinear Neumann boundary value problem. The solution is obtained in the viscosity sense and therefore, it is hard to extract any information about the optimal control, even for the diffusion control problem. The present work is also motivated by a similar question but for the manyserver queueing network. One motivation of the present work is to get some insight about the optimal control. The motivating question here is if we allow the nonpriority classes of pool to access the servers of pool in some fixed proportion when pool has some free servers then such scheduling policy would be optimal or not. In the present work we characterize the value functions of the queueing model by its limit and construct a sequence of admissible policies that are asymptotically optimal. Though theoretically we can find a minimizer for the limiting HJB, it is not easy to compute it explicitly or numerically. One of the future directions is to compute the minimizer and compare its performance with the GPS type scheduling.
The methodology of this problem is not immediate from any existing work. In general, the main idea is to convert such problems to a controlled diffusion problem and analyze the corresponding HamiltionJacobiBellman(HJB) equation to extract information about the minimizing policies. All the exiting works [11, 21, 7, 2] use workconservative properties of the controls to come up with an action space that does not depend on the state variable. But as we mentioned above that our policies need not be workconserving. Also there is an obvious action space that one could associate to our model (see (2.9)). Unfortunately, this action space depends on the state variable. In general, such action spaces are not very favorable for mathematical analysis. Existence of measurable selectors and regularity of Hamiltonian do not become obvious due to the dependency of action space on the state variable. Interestingly, for our model we could show that the structure of drift and convexity of the running cost play in favour of our analysis and we can work with such state dependent action spaces. In particular, we obtain uniform stability (Lemma 4.2) and also show that the Hamiltonian is locally Hölder continuous (Lemma 4.3, Lemma 4.6). Since our action space depends on state variable we need to verify that the Filipov’s criterion holds [1, Chapter 18] and then by using Filipov’s implicit function theorem we establish existence of a measurable minimizer for the Hamiltonian. This is done in Theorem 3.1. But such a minimizer need not be continuous, in general, and one often requires a continuous minimizing selector to construct optimal policies for the value functions (see [11, 2]). With this in mind, we consider a perturbed problem where we perturb the cost by a strictly convex function and show that the perturbed hamiltonian has a unique continuous selector (Lemma 4.1). In Theorem 3.1 below we show that this continuous minimizing selector is optimal for the perturbed ergodic control problem and can be used to construct optimal policies (Theorem 5.2). To summarize our contribution in this paper, we have

considered an ergodic control problem for the queueing network with labor crosstraining and identified the limit of the value functions,

solved the limiting HJB and established asymptotic optimality.
Notations: By we denote the dimensional Euclidean space equipped with the Euclidean norm . We denote by the set of all real matrices and we endow this space with the usual metric. For , we denote the maximum (minimum) of and as (, respectively). We define and . denotes the largest integer that is small or equal to . Given a topological space and , the interior, closure, complement and boundary of in are denoted by and , respectively. is used to denote the characteristic function of the set . By we denote the Borel field of . Let be the set of all continuous functions from to . Given a path , we denote by , jump of at time , i.e., . We define as the set of all real valued times continuously differentiable functions on . For , denotes the set of all real valued times continuously differentiable function on with its th derivative being locally Hölder continuous on . For any any domain , denotes the set of all times weakly differentiable functions that is in and all its weak derivatives up to order are also in . By we denote the collection of function that are times weakly differentiable and all its derivatives up to order are in . denotes the set of all real valued continuous functions that have at most polynomial growth i.e.,
For a measurable and measure we denote . Let denote the space of function such that
By we denote the subspace of containing function satisfying
Infimum over empty set is regarded as . are deterministic positive constants whose value might change from line to line.
The organization of the paper is as follows. The next section introduces the setting of our model and state our main result on the convergence of the value functions. In Section 3 we formulate the limiting controlled diffusion and state our results on the ergodic control problem with state dependent action space. Section 4 obtains various results for the controlled diffusion and its HJB which are used to prove Theorem 3.1 from Section 3. Finally, in Section 5 we obtain asymptotic lower and upper bounds for the value functions.
2. Setting and main result
Let be a given complete probability space and all the stochastic variables introduced below are defined on it. The expectation w.r.t. is denoted by . We consider a multiclass Markovian manyserver queueing system which consists of customer classes and server pools. Each server pool is assumed to contain identical servers (see Figure 1).
The system buffers are assumed to have infinite capacity. Customers of class arrive according to a Poisson process with rate . Upon arrival, customers enter the queue of their respective classes if not being processed. Customers of each class are served in the firstcomefirstserve (FCFS) service discipline. Customers can abandon the system while waiting in the queue. Patience time of the customers are assumed to be exponentially distributed and class dependent. Customers of class renege at rate . We also assume that no customer renege while in service. Customers of class have highest priority in accessing service from station . A customer of class is allowed to access service from station if and only if the th queue is empty and all the servers in the th pool are occupied by class customers. By we denote the service rate of class at station . We denote by for . We assume that customer arrivals, service and abandonment of all classes are mutually independent.
The HalfinWhitt Regime. We study this queueing model in the HalfinWhitt regime (or the QualityandEfficiencyDriven (QED) regime). We consider a sequence of systems indexed by where the arrival rates and grows to infinity at certain rates. Let be the mean offered load of class customers. In the HalfinWhitt regime, the parameters are assumed to satisfy the following: as ,
(2.1) 
We note that could also be for some . could be understood as a situation where servers at station are very inefficient in serving class customers.
State Descriptors. Let be the total number of class customers in the system and be the number of class customers in the queue. By we denote the number of class customers at the station . As earlier we denote by for . The following basic relationships hold for these processes: for each , and ,
(2.2) 
Let be a collection of independent rate Poisson processes. Define
Then the dynamics takes the form
(2.3) 
Scheduling Control. We will consider policies that are nonanticipative. We also allow preemption. Under these policies every customer class and its associated station must follow a workconserving constrain in the following sense: for all ,
(2.4) 
Combining (2.2) and (2.4) we see that
(2.5) 
Therefore, when a server from station becomes free and there are no customers of class waiting in the queue, the server may process a customer of class . Also a customer of class does not receive service from a server at the station if there is an empty server at station . Service preemption is allowed, i.e., service of a customer class can be interrupted at any time to serve some other class of customers and the original service is resumed at a later time, possibly by a server at some other station. It should be noted that a policy need not be workconserving. For instance, it could happen that under some policy there are empty servers at station but there could be queue of class .
Define the fields as follows
and  
where
and is the collection of all null sets. The filtration represents the information available up to time while contains the information about future increments of the processes.
We say that a control policy is admissible if it satisfies (2.4) (or, equivalently (2.5)) and,

is adapted to ,

is independent of at each time ,

for each , and , the process agrees in law with , and the process agrees in law with .
By criterion (iii) above the increments of the processes have same distribution as the original processes in addition to being independent of (by (ii) above). We denote the set of all admissible control policies by .
2.1. Control problem formulation
Define the diffusionscaled processes
by
(2.6) 
for . By (2.3) and (2.5), we can express as
(2.7)  
where is defined as
and
(2.8) 
are square integrable martingales w.r.t. the filtration .
Note that by (2.1)
By we denote the set of real matrices with nonnegative entries. Define
For any , we define
(2.9) 
It is easy to see that is a nonempty convex and compact subset of for all . Also , for all . From (2.2), (2.4) and (2.5) we have for ,
(2.10) 
Define
where we fix if . We also set , for all , and . Therefore using (2.10) we obtain,
(2.11) 
Thus for all and is adapted. Also represents the fraction of the number of servers at station that are serving class customers. As we show later, it is convenient to view as the control.
2.1.1. The cost minimization problem
We next introduce the running cost function for the control problem. Let be a given function satisfying
(2.12) 
and some positive constant . We also assume that is convex and therefore, locally Lipschitz. For example, if we let be the holding cost rate for class customers, then some of the typical running cost functions are the following:
These running cost functions evidently satisfy the condition in (2.12).
Given the initial state and an admissible scheduling policy , we define the diffusionscaled cost function as
(2.13) 
where the running cost function satisfies (2.12). Then, the associated cost minimization problem is defined by
(2.14) 
We refer to as the diffusionscaled value function given the initial state for the system.
From (2.4) we see that for , and ,
Therefore redefining as
(2.15) 
we can rewrite the control problem as
where
(2.16) 
and the infimum is taken over all admissible pairs satisfying (2.11). Hence , almost surely, for all .
For simplicity we assume that the initial condition is deterministic and , as , for some .
2.1.2. The limiting controlled diffusion process
As in [11, 21, 2], the analysis will be done by studying the limiting controlled diffusions. One formally deduces that, provided , there exists a limit for on every finite time interval, and the limit process is a dimensional diffusion process, that is,
(2.17) 
with initial condition . In (2.17) the drift takes the form
(2.18) 
with
The control lives in and is nonanticipative, is a dimensional standard Wiener process independent of the initial condition , and the covariance matrix is given by
A formal derivation of the drift in (2.18) can be obtained from (2.4) and (2.7) . We also need the control to satisfy for all . We define as follows,
(2.19) 
Thus from (2.18) we get that
(2.20) 
A detailed description of equation (2.17) and related results are given in Section 3.
2.1.3. The ergodic control problem for controlled diffusion
Define , by
We note that for the cost agrees with given by (2.15). In analogy with (2.16) we define the ergodic cost associated with the controlled diffusion process and the running cost function as
Here denotes set of all admissible controls which are defined in Section 3. We consider the ergodic control problem
(2.21) 
We call the optimal value at the initial state for the controlled diffusion process . It is shown later that is independent of . A detailed treatment and related results corresponding to the ergodic control problem are given in Section 3.
Next we state the main result of this section, the proof of which can be found in Section 5.
Theorem 2.1.
Theorem 2.1 is similar to Theorems 2.1 and 2.2 in [2]. The central idea of the proof of Theorem 2.1 is same as that of [2]. One of the main advantage of the present setting is the stability of the system. We could directly show that for all large the meanempirical measures corresponding to th system has all polynomial moments finite under every admissible policy (see Lemma 5.1). This is not the case in [2] where a spatial truncation method is used to treat such difficulty. We must also note that the action space in our setting depends on the location whereas in [2] the action space is a fixed compact set. Therefore we need to adopt suitable modification for this problem. As shown below the convexity property of the cost and the structure of drift play a key role in our analysis.
3. An Ergodic Control Problem for Controlled Diffusions
3.1. The controlled diffusion model
The dynamics are modeled by controlled diffusion processes taking values in , and governed by the Itô stochastic differential equation
(3.1) 
where is given by (2.20) and
All random processes in (3.1) live in a complete probability space . The process is a dimensional standard Brownian motion independent of the initial condition .
Definition 3.1.
A process , taking values in and is jointly measurable in , is said to be an admissible control if, there exists a strong solution satisfying (3.1), and,

is nonanticipative: for , is independent of

, almost surely, for .
We let denote the set of all admissible controls. Note that the drift is Lipschitz continuous and the diffusion matrix nondegenerate. Since for all we see that is in . Thus is nonempty. Let . We define the family of operators , with parameter , by
(3.2) 
We refer to as the controlled extended generator of the diffusion (3.1). In (3.2) and elsewhere in this paper we have adopted the notation and .
A control is said to be stationary Markov if for some measurable we have . Therefore for a stationary Markov control we have for all . By we denote the set of all stationary Markov controls.
Now we introduce relaxed controls which will be useful for our analysis. Association of relaxed controls to such control problems is useful since it extends the action space to a compact, convex set [3]. In our setup we show that we can not do better even we extend the controls to include relaxed controls (see Theorem 3.1 and 4.1). Moreover, relaxed control would be useful to prove asymptotic lower bounds (Theorem 5.1). By we denote the set of all probability measures on . We can extend the drift and the running cost on as follows: for ,
Controls taking values in are known as relaxed controls. Similarly, we can extend the definition of stationary Markov controls to measure valued processes. A stationary Markov control is said to be admissible if there is a unique strong solution to (3.1) and