An ergodic control problem for many-server multiclass queueing systems with cross-trained servers
A queueing network is considered with independent customer classes and server pools in Halfin-Whitt regime. Class customers has priority for service in pool for , and may access some other pool if the pool has an idle server and all the servers in pool are busy. We formulate an ergodic control problem where the running cost is given by a non-negative convex function with polynomial growth. We show that the limiting controlled diffusion is modeled by an action space which depends on the state variable. We provide a complete analysis for the limiting ergodic control problem and establish asymptotic convergence of the value functions for the queueing model.
Key words and phrases:Halfin–Whitt, multiclass Markovian queues, heavy-traffic, cross-training, scheduling control, Hamilton-Jacobi-Bellman equation, asymptotic optimality.
2000 Mathematics Subject Classification:93E20, 60H30, 35J60
In this article we consider a queueing system consisting of independent customer classes and server pools (or stations). Each server pool contains identical servers. Customers of class have priority for service in pool and this priority is of preemptive type. Therefore a newly arrived job of class at time would preempt the service of a class job, , if there is a class job receiving service from pool at time . A customer from class may access service from pool if and only if there is an empty server in the pool and all the servers in the pool are busy. Therefore service stations are cross-trained to serve nonpriority customers when its own priority customer class is underloaded. Customers are also allowed to renege the system from the queue. The arrival of customers are given by -independent Poisson processes with parameter . By we denote the service rate of class customers at station . We shall use instead of for . The network is assumed to work under Halfin-Whitt setting in the sense that
Therefore under (1.1) each customer class is in criticality with respect to pool , , i.e., for some constant where is the mean offered load of class to the pool . Note that the above criticality condition is different from those generally used in multiclass multi-pool setting . This condition could be seen as a generalization to [29, Assumption 4.12(3)] to the many-server setting. To elaborate we recall the generalized processor sharing (GPS) network from . In a multiclass GPS network with customer classes and single server the server would use a fraction , is a given probability vector, of the total processing capacity to serve class- jobs when all the job classes are available in the system, otherwise (that is when positive number of classes are empty) any excess processing capacity, normally reserved for the job classes that are empty, is redistributed among the nonempty classes in proportion to the weight vector . In this case the conventional heavy traffic condition implies that exists for all , see [29, Assumption 4.12(3)]. Therefore (1.1) can be seen as a many-server analogue to the conventional heavy-traffic condition of GPS network.
Control is given by a matrix value process where denotes the number of class- customers getting served at station . We note that for our model a control need not be work-conserving. The running cost is given by where is a non-negative, convex function of polynomial growth and denotes the diffusion-scaled queue length vector i.e., where is the queue length vector. We are interested in the cost function
The value function is defined to be the infimum of the above cost where the infimum is taken over all admissible controls. One of the main results of this paper analyze asymptotic behavior of as . In Theorem 2.1 we show that tends to the optimal value of an ergodic control problem governed by certain class of controlled diffusions. We also study the limiting ergodic control problem and establish the existence-uniqueness results of the value function in Theorem 3.1. It is worth mentioning that results like Theorem 2.1 and 3.1 continue to hold if one considers other types of convex running cost functions which might depend on (see Remark 5.1). Let us denote by () when class- customers can (resp., can not) access service from station . In this article we have concentrated on the situation where , for all , but it is not a necessary condition for our method to work. As noted in Remark 5.2, if we impose for some in the above queueing model, our results continue to hold without any major change in the arguments.
Literature review: Scheduling control problems have a rich and profound history in queueing literature. The main goal of these problems is to schedule the customers/jobs in a network in an optimal way. But it is not always possible to find a simple policy that is optimal. It is well known that for various queueing networks with finitely many servers policy is optimal [8, 13, 27, 32]. scheduling rule prioritize the service of the job classes according to the index (higher index gets priority for receiving service) where denotes the holding cost for class- and denotes the mean service time of class- jobs. In case of many servers, it is shown in [9, 10] that a similar priority policy, known as -policy, that prioritize the jobs according to the index , being the abandonment rate of class-, is optimal when the queueing system asymptotic is considered under fluid scaling and the cost is given by the long run average of a linear running cost. But existence of such simple optimal priority policies are rare in Halfin-Whitt setting. In general, by Halfin-Whitt setting we mean the number of servers and the total offered load scale like for some constant . See (2.1) below for exact formulation for our model. There are several papers devoted to the study of control problems in Halfin-Whitt regime. [11, 21] studied a control problem with discounted pay-off for multiclass single pool queueing network and asymptotics of the value functions are obtained in . Later these works are generalized to multi-pool case in .  considered a simplified multiclass multi-pool control problem with a discounted cost criterion where the service rates either depend on the class or the pool but not the both, and established asymptotic optimality for the scheduling policies. Under some assumptions on the holding cost, a static priority policy is shown to be optimal in [17, 18] in a multiclass multi-pool queueing network where the cost function is of finite horizon type.  studied queue-and-idleness-ratio controls, and their associated properties and staffing implications for multiclass multi-pool networks. In  the authors considered an ergodic control problem for multiclass many-server queueing network and established convergence of the value functions. Some other works that have considered ergodic control problems for queueing networks are as follows:  considers an ergodic control problem in the conventional heavy-traffic regime and establishes asymptotic optimality,  studies admission control problem with an ergodic cost criterion for a single class queueing network. For an inverted ’V’ model it is shown in  that the fastest-server-first policy is asymptotically optimal for minimizing the steady-state expected queue length and waiting time.  considered a blind fair routing policy for multiclass multi-pool networks and used simulations to validate the performance of the blind fair routing policies comparing them with non-blind policies derived from the limiting diffusion control problem. Recently, ergodic control of multiclass multi-pool queueing networks is considered in  where the authors have addressed existence and uniqueness of solutions of the HJB (Hamilton-Jacobi-Bellman) equation associated to the limiting diffusion control problem. Asymptotic optimality results for the N-netwrok queueing model are obtained in . Most of these above works [11, 21, 7, 12, 2] on many server networks consider work-conserving policies as their admissible controls. It is necessary to point out some key differences of the present queueing network with the one considered in [7, 4]. First of all the Halfin-Whitt condition above (see (1.1)) is different from [7, p. 2614] and therefore, the diffusion scalings of the customer count processes are also different. Moreover, the collection of admissible controls in  includes a wider class of scheduling policies which are jointly work-conserving and need not follow any class-to-pool priority, whereas for the queueing model under consideration every admissible control must obey the class-to-pool priority constrain. The particular nature of our network allow us to consider an optimal ergodic control problem with a general running cost function and to obtain asymptotic optimality (Theorem 2.1) under standard assumptions on the service rates.
Motivation and comparison: The above model is realistic in many scenario. For instance, in a call center different service stations are designed to serve certain type of customers and they may choose to help other type of customers when one or many servers are idle in the station. It is known that cross-training of customer sales representatives in inbound call center reduces the average number of customers in queue. We refer to  and the references therein for a discussion on labor cross-training and its effect on the performance on queueing networks. Since it is expensive to train every sales representative in all skills it becomes important to understand the optimal cross-training structure of the agents which reduces the average number of customers in queue.  uses average shortest path length as a metric to predict the more effective cross-training structures in terms of customer waiting times. Our model is a variant of these networks. As mentioned before, we may have for some in our queueing network which should be interpreted as the inability of station to serve class jobs. It is also reasonable to have class-to-pool priority when the agents in pool are primarily trained to serve jobs of class and might not be efficient in serving class . It should be noted that for our queueing model we have fixed a cross-training structure and we are trying to investigate the optimal dynamic scheduling policy that will optimize the pay-off function.
Our model also bears resemblance with Generalized Processor Sharing (GPS) models from [29, 30]. In fact, our model can be scaled to a single pool case where each class of customers have priority in accessing a fixed fraction of the total number of servers and they may get access to other servers, fixed for other customer classes, if the queues of other customer classes are empty. It should be observed that the multi-pool version is more general than the single pool version. For instance, it is not natural to have , for , in the setting of a single pool with homogeneous servers, but this is not the case for a multi-pool model. Therefore we stick to the multi-pool network model. A legitimate question for these processor sharing type model is that whether the GPS type policy is optimal or not for the pay-off function considered above. Motivated by this question a similar control problem is considered in  for a queueing model with finitely many servers and it is shown that the value function associated to the limiting controlled diffusion model solves a non-linear Neumann boundary value problem. The solution is obtained in the viscosity sense and therefore, it is hard to extract any information about the optimal control, even for the diffusion control problem. The present work is also motivated by a similar question but for the many-server queueing network. One motivation of the present work is to get some insight about the optimal control. The motivating question here is if we allow the non-priority classes of pool to access the servers of pool in some fixed proportion when pool has some free servers then such scheduling policy would be optimal or not. In the present work we characterize the value functions of the queueing model by its limit and construct a sequence of admissible policies that are asymptotically optimal. Though theoretically we can find a minimizer for the limiting HJB, it is not easy to compute it explicitly or numerically. One of the future directions is to compute the minimizer and compare its performance with the GPS type scheduling.
The methodology of this problem is not immediate from any existing work. In general, the main idea is to convert such problems to a controlled diffusion problem and analyze the corresponding Hamiltion-Jacobi-Bellman(HJB) equation to extract information about the minimizing policies. All the exiting works [11, 21, 7, 2] use work-conservative properties of the controls to come up with an action space that does not depend on the state variable. But as we mentioned above that our policies need not be work-conserving. Also there is an obvious action space that one could associate to our model (see (2.9)). Unfortunately, this action space depends on the state variable. In general, such action spaces are not very favorable for mathematical analysis. Existence of measurable selectors and regularity of Hamiltonian do not become obvious due to the dependency of action space on the state variable. Interestingly, for our model we could show that the structure of drift and convexity of the running cost play in favour of our analysis and we can work with such state dependent action spaces. In particular, we obtain uniform stability (Lemma 4.2) and also show that the Hamiltonian is locally Hölder continuous (Lemma 4.3, Lemma 4.6). Since our action space depends on state variable we need to verify that the Filipov’s criterion holds [1, Chapter 18] and then by using Filipov’s implicit function theorem we establish existence of a measurable minimizer for the Hamiltonian. This is done in Theorem 3.1. But such a minimizer need not be continuous, in general, and one often requires a continuous minimizing selector to construct -optimal policies for the value functions (see [11, 2]). With this in mind, we consider a perturbed problem where we perturb the cost by a strictly convex function and show that the perturbed hamiltonian has a unique continuous selector (Lemma 4.1). In Theorem 3.1 below we show that this continuous minimizing selector is optimal for the perturbed ergodic control problem and can be used to construct -optimal policies (Theorem 5.2). To summarize our contribution in this paper, we have
considered an ergodic control problem for the queueing network with labor cross-training and identified the limit of the value functions,
solved the limiting HJB and established asymptotic optimality.
Notations: By we denote the -dimensional Euclidean space equipped with the Euclidean norm . We denote by the set of all real matrices and we endow this space with the usual metric. For , we denote the maximum (minimum) of and as (, respectively). We define and . denotes the largest integer that is small or equal to . Given a topological space and , the interior, closure, complement and boundary of in are denoted by and , respectively. is used to denote the characteristic function of the set . By we denote the Borel -field of . Let be the set of all continuous functions from to . Given a path , we denote by , jump of at time , i.e., . We define as the set of all real valued times continuously differentiable functions on . For , denotes the set of all real valued -times continuously differentiable function on with its -th derivative being locally -Hölder continuous on . For any any domain , denotes the set of all -times weakly differentiable functions that is in and all its weak derivatives up to order are also in . By we denote the collection of function that are -times weakly differentiable and all its derivatives up to order are in . denotes the set of all real valued continuous functions that have at most polynomial growth i.e.,
For a measurable and measure we denote . Let denote the space of function such that
By we denote the subspace of containing function satisfying
Infimum over empty set is regarded as . are deterministic positive constants whose value might change from line to line.
The organization of the paper is as follows. The next section introduces the setting of our model and state our main result on the convergence of the value functions. In Section 3 we formulate the limiting controlled diffusion and state our results on the ergodic control problem with state dependent action space. Section 4 obtains various results for the controlled diffusion and its HJB which are used to prove Theorem 3.1 from Section 3. Finally, in Section 5 we obtain asymptotic lower and upper bounds for the value functions.
2. Setting and main result
Let be a given complete probability space and all the stochastic variables introduced below are defined on it. The expectation w.r.t. is denoted by . We consider a multiclass Markovian many-server queueing system which consists of customer classes and server pools. Each server pool is assumed to contain identical servers (see Figure 1).
The system buffers are assumed to have infinite capacity. Customers of class arrive according to a Poisson process with rate . Upon arrival, customers enter the queue of their respective classes if not being processed. Customers of each class are served in the first-come-first-serve (FCFS) service discipline. Customers can abandon the system while waiting in the queue. Patience time of the customers are assumed to be exponentially distributed and class dependent. Customers of class renege at rate . We also assume that no customer renege while in service. Customers of class have highest priority in accessing service from station . A customer of class is allowed to access service from station if and only if the -th queue is empty and all the servers in the -th pool are occupied by class- customers. By we denote the service rate of class at station . We denote by for . We assume that customer arrivals, service and abandonment of all classes are mutually independent.
The Halfin-Whitt Regime. We study this queueing model in the Halfin-Whitt regime (or the Quality-and-Efficiency-Driven (QED) regime). We consider a sequence of systems indexed by where the arrival rates and grows to infinity at certain rates. Let be the mean offered load of class customers. In the Halfin-Whitt regime, the parameters are assumed to satisfy the following: as ,
We note that could also be for some . could be understood as a situation where servers at station are very inefficient in serving class- customers.
State Descriptors. Let be the total number of class customers in the system and be the number of class customers in the queue. By we denote the number of class customers at the station . As earlier we denote by for . The following basic relationships hold for these processes: for each , and ,
Let be a collection of independent rate- Poisson processes. Define
Then the dynamics takes the form
Scheduling Control. We will consider policies that are non-anticipative. We also allow preemption. Under these policies every customer class and its associated station must follow a work-conserving constrain in the following sense: for all ,
Therefore, when a server from station -becomes free and there are no customers of class- waiting in the queue, the server may process a customer of class . Also a customer of class does not receive service from a server at the station if there is an empty server at station . Service preemption is allowed, i.e., service of a customer class can be interrupted at any time to serve some other class of customers and the original service is resumed at a later time, possibly by a server at some other station. It should be noted that a policy need not be work-conserving. For instance, it could happen that under some policy there are empty servers at station but there could be queue of class .
Define the -fields as follows
and is the collection of all -null sets. The filtration represents the information available up to time while contains the information about future increments of the processes.
is adapted to ,
is independent of at each time ,
for each , and , the process agrees in law with , and the process agrees in law with .
By criterion (iii) above the increments of the processes have same distribution as the original processes in addition to being independent of (by (ii) above). We denote the set of all admissible control policies by .
2.1. Control problem formulation
Define the diffusion-scaled processes
where is defined as
are square integrable martingales w.r.t. the filtration .
Note that by (2.1)
By we denote the set of real matrices with non-negative entries. Define
For any , we define
where we fix if . We also set , for all , and . Therefore using (2.10) we obtain,
Thus for all and is adapted. Also represents the fraction of the number of servers at station that are serving class- customers. As we show later, it is convenient to view as the control.
2.1.1. The cost minimization problem
We next introduce the running cost function for the control problem. Let be a given function satisfying
and some positive constant . We also assume that is convex and therefore, locally Lipschitz. For example, if we let be the holding cost rate for class customers, then some of the typical running cost functions are the following:
These running cost functions evidently satisfy the condition in (2.12).
Given the initial state and an admissible scheduling policy , we define the diffusion-scaled cost function as
where the running cost function satisfies (2.12). Then, the associated cost minimization problem is defined by
We refer to as the diffusion-scaled value function given the initial state for the system.
From (2.4) we see that for , and ,
Therefore redefining as
we can rewrite the control problem as
and the infimum is taken over all admissible pairs satisfying (2.11). Hence , almost surely, for all .
For simplicity we assume that the initial condition is deterministic and , as , for some .
2.1.2. The limiting controlled diffusion process
As in [11, 21, 2], the analysis will be done by studying the limiting controlled diffusions. One formally deduces that, provided , there exists a limit for on every finite time interval, and the limit process is a -dimensional diffusion process, that is,
with initial condition . In (2.17) the drift takes the form
The control lives in and is non-anticipative, is a -dimensional standard Wiener process independent of the initial condition , and the covariance matrix is given by
Thus from (2.18) we get that
2.1.3. The ergodic control problem for controlled diffusion
Define , by
Here denotes set of all admissible controls which are defined in Section 3. We consider the ergodic control problem
We call the optimal value at the initial state for the controlled diffusion process . It is shown later that is independent of . A detailed treatment and related results corresponding to the ergodic control problem are given in Section 3.
Next we state the main result of this section, the proof of which can be found in Section 5.
Theorem 2.1 is similar to Theorems 2.1 and 2.2 in . The central idea of the proof of Theorem 2.1 is same as that of . One of the main advantage of the present setting is the stability of the system. We could directly show that for all large the mean-empirical measures corresponding to -th system has all polynomial moments finite under every admissible policy (see Lemma 5.1). This is not the case in  where a spatial truncation method is used to treat such difficulty. We must also note that the action space in our setting depends on the location whereas in  the action space is a fixed compact set. Therefore we need to adopt suitable modification for this problem. As shown below the convexity property of the cost and the structure of drift play a key role in our analysis.
3. An Ergodic Control Problem for Controlled Diffusions
3.1. The controlled diffusion model
The dynamics are modeled by controlled diffusion processes taking values in , and governed by the Itô stochastic differential equation
where is given by (2.20) and
All random processes in (3.1) live in a complete probability space . The process is a -dimensional standard Brownian motion independent of the initial condition .
A process , taking values in and is jointly measurable in , is said to be an admissible control if, there exists a strong solution satisfying (3.1), and,
is non-anticipative: for , is independent of
, almost surely, for .
We let denote the set of all admissible controls. Note that the drift is Lipschitz continuous and the diffusion matrix non-degenerate. Since for all we see that is in . Thus is non-empty. Let . We define the family of operators , with parameter , by
A control is said to be stationary Markov if for some measurable we have . Therefore for a stationary Markov control we have for all . By we denote the set of all stationary Markov controls.
Now we introduce relaxed controls which will be useful for our analysis. Association of relaxed controls to such control problems is useful since it extends the action space to a compact, convex set . In our setup we show that we can not do better even we extend the controls to include relaxed controls (see Theorem 3.1 and 4.1). Moreover, relaxed control would be useful to prove asymptotic lower bounds (Theorem 5.1). By we denote the set of all probability measures on . We can extend the drift and the running cost on as follows: for ,
Controls taking values in are known as relaxed controls. Similarly, we can extend the definition of stationary Markov controls to measure valued processes. A stationary Markov control is said to be admissible if there is a unique strong solution to (3.1) and