Infinite Horizon Average Optimality of the Nnetwork
Queueing Model in the Halfin–Whitt Regime
Abstract.
We study the infinite horizon optimal control problem for Nnetwork queueing systems, which consist of two customer classes and two server pools, under average (ergodic) criteria in the Halfin–Whitt regime. We consider three control objectives: 1) minimizing the queueing (and idleness) cost, 2) minimizing the queueing cost while imposing a constraint on idleness at each server pool, and 3) minimizing the queueing cost while requiring fairness on idleness. The running costs can be any nonnegative convex functions having at most polynomial growth.
For all three problems we establish asymptotic optimality, namely, the convergence of the value functions of the diffusionscaled state process to the corresponding values of the controlled diffusion limit. We also present a simple statedependent priority scheduling policy under which the diffusionscaled state process is geometrically ergodic in the Halfin–Whitt regime, and some results on convergence of mean empirical measures which facilitate the proofs.
Key words and phrases:
parallelserver network, Nnetwork, reneging/abandonment, Halfin–Whitt (QED) regime, diffusion scaling, long time average control, ergodic control, ergodic control with constraints, geometric ergodicity, stable Markov optimal control, asymptotic optimality2000 Mathematics Subject Classification:
60K25, 68M20, 90B22, 90B361. Introduction
Parallel server networks in the Halfin–Whitt regime have been very actively studied in recent years. Many important insights have been gained in their performance, design and control. One important question that has mostly remained open is optimal control under the longrun average expected cost (ergodic) criterion. Since it is prohibitive to exactly solve the discrete state Markov decision problem, the plausible approach is to solve the control problem for the limiting diffusion in the Halfin–Whitt regime and use this as an approximation. However, the results in the existing literature for ergodic control of diffusions (see a good review in Arapostathis et al. [2]) cannot be directly applied to the class of diffusion models arising from the parallel server networks in the Halfin–Whitt regime. Recently, Arapostathis et al. [3] and Arapostathis and Pang [1] have developed the basic tools needed to tackle this class of ergodic control problems.
Given an optimal solution to the control problem for the diffusion limit, the important task that remains is to show it gives rise to a scheduling policy for the network and establish that any sequence of such scheduling policies is asymptotically optimal in the Halfin–Whitt regime. Under the discounted cost criterion, this task has been accomplished in Atar et al. [8] for the multiclass Vmodel (or Vnetwork), which consists of multiple customer classes that are catered by servers in a single pool, and in Atar [7] for multiclass multipool networks with certain tree topologies. Under the ergodic criterion, the problem becomes much more difficult because it is intertwined with questions concerning the ergodicity of the diffusionscaled state process under the scheduling policies. This relates to various open questions on the stochastic stability of parallel server networks in the Halfin–Whitt regime.
Stability of the multiclass Vmodel in the Halfin–Whitt regime is well treated in Gamarnik and Stolyar [14]. Stolyar [23] has recently proved the tightness of the stationary distributions of the diffusionscaled state process for the socalled Nnetwork (or Nmodel), depicted in Figure 1, with no abandonment under a static priority policy. For the Vnetwork, Arapostathis et al. [3] have shown that a sequence of scheduling policies constructed from the optimal solution to the diffusion control problem under the ergodic criterion is asymptotically optimal. In this construction, the state space is divided into a compact subset with radius in the order of the square root of the number of servers around the steady state, and its complement. An approximation to the optimal control for the diffusion is used inside this set, and a static priority policy is employed in its complement. It follows from the results of [3] that under this sequence of scheduling policies the state process is geometrically ergodic. The proof of asymptotic optimality takes advantage of the fact that, under the static priority scheduling policy, the state process of the Vmodel in the Halfin–Whitt regime is geometrically ergodic. In fact, such a static priority policy for the Vmodel also corresponds to a constant Markov control, under which the limiting diffusion is geometrically ergodic.
However, for multiclass multipool networks, although the optimal control problem for the limiting diffusion has been thoroughly solved in Arapostathis and Pang [1], the lack of sufficient understanding of the stochastic stability properties of the diffusionscaled state process has been the critical obstacle to establishing asymptotic optimality. It is worth noting that this difficulty is related to the socalled “joint work conservation” (JWC) condition which plays a key role in the study of multiclass multipool networks as shown in Atar [6, 7]. Although the JWC condition holds for the limiting diffusions over the entire state space, it generally holds only in a bounded subset of the state space for the diffusionscaled process, whose radius is in the order of the number of servers around the steady state. Thus, an optimal control derived from the limiting diffusion does not translate well to a scheduling policy which is compatible with the controlled dynamics of the network on the entire state space. At the same time, although as shown in [1] there exists a constant Markov control under which the limiting diffusion of multiclass multipool networks is geometrically ergodic, it is unclear if this is also the case for the diffusionscaled state processes under the corresponding static priority scheduling policy. Therefore, the limiting diffusion does not offer much help in the synthesis of a suitable scheduling policy on the part of the state space where the JWC condition does not hold, and as a result constructing stable policies for multiclass multipool networks is quite a challenge.
In this paper, we address these challenging problems for the Nnetwork. We study three ergodic control problems: (P1) minimizing the queueing (and idleness) cost, (P2) minimizing the queueing cost while imposing a constraint on the idleness of each server pool (e.g., the longrun average idleness cannot exceed a specified threshold), and (P3) minimizing the queueing cost while requiring fairness on idleness (e.g., the average idleness of the two server pools satisfies a fixed ratio condition). The running cost can be any nontrivial nonnegative convex functions having at most polynomial growth. Under its usual parameterization, the control specifies the number of customers from each class that are scheduled to each server pool, and we refer to it as a “scheduling” policy. However, the control can be also parameterized in a way so as to specify which class of customers should be scheduled to server pool if it has any available servers (“scheduling” control), and which of the server pools should class customers be routed to, if both pools have available servers (“routing” control). The optimal control problems for the limiting diffusion corresponding to (P1)–(P3) are wellposed and in the case of (P1)–(P2) the solutions can be fully characterized via HJB equations, following the methods in [1, 3]. The dynamic programming characterization for (P3) is more difficult. This is one of those rare examples in ergodic control where the running cost is not bounded below or above, and there is no blanket stability property. In this paper, we establish the existence of a solution to the HJB equation, and the usual characterization of optimality for this problem.
We first present a Markov scheduling policy, for the Nnetwork under which the diffusionscaled state processes are geometrically ergodic in the Halfin–Whitt regime (see Section 3.2). Unlike the Vmodel, this scheduling policy is a statedependent priority (SDP) policy, i.e., priorities change as the system state varies—yet it is simple to describe. This result is significant since it indicates that the ergodic control problems for the diffusionscaled processes in the Halfin–Whitt regime have finite values. Moreover, it can be used as a scheduling policy outside a bounded subset of the state space where the JWC property might fail to hold. On the other hand, it follows from the theory in Arapostathis and Pang [1] that the controlled diffusion limit is geometrically ergodic under some constant Markov control (see Theorem 4.2 in [1]). In this paper we show that a much stronger result applies for the Nnetwork (Lemma 4.1): as long as the scheduling control is a constant Markov control with pool prioritizing class over , the controlled diffusion limit is geometrically ergodic, uniformly over all routing controls (e.g., class customers prioritizing server pool over , or a statedependent priority policy, or even a nonstationary one).
The main results of the paper center around the proof of convergence of the value functions, which is accomplished by establishing matching lower and upper bounds (see Theorems 5.1–5.2). To prove the lower bound, the key is to show that as long as the longrun average firstorder moment of the diffusionscaled state process is finite, the associated mean empirical measures are tight and converge to an ergodic occupation measure corresponding to a stationary stable Markov control for the limiting diffusion (Lemma 7.1). In fact, we can show that for the Nnetwork, under any admissible (work conserving) scheduling policy, the longrun average () moment of the diffusionscaled state process is bounded by the longrun average moment of the diffusionscaled queue under that policy (Lemma 8.1). The lower bounds can then be deduced from these observations. It is worth noting that in order to establish asymptotic optimality for the fairness problem (P3), we must relax the equality in the constraint and show instead that the constraint is asymptotically feasible.
In order to establish the upper bound, a Markov scheduling policy is synthesized which is the concatenation of a Markov policy induced by the solution of the ergodic control problem for the diffusion limit, and which is applied on a bounded subset of the state space where the JWC condition holds, and the SDP policy, which is applied on the complement of this set.
The proof involves the following key components. First, we apply the spatial truncation approximation technique developed in Arapostathis et al. [3] and Arapostathis and Pang [1] for the ergodic control problem for the diffusion limit. This provides us with an optimal continuous precise control. Second, we show that under the concatenation of the Markov scheduling policy induced by this optimal control and the SDP policy, the diffusionscaled state processes are geometrically ergodic (Lemma 9.1). Then we prove that the mean empirical measures of the diffusionscaled process and control, converge to the ergodic occupation measure of the diffusion limit associated with the optimal precise control originally selected (Lemma 7.2). Uniform integrability implied by the geometric ergodicity takes care of the rest.
1.1. Literature review
In a certain way, the Nnetwork has been viewed as the benchmark of multiclass multipool networks, mainly because it is simple to describe, yet it has complicated enough dynamics. There are several important studies on stochastic control of parallel server networks, focusing on Nnetworks. Xu et al. [30] studied the Markovian singleserver Nnetwork and showed that a threshold scheduling policy is optimal under the expected discounted and longrun average linear holding cost, utilizing a Markov decision process approach. In the conventional (singleserver) heavytraffic regime, the Nnetwork with two single severs, was first studied in Harrison [19], under the assumption of Poisson arrivals and deterministic services, and a “discretereview” policy is shown to be asymptotically optimal under an infinite horizon discounted linear queueing cost. The Nmodel with renewal arrival processes and general service time distributions was then studied in Bell and Williams [10], as a Brownian control problem under an infinite horizon discounted linear queueing cost, and a threshold policy is shown to be asymptotically optimal. Ghamami and Ward [15] studied the Nnetwork with renewal arrival processes, general service time distributions and exponential patience times, and showed a twothreshold scheduling policy is asymptotically optimal via a Brownian control problem under an infinite horizon discounted linear queueing cost. Brownian control models for multiclass networks were pioneered in Harrison [18, 20] and have been extended to many interesting networks; see Williams [29] for an extensive review of that literature.
In the manyserver Halfin–Whitt regime, Atar [6, 7] pioneered the study of multiclass multipool networks with abandonment (of a certain tree topology) via the corresponding control problems for the diffusion limit under an infinitehorizon discounted cost. Gurvich and Whitt [16, 17] have studied queueandidlenessratio controls for multiclass multipool networks (including the Nnetwork) in the Halfin–Whitt regime by establishing a StateSpaceCollapse property, under certain assumptions on the network structure and the system parameters. The Nnetwork with manyserver pools and abandonment has been recently studied in Tezcan and Dai [26], where a static priority policy is shown to be asymptotically optimal in the Halfin–Whitt regime under a finitetime horizon cost criterion. In Ward and Armony [27], some blind fair routing policies are proposed for some multiclass multipool networks (including the Nnetwork), where the control problems are formulated to minimize the average queueing cost under a fairness constraint on the idleness.
On the other hand, most of the existing results on the stochastic control of multiclass multipool networks in the Halfin–Whitt regime have only considered either discounted cost criteria (Atar [6, 7], Atar et al. [9]) or finitetime horizon cost criteria (Dai and Tezcan [12, 13]). There is only limited work of multiclass networks under ergodic cost criteria. Arapostathis et al. [3] have recently studied the multiclass Vmodel under ergodic cost in the Halfin–Whitt regime. The inverted Vmodel is studied in Armony [4], and it is shown that the fastestserverfirst policy is asymptotically optimal for minimizing the steadystate expected queue length and waiting time. For the same model, Armony and Ward [5] showed that a threshold policy is asymptotically optimal for minimizing the steadystate expected queue length and waiting time subject to a “fairness” constraint on the workload division. Biswas [11] has recently studied a multiclass multipool network with “help” under an ergodic cost criterion, where each server pool has a dedicated stream of a customer class, and can help with other customer classes only when it has idle servers. The Nnetwork does not belong to the class of models considered in Biswas [11]. For general multiclass multipool networks, Arapostathis and Pang [1] have thoroughly studied ergodic control problems for the limiting diffusion. However, as mentioned earlier, asymptotic optimality has remained open. This work makes a significant contribution in that direction, by studying the Nnetwork. The fairness problem we study fills, in some sense (our formulation is more general), the asymptotic optimality gap in Ward and Armony [27], where the associated approximate diffusion control problems are studied via simulations.
We also feel that this work contributes to the understanding of the stability of multiclass multipool networks in the Halfin–Whitt regime. In this topic, in addition to the stability studies of the V and Nnetworks in Gamarnik and Stolyar [14] and Stolyar [23], it is worthwhile mentioning the following relevant work. Stolyar and Yudovina [25] studied the stability of multiclass multipool networks under a load balancing scheduling and routing policy, “longestqueue freestserver” (LQFSLB). They showed that the fluid limit may be unstable in the vicinity of the equilibrium point for certain network structures and system parameters, and that the sequence of stationary distributions of the diffusionscaled processes may not be tight in both the underloaded regime and the Halfin–Whitt regime. They also provided positive answers to the stability and exchangeoflimit results in the diffusion scale for one special class of networks. Stolyar and Yudovina [24] proved the tightness of the sequence of stationary distributions of multiclass multipool networks under a leaf activity priority policy (assigning static priorities to the activities in the order of sequential “elimination” of the tree leaves) in the scale ( is the scaling parameter) for all , which was extended to the diffusion scale in Stolyar [23]. The stability/recurrence properties for general multiclass multipool networks under other scheduling policies remain open.
As alluded above, the main challenge to establish asymptotic optimality for general multiclass multipool networks is to understand the stochastic stability/recurrence properties of the diffusionscaled state processes in the HalfinWhitt regime. Despite the recent development in [24, 25, 23], these are far from being adequate for proving the asymptotic optimality for general multiclass multipool networks. The stochastic stability/recurrence properties may depend critically upon the network topology and/or parameter assumptions. We believe that the methodology developed here for the Nnetwork will provide some important insights on what stochastic stability properties are required and the roles they may play in proving asymptotic optimality.
1.2. Organization of the paper
The notation used in this paper is summarized in Section 1.3. A detailed description of the Nnetwork model is given in Section 2. We define the control objectives in Section 3.1 and present a statedependent priority policy that is geometrically stable in Section 3.2. We state the corresponding ergodic control problems for the limiting diffusion, as well as the results on the characterization of optimality in Section 4. The asymptotic optimality results are stated in Section 5. We describe the system dynamics and an equivalent control parameterization in Section 6. In Section 7, we establish convergence results for the mean empirical measures for the diffusionscaled state processes. We then prove the lower and upper bounds in Sections 8 and 9, respectively. The proof of geometric stability of the SDP policy is given in Appendix A, and Appendix B is concerned with the proof of Theorem 4.3.
1.3. Notation
The following notation is used in this paper. The symbol , denotes the field of real numbers, and , , and denote the sets of nonnegative real numbers, natural numbers, and integers, respectively. Given two real numbers and , the minimum (maximum) is denoted by (), respectively. Define and . The integer part of a real number is denoted by . We also let .
For a set , we use , , and to denote the closure, the complement, and the indicator function of , respectively. A ball of radius in around a point is denoted by , or simply as if . The Euclidean norm on is denoted by , denotes the inner product of , and .
For a nonnegative function we let denote the space of functions satisfying . We also let denote the subspace of consisting of those functions satisfying Abusing the notation, and occasionally denote generic members of these sets.
We let denote the set of smooth realvalued functions on with compact support. Given any Polish space , we denote by the set of probability measures on and we endow with the Prokhorov metric. For and a Borel measurable map , we often use the abbreviated notation The quadratic variation of a square integrable martingale is denoted by . For any path of a càdlàg process, we use the notation to denote the jump at time .
2. Model Description
All stochastic variables introduced below are defined on a complete probability space . The expectation w.r.t. is denoted by .
2.1. The Nnetwork model
Consider an Nnetwork with two classes of jobs (or customers) and two server pools, as depicted in Figure 1. Jobs of each class arrive according to a Poisson process with rates , . There are two server pools, each of which have multiple statistically identical servers, and servers in pool can only serve class jobs, while servers in pool can serve both classes of jobs. Let be the number of servers in pool , . The service times of all jobs are exponentially distributed, where jobs of class are served at rates and by servers in pools and , respectively, while jobs of class are served at a rate by servers in pool . Throughout the paper we set , and . Jobs may abandon while waiting in queue, with an exponential patience time with rate for . We study a sequence of such networks indexed by an integer which is the order of the number of servers and let .
Throughout the paper we assume that the parameters satisfy the following conditions.
Assumption 2.1.
Halfin–Whitt Regime As , the following hold:
We also have
(2.1) 
Note that (2.1) implies that class jobs are overloaded for server pool , class jobs are underloaded for server pool , and the overload of class jobs can be served by server pool so that both server pools are critically loaded. This assumption is referred to as the complete resource pooling condition (Williams [28], Atar [7]).
Let be a constant matrix
(2.2) 
The quantity can be interpreted as the steadystate fraction of service allocation of pool to class jobs in the fluid scale. Define and by
(2.3) 
(2.4) 
Then can be interpreted as the steadystate total number of class jobs, and can be interpreted as the steadystate number of class jobs receiving service in pool , in the fluid scale. It is easy to check that , where .
For each let and be the total number of class jobs in the system and in the queue, respectively. For each , let be the number of idle servers in server pool . For , let be the number of class jobs being served in server pool , and note that . The following fundamental balance equations hold:
(2.5) 
for each . We let , , and analogously define and .
2.2. Scheduling control
We only consider work conserving policies that are nonanticipative and preemptive. Work conservation requires that the processes and satisfy
In other words, no server will idle if there is any job in a queue that the server can serve. Service preemption is allowed, that is, jobs in service at pool can be interrupted and resumed at a later time in order to serve jobs from the other class.
Let
We define the action set as
Define the fields
where is the collection of all null sets, and
The processes , and are all rate1 Poisson processes, representing the arrival, service and abandonment quantities, respectively. We assume that they are mutually independent, and also independent of the initial condition . Note that quantities with subscript , are all equal to zero. The filtration represents the information available up to time , and the filtration contains the information about future increments of the processes. We say that a scheduling policy is admissible if

for all ;

is adapted to ;

is independent of at each time ;

for each , and for each , the process agrees in law with , and the process agrees in law with .
We denote the set of all admissible scheduling policies by . Abusing the notation we sometimes denote this as .
Following Atar [7], we also consider a stronger condition, joint work conservation (JWC), for preemptive scheduling policies. Namely, for each , there exists a rearrangement of jobs in service such that there is either no job in queue or no idling server in the system, satisfying
(2.6) 
We let denote the set of all possible values of for which the JWC condition (2.6) holds, i.e.,
Note that the set may not include all possible scenarios of the system state for finite at each time .
We quote a result from Atar [7], which is used later.
Lemma 2.1 (Lemma 3 in Atar [7]).
There exists a constant such that, the collection of sets defined by
satisfies for all . Moreover, for any satisfying and , we have
(2.7) 
We need the following definition.
Definition 2.1.
We fix some open ball centered at the origin, such that for all . The jointly work conserving action set at is defined as the subset of , which satisfies
We also define the associated admissible policies by
We refer to the policies in as eventually jointly work conserving (EJWC).
Remark 2.1.
The ball is fixed in Definition 2.1 only for convenience. We could instead adopt a more general definition of , without affecting the results of the paper. Let be a collection of domains which covers and satisfies , and for all . Then we redefine using Definition 2.1 and replacing with and define analogously. If , then, in the diffusion scale, JWC holds on an expanding sequence of domains which cover . This is the reason behind the terminology EJWC. The EJWC condition plays a crucial role in the derivation of the controlled diffusion limit. Therefore, convergence of mean empirical measures of the diffusionscaled state process and control, and thus, also the lower and upper bounds for asymptotic optimality are established for sequences .
3. Ergodic Control Problems
We define the diffusionscaled processes , , and analogously for and , by
(3.1) 
3.1. Control objectives
We consider three control objectives, which address the queueing (delay) and/or idleness costs in the system: (i) unconstrained problem, minimizing the queueing (and idleness) cost and (ii) constrained problem, minimizing the queueing cost while imposing a constraint on idleness, and (iii) fairness problem, minimizing the queueing cost while imposing a constraint on the idleness ratio between the two server pools. The running cost is a function of the diffusionscaled processes, which are related to the unscaled ones by (3.1). For simplicity, in all three cost minimization problems, we assume that the initial condition is deterministic and as . Let be defined by
(3.2) 
where is a strictly positive vector and is a nonnegative vector. In the case , only the queueing cost is minimized. In (P1) below, idleness may be added as a penalty in the objective. We denote by the expectation operator under an admissible policy .

(unconstrained problem) The running cost penalizes the queueing (and idleness). Let be the running cost function as defined in (3.2). Given an initial state , and an admissible scheduling policy , we define the diffusionscaled cost criterion by
(3.3) The associated cost minimization problem becomes

(constrained problem) The objective here is to minimize the queueing cost while imposing idleness constraints on the two server pools. Let be the running cost function corresponding to in (3.2) with . The diffusionscaled cost criterion is defined analogously to (3.3) with running cost , that is,
Also define with . The associated cost minimization problem becomes
(3.4) where is a positive vector.

(fairness) Here we minimize the queueing cost while keeping the average idleness of the two server pools balanced. Let be a positive constant and let . The associated cost minimization problem becomes
We refer to , and as the diffusionscaled optimal values for the system given the initial state , for (P1), (P2) and (P3), respectively.
Remark 3.1.
We choose running costs of the form (3.3) mainly to simplify the exposition. However, all the results of this paper still hold for more general classes of functions. Let be a convex function satisfying for some and constants and , and , , , be convex functions that have at most polynomial growth. Then we can choose for the unconstrained problem, and as the functions in the constraints in (3.4) (with ). For the problem (P3) we require in addition that and they are in . The analogous running costs can of course be used in the corresponding control problems for the limiting diffusion, which are presented later in Section 4.2.
3.2. A geometrically stable scheduling policy
We introduce a Markov scheduling policy for the Nnetwork that results in geometric ergodicity for the diffusionscaled state process, and also implies that the diffusionscaled cost in the ergodic control problem (P1) is bounded, uniformly in . Let and . Note that .
Definition 3.1.
For each , we define the scheduling policy , , by
Note that the scheduling policy is statedependent, and can be interpreted as follows. Class jobs prioritize server pool over . Server pool prioritizes the two classes of jobs depending on the system state. Whenever , server pool allocates no more than servers to class jobs, while whenever , it allocates no more than servers to class jobs. It is easy to check that this policy is work conserving. The resulting queue length and idleness and can be obtained by the balance equations: for ,
The generator of the state process under a scheduling policy takes the form
(3.7) 
for . We can write the generator of the diffusionscaled state process using (3.7) and the function in Definition 3.2 as
(3.8) 
We have the following.
Proposition 3.1.
Let denote the diffusionscaled state process under the scheduling policy in Definition 3.1, and be its generator. For any , there exists , such that
(3.9) 
for some positive constants , , and , which depend on and . Namely, under the scheduling policy is geometrically ergodic. As a consequence, for any , there exists such that
(3.10) 
and the same holds if we replace with or in (3.10). In other words, the diffusionscaled cost criterion is finite for .
Proof.
See Appendix A. ∎
Remark 3.2.
We remark that given (3.10) for , the same property may not hold for or . It always holds if a scheduling policy satisfies the JWC condition (by the balance equation (6.5)). Otherwise, that property needs to be verified under the given scheduling policy. It can easily checked that if the property holds for any two processes of , and , then it also holds for the third.
4. Ergodic Control of the Limiting Diffusion
4.1. The controlled diffusion limit
If the action space is , or equivalently , the convergence in distribution of the diffusionscaled processes to the limiting diffusion in (4.1) is shown in Proposition 3 in Atar [7]. For the class of multiclass multipool networks, the drift of the limiting diffusion is given implicitly via a linear map in Proposition 3 of Atar [7]. For the Nnetwork, the drift can be explicitly expressed as we show below in (4.4). In Arapostathis and Pang [1], a leaf elimination algorithm has been developed to provide an explicit expression for the drift of the limiting diffusion of general multiclass multipool networks. In the case of the Nnetwork, the limit process is an dimensional diffusion satisfying the Itô equation
(4.1) 
with initial condition and the control , where
(4.2) 
In (4.1), the process is a dimensional standard Wiener process independent of the initial condition .
Following the leaf elimination algorithm for the Nnetwork, the drift of the diffusion can be computed as follows. Let
(4.3) 
Then the drift takes the form
which can also be written as (see Lemma 4.3 and Section 4.2 in [1])