Infinite Horizon Average Optimality of the N-network

# Infinite Horizon Average Optimality of the N-network Queueing Model in the Halfin–Whitt Regime

Ari Arapostathis Department of Electrical and Computer Engineering
The University of Texas at Austin
2501 Speedway St., EER 7.824
Austin, TX  78712
and  Guodong Pang The Harold and Inge Marcus Dept. of Industrial and Manufacturing Eng.
College of Engineering
Pennsylvania State University
University Park, PA  16802
###### Abstract.

We study the infinite horizon optimal control problem for N-network queueing systems, which consist of two customer classes and two server pools, under average (ergodic) criteria in the Halfin–Whitt regime. We consider three control objectives: 1) minimizing the queueing (and idleness) cost, 2) minimizing the queueing cost while imposing a constraint on idleness at each server pool, and 3) minimizing the queueing cost while requiring fairness on idleness. The running costs can be any nonnegative convex functions having at most polynomial growth.

For all three problems we establish asymptotic optimality, namely, the convergence of the value functions of the diffusion-scaled state process to the corresponding values of the controlled diffusion limit. We also present a simple state-dependent priority scheduling policy under which the diffusion-scaled state process is geometrically ergodic in the Halfin–Whitt regime, and some results on convergence of mean empirical measures which facilitate the proofs.

###### Key words and phrases:
parallel-server network, N-network, reneging/abandonment, Halfin–Whitt (QED) regime, diffusion scaling, long time average control, ergodic control, ergodic control with constraints, geometric ergodicity, stable Markov optimal control, asymptotic optimality
###### 2000 Mathematics Subject Classification:
60K25, 68M20, 90B22, 90B36

## 1. Introduction

Parallel server networks in the Halfin–Whitt regime have been very actively studied in recent years. Many important insights have been gained in their performance, design and control. One important question that has mostly remained open is optimal control under the long-run average expected cost (ergodic) criterion. Since it is prohibitive to exactly solve the discrete state Markov decision problem, the plausible approach is to solve the control problem for the limiting diffusion in the Halfin–Whitt regime and use this as an approximation. However, the results in the existing literature for ergodic control of diffusions (see a good review in Arapostathis et al. [2]) cannot be directly applied to the class of diffusion models arising from the parallel server networks in the Halfin–Whitt regime. Recently, Arapostathis et al. [3] and Arapostathis and Pang [1] have developed the basic tools needed to tackle this class of ergodic control problems.

Given an optimal solution to the control problem for the diffusion limit, the important task that remains is to show it gives rise to a scheduling policy for the network and establish that any sequence of such scheduling policies is asymptotically optimal in the Halfin–Whitt regime. Under the discounted cost criterion, this task has been accomplished in Atar et al. [8] for the multiclass V-model (or V-network), which consists of multiple customer classes that are catered by servers in a single pool, and in Atar [7] for multiclass multi-pool networks with certain tree topologies. Under the ergodic criterion, the problem becomes much more difficult because it is intertwined with questions concerning the ergodicity of the diffusion-scaled state process under the scheduling policies. This relates to various open questions on the stochastic stability of parallel server networks in the Halfin–Whitt regime.

Stability of the multiclass V-model in the Halfin–Whitt regime is well treated in Gamarnik and Stolyar [14]. Stolyar [23] has recently proved the tightness of the stationary distributions of the diffusion-scaled state process for the so-called N-network (or N-model), depicted in Figure 1, with no abandonment under a static priority policy. For the V-network, Arapostathis et al. [3] have shown that a sequence of scheduling policies constructed from the optimal solution to the diffusion control problem under the ergodic criterion is asymptotically optimal. In this construction, the state space is divided into a compact subset with radius in the order of the square root of the number of servers around the steady state, and its complement. An approximation to the optimal control for the diffusion is used inside this set, and a static priority policy is employed in its complement. It follows from the results of [3] that under this sequence of scheduling policies the state process is geometrically ergodic. The proof of asymptotic optimality takes advantage of the fact that, under the static priority scheduling policy, the state process of the V-model in the Halfin–Whitt regime is geometrically ergodic. In fact, such a static priority policy for the V-model also corresponds to a constant Markov control, under which the limiting diffusion is geometrically ergodic.

However, for multiclass multi-pool networks, although the optimal control problem for the limiting diffusion has been thoroughly solved in Arapostathis and Pang [1], the lack of sufficient understanding of the stochastic stability properties of the diffusion-scaled state process has been the critical obstacle to establishing asymptotic optimality. It is worth noting that this difficulty is related to the so-called “joint work conservation” (JWC) condition which plays a key role in the study of multiclass multi-pool networks as shown in Atar [6, 7]. Although the JWC condition holds for the limiting diffusions over the entire state space, it generally holds only in a bounded subset of the state space for the diffusion-scaled process, whose radius is in the order of the number of servers around the steady state. Thus, an optimal control derived from the limiting diffusion does not translate well to a scheduling policy which is compatible with the controlled dynamics of the network on the entire state space. At the same time, although as shown in [1] there exists a constant Markov control under which the limiting diffusion of multiclass multi-pool networks is geometrically ergodic, it is unclear if this is also the case for the diffusion-scaled state processes under the corresponding static priority scheduling policy. Therefore, the limiting diffusion does not offer much help in the synthesis of a suitable scheduling policy on the part of the state space where the JWC condition does not hold, and as a result constructing stable policies for multiclass multi-pool networks is quite a challenge.

In this paper, we address these challenging problems for the N-network. We study three ergodic control problems: (P1) minimizing the queueing (and idleness) cost, (P2) minimizing the queueing cost while imposing a constraint on the idleness of each server pool (e.g., the long-run average idleness cannot exceed a specified threshold), and (P3) minimizing the queueing cost while requiring fairness on idleness (e.g., the average idleness of the two server pools satisfies a fixed ratio condition). The running cost can be any nontrivial nonnegative convex functions having at most polynomial growth. Under its usual parameterization, the control specifies the number of customers from each class that are scheduled to each server pool, and we refer to it as a “scheduling” policy. However, the control can be also parameterized in a way so as to specify which class of customers should be scheduled to server pool  if it has any available servers (“scheduling” control), and which of the server pools should class- customers be routed to, if both pools have available servers (“routing” control). The optimal control problems for the limiting diffusion corresponding to (P1)–(P3) are well-posed and in the case of (P1)–(P2) the solutions can be fully characterized via HJB equations, following the methods in [1, 3]. The dynamic programming characterization for (P3) is more difficult. This is one of those rare examples in ergodic control where the running cost is not bounded below or above, and there is no blanket stability property. In this paper, we establish the existence of a solution to the HJB equation, and the usual characterization of optimality for this problem.

We first present a Markov scheduling policy, for the N-network under which the diffusion-scaled state processes are geometrically ergodic in the Halfin–Whitt regime (see Section 3.2). Unlike the V-model, this scheduling policy is a state-dependent priority (SDP) policy, i.e., priorities change as the system state varies—yet it is simple to describe. This result is significant since it indicates that the ergodic control problems for the diffusion-scaled processes in the Halfin–Whitt regime have finite values. Moreover, it can be used as a scheduling policy outside a bounded subset of the state space where the JWC property might fail to hold. On the other hand, it follows from the theory in Arapostathis and Pang [1] that the controlled diffusion limit is geometrically ergodic under some constant Markov control (see Theorem 4.2 in [1]). In this paper we show that a much stronger result applies for the N-network (Lemma 4.1): as long as the scheduling control is a constant Markov control with pool  prioritizing class over , the controlled diffusion limit is geometrically ergodic, uniformly over all routing controls (e.g., class- customers prioritizing server pool over , or a state-dependent priority policy, or even a non-stationary one).

The main results of the paper center around the proof of convergence of the value functions, which is accomplished by establishing matching lower and upper bounds (see Theorems 5.15.2). To prove the lower bound, the key is to show that as long as the long-run average first-order moment of the diffusion-scaled state process is finite, the associated mean empirical measures are tight and converge to an ergodic occupation measure corresponding to a stationary stable Markov control for the limiting diffusion (Lemma 7.1). In fact, we can show that for the N-network, under any admissible (work conserving) scheduling policy, the long-run average () moment of the diffusion-scaled state process is bounded by the long-run average moment of the diffusion-scaled queue under that policy (Lemma 8.1). The lower bounds can then be deduced from these observations. It is worth noting that in order to establish asymptotic optimality for the fairness problem (P3), we must relax the equality in the constraint and show instead that the constraint is asymptotically feasible.

In order to establish the upper bound, a Markov scheduling policy is synthesized which is the concatenation of a Markov policy induced by the solution of the ergodic control problem for the diffusion limit, and which is applied on a bounded subset of the state space where the JWC condition holds, and the SDP policy, which is applied on the complement of this set.

The proof involves the following key components. First, we apply the spatial truncation approximation technique developed in Arapostathis et al. [3] and Arapostathis and Pang [1] for the ergodic control problem for the diffusion limit. This provides us with an -optimal continuous precise control. Second, we show that under the concatenation of the Markov scheduling policy induced by this -optimal control and the SDP policy, the diffusion-scaled state processes are geometrically ergodic (Lemma 9.1). Then we prove that the mean empirical measures of the diffusion-scaled process and control, converge to the ergodic occupation measure of the diffusion limit associated with the -optimal precise control originally selected (Lemma 7.2). Uniform integrability implied by the geometric ergodicity takes care of the rest.

### 1.1. Literature review

In a certain way, the N-network has been viewed as the benchmark of multiclass multi-pool networks, mainly because it is simple to describe, yet it has complicated enough dynamics. There are several important studies on stochastic control of parallel server networks, focusing on N-networks. Xu et al. [30] studied the Markovian single-server N-network and showed that a threshold scheduling policy is optimal under the expected discounted and long-run average linear holding cost, utilizing a Markov decision process approach. In the conventional (single-server) heavy-traffic regime, the N-network with two single severs, was first studied in Harrison [19], under the assumption of Poisson arrivals and deterministic services, and a “discrete-review” policy is shown to be asymptotically optimal under an infinite horizon discounted linear queueing cost. The N-model with renewal arrival processes and general service time distributions was then studied in Bell and Williams [10], as a Brownian control problem under an infinite horizon discounted linear queueing cost, and a threshold policy is shown to be asymptotically optimal. Ghamami and Ward [15] studied the N-network with renewal arrival processes, general service time distributions and exponential patience times, and showed a two-threshold scheduling policy is asymptotically optimal via a Brownian control problem under an infinite horizon discounted linear queueing cost. Brownian control models for multiclass networks were pioneered in Harrison [18, 20] and have been extended to many interesting networks; see Williams [29] for an extensive review of that literature.

In the many-server Halfin–Whitt regime, Atar [6, 7] pioneered the study of multiclass multi-pool networks with abandonment (of a certain tree topology) via the corresponding control problems for the diffusion limit under an infinite-horizon discounted cost. Gurvich and Whitt [16, 17] have studied queue-and-idleness-ratio controls for multiclass multi-pool networks (including the N-network) in the Halfin–Whitt regime by establishing a State-Space-Collapse property, under certain assumptions on the network structure and the system parameters. The N-network with many-server pools and abandonment has been recently studied in Tezcan and Dai [26], where a static priority policy is shown to be asymptotically optimal in the Halfin–Whitt regime under a finite-time horizon cost criterion. In Ward and Armony [27], some blind fair routing policies are proposed for some multiclass multi-pool networks (including the N-network), where the control problems are formulated to minimize the average queueing cost under a fairness constraint on the idleness.

On the other hand, most of the existing results on the stochastic control of multiclass multi-pool networks in the Halfin–Whitt regime have only considered either discounted cost criteria (Atar [6, 7], Atar et al. [9]) or finite-time horizon cost criteria (Dai and Tezcan [12, 13]). There is only limited work of multiclass networks under ergodic cost criteria. Arapostathis et al. [3] have recently studied the multiclass V-model under ergodic cost in the Halfin–Whitt regime. The inverted V-model is studied in Armony [4], and it is shown that the fastest-server-first policy is asymptotically optimal for minimizing the steady-state expected queue length and waiting time. For the same model, Armony and Ward [5] showed that a threshold policy is asymptotically optimal for minimizing the steady-state expected queue length and waiting time subject to a “fairness” constraint on the workload division. Biswas [11] has recently studied a multiclass multi-pool network with “help” under an ergodic cost criterion, where each server pool has a dedicated stream of a customer class, and can help with other customer classes only when it has idle servers. The N-network does not belong to the class of models considered in Biswas [11]. For general multiclass multi-pool networks, Arapostathis and Pang [1] have thoroughly studied ergodic control problems for the limiting diffusion. However, as mentioned earlier, asymptotic optimality has remained open. This work makes a significant contribution in that direction, by studying the N-network. The fairness problem we study fills, in some sense (our formulation is more general), the asymptotic optimality gap in Ward and Armony [27], where the associated approximate diffusion control problems are studied via simulations.

We also feel that this work contributes to the understanding of the stability of multiclass multi-pool networks in the Halfin–Whitt regime. In this topic, in addition to the stability studies of the V and N-networks in Gamarnik and Stolyar [14] and Stolyar [23], it is worthwhile mentioning the following relevant work. Stolyar and Yudovina [25] studied the stability of multiclass multi-pool networks under a load balancing scheduling and routing policy, “longest-queue freest-server” (LQFS-LB). They showed that the fluid limit may be unstable in the vicinity of the equilibrium point for certain network structures and system parameters, and that the sequence of stationary distributions of the diffusion-scaled processes may not be tight in both the underloaded regime and the Halfin–Whitt regime. They also provided positive answers to the stability and exchange-of-limit results in the diffusion scale for one special class of networks. Stolyar and Yudovina [24] proved the tightness of the sequence of stationary distributions of multiclass multi-pool networks under a leaf activity priority policy (assigning static priorities to the activities in the order of sequential “elimination” of the tree leaves) in the scale ( is the scaling parameter) for all , which was extended to the diffusion scale in Stolyar [23]. The stability/recurrence properties for general multiclass multi-pool networks under other scheduling policies remain open.

As alluded above, the main challenge to establish asymptotic optimality for general multiclass multi-pool networks is to understand the stochastic stability/recurrence properties of the diffusion-scaled state processes in the Halfin-Whitt regime. Despite the recent development in [24, 25, 23], these are far from being adequate for proving the asymptotic optimality for general multiclass multi-pool networks. The stochastic stability/recurrence properties may depend critically upon the network topology and/or parameter assumptions. We believe that the methodology developed here for the N-network will provide some important insights on what stochastic stability properties are required and the roles they may play in proving asymptotic optimality.

### 1.2. Organization of the paper

The notation used in this paper is summarized in Section 1.3. A detailed description of the N-network model is given in Section 2. We define the control objectives in Section 3.1 and present a state-dependent priority policy that is geometrically stable in Section 3.2. We state the corresponding ergodic control problems for the limiting diffusion, as well as the results on the characterization of optimality in Section 4. The asymptotic optimality results are stated in Section 5. We describe the system dynamics and an equivalent control parameterization in Section 6. In Section 7, we establish convergence results for the mean empirical measures for the diffusion-scaled state processes. We then prove the lower and upper bounds in Sections 8 and 9, respectively. The proof of geometric stability of the SDP policy is given in Appendix A, and Appendix B is concerned with the proof of Theorem 4.3.

### 1.3. Notation

The following notation is used in this paper. The symbol , denotes the field of real numbers, and , , and denote the sets of nonnegative real numbers, natural numbers, and integers, respectively. Given two real numbers and , the minimum (maximum) is denoted by (), respectively. Define and . The integer part of a real number is denoted by . We also let .

For a set , we use , , and to denote the closure, the complement, and the indicator function of , respectively. A ball of radius in around a point is denoted by , or simply as if . The Euclidean norm on is denoted by , denotes the inner product of , and .

For a nonnegative function we let denote the space of functions satisfying . We also let denote the subspace of consisting of those functions satisfying Abusing the notation, and occasionally denote generic members of these sets.

We let denote the set of smooth real-valued functions on with compact support. Given any Polish space , we denote by the set of probability measures on and we endow with the Prokhorov metric. For and a Borel measurable map , we often use the abbreviated notation The quadratic variation of a square integrable martingale is denoted by . For any path of a càdlàg process, we use the notation to denote the jump at time .

## 2. Model Description

All stochastic variables introduced below are defined on a complete probability space . The expectation w.r.t. is denoted by .

### 2.1. The N-network model

Consider an N-network with two classes of jobs (or customers) and two server pools, as depicted in Figure 1. Jobs of each class arrive according to a Poisson process with rates , . There are two server pools, each of which have multiple statistically identical servers, and servers in pool  can only serve class- jobs, while servers in pool  can serve both classes of jobs. Let be the number of servers in pool , . The service times of all jobs are exponentially distributed, where jobs of class  are served at rates and by servers in pools  and , respectively, while jobs of class  are served at a rate by servers in pool . Throughout the paper we set , and . Jobs may abandon while waiting in queue, with an exponential patience time with rate for . We study a sequence of such networks indexed by an integer which is the order of the number of servers and let .

Throughout the paper we assume that the parameters satisfy the following conditions.

###### Assumption 2.1.

Halfin–Whitt Regime As , the following hold:

 λnin→λi>0,λni−nλi√n→^λi,γni→γi≥0,i=1,2, Nnjn→νj>0,√n(n−1Nnj−νj)→0,j=1,2, μnij→μij>0,√n(μnij−μij)→^μij,i,j=1,2.

We also have

 λ1>μ11ν1,λ1−μ11ν1μ12ν2+λ2μ22ν2=1. (2.1)

Note that (2.1) implies that class- jobs are overloaded for server pool , class- jobs are underloaded for server pool , and the overload of class- jobs can be served by server pool  so that both server pools are critically loaded. This assumption is referred to as the complete resource pooling condition (Williams [28], Atar [7]).

Let be a constant matrix

 ξ∗:=⎡⎢⎣1λ1−μ11ν1μ12ν20λ2μ22ν2⎤⎥⎦. (2.2)

The quantity can be interpreted as the steady-state fraction of service allocation of pool  to class- jobs in the fluid scale. Define and by

 x∗1:=ξ∗11ν1+ξ∗12ν2,x∗2:=ξ∗22ν2, (2.3)
 z∗=(z∗ij):=(ξ∗ijνj)=⎡⎢⎣ν1λ1−μ11ν1μ120λ2μ22⎤⎥⎦. (2.4)

Then can be interpreted as the steady-state total number of class- jobs, and can be interpreted as the steady-state number of class- jobs receiving service in pool , in the fluid scale. It is easy to check that , where .

For each let and be the total number of class- jobs in the system and in the queue, respectively. For each , let be the number of idle servers in server pool . For , let be the number of class- jobs being served in server pool , and note that . The following fundamental balance equations hold:

 (2.5)

for each . We let , , and analogously define and .

### 2.2. Scheduling control

We only consider work conserving policies that are non-anticipative and preemptive. Work conservation requires that the processes and satisfy

 Qn1(t)∧Ynj(t)=0∀j=1,2,andQn2(t)∧Yn2(t)=0,∀t≥0.

In other words, no server will idle if there is any job in a queue that the server can serve. Service preemption is allowed, that is, jobs in service at pool  can be interrupted and resumed at a later time in order to serve jobs from the other class.

Let

 q1(x,z) :=x1−z11−z12, yn1(x,z) :=Nn1−z11, q2(x,z) :=x2−z22, yn2(x,z) :=Nn2−z12−z22.

We define the action set as

 Zn(x):={z∈\mathdsZ2×2+:z21=0, q1(x,z)∧q2(x,z)∧yn1(x,z)∧yn2(x,z)≥0,q1(x,z)∧(yn1(x,z)+yn2(x,z))=0,q2(x,z)∧yn2(x,z)=0}.

Define the -fields

 Fnt :=σ{Xn(0),~Ani(s),~Snij(s),~Rni(s):i,j=1,2,0≤s≤t}∨N, Gnt :=σ{δ~Ani(t,r),δ~Snij(t,r),δ~Rni(t,r):i,j=1,2,r≥0},

where is the collection of all -null sets, and

 ~Ani(t) :=Ani(λnit), δ~Ani(t,r) :=~Ani(t+r)−~Ani(t), ~Snij(t) :=Snij(μnij∫t0Znij(s)ds), δ~Snij(t,r) :=Snij(μnij∫t0Znij(s)ds+μnijr)−~Snij(t), ~Rni(t) :=Rni(γni∫t0Qni(s)ds), δ~Rni(t,r) :=Rni(γni∫t0Qni(s)ds+γnir)−~Rni(t).

The processes , and are all rate-1 Poisson processes, representing the arrival, service and abandonment quantities, respectively. We assume that they are mutually independent, and also independent of the initial condition . Note that quantities with subscript , are all equal to zero. The filtration represents the information available up to time , and the filtration contains the information about future increments of the processes. We say that a scheduling policy is admissible if

1. for all ;

2. is adapted to ;

3. is independent of at each time ;

4. for each , and for each , the process agrees in law with , and the process agrees in law with .

We denote the set of all admissible scheduling policies by . Abusing the notation we sometimes denote this as .

Following Atar [7], we also consider a stronger condition, joint work conservation (JWC), for preemptive scheduling policies. Namely, for each , there exists a rearrangement of jobs in service such that there is either no job in queue or no idling server in the system, satisfying

 e⋅q(x,z)∧e⋅yn(x,z)=0. (2.6)

We let denote the set of all possible values of for which the JWC condition (2.6) holds, i.e.,

 Xn:={x∈\mathdsZ2+:(???) holds for some z∈Zn(x)}.

Note that the set may not include all possible scenarios of the system state for finite at each time .

We quote a result from Atar [7], which is used later.

###### Lemma 2.1 (Lemma 3 in Atar [7]).

There exists a constant such that, the collection of sets defined by

 ˘Xn:={x∈\mathdsZ2+:∥x−nx∗∥≤c0n},

satisfies for all . Moreover, for any satisfying and , we have

 [Nn1−y1x1−q1−(Nn1−y1)0x2−q2]∈Zn(x). (2.7)

We need the following definition.

###### Definition 2.1.

We fix some open ball centered at the origin, such that for all . The jointly work conserving action set at is defined as the subset of , which satisfies

 ˘Zn(x):={{z∈Zn(x): e⋅q(x,z)∧e⋅yn(x,z)=0}if\ % \ x∈n(˘B+x∗),Zn(x)otherwise.

We also define the associated admissible policies by

We refer to the policies in as eventually jointly work conserving (EJWC).

###### Remark 2.1.

The ball is fixed in Definition 2.1 only for convenience. We could instead adopt a more general definition of , without affecting the results of the paper. Let be a collection of domains which covers and satisfies , and for all . Then we redefine using Definition 2.1 and replacing with and define analogously. If , then, in the diffusion scale, JWC holds on an expanding sequence of domains which cover . This is the reason behind the terminology EJWC. The EJWC condition plays a crucial role in the derivation of the controlled diffusion limit. Therefore, convergence of mean empirical measures of the diffusion-scaled state process and control, and thus, also the lower and upper bounds for asymptotic optimality are established for sequences .

## 3. Ergodic Control Problems

We define the diffusion-scaled processes , , and analogously for and , by

 ^Xni(t):=1√n(Xni(t)−nx∗i),^Qni(t):=1√nQni(t),^Znij(t):=1√n(Znij(t)−nz∗ij),^Ynj(t):=1√nYnj(t), (3.1)

where and are defined in (2.3)–(2.4).

### 3.1. Control objectives

We consider three control objectives, which address the queueing (delay) and/or idleness costs in the system: (i) unconstrained problem, minimizing the queueing (and idleness) cost and (ii) constrained problem, minimizing the queueing cost while imposing a constraint on idleness, and (iii) fairness problem, minimizing the queueing cost while imposing a constraint on the idleness ratio between the two server pools. The running cost is a function of the diffusion-scaled processes, which are related to the unscaled ones by (3.1). For simplicity, in all three cost minimization problems, we assume that the initial condition is deterministic and as . Let be defined by

 ^r(q,y):=2∑i=1ξiqmi+2∑j=1ζjymj,q∈\mathdsR2+,y∈\mathdsR2+,for% some~{}m≥1, (3.2)

where is a strictly positive vector and is a nonnegative vector. In the case , only the queueing cost is minimized. In (P1) below, idleness may be added as a penalty in the objective. We denote by the expectation operator under an admissible policy .

• (unconstrained problem) The running cost penalizes the queueing (and idleness). Let be the running cost function as defined in (3.2). Given an initial state , and an admissible scheduling policy , we define the diffusion-scaled cost criterion by

 J(^Xn(0),Zn):=limsupT→∞1TEZn[∫T0^r(^Qn(s),^Yn(s))ds]. (3.3)

The associated cost minimization problem becomes

 ^Vn(^Xn(0)):=infZn∈ZnJ(^Xn(0),Zn).
• (constrained problem) The objective here is to minimize the queueing cost while imposing idleness constraints on the two server pools. Let be the running cost function corresponding to in (3.2) with . The diffusion-scaled cost criterion is defined analogously to (3.3) with running cost , that is,

 Jo(^Xn(0),Zn) :=limsupT→∞1TEZn[∫T0^ro(^Qn(s))ds]. Also define Jc,j(^Xn(0),Zn) :=limsupT→∞1TEZn[∫T0(^Ynj(s))~mds],j=1,2,

with . The associated cost minimization problem becomes

 ^Vnc(^Xn(0)) :=infZn∈ZnJo(^Xn(0),Zn), subject toJc,j(^Xn(0),Zn) ≤\updeltaj,j=1,2, (3.4)

where is a positive vector.

• (fairness) Here we minimize the queueing cost while keeping the average idleness of the two server pools balanced. Let be a positive constant and let . The associated cost minimization problem becomes

 ^Vnf(^Xn(0)) :=infZn∈ZnJo(^Xn(0),Zn), subject toJc,1(^Xn(0),Zn)

We refer to , and as the diffusion-scaled optimal values for the system given the initial state , for (P1), (P2) and (P3), respectively.

###### Remark 3.1.

We choose running costs of the form (3.3) mainly to simplify the exposition. However, all the results of this paper still hold for more general classes of functions. Let be a convex function satisfying for some and constants and , and , , , be convex functions that have at most polynomial growth. Then we can choose for the unconstrained problem, and as the functions in the constraints in (3.4) (with ). For the problem (P3) we require in addition that and they are in . The analogous running costs can of course be used in the corresponding control problems for the limiting diffusion, which are presented later in Section 4.2.

### 3.2. A geometrically stable scheduling policy

We introduce a Markov scheduling policy for the N-network that results in geometric ergodicity for the diffusion-scaled state process, and also implies that the diffusion-scaled cost in the ergodic control problem (P1) is bounded, uniformly in . Let and . Note that .

###### Definition 3.1.

For each , we define the scheduling policy , , by

 ˇzn11(x) =x1∧Nn1, ˇzn12(x) ={(x1−Nn1)+∧Nn12if% ~{}x2≥Nn22(x1−Nn1)+∧(Nn2−x2)otherwise, ˇzn22(x) ={x2∧Nn22if~{}x1≥Nn1+Nn12x2∧(Nn2−(x1−Nn1)+)otherwise.

Note that the scheduling policy is state-dependent, and can be interpreted as follows. Class- jobs prioritize server pool  over . Server pool  prioritizes the two classes of jobs depending on the system state. Whenever , server pool  allocates no more than servers to class- jobs, while whenever , it allocates no more than servers to class- jobs. It is easy to check that this policy is work conserving. The resulting queue length and idleness and can be obtained by the balance equations: for ,

 ˇqn1(x) =x1−ˇzn11(x)−ˇzn12(x), ˇqn2(x) =x2−ˇzn22(x), ˇyn1(x) =Nn1−ˇzn11(x), ˇyn2(x) =Nn2−ˇzn12(x)−ˇzn22(x).
###### Definition 3.2.

For each , define

 ~xn(x):=(x1−nx∗1,x2−nx∗2),^xn(x):=~xn(x)√n. (3.5)

where is given in (2.3). Also define

 Sn:={^xn(x):x∈\mathdsZ2+},˘Sn:={^xn(x):x∈˘Xn}.

For and , we let

 Vk,β(x):=|x1|k+β|x2|k,x∈\mathdsR2. (3.6)

The generator of the state process under a scheduling policy takes the form

 Lznnf(x):=2∑i=1λni(f(x+ei)−f(x))+(μn11zn11+μn12zn12)(f(x−e1)−f(x))+μn22zn22(f(x−e2)−f(x))+2∑i=1γniqni(f(x−ei)−f(x)),x∈\mathdsZ2+, (3.7)

for . We can write the generator of the diffusion-scaled state process using (3.7) and the function in Definition 3.2 as

 ˆLznnf(^x)=Lznnf(^xn(x)). (3.8)

We have the following.

###### Proposition 3.1.

Let denote the diffusion-scaled state process under the scheduling policy in Definition 3.1, and be its generator. For any , there exists , such that

 ˆLˇznnVk,β(^x)≤C1−C2Vk,β(^x)∀^x∈Sn,∀n≥n0, (3.9)

for some positive constants , , and , which depend on and . Namely, under the scheduling policy is geometrically ergodic. As a consequence, for any , there exists such that

 supn≥n0limsupT→∞1TEˇzn[∫T0∣∣^Xn(s)∣∣kds]<∞, (3.10)

and the same holds if we replace with or in (3.10). In other words, the diffusion-scaled cost criterion is finite for .

###### Proof.

See Appendix A. ∎

###### Remark 3.2.

We remark that given (3.10) for , the same property may not hold for or . It always holds if a scheduling policy satisfies the JWC condition (by the balance equation (6.5)). Otherwise, that property needs to be verified under the given scheduling policy. It can easily checked that if the property holds for any two processes of , and , then it also holds for the third.

## 4. Ergodic Control of the Limiting Diffusion

### 4.1. The controlled diffusion limit

If the action space is , or equivalently , the convergence in distribution of the diffusion-scaled processes to the limiting diffusion in (4.1) is shown in Proposition 3 in Atar [7]. For the class of multiclass multi-pool networks, the drift of the limiting diffusion is given implicitly via a linear map in Proposition 3 of Atar [7]. For the N-network, the drift can be explicitly expressed as we show below in (4.4). In Arapostathis and Pang [1], a leaf elimination algorithm has been developed to provide an explicit expression for the drift of the limiting diffusion of general multiclass multi-pool networks. In the case of the N-network, the limit process is an -dimensional diffusion satisfying the Itô equation

 dXt=b(Xt,Ut)dt+ΣdWt, (4.1)

with initial condition and the control , where

 U:={u=(uc,us)∈\mathdsR2+×\mathdsR2+:e⋅uc=e⋅us=1}. (4.2)

In (4.1), the process is a -dimensional standard Wiener process independent of the initial condition .

Following the leaf elimination algorithm for the N-network, the drift of the diffusion can be computed as follows. Let

 ˆG[u](x):=(−(e⋅x)−us1x1−(e⋅x)+uc1+(e⋅x)−us10x2−(e⋅x)+uc2),u∈U. (4.3)

Then the drift takes the form

 b(x,u)=(−μ11ˆG11[u](x)−μ12ˆG12[u](x)−γi(e⋅x)+uci+ℓ1μ22ˆG22[u](x)−γ2(e⋅x)+uc2+ℓ2,)

which can also be written as (see Lemma 4.3 and Section 4.2 in [1])

 b(x,u)=−B1(x−(e⋅x)+uc)+(e⋅x)−B2us−(e⋅x)+