MaxWeight Scheduling: Asymptotic Behavior of Unscaled Queue-Differentials in Heavy Traffic

MaxWeight Scheduling: Asymptotic Behavior of Unscaled Queue-Differentials in Heavy Traffic

Rahul Singh
ECE Department
Texas A&M University
College Station, TX 77843
Alexander L. Stolyar
ISE Department
Lehigh University
Bethlehem, PA 18015

The model is a “generalized switch", serving multiple traffic flows in discrete time. The switch uses MaxWeight algorithm to make a service decision (scheduling choice) at each time step, which determines the probability distribution of the amount of service that will be provided. We are primarily motivated by the following question: in the heavy traffic regime, when the switch load approaches critical level, will the service processes provided to each flow remain “smooth" (i.e., without large gaps in service)? Addressing this question reduces to the analysis of the asymptotic behavior of the unscaled queue-differential process in heavy traffic. We prove that the stationary regime of this process converges to that of a positive recurrent Markov chain, whose structure we explicitly describe. This in turn implies asymptotic “smoothness" of the service processes.

Dynamic scheduling, MaxWeight algorithm, Heavy traffic
asymptotic regime, Markov chain, Queue length differentials, Smooth service process


1 Introduction

Suppose we have a system in which several data traffic flows share a common transmission medium (or channel). Sharing means that in each time slot a scheduler chooses a transmission mode – the subset the flows to serve and corresponding transmission rates; the outcome of each transmission (the number of successfully delivered packets) is random. Scheduler has two key objectives: (a) the time-average (successful) transmission rate of each flow has to be at least some ; (b) the successful transmissions for each flow need to be spread out "smoothly" in time – without large time-gaps between succesful transmissions. Such models arise, for example, when the goal is timely delivery of information over a shared wireless channel [5].

A very natural way to approach this problem is to treat the model as a queueing system, where services (transmissions) are controlled by a so called MaxWeight scheduler (see [10, 9, 3]), which serves a set of virtual queues (one for each traffic flow), each receiving new work at the rate . (See e.g. [1].) This automatically achieves objective (a), if this is feasible at all; MaxWeight is known to be throughput optimal – stabilize the queues if this is feasible at all. The MaxWeight stability results, however, do not tell whether or not the objective (b) is achieved. Specifically, when the system is heavily loaded, i.e. the vector is within the system rate region , but close to its boundary, the steady-state queue lengths under MaxWeight are necessarily large, and it is conceivable that this may result in large time-gaps in service for individual flows. (Note that, if (a) and (b) are the objectives and the queues are virtual, the large queue lengths in themselves are not an issue. As long as (a) and (b) are achieved, minimizing the queue lengths is not important.) Our main results show that this is not the case. Namely, in the heavy traffic regime, when , where is a point on the outer boundary of rate region , the service process remains "smooth", in the sense that its stationary regime converges to that of a positive recurrent Markov chain, whose structure is given explicitly.

To obtain "clean" convergence results, we assume that the amount of new work arriving in the queues in each time slot is random and has continuous distribution. (The amounts of service are random, but discrete.) Under this assumption, the state spaces of the processes that we consider are continuous. On one hand, this makes the analysis more involved (because the notion of positive recurrence is more involved for a continuous state space, as opposed to a countable one). But on the other hand, this makes all stationary distributions absolutely continuous w.r.t. the corresponding Lebesgue measure, making it easier to prove convergence. We emphasize that the assumption of continuous distribution of the arriving work is non-restrictive; if we create virtual queues, artificially, for the purpose of applying MaxWeight algorithm, the structure of the virtual arrival process is within our control.

The problem essentially reduces to analysis of stationary versions of the queue-differential process , which is the projection of the (weighted) queue length process on the subspace , orthogonal to the outer normal cone to the rate region at the point . As we show, in the heavy-traffic limit, in steady-state, the values of the queue-differential process uniquely determine the decisions chosen by MaxWeight scheduler. Note that the process is obtained by projection only, without any scaling depending on the system load.

The model that we consider is essentially a "generalized switch" of [9]. Some features of our model, namely random service outcome and continuous amounts of arriving work, as well as the objective (b), are motivated by applications such as timely delivery of packets of multiple flows over a shared wireless channel [5]. The model of [5] is a special case of ours; paper [5] introduces a debt scheme and proves that it achieves the throughput objective (a); the objective (b) is not considered in [5].

The analysis of MaxWeight stability has a long history, starting from the seminal paper [10], which introduced
MaxWeight; heavy traffic analysis of the algorithm originated in [9]. (See, e.g., [3] for an extensive recent review of
MaxWeight literature.)

The line of work most closely related to this paper, is that in [3, 6, 7]. Paper [3] studies MaxWeight under heavy traffic regime and under the additional assumption that the normal cone is one-dimensional, i.e. it is a ray. (The latter assumption is usually referred to in the literature as complete resource pooling (CRP).) Paper [3] shows, in particular, the stationary distribution tightness of what we call the queue-differential process in heavy traffic. Part of our analysis is also showing the stationary distribution tightness of – it is analogous to that in [3] (and we also borrow a lot of notation from [3]). Besides the difference in models, our proof of tightness is more general in that it applies to non-CRP case – this more general argument is close to that used in [2]. From the tightness of stationary distributions, using the structure of the corresponding continuous state space, we obtain the convergence of the stationary version of (non-Markov) process to that of a positive recurrent Markov chain, whose structure we explicitly describe.

Papers [6, 7] consider objective (b) in the heavy traffic
regime. They introduce a modification of MaxWeight, called regular service guarantee (RSG) scheme, which explicitly tracks the service time-gaps for each flow to dynamically increase the scheduling priority of flows with large current time-gaps. The papers prove that RSG, under certain parameter settings, preserves heavy-traffic queue-length minimization properties of MaxWeight, under CRP condition; at the same time, the papers demonstrate via simulations that RSG improves smoothness (regularity) of the service process. Recall that in this paper we focus on the "pure" MaxWeight, without CRP, and formally show the service process smoothness in the heavy traffic limit.

The rest of the paper is organized as follows. The formal model is presented in Section 2. Section 3 describes the MaxWeight algorithm and the heavy traffic asymptotic regime. Our main results, Theorems 2 and 21 are described in Section 4. (Formal statement of Theorem 21 is in Section 9.) The CRP condition is defined in Section 5. In Section 6 we provide some necessary background and results for general state-space Markov chains. Sections 79 we prove our results for the special case when CRP holds. Finally, in Section 10 we show how the proofs generalize to the case when CRP does not necessarily hold.

1.1 Basic notation

Elements of a Euclidean space will be viewed as row-vectors, and written in bold font; is the usual Euclidean norm of vector . For two vectors and , denotes their scalar (dot) product; vector inequalities are understood componentwise; zero vector and the vector of all ones are denoted and , respectively; will denote the vector obtained by componentwise multiplication; if all components of are non-zero, will denote the vector obtained by componentwise division; statement “ is a positive vector" means . The closed ball of radius centered at is . The positive orthant of is denoted .

For numbers and , we denote , , . For vectors , we denote by the rectangle in .

We always consider Borel -algebra (resp. ) on (resp. ), when the latter is viewed as measurable space. Lebesgue measure on is denoted by . When we consider a linear subspace of , we endow it with the Euclidean metric and the corresponding Borel -algebra and Lebesgue measure.

For a random process , we often use notation or simply .

2 System Model

We consider a system of flows served by a “switch", which evolves in discrete time . At the beginning of each time-slot, the scheduler has to choose from a finite number of “service-decisions". If the service decision is chosen, then independently of the past history the flows get an amount of service, given by a random non-negative vector. Furthermore, we assume that (if decision is chosen), there is a finite number of possible service-vector outcomes, i.e. with probability , it is given by a non-negative vector . The expected service vector for decision is denoted . We assume that vectors are non-zero and different from each other; and that for each there exists such that . We will use notations

We denote by the (random) realization of the service vector at time , and call the service process.

After the service at time is completed, a random amount of work arrives into the queues, and it is given by a non-negative vector . The values of are i.i.d. across times , and is called the arrival process. The mean arrival rates of this process are given by vector

We will now make assumptions on the distribution of . The distribution is absolutely continuous w.r.t. Lebesgue measure, it is concentrated on the rectangle for some constant vector ; moreover, on this rectangle the distribution density is both upper and lower bounded by positive constants, i.e. .

If is the vector of queue lengths at time , then for each


where is the amount of service “wasted" by flow at time .

3 MaxWeight scheduling scheme. Heavy traffic regime

3.1 MaxWeight definition

Let a vector be fixed. MaxWeight scheduling algorithm chooses, at each time , a service decision


with ties broken according to any well defined rule.

Under MaxWeight, the queue length process is a discrete time Markov chain with (continuous) state space . System stability is understood as positive Harris recurrence of this Markov chain.

Denote the system rate region by


It is well known (see [10, 9, 3]) that, in general, under
MaxWeight the system is stable as long as the vector of mean arrival rates is such that . (Scheduling rules having this property are sometimes called “throughput-optimal".) This is true for our model as well as will be shown in Section 7. (Establishing this fact is not difficult, but it does not directly follow from previous work, because we have continuous state space.)

3.2 Heavy traffic regime

We will consider a sequence of systems, indexed by , operating under MaxWeight scheduling. (Variables pertaining to -th system will be supplied superscript .) The switch parameters will remain unchanged, but the distribution of changes with : namely, for each it has density which satisfies all conditions specified in Section 2, and uniformly converges to some density . Note that, automatically, the limiting density (as well as each ) satisfies bounds in the rectangle , and is zero elsewhere. The arrival process , such that the distribution of has density , has the arrival rate vector . Correspondingly, .

We assume that is a maximal element of rate region , i.e. and only when . Thus, lies on the outer boundary of . We further assume that for each , lies in the interior of ; therefore, the system is stable for each (under the MaxWeight algorithm).

The (limiting) system, with arrival process is called critically loaded.

4 Main Results

Consider the sequence of systems described in Section 3, in the heavy traffic regime. Under any throughput-optimal scheduling algorithm, for each , the steady-state average amount of service provided to each flow is greater or equal to its arrival rate . (It may, and typically will, be greater if the wasted service is taken into account.)

We now define the notion of asymptotic smoothness of the steady-state service process. Informally, it means the property that as the system load approaches critical, the steady state service processes are such that for each flow the probability of a -long gap (without any service at all) uniformly vanishes, as .

For each , consider the cumulative service process in steady state. Namely,

Definition 1

We call the service process asymptotically
smooth, if


Our key result (Theorem 21 in Section 9) shows that a "queue-differential" process, which determines scheduling decisions in the system under MaxWeight in heavy traffic, is such that its stationary version converges to that of stationary positive Harris recurrent Markov chain, whose structure we describe explicitly. This result, in particular, will imply the following

Theorem 2

Consider the sequence of systems described in Section 3, in the heavy traffic regime. Under MaxWeight scheduling, the service process is asymptotically smooth.

The proof is given in Section 9.

5 Complete Resource Pooling

To improve exposition, we first give detailed proofs of our main results for the special case, when the following complete resource pooling (CRP) condition holds. (In Section 10 we will show how the proof generalizes to the case without the CRP condition.) Assume that vector is such that there is a unique (up to scaling) outer normal vector to at point ; we choose so that . Denote by


the outer face of where lies. Given our assumptions on , it lies in the relative interior of .

By we denote the subspace of orthogonal to . For any vector , we denote by its orthogonal projection on the (one-dimensional) subspace spanned by , and by its orthogonal projection on the -dimensional subspace .

The following observations and notations will be useful. There is a such that the entire set


also lies in the relative interior of .

6 Background on general-state-
space discrete-time
Markov Chains

We will briefly discuss some notions and results from [8] and [4] on the stability of discrete time Markov Chains (MC), which will be used in later sections. Throughout this section we will assume that the Markov Chain is evolving on a locally compact separable metric space whose Borel -algebra will be denoted by . and are used to denote the probabilities and expectations conditional on having distribution , while and are used when is concentrated at . The transition function of is denoted by . The iterates , are defined inductively by

where is the identity transition function.

Definition 3

(i) -irreducibility: A Markov Chain is called irreducible if there exists a finite measure such that for all whenever . Measure is called an irreducibility measure.

(ii) Harris Recurrence: If is -irreducible and whenever , then is called Harris recurrent. [Abbreviation ’i.o.’ means ’infinitely often’.]

(iii) Invariant Measure: A -finite measure on with the property

is called an invariant measure.

(iv) Positive Harris Recurrence: If is Harris Recurrent with a finite invariant measure , then it is called positive Harris Recurrent.

(v) Boundedness in Probability: If for any and any , there exists a compact set such that


then the Markov process is called bounded in probability.

(vi) Small Sets: A set is called small if for all and some integer , we have


where is a sub-probability measure, i.e. .

(vii) For a probability distribution on , the Markov transition function is defined as

(viii) Petite Sets: A set and a sub-probability measure on are called petite if for some probability distribution on we have

(ix) Non-evanescence: A Markov chain is called non-evanescent if for each . [Event consists of the outcomes such that the sequence visits any compact set at most a finite number of times.]

The following proposition states some results from [8].

Proposition 4

(i) If a set is small and for some probability distribution on and a set , we have


then is petite.

(ii) Suppose that every compact subset of is petite. Then is positive Harris recurrent if and only if it is bounded in probability.

(iii) Suppose that every compact subset of is petite. Then is Harris recurrent if and only if it is non-evanescent.

The following result is form from [4]. It is stated in a form convenient for its application in this paper.

Proposition 5

Let be a non-negative (Lyapunov) function such that the Markov process satisfies the following two conditions, for some positive constants :
(a) , for any state such that .
(b) .
Then there exist constants and such that


7 Queue length process

Recall that is the queue length process for the -th system under MaxWeight. In this section we prove that for all , the process is positive Harris recurrent. The proof uses a Lyapunov drift argument which is fairly standard (in fact, there is more than one way to prove stability of ), except, since our state space is continuous, as a first step we will show that all compact sets are petite.

Some simple preliminary observations are given in the following lemma.

Lemma 6

(i) The points , such that
is non-unique, form a set of zero Lebesgue measure. Moreover, if is such that is unique, then for a sufficiently small the decision is also the unique element of for all .
(ii) The one-step transition function of the process is such that, uniformly in and , the distribution is absolutely continuous with the density upper bounded by and, in the rectangle , lower bounded by .


Statement (i) easily follows from the finiteness of the set of decisions . Statement (ii) easily follows from the assumptions on the arrival process distribution and the fact that .

Lemma 7

For any , there exists such that the set is small for the process .


Consider rectangle
. Choose small enough, so that and . Then, lies in the interior of and every point in is strictly smaller than any point in . Lemma 6(ii) implies that for any , the distribution has a density lower bounded by in .

Lemma 8

For the Markov process , any compact set is petite.


Consider a compact set ; of course, is bounded. Fix arbitrary and pick small enough, so that is small and lies in the interior of . Pick small such that any point in is strictly less than any point in .

It is easy to verify that there exists an integer such that the following holds uniformly in :


Indeed, suppose first that for all , . (This is a probability zero event, of course, but let’s consider it anyway.) Then, for any there exist , such that the following holds: with probability at least some , the norm decreases at least by some , at each time when . This implies that for some and , implies . Now, using this and the fact that with a positive probability can be “very close to ,” we can easily establish property (11). (We omit rather trivial details.)

Next, it is easy to show that there exists an integer such that the following holds uniformly in :


Here we use Lemma 6(ii), which shows that at each time step the distribution of the increments of has a density lower bounded by in .

From (11) and (12) we see that uniformly in , . Application of Theorem 4(i) shows that is petite (and, moreover, that it is small).

To prove stability, we will apply Proposition 4 which requires the following

Lemma 9

Consider the scalar projection of the the Markov process starting with a fixed initial state , such that . Then, uniformly on all large we have,


for some constants and which depend on . Consequently, the process is bounded in probability.


We will use notation . Then
. Clearly, is uniformly bounded by a constant, given our assumptions on the arrival and service processes. We will show that the drift (average increment) of is upper bounded by some when for some .

Consider a fixed and denote . Clearly,


where the inequality follows from the concavity of the function . Substitute the value of from equation (2), concentrate on the numerator of the above expression to obtain,


where is a uniform bound on , and is a uniform bound on which follows from the property that only when is sufficiently small.

To simplify exposition and avoid introducing additional notation, let us assume that for some . (If not, then instead of in this proof we can use , which the orthogonal projection of on .) Combining (7) and (7), we obtain


where the last inequality follows from the definition of Max Weight (see  (2)) and the set (see (6)). If , then at least one of or is greater than or equal to . After some algebraic manipulations we obtain (),

Substituting the above in inequality (7) we see that the drift is upper bounded by

This quantity is uniformly bounded by a negative constant for sufficiently large . Application of Proposition 5 completes the proof.

Now the positive recurrence of follows from Proposition 4. In fact, we will prove the following stronger statement.

Theorem 10

For each , the Markov process is positive Harris recurrent and hence has a unique invariant probability distribution, which will be denoted . Moreover, if is the (random) process state in stationary regime (i.e. it has distribution ),


By Lemma 8 any compact set is petite. Since is also bounded in probability (Lemma 9), by Proposition 4 is positive Harris recurrent.

For a function and fixed , denote . Consider the process starting from an arbitrary fixed initial state . Since the process is positive Harris recurrent, we can apply the ergodic theorem to obtain (note that is a bounded continuous function):


On the other hand,


for some constant , where the second inequality follows from  (9). Combining (17) and (7), we have


and therefore, by monotone convergence theorem,

Lemma 11

Uniformly on all (large) and the distributions of , the distribution of is absolutely continuous w.r.t. Lebesgue measure, with the density upper bounded by .

We omit the proof, which is straightforward, given our assumptions on the distribution of .

Lemma 12

As , in probability.


The proof is by contradiction. Suppose, for some fixed the compact set is such that


Suppose . Then, using the same argument as in the proof of Lemma 8, it is easy to see that for any there exists time , such that . This in turn implies that, with probability at least some , for at least one flow the amount of wasted service . This implies that, for at least one ,

This, however, contradicts the fact that the process is stable for all large .

8 Steady-state queue lengths
deviations from

Let us consider the process , defined as

Lemma 13

The steady-state expected norm is uniformly bounded in .


As we did in the proof of Lemma 9, to simplify exposition, assume that . (If not, in this proof we would consider the projection of on , instead of . Consider Lyapunov function . By Theorem 10, . The conditional drift of in one time step is given by (let , , and so on, to simplify notation)


where depends only on , and the last inequality follows from the definition of MaxWeight and . Now consider the process in stationary regime, and take the expectation of both parts of (8). We obtain,


Recalling that , we see that
is uniformly bounded.

9 Limit of the queue-differential process

We now define a Markov chain , which, in the sense that will be made precise later, is a limit of the (non-Markov) process as .

Define as the orthogonal projection of on the subspace . We call a queue-differential process. (Obviously, under the CRP condition, the queue-differential process is equal to the “queue deviation” process in Section 8. When CRP does not hold, the “deviation” and “differential” processes are defined differently. This will be discussed in Section 10.) Denote by the corresponding projection of the steady-state , and by its distribution.

Markov chain is defined formally as follows. (We will show below that, in fact, the distribution converges to the stationary distribution of .) The state space of is . Assume that at time the "scheduler" chooses decision


which determines the corresponding random amount of service , provided to the "queues" given by vector . After that the (random) amount of new "work" arrives and is added to the "queues." Finally, the new queue lengths vector is transformed into via componentwise multiplication by and orthogonal projection on . (Note that both and may have components of any sign. Also, there is no "wasted service" here.) In summary, the one step evolution is described by


Informally, one can interpret the process as the queue-differential process , when is very large and the queue length vector is both large and has a small angle with . Under these conditions, the only service decisions that can be chosen are such that , and the choice is uniquely determined by .

Let denote the one-step transition function for the Markov process . If , then let