From Local to Global Stability in Stochastic Processing Networks through Quadratic Lyapunov Functions

From Local to Global Stability in Stochastic Processing Networks through Quadratic Lyapunov Functions

A. B. Dieker Antonius B. (Ton) Dieker is with the H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta GA USA 30332. Email: ton.dieker@isye.gatech.edu.    J. Shin Jinwoo Shin is with the Mathematical Sciences Department, IBM T. J. Watson Research, Yorktown Heights, NY USA 10598. Email: jshin@us.ibm.com.
Abstract

We construct a generic, simple, and efficient scheduling policy for stochastic processing networks, and provide a general framework to establish its stability. Our policy is randomized and prioritized: with high probability it prioritizes jobs which have been least routed through the network. We show that the network is globally stable under this policy if there exists an appropriate quadratic ‘local’ Lyapunov function that provides a negative drift with respect to nominal loads at servers. Applying this generic framework, we obtain stability results for our policy in many important examples of stochastic processing networks: open multiclass queueing networks, parallel server networks, networks of input-queued switches, and a variety of wireless network models with interference constraints. Our main novelty is the construction of an appropriate ‘global’ Lyapunov function from quadratic ‘local’ Lyapunov functions, which we believe to be of broader interest.

1 Introduction

The past few decades have witnessed a surge in interest on the design and analysis of scheduling policies for stochastic networks, e.g., [13, 24, 18, 19, 12, 5]. One of the key insights from this body of work is that natural scheduling policies can lead to instability even when each server is nominally underloaded [15, 17, 20]. (There are several notions of stability for stochastic networks, but they intuitively entail that, in some sense, the number of customers in the system does not grow without bounds.) This insight stimulated a search for tools that can characterize the stability regions of scheduling policies, i.e., the exact conditions on the arrival and service rates under which a network is stabilized by a policy.

It is the objective of this paper to study a question of a different kind: is it possible to construct a generic, simple, and efficient scheduling policy for stochastic processing networks, which leads to a (globally) stable network if all servers are (locally) nominally underloaded in some sense? To our knowledge, we are the first to answer this question within the setting of stochastic processing networks, which constitute a large class of stochastic networks capable of modeling a variety of networked systems for communication, manufacturing, and service systems (e.g., [12]). To investigate this question, the key is to determine if and how jobs from different parts of the network should be treated differently when they share the same buffer.

Various existing scheduling policies are ‘throughput optimal’ in the sense that they achieve the largest possible stability region, but these policies suffer from significant drawbacks. They typically obtain the desired stability by framing the contention resolution between buffers as an appropriate global optimization problem. This optimization problem requires central coordination between network entities, and it is computationally hard to solve if the network is large. The resulting policies, such as the max-weight policy [18, 23] and the back-pressure policy [24, 5], are not scalable and cannot cope efficiently with large networks. As a result, these throughput optimal policies do not provide a satisfactory answer to the aforementioned question.

The computational challenges surrounding existing throughput optimal policies motivate the search for easily implementable scheduling policies with provable performance guarantees but not necessarily with the throughput optimality property. This has led to the analysis of simple greedy scheduling policies for a variety of special classes of stochastic processing networks: open multiclass queueing networks [3], input-queued switches [6], and wireless network models [16]. These policies play an important role in this paper, since our most critical assumption roughly requires any local network component to be nominally underloaded under any ‘maximal’ greedy scheduling policy.

Our main contribution is a randomized scheduling policy for stochastic processing networks which only requires coordination within local components (e.g., service stations), and which is computationally attractive since it is a kind of priority policy. With high probability, our policy prioritizes jobs which have been least routed, and we therefore call our policy the -Least Routed First Served (-LRFS) policy. Here is a small number which helps to make the meaning of “high probability” precise.

Our main technical tool is a novel framework to construct a ‘global’ Lyapunov function for a stochastic processing network through appropriate ‘local’ Lyapunov functions. If the local Lyapunov functions yield stability of the corresponding ‘local’ network components, then the global Lyapunov function allows us to conclude that the whole network is stable. A critical feature of our framework is that the Lyapunov functions we work with are quadratic. Through examples, we show that quadratic local Lyapunov functions can readily be found for wide classes of networks. We refer to [8, 14] for other uses of quadratic Lyapunov functions.

Our approach to construct an appropriate ‘global’ Lyapunov function using ‘local’ quadratic Lyapunov functions contrasts with the popular fluid model methodology for establishing stability of stochastic networks [3, 9, 20]. The fluid model framework essentially reduces the question of stochastic stability to a question of a related deterministic (fluid) system. In the case of reentrant lines, our policy reduces to the First Buffer First Served (FBFS) policy, which has been proven to be stable via ‘inductive’ fluid arguments [7]. A similar fluid induction argument can be expected to work for general multiclass networks (modulo some technical arguments), but a fluid induction argument cannot be expected to work in general. A disadvantage compared to fluid models is that we have to keep track of detailed system behavior such as remaining service times, but therein also lies the power of our approach. The generality of our framework presents challenges to the use of fluid methods, and instead we work directly with a global Lyapunov function. The connection with fluid techniques is discussed in more detail in Section 4.3.

Although we believe that our methodology could be of much wider use, we have chosen to work out two special classes of stochastic processing networks in order to describe the implications of our techniques in relatively simple yet powerful settings: parallel server networks (including multiclass queueing networks) and communication network models (including networks of input-queued switches and wireless networks).

Our work is related to a paper by Bramson [2], who shows that open multiclass queueing networks can be stabilized by the Earliest Due Date First Served (EDDFS) policy under the ‘local’ condition that every processing unit is nominally underloaded. However, it is not known whether the EDDFS policy achieves similar results beyond the open multiclass queueing network. In fact, it is a priori unclear how to formulate a ‘local’ condition for general stochastic processing networks; our notion of ‘local’ Lyapunov function plays this role in the present paper.

This paper is organized as follows. Section 2 introduces the class of stochastic processing networks we study in this paper. Section 3 formally introduces the -LRFS policy and presents our main results. Section 4 describes our main idea using a simple network, the Rybko-Stolyar network [20]. We specialize our result to parallel server and communication network models in Section 5. All proofs are given in Section 6.

2 A Class of Stochastic Processing Networks

A stochastic processing network (SPN) consists of a set of buffers, a set of activities and a set of processors. Each buffer has infinite capacity and holds jobs that await service. The SPNs we study in this paper have the feature that activity can only process jobs from a single buffer , and that requires simultaneous possession of a set of processors. Let be the set of activities capable of processing buffer , i.e.,

We say that two buffers and are activity-interchangeable if . We also say that buffers and are processor-independent if and are disjoint.

Network state.

We let be the queue length of buffer at time , i.e., the number of jobs waiting in buffer excluding those being processed. We write for the sum of the remaining service requirements over all jobs in buffer which are currently being processed at time . We use to denote the activity level of activity at time , where we assume that , meaning that each activity is either fully employed or not employed at all. Thus, we say that the network is non-processor-splitting.

Routing.

After departing from a buffer , a job joins buffer with probability and departs from the network with probability (independently of everything else). We write for the matrix of routing probabilities.

Resource allocation.

Each activity decreases the remaining service requirement of the job it is processing at rate if . It is not allowed for an activity to be interrupted before it finishes the service requirement of the job it is working on, i.e., the network is non-preemptive. We do not allow for multiple activities to work on the same job simultaneously. Furthermore, we assume that each processor has unit capacity, i.e.,

We note that this unit capacity assumption is not restrictive since in our non-processor-splitting network of , a single processor with capacity can be replaced by copies with unit capacity, and identical activity structure inherited from the original processor. We further define the service rate vector for a (scheduling) vector through

External arrivals.

There are external arrivals to at least one buffer. For , let be the number of external jobs arriving at buffer during the time interval . We assume that for all (i.e., bounded second moment) and

for constant . We note that we allow for dependencies in the random processes and for two buffers and . Even though it is possible for the to exceed the external arrival rates, it is convenient to interpret as the external arrival rate at buffer . Similarly abusing terminology, we let be the effective arrival rate vector, i.e.,

where . We also say that all routes are bounded (in length) if

Service requirements.

Once a job in buffer is selected for processing by activity , it requires service for a random amount of time. We assume that all service times are independent, and that they are independent of the routing and external arrival processes. We also suppose that the service time distribution only depends on the buffer from which the job is processed. Writing for a generic processing time at buffer , we assume that

for constants . Let denote the nominal load, i.e., . We write for the (expected) immediate workload in buffer , which we define to be

(1)

This is the expected amount of work in the -th buffer given and . We note that our notion of immediate workload is the conditional expectation of a more common definition, and that immediate workload is defined for each buffer (as opposed to resource). We say that the network is synchronized if for all , and ,

That is, in a synchronized network, arrivals and service completions only occur at integer time epochs.

Maximal scheduling policies.

We say that activity is maximal in a (scheduling) vector if there exists a processor such that

i.e., either activity uses each of the processors in under schedule or it cannot be employed without violating the unit capacity constraint for some processor in . Given a non-negative vector , activity is called maximal in with respect to if or is maximal in . An activity is non-maximal if it is not maximal. If all activities are maximal in with respect to , we simply say that is maximal with respect to . We write for the set of maximal scheduling vectors with respect to , i.e.,

Finally, we say that a scheduling policy is maximal if, under the policy, for all .

3 Main Result

This section describes the scheduling policies which play a central role in this paper, and presents our main stability result.

Scheduling policies.

Our policies require that each job maintains a ‘counter’ for the number of times it has been routed so far, where we follow the convention that counters start from . The counter of a job is increased even when a job is routed to a buffer it has previously visited, so the counter of a job could differ from the number of different stations it has visited. We also consider a partition of buffers into components, such that

where any two buffers from different components are processor-independent. One possible choice for the partition is , but, as becomes apparent from the description of our policies below, a finer partition makes our policies more ‘distributed’. It is important to note that we allow for routing between components. We assume that each component for maintains a ‘timer’ at time which decreases at unit rate if .

To describe our maximal scheduling policies, we need the following notation. We write for the queue length information in component , where is the indicator function of the set , i.e.,

The following policy plays a key role throughout this paper.

Definition 3.1 (Least Routed First Served (LRFS) policy).

For each , whenever a new arrival and/or service completion occurs at time in component , execute the following algorithm immediately after all arrivals and service completions have occurred:

  • Find the set of non-maximal activities in with respect to .

  • Find a job with the smallest counter among those in buffers , where ties are broken arbitrarily.

  • Choose an arbitrary activity to process the job identified in step 2, i.e., set .

  • Repeat steps 1–3 until is maximal with respect to .

We next introduce the maximal scheduling policy which is of primary interest in this paper. It uses a small parameter in order to deal with unbounded route lengths.

Definition 3.2 (-Least Routed First Served (-LRFS) policy).

For each , whenever a new arrival and/or service completion occurs at time in component , execute the following algorithm immediately after all arrivals and service completions have occurred:

  • Find a buffer containing a job with the largest counter among those in buffers .

  • Find the set of non-maximal activities in .

  • If and ,

    • With probability , choose an arbitrary activity to process the job identified in step 1, i.e., set . With probability , do nothing.

    • Set .

  • Execute the LRFS policy for component .

We remark that the -LRFS policy is identical to the LRFS policy when either or , i.e., execution of the first three steps is not necessary in these cases. Since we assume that each timer decreases at unit rate, step 3-1 can be executed at most once per component in any time interval of unit length.

The network process.

Write for the queue length of jobs with counter in buffer at time . Let be the remaining service requirement of the job with counter in buffer at time if it is processed by activity , and set if is not processing a job with counter . The network state is described by

Note that does not encode information on the external arrival processes. In particular, is non-Markovian in general. We impose the convention that has right-continuous sample paths. We define a norm on through

We assume that where . In synchronized networks, one has for all under the -LRFS policy. We refer to the process operating under the -LRFS policy as the -LRFS process. If , then we simply refer to this process as the LRFS process.

Network stability.

This paper uses the following notion of stability.

Definition 3.3.

The -LRFS process is called queue-length-stable if

(2)

for any given initial state .

We establish the queue-length-stability by constructing appropriate Lyapunov functions. Under some additional assumptions on the arrival processes and service time distributions, these Lyapunov functions can also be used to establish positive recurrence of the -LRFS process, cf. Condition (A3) in [4]. It is outside of the main scope of the current paper to work out the details. We also remark that our proof can find an explicit finite constant for the right-hand side of (2), but this requires tedious bookkeeping and we therefore do not carry out this analysis.

To show the desired stability, we need the following notion of a ‘local’ Lyapunov function.

Definition 3.4.

We say that is a local Lyapunov function with slack parameter if there exist constants , such that for every pair satisfying ,

(3)

where .

The reason for this nomenclature is that in a ‘local’ network, i.e., a network without routing (), the above inequality for provides the desired negative drift condition in the Foster-Lyapunov criteria [10], which implies network stability. Here , , and can be interpreted as the immediate workload, the external workload arrival rate and the workload processing rate at buffer , respectively. We refer to Sections 4 and 5 for examples of local Lyapunov functions.

Now we are ready to state the main theorem of this paper, which establishes ‘global’ stability using a ‘local’ quadratic Lyapunov function.

Theorem 3.5.

Suppose that there exists a symmetric matrix such that is a local Lyapunov function with slack . Then the -LRFS process is queue-length-stable if one of the following conditions C1 and C2 is satisfied:

  • The network is synchronized and only if buffers and are in the same component.

  • only if buffers and are activity-interchangeable and every buffer has an associated activity with .

Furthermore, if all routes are bounded, then the LRFS process is queue-length-stable if one of the conditions C1 or C2 is satisfied, where

  • only if buffers and are activity-interchangeable.

Theorem 3.5 implies that if any maximal policy is ‘quadratic’ stable in a stochastic processing network without routing under the external load , then -LRFS is stable in a stochastic processing network with routing under nominal load for some small . Section 4 describes the main idea of the proof, and a full proof is presented in Section 6.

We remark that the requirement of activity-interchangeable buffers in Condition C2 can be relaxed slightly. Our proof of Theorem 3.5 also works when the following relaxed condition C3 replaces C2.

  • only if for every , there exists such that , and vice versa (i.e., for every , there exists such that ). Also, every buffer has an associated activity with .

A corresponding condition C3 can also replace C2, where C3 does not require the second part of C3.

4 Proof Ideas for Theorem 3.5

In this section, we describe the main idea in the proof of Theorem 3.5. We first present it in a very special network, the Rybko-Stolyar network [20], which allows us to summarize the main idea of the proof at a high level. We subsequently describe the challenges that have to be overcome to establish our result in the general case, and discuss the feasibility of an approach based on fluid models.

4.1 Rybko-Stolyar Network

This network consists of four activities associated to four different buffers and two processors. The first processor is required for activities 1 and 4, and the second for activities 2 and 3. Each activity decreases the remaining service requirement of the job it is currently processing at unit rate (i.e., ). Customers (or jobs) arrive at the first and third buffers, and traverse the buffers deterministically in the order or . The service time is deterministic and equal to for buffer , and the external arrival processes (at the first and third buffers) are independent Poisson processes with rate 1. The network is given in Figure 1, and a necessary condition for stability is

(4)
Figure 1: Illustration of the Rybko-Stolyar Network. In this diagram, there are four jobs in the system. Jobs are currently being processed and waiting for service are colored black and white, respectively.

The network state can be described by , where is the queue length (i.e., the number of jobs waiting) of buffer at time and is the remaining service time of the job currently being processed in buffer at time . Hence if no job is being processed in buffer . We define a norm through . In this network, it is known [20] that if processor 1 prioritizes buffer 4 and processor 2 prioritizes buffer 2, the network can be unstable even when the necessary condition (4) holds. On the other hand, LRFS prioritizes buffer 1 and 3, so it reduces to the First Buffer First Served policy. This is known to be stable under the necessary stability condition (4). We next derive this stability result using our main idea.

Assuming LRFS and (4), we construct an appropriate Lyapunov function satisfying

(5)

where are some constants. By first taking expectations with respect to the distribution of and then integrating over on both sides of the above inequality, we conclude that

(6)

We now proceed toward constructing the ‘global’ Lyapunov function satisfying (5) based on a ‘local’ quadratic Lyapunov function . To this end, we first discuss how to construct the ‘local’ quadratic Lyapunov function. Consider a single-processor system with two buffers and , deterministic service times given by and , and independent Poisson arrival processes with rate at each buffer (i.e., the total rate is ). Hence, a necessary condition for stability is

Under a maximal (i.e., work-conserving) scheduling policy, the workload at time satisfies

where is defined as the immediate workload at buffer as in (1), is the number of jobs arriving at buffer during the time interval so that and we define

Hence, we have

(7)

where we use that for since we assume deterministic service times. Using this, it follows that for some finite constant ,

(8)

where we use for the first inequality. This shows that is a suitable Lyapunov function for our ‘local’ single-processor system under the necessary stability condition .

This observation on the single-processor system motivates the following quadratic local Lyapunov function for the Rybko-Stolyar network:

(9)

One can easily check that it satisfies (3) with (small) slack under the necessary stability requirements and . We propose the following global Lyapunov function :

where the new parameters and shall be defined explicitly. We remind the reader that our goal is to prove (5).

First, a similar calculation as for the single-processor case in (7) yields that under the LRFS policy,

Hence, as for (8), one can conclude that for some constant ,

(10)

where the precise value of can be different from line to line.

We note that the sum is not a suitable choice for since it does not include and (or and ). To address this issue, we further use

We refer to and as the total workload in buffer and , respectively. Using this notation, one finds that under the LRFS policy,

The above equalities can be used to obtain ‘negative drift terms’ for and , which are missing in (10) and (LABEL:eq2:globalRybkoStolyar). Namely, for some constant , we obtain

(12)

where and . Similarly,

(13)

Observe that there are positive terms and in (12) and (13), respectively. The key idea behind our proof is that the positive terms can be canceled out by appropriately summing (10), (LABEL:eq2:globalRybkoStolyar), (12) and (13). Indeed, we define the desired Lyapunov function as

where we choose . Combining (10), (LABEL:eq2:globalRybkoStolyar), (12) and (13), we conclude that

This completes the proof of (5), and hence the desired stability (6).

4.2 Beyond the Rybko-Stolyar Network

The preceding subsection presents the main idea behind our construction of a ‘global’ Lyapunov function using a ‘local’ Lyapunov function (i.e., from the single-processor system) in the specific example of the Rybko-Stolyar network. The construction of relies on summing terms inductively by exploring certain maximality properties of the LRFS policy at each iteration. In general networks there are several difficulties which do not arise in the Rybko-Stolyar network, and this section discusses the ideas and arguments needed to overcome them.

A first challenge we have overcome arises in networks with unbounded route lengths (i.e., ). In that case, the above inductive procedure does not terminate. For this reason, we propose a variant of the LRFS policy, the -LRFS policy, which occasionally processes a job with the largest counter. Intuitively speaking, this additional mechanism in -LRFS can control the jobs with large counters, whereas LRFS cannot.

A second challenge we have surmounted is that the construction of in the Rybko-Stolyar network starts from a simple local Lyapunov function in a single-server system, but it is not clear whether similar arguments go through for general local Lyapunov functions and stochastic processing networks. We require Condition C2 to resolve this issue. It is readily seen that the local Lyapunov function (9) used in the Rybko-Stolyar network satisfies this condition. The condition can be relaxed under some additional conditions on the arrival processes and service time distributions. For example, in synchronized networks, Condition C1 can be used instead of C2.

A further challenge in the general case relates to the definition of . In the Rybko-Stolyar network, it is the sum of workloads along a path of buffers, with as the last buffer. This definition only applies to networks with deterministic routing. In the general case we use several notions of total workload. To allow for stochastic routing, we construct a new process from with deterministic routing. This process is essentially identical to , but we enlarge the state space to incorporate routing information. We construct a Lyapunov function for , which we use to establish the stability of and hence the stability of .

In summary, we construct the Lyapunov function for general networks as the sum of three parts:

The specific notation used here is not important; we refer readers to Section 6 for the definitions used. To prove stability, we need to argue that this function satisfies a so-called negative-drift condition. The first term, i.e., the finite sum, comes from the inductive construction under LRFS, appropriately truncated. For the Rybko-Stolyar network, this is the only part we need. The first part produces the desired negative drift for jobs with low counters, but it gives a positive drift in terms of remaining service requirements as a by-product (albeit not in the Rybko-Stolyar network under the assumptions of the preceding subsection). The second term in our Lyapunov function () has a negative drift and compensates the positive drift incurred by the first term. The third term in our Lyapunov function () controls the high-counter jobs under the mechnism which is present in the -LRFS policy but not in the LRFS policy (step 3 in Definition 3.2). This additional mechanism allows us to establish a negative drift for the last term. By appropriately weighing each of the three terms, we derive the desired negative drift condition for the Lyapunov function .

4.3 Connection with Fluid Models

As mentioned in the previous subsection, our approach relies on an inductive argument based on job counters. For the Rybko-Stolyar network and more generally for multiclass networks, fluid models can be used to give relatively simple proofs of our results. Thus, a more detailed discussion on the connection with fluid models together with its pros and cons is warranted.

The fluid approach consists of two main steps. In the first step, by scaling time and space, one proves convergence of the queueing process to the solution of a system of deterministic equations known as the fluid model. In the second step, one proves that this fluid model is stable, i.e., that it eventually reaches the origin. Stability of the fluid model can be established through the construction of a Lyapunov function for the fluid model, or in some cases one can obtain fluid stability through direct methods such as induction. Once fluid stability has been established, one can apply general theorems to deduce that the stochastic model is also stable (in a certain sense), see for instance [3].

It might be possible to establish existence of a fluid model and to prove that the stochastic model converges to the fluid model in the setting of the present paper, and it can be expected that our ‘global’ Lyapunov function should work to prove fluid stability. Comparing our approach with this proof strategy, a disadvantage of the fluid model is that one needs to establish convergence to the fluid model, while a disadvantage of our approach is that we have to keep track of detailed state information such as residual service times.

Another possible approach to establish fluid stability is to use an inductive argument, which may seem particularly attractive given our construction of job counters and the suitability of a induction argument in existing work on fluid models [7]. However, this approach has inherent challenges. The base step in an inductive approach could use the ‘local’ Lyapunov function to argue that the fluid level of jobs with counter 1 vanishes after some finite time . It would then use to argue that the fluid level of jobs with counter 2 vanishes after some finite time , and so forth. To carry out this argument, one has to show that satisfies a certain negative-drift condition under the assumption that high-priority counter 1 jobs vanish on a fluid scale. The latter only yields a guarantee on the ‘average’ or ‘long-run’ behavior of the jobs with counter 1, whereas one needs ‘short-term’ network state information to establish the negative-drift condition for jobs with counter 2. Indeed, under our scheduling policy, jobs with counter 1 (even when vanishing on a fluid scale) can significantly influence the dynamics of jobs with counter 2 depending on the complexity of the network. Therefore, the base of the induction approach is too weak to be used in the induction step for general networks since one needs more detailed information than the time-average given by the fluid approach.

In special cases such as multiclass networks, one may not need quadratic Lyapunov functions and it may be possible to establish the stability of our counter-based policy using fluid induction without quadratic Lyapunov functions. However, in general (e.g., for networks of switches), we need quadratic Lyapunov functions since they are the only available tool to establish stability for single-hop networks.

5 Examples

In this section, we provide applications of Theorem 3.5 to various special stochastic processing networks. We consider parallel server networks (including multiclass queueing networks) in Section 5.1 and communication networks (including wireless networks and networks of input-queued switches) in Section 5.2. They are examples of non-synchronized and synchronized networks, respectively. In all of these important examples, suitable local Lyapunov functions are easy to find.

5.1 Open Multiclass Queueing Networks and Parallel Server Networks

In this section, we consider special stochastic processing networks known as parallel server networks. These networks are characterized by the following assumption.

  • Each activity is processed by exactly one processor and processes exactly one buffer, i.e.,

Figure 2 illustrates the relations between buffers, activities and processors in parallel server networks. Our notion of ‘parallel server network’ generalizes the well-studied parallel server systems [11] by adding stochastic routing dynamics between buffers. It also includes open multiclass queueing networks [3] as a special case, which additionally require

(14)

In open multiclass queueing networks, buffers and activities are in one-to-one correspondence and they are referred to as classes. The Rybko-Stolyar in Section 4 is an instance of open multiclass queueing networks.

Figure 2: Two examples of parallel server networks. They do not satisfy Assumption A2. The leftmost diagram illustrates a parallel server network with five buffers , three processors and eight activities . Once a job in buffer completes its service requirement, it joins buffer (i.e., ), respectively. Once a job in buffer completes its service requirement, it leaves the network. The rightmost diagram illustrates a parallel server network consisting of three ‘local’ parallel server systems where jobs are routed between local systems.

A parallel server network naturally defines a bipartite graph such that each activity in defines an edge between buffers and processors . Requirement (14) of open multiclass queueing networks imposes the additional restriction that each vertex in has degree one. We further consider the following strengthening of Assumption A1.

  • is a union of disjoint complete bipartite graphs , i.e.,

This assumption implies that two buffers in the same component are activity-interchangeable, even though they may differ with respect to routing, external arrivals or service requirements. One can easily check that open multiclass queueing networks always satisfy this assumption, while the parallel server networks in Figure 2 do not. Assumption A2 is useful because it enables us to establish a necessary condition for stability and it allows us to find a suitable local Lyapunov function satisfying Condition C2 of Theorem 3.5. However, Theorem 3.5 is applicable to general networks as long as one can find a ‘good’ local Lyapunov function satisfying Condition C2. Figure 3 gives examples of parallel server networks satisfying Assumption A2.

Figure 3: Two examples of parallel server networks satisfying Assumption A2. The rightmost diagram illustrates the Rybko-Stolyar network described in Section 4.
Necessary condition for stability.

We now aim to obtain a necessary condition for stability of a parallel server network. Under Assumption A2, a necessary condition to stabilize the network is that for every ,

(15)

It is clear that the above condition is required for stability since and describe the total nominal load and the maximum processing rate, respectively, at the local component .

Local Lyapunov function.

As in Section 4, the single-processor example is the main building block. We define the local Lyapunov function as

(16)

We now show that this function satisfies (3) with some slack as long as the necessary condition (15) for stability is satisfied. For given vectors and , maximality implies that, on writing ,

where we use Assumption A2 and we recall that if , and otherwise. Thus, we have

where is some constant and we define Therefore, is a local Lyapunov function with slack for

where the right-hand side is positive if (15) holds.

Stability of LRFS policies.

We now formulate the main results of this paper for open multiclass networks and parallel server networks. Under Assumption A2, the local Lyapunov function (16) satisfies Condition C2 of Theorem 3.5. Therefore, we obtain the following proposition as a corollary. We remind the reader that open multiclass queueing networks are special instances of parallel server networks, and that Assumption A2 automatically holds for these networks.

Proposition 5.1.

If a stochastic processing network satisfies Assumption A2 with for all , then

  • The -LRFS process is queue-length-stable for any

  • The LRFS process is queue-length-stable if all routes are bounded in length.

We note that the -LRFS policy admits a simpler description in a stochastic processing network satisfying Assumption A2, since a job can be processed by any processor in the partition, i.e., in Definition 3.2 is non-empty whenever a processor is idle and capable of processing a job. Indeed, the -LRFS policy reduces to the following work-conserving randomized priority policy: whenever a processor is idle at time and there are jobs capable of being processed by ,

  • Process a job with the smallest counter with probability , otherwise process a job with the largest counter.

  • Set if ,

where is the (local) component of buffers associated with processor . Proposition 5.1 implies that the -LRFS policy can achieve ‘almost’ the full capacity region (15) by choosing a small .

Our proof of Proposition 5.1 provides a different proof for some results that have been established using fluid model techniques. For example, in reentrant lines, the -LRFS policy for is identical to the well-known First Buffer First Served (FBFS) policy. Our proposition implies that the FBFS policy is throughput optimal in all reentrant lines, which has been proved originally in [7].

5.2 Communication Networks

We now consider examples of synchronized stochastic processing networks described in Section 2, i.e., , for all . In particular, we consider the following additional assumption on synchronized stochastic processing networks.

  • Each buffer has exactly one associated activity, i.e.,

    Hence, we write .

We again remark that Assumption B1 facilitates a suitable local Lyapunov function for Theorem 3.5. However, even if Assumption B1 does not hold, Theorem 3.5 is applicable to synchronized stochastic processing networks as long as one can find a ‘good’ local Lyapunov function. Synchronized stochastic processing networks satisfying Assumption B1 include various communication network models of unit-sized packets: networks of input-queued switches [18, 6], wireless network models with primary interference constraints [21] and independent-set interference constraints [22]. We refer the corresponding references for detailed descriptions of the network models. As a concrete example, we write out the details of the wireless network model with primary interference constraints.

Wireless networks with primary interference constraints.

Consider a network of nodes represented by and a set of directed paths . Unit-size packets arrive at the ingress node of each path as per an exogenous arrival process. Assume that the network is synchronized, i.e., each packet departs from a node at time and arrives at the next node on its route at time . The primary interference constraint means that each node can either send or receive (it cannot do both) one packet at the time. A scheduling policy (or algorithm) decides which packets transmit at each (discrete) time instance. Figure 4 illustrates a wireless network of four nodes with primary interference constraints.

Figure 4: Example of a wireless network with primary interference constraints with four nodes and five paths . There is a buffer on each (directed) edge on each path, i.e., six buffers in total. Unit-size packets arrive at the ingress buffer (i.e., the first node) of each path. In the leftmost diagram, two links and are transmitting (unit-sized) packets. Once a packet is transmitted, it leaves the network if it arrives at the destination node (i.e., the last node on its path). The rightmost diagram illustrates the corresponding stochastic processing network, where the relation between buffers, activities and processors induces a hypergraph (each buffer requires two processors)

.

Necessary condition for stability.

As in the parallel server networks, one can obtain the following necessary condition to stabilize a stochastic processing network satisfying Assumption B1: for all ,

(17)

This is because Assumption B1 implies that each buffer has at most one associated activity, and the processing rate is .

Local Lyapunov function.

We consider the following local Lyapunov function:

We now proceed toward proving that condition (3) holds. For given vectors and , maximality implies that