General notions of indexability for queueing control and asset management
Abstract
We develop appropriately generalized notions of indexability for problems of dynamic resource allocation where the resource concerned may be assigned more flexibility than is allowed, for example, in classical multiarmed bandits. Most especially we have in mind the allocation of a divisible resource (manpower, money, equipment) to a collection of objects (projects) requiring it in cases where its overconcentration would usually be far from optimal. The resulting project indices are functions of both a resource level and a state. They have a simple interpretation as a fair charge for increasing the resource available to the project from the specified resource level when in the specified state. We illustrate ideas by reference to two model classes which are of independent interest. In the first, a pool of servers is assigned dynamically to a collection of service teams, each of which mans a service station. We demonstrate indexability under a natural assumption that the service rate delivered is increasing and concave in the team size. The second model class is a generalization of the spinning plates model for the optimal deployment of a divisible investment resource to a collection of reward generating assets. Asset indexability is established under appropriately drawn laws of diminishing returns for resource deployment. For both model classes numerical studies provide evidence that the proposed greedy index heuristic performs strongly.
10.1214/10AAP705 \volume21 \issue3 2011 \firstpage876 \lastpage907 \newproclaimdefnDefinition \newproclaimassumpAssumption \newproclaimexampleExample \newproclaimRemarkRemark
General notions of indexability
A]\fnmsKevin D. \snmGlazebrook\corref\thanksreft1label=e1]k.glazebrook@lancaster.ac.uklabel=u1,url]http://www.lums.lancs.ac.uk/profiles/kevinglazebrook/, B]\fnmsDavid J. \snmHodge\thanksreft1label=e2]david.hodge@nottingham.ac.uk and C]\fnmsChris \snmKirkbride\thanksreft2label=e3]c.kirkbride@lancaster.ac.uk
t1Supported by EPSRC Grant EP/E049265/01. \thankstextt2Supported by an RCUK Fellowship.
class=AMS] \kwd[Primary ]68M20 \kwd[; secondary ]90B22 \kwd90B36. Asset management \kwddynamic programming \kwddynamic resource allocation \kwdfull indexability \kwdindex policy \kwdLagrangian relaxation \kwdmonotone policy \kwdqueueing control.
1 Introduction
A notable, now classical, contribution to the theory of dynamic resource allocation was the elucidation by Gittins git79 (), git89 () of indexbased solutions to a large family of multiarmed bandit problems (MABs). This is a class of models concerned with the sequential allocation of effort, to be thought of as a single indivisible resource, to a collection of stochastic reward generating projects (or bandits as they are sometimes called). Gittins demonstrated that optimal project choices are those of highest index. There is no doubt that the idea that strongly performing policies are determined by simple, interpretable calibrations (i.e., indices) of decision options is an attractive and powerful one and offers crucial computational benefits. There is now substantial literature describing extensions to and reformulations of Gittins’ result. Some key contributions are cited in the recent survey of Mahajan and Teneketzis mah07 ().
Whittle whi88 () introduced a class of restless bandit problems (RBPs) as a means of addressing a critical limitation of Gittins’ MABs, namely, that projects should remain frozen while not in receipt of effort. In RBPs, projects may change state while active or passive though according to different dynamics. However, this generalization is bought at great cost. In contrast to MABs, RBPs are almost certainly intractable having been shown to be PSPACEhard by Papadimitriou and Tsitsiklis pap99 (). Whittle whi88 () proposed an index heuristic for those RBPs which pass an indexability test. This heuristic reduces to Gittins’ index policy in the MAB case. Whittle’s index emerges from a Lagrangian relaxation of the original problem and has an interpretation as a fair charge for the allocation of effort to a particular project in a particular state. Weber and Weiss web90 () established a form of asymptotic optimality for Whittle’s heuristic under given conditions. More recently, several studies have demonstrated the power of Whittle’s approach in a range of application areas. These include the dynamic routing of customers for service arg08 (), gla07 (), machine maintenance gla05 (), asset management gla06 () and inventory routing arc08 ().
The above classical models and associated theory are undeniably powerful when applicable. However, the scope of their applicability is heavily constrained by the very simple view the models take of the resource to be allocated. As indicated above, in Gittins’ MAB model a single indivisible resource is allocated wholly and exclusively to a single project at each decision epoch. In Whittle’s RBP formulation, parallel server versions of this are allowed. Many applications, however, call for the allocation of a divisible resource (e.g., money, manpower or equipment) in situations where its over concentration would usually be far from optimal. This is the case, for example, in the problem concerning the planning of new product pharmaceutical research which was discussed by Gittins git89 () and which provided practical motivation for his pioneering contribution. This paper records the first outcomes of a major research program whose goal is to develop a usable and effective index theory for such problems.
In Section 2 we present a general model for dynamic resource allocation. Both Gittins’ MABs and Whittle’s RBPs may be recovered as special cases as may the recent model of gla08 () which extends Gittins’ MABs such that bandit activation consumes amounts of the available resource which may vary by bandit and state. Our general model allows for resource to be applied at a range of levels to each constituent project, subject to some overall constraint on the total rate at which resource is available. A notion of (full) indexability which generalizes that of Whittle for RBPs is developed. Any project which is fully indexable has an index which is a function both of a given resource level () and of a given state (). The index may be understood as a fair charge for raising the project’s resource level above when in state . We discuss how to use such indices to develop heuristics for dynamic resource allocation when all projects are fully indexable.
In Sections 3 and 4 we use the ideas and methods of Section 2 to construct index heuristics for the dynamic allocation of a divisible resource in the context of two model classes which are of considerable interest in their own right. In Section 3 we deploy the framework of Section 2 to develop heuristics for the dynamic allocation of a pool of servers to service stations (or customer classes) at which queues may form. This model is able to capture situations where, for example, each of customer classes is served by a dedicated team of specialists. Additionally, higher level generalist servers are available for deployment across the customer classes to supplement the specialist teams as demand dictates. Deployment of generalists to customer class enhances the local specialist team which then delivers service collectively at rate . An assumption that the service rate functions are increasing and concave reflects a law of diminishing returns as service teams grow. The problem of determining how the pool of generalists should be deployed across the customer classes in response to queue length information is formulated as a dynamic resource allocation problem of the kind discussed in Section 2. The analysis which establishes full indexability in Section 3 markedly adds to the queueing control literature in establishing monotonicity with respect to service costs of optimal policies for a derived problem involving a single queue. An algorithm is given for the computation of indices. A numerical study provides evidence that a greedy index heuristic for allocating the common service pool is close to optimal throughout a numerical study featuring nearly 10,000 two station problems.
The model class studied in Section 4 generalizes the socalled spinning plates model discussed by Glazebrook, Kirkbride and RuizHernandez gla06 (). It is a flexible finite state model class in which a divisible investment resource is available to drive improvements to the (reward) performance of reward generating assets, which in the absence of any such resource deployment will tend to deteriorate. Positive investment both arrests an asset’s tendency to deteriorate and enhances asset performance by enabling movement of the asset state toward those in which its reward generating performance will be stronger. Full indexability for assets is established under laws of diminishing returns as asset investment levels grow. This considerably extends the work of Glazebrook, Kirkbride and RuizHernandez gla06 (). A numerical study which features 14,000 two asset problems testifies to the strong performance of the greedy index heuristic in comparison to optimum and to competitor policies. Conclusions and proposals for further work are discussed in Section 5.
2 A model for dynamic resource allocation
We propose a semiMarkov decision process (SMDP) formulation of the problem of dynamically allocating a resource to a collection of stochastic projects. This formulation includes Gittins’ MABs and Whittle’s RBPs as special cases. In our SMDP project is characterized by its (finite or countable) state space , its highest activation level , cost rate function , resource consumption function and Markov transition law . The model is in continuous time. We use for generic states of project and for generic states of the process. In the SMDP an action must be taken at time and after each (state) transition of the process. This specifies the resource level to be applied to project . The choice indicates that resource at a minimal level (usually none) is to be applied to ( is passive), while the choice indicates a maximal resource allocation. Resource level applied to project when in state leads to a consumption of resource at rate , with increasing . In the major examples discussed in the upcoming sections we will have and the resource level is identified with the resource consumed. When resource level is applied to project when in state , it incurs costs at rate . Both cost and resource consumption rates are additive over projects. It will be convenient to write and . The set of admissible actions in process state is given by where is the rate at which resource is available to the system, assumed constant over time. We suppose that . An admissible policy is a rule for taking admissible actions.
Should action be taken when the system is in state , the system will remain in state for an amount of time which is exponentially distributed with rate
The transition following will be from state to state within project with probability
Hence the projects evolve independently, given the choice of action, with yielding transition rates for project . The goal of analysis is the determination of a policy for resource allocation (a rule for taking admissible actions at all decision epochs) which minimizes the average cost per unit time incurred over an infinite horizon.
To develop ideas and notation we use for the set of deterministic, stationary, Markov (DSM) and admissible policies determined by functions with domain which satisfy . Fix . We shall also use for the system state evolving over time and for the corresponding stochastic process of admissible actions taken by . We write
(1) 
for the average cost per unit time incurred under policy over an infinite horizon from initial state . In (1) denotes an expectation taken over realizations of the system evolving under from initial state . We shall assume the existence of a policy such that and write for the minimized cost rate, namely,
(2) 
We shall use the term optimal to denote a policy (assumed to exist) which achieves the infimum in (2) uniformly over initial states. This applies both to the problem in (2) and also to the derived optimization problems we shall discuss later in the account. In the model classes featured in Sections 3 and 4 it will be the case that the average costs in (1) and (2) are independent of . Henceforth, for simplicity, we shall suppress dependence on the initial state in the notation.
We shall use
(3) 
for the average rate at which resource is consumed under policy . We also write
(4) 
to give a disaggregation of the cost and resource consumption rates into the contributions from individual projects.
In principle, the tools of dynamic programming (DP) are available to determine optimal policies. See, for example, put94 (). However, direct application of DP is computationally infeasible other than for small problems (crucially, small ). Hence, our primary interest lies in the development of heuristic policies which are close to cost minimizing. To this end we relax the optimization problem in (2) by extending the class of policies from the DSM admissible class to those DSM policies which consume resource at an average rate which is no greater than . Hence, we write
(5) 
where in (5), the infimum is taken over the collection of DSM policies satisfying
(6) 
We now relax the problem again by further extending the class of policies and by incorporating the constraint (6) into the objective (5) in a Lagrangian fashion. We write
(7) 
In (7) the infimum is taken over the class of DSM policies which allow, for each project , a free choice of action from the set at each decision epoch. It is clear that
However, the Lagrangian relaxation of our optimization problem expressed by (7) admits, on account both of the policy class involved and the nature of the objective, an additive projectbased decomposition. Expressed differently, an optimal policy for (7) operates optimal policies for the individual projects in parallel. In an obvious notation we write
(8) 
where
(9) 
The optimization problem in (9) concerns project alone. We denote it . In its objective the Lagrange multiplier plays the role of a charge per unit of time and per unit of resource consumed. An optimal policy for minimizes an aggregate rate of project costs incurred and charges levied for resource consumed. Further, the policy which applies to each project , achieves in (7) and hence provides a solution to the above Lagrangian relaxation. Note that in what follows we shall use the notation to denote the action (resource consumption levels) chosen by DSM policies in states , respectively.
In order to develop natural project calibrations (or indices) which can facilitate the construction of effective heuristics for our original problem (2), we seek optimal policies for the problems which are structured as in Definition 2 below. We first require additional notation. Write
(10) 
for the set of project states for which policy chooses to consume resource at level or below. {defn}[(Full indexability)] Project is fully indexable if there exists a family of DSM policies such that is optimal for and is nondecreasing in for each .
To summarize the requirements of Definition 2, a project will be fully indexable if the problem has an optimal policy which, for any given state, consumes an amount of resource which is decreasing in the resource charge . Full indexability enables a calibration of the individual projects as described in Definition 2. {defn}[(Project indices)] If project is fully indexable as in Definition 2, a corresponding index function is given by
(11) 
The index can be thought of as a fair charge at project for raising the resource level from to in state . Were a resource charge less than to be levied, the consumption of the additional resource would be preferable, while if the resource charge were to be in excess of the index, that would not be the case. We shall adopt the convention that the index function is extended to where .
The following is a simple consequence of the above definitions. Its proof is omitted.
Lemma 1
If project is fully indexable, the index is decreasing in , for fixed .
Hence, under full indexability, the fair charge for raising the resource level for project in any state from to is decreasing in the resource level .
We now return to consideration of the Lagrangian relaxation in (7) and (8) and suppose that all projects are fully indexable with families of optimal policies
structured as in Definition 2. Under full indexability, all of these policies have a structure describable in terms of the index functions . Theorem 2 now follows.
Theorem 2
Suppose that all projects are fully indexable with extended index functions . The policy such that
(12)  
achieves .
According to Theorem 2, policy constructs actions (allocations of resource) in each system state by accumulating resource at each project until the fair charge for adding further resource drops below the prevailing charge . This is strongly suggestive of how effective, interpretable heuristics for our original dynamic resource allocation problem based on the above indices (fair charges) may be constructed when all projects are fully indexable. A natural greedy index heuristic constructs actions in every system state by increasing resource consumption levels in decreasing order of the above station indices until the point is reached when the resource constraint is violated by additional allocation of resource.
Formally the greedy index heuristic is structured as follows:
Greedy index heuristic
In state the greedy index heuristic constructs an action (allocation of resource) as follows:
Step 1. The initial allocation is . The current allocation is with .
Step 2. Choose any satisfying
Step 3. If denotes a vector whose th component is 1 with zeroes elsewhere, the new deployment is if
(13) 
If there is strict inequality in (13), return to Step 1 and repeat. Otherwise, stop and declare to be the chosen action in . If
stop and declare to be the chosen action in . {Remark*} We shall use Figure 1 to illustrate the construction of actions by both the policy (as in Theorem 2) and the greedy index heuristic in a simple problem with in which both projects are fully indexable. Section 3 discusses a class of models in which and where all projects have state space and a common maximum resource level, say, which is equal to , the total rate at which resource is available. Suppose now that in such a model and that the system state is . Figure 1 indicates values of the appropriate project indices and for the range together with the value of the Lagrange multiplier .
The policy will make allocations of resource supported by those index values which are above . Hence from Figure 1, the choice of action in state will be . This is an inadmissible action for the original problem since the total resource rate allocated (6) exceeds that available (5). The greedy heuristic makes allocations of resource supported by the five largest index values (indicated by in Figure 1). Plainly, the action taken by the index heuristic is . As the system state evolves under the operation of either policy, the index values change as do the implied actions.
The major challenge to implementation of the above program for heuristic construction is the identification of optimal policies for the problems
which meet the requirements of Definition 2. In Sections 3 and 4 we are able to achieve this in the context of two model classes for which we are able to establish an appropriate form of full indexability. For the Section 3 problem, we also give an algorithm for index computation. For both model classes we proceed to assess the performance of the greedy index heuristic in extensive numerical studies.
We recover Whittle’s RBPs whi88 () by making the choices , and in the above. Hence there are just two modes of activation (active, passive) of each project, with projects to be made active at each epoch. For this special case the above greedy index heuristic is precisely the index heuristic proposed by Whittle. If we make the further choice and impose the requirement that projects can only change state under the active action, we then recover Gittins’ MAB git79 () and its associated (optimal) index policy.
3 The optimal allocation of a pool of servers
We illustrate the above ideas by considering a setup in which service is provided at service stations. These stations could represent distinct geographical locations or facilities dedicated to the service of a particular class of customer. Customers arrive at the stations in independent Poisson streams, with the rate for station . A pool of servers is available to support service at the stations. Should servers from the pool be allocated to station at any point, the resulting exponential service rate is . Note that there may be a local team of servers permanently stationed at (i.e., in addition to any allocated from the pool) in which case we will have . Please note also that we shall suppose that all servers (whether permanently based at a location or allocated there from the common pool) offer service as a team, namely, that they act in concert as a single server. The goal of analysis is the determination of a policy for deploying the common service pool in response to queue length information to minimize some linear measure of holding cost rate for the system incurred over an infinite horizon.
More formally, the system state at time is where is the number of customers at service station (including any in service) at . We shall on occasion refer to as the head count at station at time . This system state is observed continuously. The decision epochs for the system are time zero and the times at which the system state changes. At each decision epoch, some action is taken, where , and . Action denotes the deployment of servers from the central pool to service station . Should action be taken in state then an exponentially distributed amount of time with rate
(14) 
will elapse before a change of state. In (14) is an indicator function. The next state of the system will be with probability and will be with probability .
A DSM admissible policy is given by a map , where
(15) 
and is a rule for choosing admissible actions as a function of the current system state. The cost associated with policy is given by
(16) 
where the are positive weights (holding cost rates) and is the time average number of customers at station under policy . The optimization problem of interest is given by
(17) 
where in (17) the infimum is over the set of DSM admissible policies.
We pause to note that this problem does indeed belong to the class of dynamic resource allocation problems described in the preceding section. We make the choices , with the transition rates satisfying
for all choices of and . They are otherwise zero. One thing which is special about this problem is that it is possible to utilize all of the resource which is on offer all of the time. It is plainly optimal to do so. Hence, in (15), we can restrict admissible actions to those which deploy all servers from the pool.
Before proceeding to develop appropriate notions of full indexability/indices, we describe assumptions we shall make about our service rate functions . In Assumption 3 we use the notation
There exist functions which are strictly increasing, twice differentiable and strictly concave, satisfying
(18) 
and
(19) 
From (18) the functions , are smooth extrapolations of the service rates on the integers in the range . The properties of these functions reflect the fact that, while an increase in the size of the team at a station results in a higher service rate, the marginal benefit of adding an additional member diminishes as the team size grows. Requirement (19) guarantees the existence of stable policies under which all queue lengths remain finite. {Remark*} It is the assumption of strict concavity of the service rate functions at each station which stimulates an active approach to the distribution of the pool of servers around the stations and which makes this an interesting problem. Had we assumed, for example, that the service rates were all convex in the team size, then sob82 () shows that in an optimal policy the service pool would always be allocated en bloc and we are driven back to the “single server” world of the simple bandit models. This result is intuitively obvious, as observed by Richard Serfozo to Sobel: “the fastest rate is also the cheapest.” Indeed, the resulting service control problem has a wellknown solution in the form of the socalled rule. (See coxsmith61 ().)
We are able to develop a Lagrangian relaxation of the problem in (16) and (17) as in the preceding section. As in the analysis of Section 2 up to (8), such a relaxation yields optimization problems , one for each station, which here take the form
(20) 
where in (20), the infimum is over the class of DSM policies which can deploy any number of servers (up to ) at station at each epoch, is the time average head count and the time average number of servers deployed at under policy . The optimization problem in (20) concerns station alone and seeks to choose, at each station decision epoch and in response to queue length information for station , the number of servers (from the available) to be deployed there. The goal is to make such choices to minimize costs which are an aggregate of those incurred through customers waiting and charges imposed for the provision of service . Note that Lagrange multiplier here has an economic interpretation as the charge imposed per server per unit of time.
We now wish to develop index heuristics for our service allocation problem by developing station indices of the form described in the preceding section. These flow from the property of full indexability defined with respect to solutions to the problems , and described in Definition 2. However, full indexability is a property of individual stations and hence we now focus on a single station and drop the station identifier until further notice. For clarity, the single station problem is formulated as an SMDP as follows:

The state of the system at time is , the number of customers (head count) at the station. New customers arrive at the station according to a Poisson process of rate .

Decision epochs occur at time 0 and whenever there is a change of state. At each such epoch an action from the set is chosen. Should action be chosen at time at which point then costs will be incurred from at rate and the first event following will occur at time where . With probabilities and the event will be, respectively, an arrival or a service completion.

The goal of analysis will be the determination of a stationary policy to minimize the average cost rate incurred over an infinite horizon. Trivially, optimal policies offer no service when the system is empty .
The quest for full indexability is greatly simplified in this case by the existence of optimal policies for for which the choice of number of servers is increasing in the current head count. We call such policies monotone. This conclusion follows from Theorem 4 in Stidham and Weber stidweb89 (), which applies to a queueing system with state space and Poisson arrivals with an objective which combines a holding cost which is both increasing in the state and unbounded, with action costs which are nonnegative and increasing in the resource level. All of these requirements hold in . Stidham and Weber’s analysis first considers the problem of choosing a policy to minimize the expected cost incurred in moving the system from a general initial state to the empty state (their Theorem 2) and then deploys arguments from renewal theory to demonstrate that such a policy will also minimize long run average costs (their Section 1.3). We state our conclusion as Proposition 3.
Proposition 3 ((Stidham and Weber))
There exists a monotone policy which is optimal for .
The problem of establishing monotonicity with respect to queue size of optimal policies for service control problems for queues with Poisson input is not new. In addition to Stidham and Weber stidweb89 (), see crab72 (), doshi78 (), gall79 (), geohar01 (), mit73 (). While such monotonicity is helpful in establishing full indexability and in the subsequent computation of index functions, it is not the key to proving the latter. This is rather the demonstration (to which we now proceed in Section 3.1) that optimal policies for are monotone in . Proving this significantly extends the literature on service control problems for queues.
3.1 Stations are fully indexable
In light of Proposition 3 we can recast and simplify the requirements of full indexability expressed in Definition 2. Let be an optimal policy for which is monotone. It follows that for all choices of and ,
for some . We now have the following: {defn}[(Full indexability)] The station will be fully indexable if there exists a family of DSM policies for which (i) is monotone and optimal for and (ii) the corresponding is increasing in .
To summarize the requirements of Definition 3.1, a station will be fully indexable if the service charge problem has a monotone optimal policy for which the number of servers deployed is decreasing in the service charge for any given head count. Full indexability enables a calibration of the individual stations as described in Definition 3.1. {defn}[(Station indices)] If the station is fully indexable, the corresponding index function is given by
(21) 
Lemma 4
If the station is fully indexable, the index is (i) decreasing in for fixed and (ii) increasing in for fixed .
Please note that optimal policies for will be unchanged if all cost rates (both holding costs and service charges) are divided by throughout. When we do that, we see that increasing is equivalent to decreasing the holding cost rate in problems for which the service charge rate is fixed. This being so, we develop the following convenient reformulation of the definition of full indexability above: refer to the problem obtained by setting in the above [namely ] as to emphasize dependence on the holding cost parameter . Hence, is the problem given by
From Proposition 3 we are able to assert the existence of optimal policies for which are monotone. The following is trivially equivalent to Definition 3.1 above. {defn}[(Full indexability—alternative definition)] The station will be fully indexable if there exists a family of DSM policies such that, (i) is optimal for ; (ii) each is monotone with
where is deceasing in .
To summarize, to achieve full indexability, instead of requiring (according to Definition 3.1) that the optimal service level decreases with the service charge (for a fixed value of the holding cost rate ), we now equivalently require it to increase with the holding cost rate (for fixed service charge ). This reformulation of full indexability which focuses attention on the holding cost element of the objective yields a more accessible account.
We begin this part of our analysis by noting that it is easy to establish that any optimal policy for must be such that . It follows that the head count process is ergodic under its operation. We uniformize station evolution by rescaling time such that
Under this uniformization, the DP optimality equations for the problem are as follows:
where the minimum in (3.1) is over the range . Note that in (3.1) the quantity is the minimized cost rate for with the corresponding bias function, where . If we write for the minimum total cost incurred in during when , then we have .
Action is optimal for in state if and only if it achieves the minimum in (3.1). To proceed further, we write , and . Hence (3.1) now becomes
(23) 
We note in passing that it is trivial to deduce from the inductive specification of given by the optimality equations, that the quantities are well defined, including in the event that there are several optimal policies for . The following is an immediate consequence of (23).
Lemma 5
Please note that if a policy is such that the inequalities in (5) are all strict then it is uniquely optimal and so must be monotone by Proposition 3. Should the lefthand inequality be satisfied as an equation for some with , then both and are optimal choices of action in state . To develop the analysis further we need information regarding the quantities when viewed as functions of .
Lemma 6
The function is continuous .
It is trivial to establish that the average cost rate is continuous in . Observe from (23) that
and hence is continuous. From (23) we also note that it is straightforward to establish that, if is continuous, then so must be . The result follows by an induction argument.
Now use to denote any DSM policy which is optimal for . We use for the expected time until the system is first emptied under given that . We also use for the expected cost incurred under from time 0 when until the system first empties.
Lemma 7
,
A standard argument, based on the fact that the system evolving under regenerates upon every entry into the empty state, yields the conclusion that
(25) 
from which we immediately infer that
Consider now the system evolving under from time when its state is until it enters state for the first time. The expected time taken is plainly and the holding cost rate incurred through this period is bounded below by . If we write the mean integrated head count divided by as and the mean total service cost divided by as we infer that
(27)  
The inequality in the lemma follows immediately from (3.1) and (3.1). To justify the divergence claim, we simply observe that an assumed permanent utilization of the maximum service rate implies that is a uniform lower bound on . The proof is complete.
Before proceeding, we observe from (3.1) and (3.1) and the definitions of the quantities concerned that we may write
where
Note that it is straightforward to establish that
(29) 
The following is an immediate consequence of (5) and Lemma 7.
Lemma 8
such that , for all choices of .
We are now in a position to prove full indexability. The key fact to establish is that is increasing in for each . Full indexability will then follow trivially from (5).
Theorem 9 ((Full indexability))
(i) The function is increasing ; (ii) the station is fully indexable.
Fix . There are two possibilities. Either there exists a monotone policy which is uniquely optimal for (case 1) or not (case ). Under case , invoking the preceding lemma we may assert the existence of such that (5) is satisfied in the form
(30)  
(31)  
Since is continuous for , it must follow that with the property that the inequalities in (3.1) are satisfied with replacing for all in the range . We infer from (5) that monotone policy is uniquely optimal for . If we now consider the expression in (3.1) with computed with respect to policy , it follows easily that is increasing and linear in over the range .
Now consider case 2. Use to denote the collection of DSM policies which are optimal for . From the preceding lemma and invoking the strict concavity of , we infer that must be finite. Further, the continuity of , together with (5) implies the existence of such that must be optimized by a member of for in the range . Suppose that optimizes for some . It then follows from (3.1) that
where in (3.1), and denote quantities computed with respect to policy . Hence from (3.1), it follows that for each , lies on one of a finite collection of straight lines with positive gradient [one for each ] throughout the range . However, the continuity of implies that it must in fact lie on just one of those lines throughout that range. It follows that is increasing linear in over the range . We conclude from the above consideration of cases 1 and 2 that, for each is continuous with a positive right gradient at each and is thus increasing. This concludes the proof of part (i).
For part (ii), we first take the analysis of part (i), case 2, a little further. Since for the chosen is strictly increasing through for all , the only policy which can remain optimal throughout this range must satisfy conditions of the form (3.1). This policy must be maximal (i.e., must assign maximal service levels) among those policies in and will be uniquely optimal for and hence monotone.
From the above discussion, we can infer the following: fix any and choose the maximal optimal policy for . This policy is monotone. Call it . Define by
By the above argument and is strictly optimal for . Further, if is optimal for , but not uniquely so. We use for the maximal DSM policy which is optimal for . Policy is monotone such that
(33) 
where (33) means
with strict inequality for at least one . In this way we can develop a sequence and corresponding monotone policies , such that:

is optimal for ;

;

is optimal for and is such that .
Since the choice of was arbitrary, indexability follows trivially from 1–3. This completes the proof of part (ii) and of the theorem.
3.2 Computation of station indices
In the proof of Theorem 9 we constructed an ascending set of values, each of which signaled a change of optimal policy for . In this construction the initial was arbitrary. In our discussion of index computation, we shall continue initially to operate in space [i.e., to consider solutions to the optimization problems ], but will construct a descending set of values, labeled each of which will also signal a change of optimal policy. We do this because such a set is straightforward to initialize, with the supremum of those for which the policy [hereafter labeled ] which applies the maximal number of servers whenever the queue is nonempty is not optimal for . Because of our ability to restrict to monotone policies, it is clear that both and the policy (which applies servers when the queue length is , but which otherwise applies servers) are optimal for . By direct calculation of the average cost rates for these policies it is straightforward to verify that
We now give an algorithm for producing the sequence and the monotone policies such that is strictly optimal for in the range . Note that we take . In the algorithm we utilize the characterization of optimal policies for given in Lemma 5 together with the formula for given following the proof of Lemma 7.
Algorithm for index computation
Step 0. Let . The positive real and the policy are as above. The positive integer is given by
Step 1. The positive real , the policy and the positive integer given by
are specified. Determine given by
and
Step 2. Let be the maximal satisfying
for some in the range . Let be an value achieving the equality.
Step 3. Define the policy by