Sequential Resource Allocation with Positional Costs

Sequential Resource Allocation with Positional Costs

Abstract

We consider the problem of minimizing the total cost to run a sequence of tasks in the given order by agents under the positional cost model. The cost to run a task not only depends on the intrinsic cost of the task itself, but also monotonically related to the position this task is in the working list of the agent assigned. Such a positional effect can naturally arise from the classic sum-of-completion-time minimization problems, and is also well motivated by the varying efficiency when an agent works in reality (such as due to the learning effects or deteriorating effects). Also, it can be seen as a deterministic variant of the classic Baysian sequential decision making problems. This paper presents a simple and practical algorithm that runs in time and minimizes the total cost of any problem instance consisting of two task types. The algorithm works by making greedy decision for each task sequentially based on some stopping thresholds in a “greedy-like” allocation simulation – a working style coinciding with Gittins’ optimal-stopping based algorithm for the classic Baysian multi-armed bandit problem.

1 Introduction

Consider the problem of minimizing the sum of completion time to serve customers with identical service providers, where the customers are ordered by a first-come-first-serve queue so that any service provider must serve the customers assigned to it in the same order as the queue (but two customers may be served simultaneously if they are assigned to different providers). Suppose a customer is the first customer of a provider that serves customers in total, since all the other customers of the same provider must wait for customer to complete in time , the completion time “caused by” customer is thus . Summing up over all customers, we have

(1)

where is the position of customer in the working list of its service provider, and equals to the reversed position of customer , i.e., if a provider serves customers in total, then , which equals to for the first customer and for the last customer, in particular.

The sum-of-completion-time problem presented above is a special case of the general problem of Positional Allocation studied in this paper. In general, we want to minimize the total cost to run a sequence of tasks by multiple agents under the positional cost model as Eq. 1 shows, where the cost to run a task not only depends on the independent cost of the task but also monotonically related to the number of tasks the agent has been assigned to before. The sum-of-completion-time problem already shows how such a positional effect can be naturally derived from the optimization objective of minimizing total completion time. Moreover, the positional cost may alternatively come from an abstraction of the varying efficiency when an agent works in reality. For example, the so-called learning effect [2] [5] usually helps a human agent to work more and more efficiently, while on the other hand various deteriorating effects [4] may just do the opposite (e.g. human/animals can get tired during working, devices can wear off during the usage, or the working situation is just getting worse over time in medical treatment and diseaster rescue). See Appendix A for a motivating application for the authors, where the goal is to optimize the overall battery efficiency in electrical systems powered by multiple batteries, in light of the phenomenon that the power efficiency of battery gets worse as the battery is discharging.

In cases where the tasks can be run in arbitrary order, we know from the rearrangement inequalities by Hardy et al. [13] that the problem can be solved by a simple Shortest-Processing-Time (SPT) rule that always matches the task with the shortest processing time to the position with the largest weight. However, when the non-reorderable cosntraint is imposed, which could be either caused by a priority of the tasks (such as in a queue) or by the online nature of the problem, the greedy SPT rule becomes suboptimal (see Figure 3 in Appendix B for a counter-example), and there seems to be no obvious way to solve the problem in polynomial time.

In this paper we propose a simple but nontrivial algorithm that runs in time, and we show that this algorithm is optimal under any problem instance with two task-types. The algorithm works by making greedy decision for each task sequentially based on stopping thresholds in a “greedy-like” allocation simulation. We expect the combinatorial structures of the problem exhibited by our algorithm can inspire the design of practical and optimal algorithms in more general settings of this important problem.

1.1 Connections with Related Work

Both sum-of-completion-time (or, the mean flow time) and max-of-completion-time (i.e. the makespan) are extensively studied optimization objectives. It is widely known that the makespan optimization problem is strongly NP-hard even assuming a constant positional function , both for its uncapacitated version (i.e., multiprocessor scheduling/bin packing) and capacitated version (i.e., the 3-partition problem). Many studies were thus focusing on designing asymptotic-PTAS or constant-ratio approximation algorithms for makespan optimizations, especially under generalized cost functions [1] [8] [15] [7]. The objective functions of most these generalized cost models are symmetric with respect to the tasks/items, and thus the order of items has no impact on the aggregate value. In contrast, in the positional allocation problem the “real cost” of an item further depends on where it is put in the bin.

Meanwhiles, another line of research tries to find polynomial-time exact algorithms for the makespan optimization problem assuming a constant number of task/item types. Specifically, Leung [14] presented an dynamic programming algorithm, where is the number of item types. However, since problems in such a setting can admit compact inputs (in fact, encoded by only numbers), the running time of a polynomial algorithm for such high-multiplicity problems needs to be polynomial to (rather than to ). A polynomial algorithm for the case was first given by McCormick, Smallwood and Spieksma in [16]. Later, Eisenbrand and Shmonin [6] showed that actually only different “packing ways” are needed for the case of . Very recently, Goemans and Rothvoß [11] extended the techniques in [6] and gave the first polynomial-time bin packing algorithm for arbitrary (but constant) number of item types. Our work pursuits similar goals with these works, trying to find polynomial-time algorithms for problem instances with constant number of task/item types, albeit with a different optimization objective encompassing flow-time. Essentially, the metrics of makespan and flow-time correspond to the norm and norm, respectively (see Eq. 2 in Section 2). Besides, also note that the order of tasks in the input sequence plays a crucial role in the positional allocation problem considered in this paper, which means binary instances of this problem cannot be compressed into a sequence of multiplicities, but will have the same input format as the general form of the problem, thus having input length.

On the other hand, scheduling under positional costs is also an active area in operations research. Biskup and others [2] [5] first considered the learning effects in single-machine scheduling, in which the positional weights decrease with the positions, typically modeled by explicit polynomial functions in most later works [17]. See [3] for a survey of them. Browne and Yechiali [4] first introduced the deteriorating effects in scheduling problems, in which the positional weights increase with the positions, typically modeled by explicit polynomial functions [18] or exponential functions [12]. See [20] for a survey of works in this line. Besides, some works also considered the parallelel machine scheduling problems with positional costs [19] [20]. In most of the positional scheduling works presented above, the key is to find good permutations of the task sequence so as to minimize the objectives considered, and the classic Short-Processing-Time-first (SPT) rule turns out to be optimal in various settings (e.g. see the summary table in [3]). Differently, the positional allocation problem considered here imposes a strict nonorderable constraint on the order of tasks. Note that the constraint is different from classic precedence constriants in that, the former only constrains the order of tasks in the same machine (but it is possible to run tasks in parallel in different machines) while the latter further rules out any parallelism between tasks with precedence relationship. It turns out that the nonorderable constraint invalidates the mostly-used SPT rule. Actually, as demostrated later, the positional allocation problem studied in this paper exhibits a quite different combinatorial structure, which leads to practical optimal algorithms quite different with the SPT rule.

Finally, there is an interesting connection between the positional allocation problem and the Baysian Multi-Armed Bandit (MAB) problem . In the Baysian MAB problem, we are given “bandit-arms”, and each arm is in an observable “state” at time-slot . In any round , we are asked to choose one arm , leading to a stochastic payoff , and also causing the chosen arm to stochastically change its state to . Actually, the function corresponds to a parameterized probability distribution of payoff, the state corresponds to the parameter setting of that the player “believes” the arm “should be in” at time , and the state transition function follows the Baysian inference principle for probability distribution . The goal is to maximize the cumulative reward at infinite horizon. It is not hard to see that the Baysian MAB problem is essentially a stochastic version of the positional allocation problem, with a special task sequence on one hand, while with general stochastic transition functions on the other hand (in the positional allocation problem, , and when ). In 1979, Gittins found an elegant simulation-based algorithm [10], which is proven to be optimal for the Baysian MAB problem [9]. Describing in our language, given an infinite task sequence , where , the algorithm assigns a score to each agent according solely to the current capacity of that agent (i.e., is independent to any other ), then the algorithm simply allocates the first task to the agent with the highest score. The score, later called the Gittins Index, happens to correspond to the (expected) normalized cumulative cost of the optimal stopping strategy of a simulation to keep allocating tasks in the single machine . Note that the task sequence in Baysian MAB is by default sorted in a Shortest-Processing-Time-first manner 1, and due to the monotonicity of the positional function, the Gittins-Index-based algorithm degenerates to the naive SPT algorithm in the positional allocation problem, which is known to be suboptimal in general (but indeed optimal for that specific single task sequence!). Interestingly, as shown in this paper, it turns out that the truely optimal algorithm for the positional allocation problem may still exhibit a very similar working pattern with the Gittins’ algorithm (at least for binary inputs), namely that the optimal decision for each task can be made sequentially based on the “stopping threshold” of a “greedy-like” allocation simulation.

2 Preliminaries

Definition 1

A -allocation scheme of tasks is a partitioning of the sequence into subsequences, denoted by , where (1) if (disjointness); (2) (completeness); (3) if , for any (monotonicity).

The size of an allocation scheme is if for every . Note that only an allocation scheme with uniform cardinality has a well-defined size. The position of task under allocation scheme is if there exists such that . Given an allocation scheme , any integer sequence can be accordingly partitioned into subsequences, denoted by . For convenience we will write for when the context is clear.

Positional Allocation Problem. Given a problem instance where , , and is arbitrary monotonically decreasing function. We want to find an allocation scheme of size in order to

(2)

In the above problem formulation, a positional allocation algorithm outputs an allocation scheme for a given problem instance. Equivalently, a positional allocation algorithm may output a decision sequence where each denotes the index of the agent assigned to the task . Clearly, any decision sequence corresponds to a unique allocation scheme. The following lemma shows that the reverse is also true: any given allocation scheme also corresponds to a unique decision sequence. In other words, an allocation scheme is equivalent to a decision sequence. See the proof in Appendix C.1.

Lemma 1

For any allocation scheme , there exists a unique decision sequence such that for each .

In the rest of this paper we will mainly discuss algorithms assuming they output decision sequences, and as a general technique, we often prove the sub-optimality of a given decision sequence by re-arranging some tasks in the allocation scheme corresponding to and showing that the re-arranged allocation scheme has lower cost than the original one. In such a proof, the monotonicity property of allocation scheme is the key to guarantee that the re-arranged allocation scheme is still valid (i.e. “achievable” by some decision sequence).

We remark that our problem formulation encompasses some other related models. For example, although in our formulation each agent must be assigned exactly tasks, both the problem variants with and without cardinality constraints (in which an agent can run at most tasks and arbitrary number of tasks, respectively) can be reduced to the problem formulated here by appending enough number of “null tasks” with . Furthermore, although our formulation assumes the positional cost function is decreasing, a problem instance with increasing function can be reduced to an instance in our model by reversing the task sequence. Specifically, for any instance with increasing function , we can construct an instance with the decreasing function , and from Lemma 1 we know that: if is the solution of in our model, then is the solution of in the model with increasing positional function (and vise versa). Similarly, the same reduction also works for the problem variant with decreasing positional function but reversely-growing positional index (from back to ). Also note that the tricks presented above can be further combined together to reduce more combinations of variants to our problem. For example, let be an instance of the sum-of-completion-time problem presented at the beginning of the paper. The “equivalent instance” of in our model is where there are “null tasks” in and .

Finally, the current probem formulation is presented in a form for the sake of simplicity, and the algorithms presented in this paper may apply to some natural generalizations of the problem. For example, in our formulation every agent is assigned with the same number of tasks, while our algorithmic results also apply to problems with arbitrary capacity plan in which agent may be assigned with (exactly or at most) a different number of tasks. Besides, in this paper we couple the position weights and task-specific costs with the multiplication operator (see Eq. 1 and Eq. 2), while the algorithmic discussions also apply to the more general cases where the cost function is monotone and has positive mixed partial derivaties .

The Shortest-Processing-Time Rule. The monotonicity of the positional cost function implies a simple “principle” to allocate tasks: In general, we tend to run tasks with relatively smaller costs first (thus coupled with larger positional weights) while to allocate tasks with relatively larger costs later (thus coupled with smaller positional weights). Lemma 2 justifies this intuition formally.

Lemma 2

(Rearrangement Inequality [13]) If a monotonically increasing function has positive mixed derivatives , then for any , and , we have

(3)

The above principle suggests that, if we could arbitrarily change the order of the tasks, the optimal algorithm would be simply to sort the tasks in ascending order and to assign them among the agents in a round-robin way. However, the nonreorderable constraint of the problem brings additional difficulty that we are forced to allocate tasks sequentially. In this case, a naive greedy allocation algorithm may be to couple a large (small) task with the smallest (largest) positional weight at that time. This naive greedy algorithm turns out to be sub-optimal. For example, Figure 3 in the appendix shows an instance with binary costs ( and ), where the allocation scheme of the naive greedy algorithm (the left side) has larger cost than another allocation scheme (the right side). Actually, the better allocation scheme at the right side is optimal under this instance, and it comes from a simple and efficient algorithm proposed in the next section. In Section 4 we prove that this algorithm minimizes the total cost for any binary-valued instance of the positional allocation problem.

3 The Algorithm

We assume , in this section. A pragmatic motivation of the assumption is that, often in real world the task-specific costs follow a bimodal distribution, in which case a two-value separation may well approximate the real values. For example, in the multi-battery application presented in Appendix A, the power consumption of a device can greatly depend on whether the device is active or on standby, with a huge gap of more than 100x (See Figure LABEL:power_gap). Moreover, detecting the active-standby mode of a device is usually much more efficient, robust, and easier than measuring the exact value of the power consumption of the device.

A straightforward -dimensional dynamic programming procedure can solve the positional scheduling problem. Specifically, define as the state vector for the situation where there are “slots” left in each agent . Given a task sequence , define as the minimum total cost over all possible decision sequences that matches to . By definition of the positional allocation problem we have

Since the size of the state space is no more than , the time complexity of the dynamic programming procedure is polynomial to for constant . For general , however, the size of the state space is at least the integer partition function (e.g., when ), which is super-polynomial to the input size. Meanwhiles, another drawback of the dynamic programming solution is that it may not easily adapte to the cases where the information of the tasks is limited, such as in online allocation scenarios where we (at best) only know a stochastic generating process of the workload, rather than a deterministic task sequence.

Input: , where
Output:
1 set for each for  to  do
2       ThresholdAllocation ()   for each
3 end for
return Function ThresholdAllocation Input:
Output: the agent to which is assigned
4 for  to  do
5       if  then
6             continue
7       end if
8      for  to  do
9             if  then
10                   return
11             end if
12            
13       end for
14      
15 end for
Algorithm 1 The basic version of the simulation-based algorithm
Figure 1: Illustration of a simulation process performed by the ThresholdAllocation routine of Algorithm 1 for under the task sequence (, ). The three diagrams correspond to the initial setup and two later expasions of H-zone (positions in red) and L-zone (positions in blue) in the simulation, respectively. In particular, when (the right side), the H-zone is of size and the L-zone is of size . Since there are at least “L” tasks in the first tasks, the algorithm will assign the first task to agent .

In this section, we present a simple algorithm that turns out to generate the optimal decision sequence for any problem instance with binary-valued task sequence. The basic version of the algorithm is shown in Algorithm 1. Again we use to denote the cardinality capacities of agents, and without loss of generality assume . To allocate the first task , the algorithm iterates over each resource in the descending order (priority) to decide whether to put in . The decision for each is made via doing a “simulated allocation” as follows (Line in Algorithm 1): the simulation first setups a L-zone and a H-zone among all the available positions, then sequentially allocates the task sequence, starting from , by sending all “large” tasks (i.e. the ones with ) in the L-zone and all “small” tasks (the ones with ) in the H-zone. Whenever the “small” tasks overflow from the H-zone, both the H-zone and the L-zone expand, and the simulated allocation continues. If at any time in the simulation the L-zone is filled up by “large” tasks (is full), the algorithm immediately stops the simulation and allocates to agent ; otherwise it will choose some other agent with smaller id in the later simulations. Note that the algorithm will guarantee to choose agent (which has the smallest capacity) if getting chance to run simulation on it (i.e. with ). Also note that the task with the largest cost (i.e. L in this context) will always be assigned to the smallest non-empty agent.

The initialization and expansions of H-zone and L-zone are based on the current capacities of agents and the index of the (agent) candidate on which the simulation focuses. Let be the “current position” of agent (so ). At any time of the simulation the H-zone only contains positions smaller than while the L-zone only contains positions equal or larger than . Initially, the H-zone contains all such legal positions for the agent (i.e., the single position of in agent ), and the L-zone contains all legal positions for agents smaller than . In every expansion, both the H-zone and L-zone include legal positions of one more agent, from to . Figure 1 illustrates an example simulation process with five agents and .

Given a H-zone and L-zone setup, to check whether the L-zone can be filled up before the H-zone overflows, the algorithm only needs to count whether the number of “large” tasks within a look-ahead “window” exceeds a “threshold”. Specifically, Let and be the sizes of H-zone and L-zone respectively, the L-zone can be “successfully” filled up if and only if there are at least “large” tasks in the first tasks. Suppose task is the head of the current task sequence, the last task to check in the look-ahead will be , which will be called the look-ahead horizon from task under setup in the rest of the paper, or just horizon when the context is clear.

Note that the output of Algorithm 1 does not depend on the specific form of the positional function at all, nor on the specific values of the task , nor even on the detailed pattern of how the tasks are arranged within each look-ahead window. Moreover, the following lemmas establish some monotonicity properties of Algorithm 1. Specifically, Lemma 3 asserts that the algorithm never changes the order of the capacities of agents, i.e., the monotonicity of agent-capacities is conserved; Lemma 4 asserts that the look-ahead horizons under any given setup always move forward in the direction from to , i.e., the monotonicity of horizon for given look-ahead setup is conserved; and Lemma 5 asserts that the allocation decisions for the same task type always move forward in the direction from agent to , i.e. the monotonicity of agent id’s assigned to given task-type is conserved.

Lemma 3

Under any problem instance with two types of task, , for the agent-capacity variables computed by Algorithm 1 under , we have if , for any .

Lemma 4

Under any problem instance with two types of task, , for any given look-ahead setup , , let and be the sizes of H-zone and L-zone (respectively) when Algorithm 1 is allocating task under setup , and let and be the sizes of H-zone and L-zone (respectively) when Algorithm 1 is allocating task under the same setup , we have

(4)
Lemma 5

Under any problem instance with two types of task, , let and be two tasks of the same type (i.e. ), and let be the decision sequence output by Algorithm 1 under , we have if .

In particular, thanks to Lemma 4, all data variables (the horizons, thresholds, and counters) used by Algorithm 1 can be updated incrementally, yielding a more efficient implementation as shown by Algorithm 3 in Appendix D. It is easy to see that time and space is sufficient for Algorithm 3 to compute the same decision sequence with Algorithm 1 for any problem instance with two types of task.

Theorem 3.1

Under any problem instance , for , , Algorithm 3 returns the same decision sequence with Algorithm 1 in time and with space.

4 Optimality of Algorithm 1

In this section we prove that Algorithm 1 minimizes the total cost of any problem instance of positional allocation with two types of task. The basic idea is to show that if at any time we don’t assign a task to the agent decided by Algorithm 1, from the resulting decision sequence we can always construct another sequence such that follows Algorithm 1 on and has less total cost than . The construction is by re-arranging some tasks in the corresponding allocation scheme of without violating the monotonicity property of Definition 1.

Since Algorithm 1 runs in an iterative manner, we only prove this for the first task . In order to be consistent with the input of Function ThresholdAllocation of Algorithm 1, we slightly generalize the problem formulation to allow variable-sized agents in this section as follows:

Variable-Sized Position Allocation (VSPA) Problem. Given a problem instance , where is the intrinsic-cost of task and is the cardinality capacity of agent , without loss of generality assume all agents are “non-empty” at the beginning, and is sorted increasingly in capacity, that is, . A valid allocation scheme is required to be consistent with the cardinality constriants, that is, for each . The allocation scheme is indexed by the capacity of the agent when the task is assigned to (i.e. the reversed position). Instead of directly applying the positional function , we define

(5)

Assigning task to agent of capacity causes a cost of . Again, every decision sequence corresponds to a unique partitioning of the task sequence , and the goal is to minimize the total cost defined by

(6)

One can verify that the VSPA problem is exactly the original positional allocation problem when .

Now we will prove the optimality of Algorithm 1 for VSPA instances with two types of task. The main part of the proof is separated in cases of (Lemma 6) and (Lemma 7). In the former case, a straightforward re-arrangment can be done by observing the fact that Algorithm 1 always assigns a L task to the “smallest” agent (i.e., in agent ). See the proof in Appendix C.4.

Lemma 6

If is an instance of the VSPA problem with , , , and , then for any decision sequence with , there exists another decision sequence such that .

The proof for cases with (Lemma 7) is by induction. Specifically, assume by induction that Algorithm 1 is optimal for the subsequent tasks , which yields a specific sequence that is guaranteed to be optimal among the set of decision sequences not following Algorithm 1 at . We will show that the specific pattern of always enables a re-arrangement to beat itself, and thus beat any sequence not following Algorithm 1 at . More specifically, suppose Algorithm 1 assigns to agent , there can be two ways to not follow this decision: i) to allocate “lower”, in some agent ; or ii) to allocate “higher”, in some agent . The constructions of better decision sequences are further separated into these two cases. See the proof in Appendix C.5.

Lemma 7

Suppose is an problem instance with , , , and , and suppose Algorithm 1 assigns task to agent under , then for any decision sequence with , there exists another decision sequence such that .

Finally, Theorem 4.1 combines results proved in Lemma 6 and Lemma 7 to complete the proof of the instance-optimality of Algorithm under binary instances consisting of and .

Theorem 4.1

Algorithm 1 minimizes the total cost defined by Eq. 2 for any instance of the positional allocation problem with two types of task.

Appendix A The Multi-Battery Problem

Energy-efficiency is a key concern on mobile devices that depend on batteries to provide the power required to maintain operation. Whenever a device draws power, not all energy that is drawn from the battery is actually useful in the sense that it ends up powering the device. In fact, only a fraction of the energy drained from the battery ends up powering the device, the remainder is wasted: it heats up the device. We call these two components of the energy drained from a battery with each load the useful energy and the wasted energy, respectively. There are two main factors that determine the amount of wasted energy in the battery in a given time duration: the power consumption of the current user load and the internal resistance of the battery. In turn, one of the key factors determining internal resistance is the State-of-Charge (SoC) of the battery, i.e., how much remaining charge is in the battery. The quantatitive relationship between the wasted energy , the power of load , and the SoC of battery can be approximated by the following formula:

(7)

where as the open-circuit voltage, as the initial resistance of the battery, as the DCIR-SoC coefficient, and as the time length of the load – all can be considered to be constants. One can verify that is increasingly monotone to and decreasingly monotone to . Also, the function always has positive mixed partial derivatives. By Lemma 2 we know that, in general, we should try to power low-power loads by batteries in relatively low state-of-charge and power high-power loads by batteries in relatively high state-of-charge, so as to minimize the total wasted energy during a battery discharging cycle.

Figure 2: Power consumption curve of smart phones. The average power under active mode is about mW, while the number under standby mode is less than mW – a gap of more than 100x.

Appendix B Example Instance Showing the Naive Greedy Strategy is Suboptimal

The naive greedy strategy for the positional allocation problem is to allocate task with relatively larger (smaller) cost to a position with relatively smaller (larger) weight. Figure 3 shows a simple problem setting asking to assign a sequence of tasks to agents. There are only two task-types and in the task sequence, and . The leftside shows the allocation scheme of the naive greedy strategy under this instance. The rightside shows the allocation scheme of the algorithm proposed in this paper, in which some L tasks are exchanged with some H tasks in lower positions, resulting a lower total cost due to the rearrangement inequality.

Figure 3: Diagram illustration of the allocation scheme of the naive greedy algorithm and Algorithm 1 under the instance HHLHH LHLHH LLLHH HHLLH HHLLH HLLHH HHLLL LHHLH HHLHL LHHLL LHHHL HHHHH LHHLH HLLHL LHLHL HLLLH

Appendix C Proofs

c.1 Proof of Lemma 1

Proof

It is obvious that we can get unique allocation scheme from a decision sequence. We prove the opposite direction by giving an algorithm (Algorithm 2) that “decodes” how the tasks are assigned by the input allocation scheme.

Input:
Output:
1 set for ;
2 ;
3 while ,  do
4       if ,  then
5             ; ; ;
6            
7       else
8            return ‘’This is not a valid allocation scheme.”;
9            
10       end if
11      
12 end while
13return ;
Algorithm 2 Algorithm that decodes the decision sequence from the input allocation scheme

For contradiction, suppose Algorithm 2 fails to output a decision sequence (i.e. goes into Line 2) at time when the input satisfies property (1)(2)(3) in Definition 1. Since every task before has been decoded, we know for all . Due to the monotonicity of allocation schemes we know for any , which violates the completeness of allocation schemes.

c.2 Proof of Lemma 3

Proof

The capacity of agent decreases for each task assignment. Due to line , Algorithm 1 never choose an agent that has the same capacity with another agent but has larger index of agent (i.e., and ), so the order of any agent pairs with respect to their capacities cannot be reversed.

c.3 Proof of Lemma 4

Proof

For a given look-ahead setup , we will show that decreases at most for any single task assignment, which directly yields

To prove for any task , let be the current position of agent at time (i.e. ), and let be any assignment decision of task made by Algorithm 1 (i.e. ). It is obvious that will not change if or , since the task is assigned to an agent not covered by either H-zone or L-zone (i.e. and ). In cases of , the size of L-zone doesn’t change because the “threshold position” doesn’t change; and the size of H-zone decrease due to the assignment of task to agent .

In cases of , the “threshold position” decreases , and the H-zone and L-zone will “get” and “lose”, respectively, exactly one position for each agent . In addition, the L-zone will lose one position for agent (i.e. the threshold position ). Due to Lemma 3, we have if Algorithm 1 chooses , which means none of agents has a position of at this time, so the L-zone will not further lose positions for them despite of the decreasing of the threshold position . In total, the sum of and decrease in this case.

c.4 Proof of Lemma 6

Proof

For the decision sequence , let . By definition . Let be the allocation scheme corresponding to .

Now consider and in the allocation scheme. If , we can simply construct by switching the tasks in the two agents and . Formally, this means we construct , and by Lemma 1 we can in turn construct from . So, in the following we assume . Because task is assigned to agent , which is of capacity , we have

Since by definition , there can be two cases:

Case 1: When . Intuitively this means the agent receives its first task () before the capacity of agent goes down to below . In that case, we exchange the allocation target for task and (i.e. assign task to agent and assign task to agent ). Due to Lemma 2, this is equivalent to a chain of exchanges, each either reduces the total cost or keeps it the same. Formally, assume for some (such an always exists), we construct

and .

Recall that , then we have

Case 2: When . This means the capacity of agent goes down to below before the agent is ever assigned any task (thus still having a capacity of by then). In this case we simply exchange all tasks of agents with the tasks assigned to agent when the capacity of is no more than , yielding

Clearly the exchange will not change the total cost, thus reduce the problem to Case 1. ∎

c.5 Proof Sketch of Lemma 7

Proof Sketch. The complete proof is rather long, so in this paper we will omit some repeated details when it is safe to do that, especially for rigorous proofs of the superiority of a rearranged sequence like in Lemma 6.

It is easy to check that Lemma 7 holds if for each agent (any algorithm gives the same total cost in these cases). For general , assume for induction that Lemma 7 holds for all the “smaller” instances, that is, that Algorithm 1 minimizes the total cost of any instance . We will prove that Algorithm 1 will also minimize the cost of the instance . Specifically, for any decision sequence , let , we only need to prove in two cases. For convenience we will denote as the -th H task in the sequence starting from task . Similarly we denote as the -th L task in the sequence.

When : Without loss of generality we can assume that follows Algorithm 1, for otherwise we can simply turn to consider such a , which guarantees to have lower costs than due to the assumption of the induction. For any such instance and any such sequence , we can have the following observations, which collectively characterize a “overflowing” situation.

First, we know because Algorithm 1 always breaks ties by returning the agent with smaller id. Second, since Algorithm 1 returns for , by definition we know that there must exist such that there are at least “L”s in , where and are the sizes of H-zone and L-zone at this time, respectively. Third, we know is the only H task assigned by to an agent with smaller id with , as shown by the following claim.

Claim

For any , we have if .

Proof

We know because Algorithm 1 has chosen for and allocating to agent only decreases the stopping threshold of Algorithm 1. Furthermore, because we have assumed by induction that is optimal for , by Lemma 5 we know that all the H tasks after will also be allocated to agents with id no smaller than .

In addition, the following claim shows that the sequence (where and follows Algorithm 1) will allocate at least one “L” in the H-zone.

Claim

If allocating according to the sequence , there always exists an such that , , , and .

Combining all the above observations together, we can derive the following claim, which asserts that there must be some L overflowing from the L-zone to the H-zone if we follow the sequence when . Let denote the first such “”. Observe that the decision sequence has put (which is also ) in the L-zone (i.e. ”below” position ) while put in the H-zone (i.e. above position ). Because is the first “L” above , we know all the tasks on top of are “H”.

Claim

If allocating according to the sequence , there always exist such that , and that for any , we have if , and if .

Based on this claim we can construct a better sequence by allocating to agent and allocating to agent .

Compared with the one of , everything is the same except that a pair of H and L is exchanged in positions, which always lowers down the total cost due to Lemma 2). The rigorous proof of the benefit of this exchange is similar to the proof of Lemma 6.

When : Informally these are the cases when put the first task in somewhere “higher” (i.e., in a smaller position) than where it “should have been”. In the following we will prove that, for any instance and any sequence with , there always exists a sequence with such that . In other words, any sequence with cannot be the optimal sequence.

Again, by induction we can assume that follows Algorithm 1. Similar with the first case, we assume for contradiction that is optimal, which leads to a series of observations that collectively characterize a snapshot of the allocation. Then we will do some task exchanges in the allocation scheme to reduce the total cost without violating the monotonicity of the allocation scheme, thus forming a contradiction.

First, without loss of generality we know , for otherwise if we will simply find the smallest agent id with , exchange the tasks allocated in agent and , and turn to consider the new allocation scheme.

Second, let denote the second H task in (i.e. for any ), we know must not be put at the left side of if wants to be optimal. That is, we have . This is because will lead to (note that we just showed ), in which case we can turn to consider the sequence , which guarantees to have no more cost than . The complete proof is similar to the proof of Lemma 6. Note that by Lemma 5 all the subsequent H tasks will also be assign to agent or at its right side.

Third, we know that allocates at least one H task below (including) the position . Formally, we have

Claim

There must exist an such that and .

Let be the first such H task (which basically “overflow”s from the H-zone). We know that there is no interleaved H and L in the H-zone at least until . Formally, we have

Claim

For any , if and , then we must have .

Finally, let be the first L task after (such a