# Timely-Throughput Optimal Scheduling with Prediction

## Abstract

Motivated by the increasing importance of providing delay-guaranteed services in general computing and communication systems, and the recent wide adoption of learning and prediction in network control, in this work, we consider a general stochastic single-server multi-user system and investigate the fundamental benefit of predictive scheduling in improving timely-throughput, being the rate of packets that are delivered to destinations before their deadlines. By adopting an error rate-based prediction model, we first derive a Markov decision process (MDP) solution to optimize the timely-throughput objective subject to an average resource consumption constraint. Based on a packet-level decomposition of the MDP, we explicitly characterize the optimal scheduling policy and rigorously quantify the timely-throughput improvement due to predictive-service, which scales as , where are constants, is the true-positive rate in prediction, is the false-negative rate, is the packet deadline and is the prediction window size. We also conduct extensive simulations to validate our theoretical findings. Our results provide novel insights into how prediction and system parameters impact performance and provide useful guidelines for designing predictive low-latency control algorithms.

## 1 Introduction

How to provide low-latency packet delivery has long been an important problem in network optimization research, particularly due to the increasingly more stringent user delay requirements in a wide range of applications. For instance, low delay is critical for video traffic in mobile networks, which has already accounted for of total mobile data in and will account for more than by according to a recent Cisco report [1]. Other areas such as online gaming, online health care and supply chain also have rigid delay requirements. Indeed, user requirements are so strong, that it has been reported that for companies like Amazon and Google, if their service latency increases by ms, they will lose of their customers and millions of dollars revenue [2]. As a result, the problem of guaranteeing low-latency has received much attention in the last decade, and many scheduling algorithms have been designed based on various mathematical techniques, e.g., [3], [4], [5], [6], [7], [8], [9].

On the other hand, driven by the availability of large amount user behavior data and the rapid development of data mining and machine learning tools, it has become common in practical systems to *predict* user demand and to *proactively* serve customer requests.
For example, Amazon tries to predict what customers may purchase and pre-ships products to distribution centers close to them, in order to reduce shipping time[10].
Netflix, on the other hand, tries to predict what customers may want and preload videos onto user devices to improve quality-of-experience [11].
Another example is brunch prediction in computer architecture, where prediction is used to decide how to pre-execute certain parts of the workload, so as to reduce computing time [12].
Despite the continuing success of this prediction-based approach in practice, it has not received much attention in theoretical study. Therefore, it remains largely unknown how prediction can fundamentally improve delay-guaranteed services.

In this paper, we aim to fill this gap and investigate *the impact of prediction on timely-throughput*. Specifically, we consider a single-server multi-user system where the server delivers packets to users. Each packet has a user-dependent deadline before which it needs to reach the user. The service channel for each user is time-varying and the transmission success probability depends on the resource spent sending a packet.
The server gets access to an *imperfect* prediction window, in which forecasts about future arrivals are available, and can *pre-serve* packets before they actually enter the system.
The overall objective of the system is to maximize a weighted sum of timely-throughputs of users, being the rates of packets delivered before their deadlines.
This formulation is general and models various important practical applications, e.g., video streaming, sending time-critical control information, and grocery delivery.

There has been an increasing set of recent results investigating the impact of prediction in networked system control. [13] and [14] consider utility optimal scheduling in downlink systems based on perfect user prediction. [15] shows that proactive scheduling can effectively reduce queueing delay in stochastic single-queue systems. [16] considers how network state prediction can be incorporated into algorithm design. [17] and [18] focus on understanding the cost saving aspect of proactive scheduling based on demand prediction. [19], [20] and [21] also investigate the benefit of prediction from an online algorithm design perspective. However, we note that the aforementioned works all focus on understanding the utility improvement aspect of prediction and proactive service, and delay saving often comes as a by-product of the resulting predictive control algorithms. Thus, the results are not applicable to delay-constrained problems, where meeting the latency guarantee is an explicit requirement.

Our formulation is closest to recent works [8], [22], [13], and [9], which focus on delay-constrained traffic scheduling. Our work is different as follows. [8] focuses on the setting where traffic is generated and delivered within synchronized frames for all users and [22] focuses on periodic traffic, while our work allows heterogeneous deadlines for user packets and random arrivals. [13] focuses on optimizing system utility subject to stability constraint, while we work explicitly with delay constraints. Lastly, while our work builds upon the novel results in [9], we focus on quantifying the impact of prediction and proactive service in a Markov system, whereas [9] considers a causal system with an i.i.d. setting. The extension to incorporate prediction significantly complicates both the solution and analysis. Our results also offer novel insights into the fundamental benefits of prediction in delay-constrained network control.

The main contributions of our paper are summarized as follows.

(i) We propose a novel framework for studying timely-throughput optimization with imperfect prediction and proactive scheduling. Our model captures key features of practical delay-constrained problems and facilitates analysis.

(ii) We derive the exact optimal solution to the prediction-based timely-throughput optimization problem using Markov decision process (MDP). We rigorously quantify that prediction improves timely-throughput by , where are constants, is the true-positive rate in prediction, is the false-negative rate, is the packet deadline and is the prediction window size. This concise and explicit characterization is novel and provides insights into how different parameters impact system performance.

(iii) We conduct extensive simulations to validate our theoretical findings. Our results show that prediction-based system control can significantly boost timely-throughput.

The rest of the paper is organized as follows. In Section 2, we present the system model. The MDP-based solution is presented in Section 3. Structural properties of the optimal solution and exact timely-throughput improvement for a static setting are derived in Section 4. The general scenario is considered in Section 5. Simulation results are presented in Section 6 and conclusion comes in Section 7.

## 2 System Model

Consider a general single-server system with users as shown in Fig. 1. The server can simultaneously transmit multiple packets to different users with cost due to resource expenditure, e.g., energy consumption. The channels are unreliable and transmissions may fail. Each packet has a hard deadline within which it must be delivered successfully. Otherwise, it becomes outdated and will be useless for the user (and discarded). We assume that time is discrete, i.e., , and a packet transmission to any user takes one time-slot.

### 2.1 The Delay-Constrained Traffic Model

The number of packet arrivals destined for user at time is denoted by . We assume that is i.i.d across time and independent for different users, with an average rate . We also assume the number of packet arrivals is bounded for all time and for all users, i.e., . For each user , there is a hard deadline or sustainable delay for his packets, denoted by . This means that for any packet in , it should be successfully delivered by time . Otherwise, it becomes useless and will be discarded from the system at time . We further assume , for some finite constant .

### 2.2 The Service Model

The system serves user packets by transmitting them over service channels, at the expense of resource consumption, e.g., energy. To model system dynamics, we assume the success of a packet transmission for user is a random event and its probability is determined by the instantaneous condition of the service channel, denoted by the channel state , which is modeled by an ergodic finite-state Markov chain with state space . The transition matrix and the stationary distribution are denoted as and , respectively.

At every time , the server needs to decide the resource consumption level for transmitting each present packet, which is chosen from a bounded set of consumption levels . If at time , the channel state is and the resource level is , then the probability of a successful packet transmission for user is . Here means that a packet will not be transmitted in the current time-slot and . Also, for all . We further assume that is a concave and strictly increasing function of .

We assume without loss of generality that there is a total order on set based on , i.e., for each pair of , either or . We also assume that there is no hard capacity constraint for the server, i.e., it can transmit an arbitrary number of packets every time, although it has to maintain an average resource consumption guarantee (the setting with hard capacity constraint will be considered in Section 5). This assumption is made to facilitate analysis and was also adopted in [9].

### 2.3 The Predictive Service Model

Different from prior results in the literature that often only consider *causal* systems, we are interested in understanding how prediction and predictive-service fundamentally impact system performance.
Thus, we assume that the system gets access to a *prediction* window for each user.
Moreover, the system implements *predictive service*, i.e., it tries to pre-serve future arrivals in in the current time slot.
Such scenario is common in practice. For instance, Amazon predicts user behavior and pre-ships goods to distribution centers closest to users [10].

In this work, we focus on two prediction models.

#### Perfect prediction

#### Imperfect prediction

In the second model, prediction made by the system can contain error. Specifically, we adopt the imperfect prediction model parameterized by the true-positive and false-negative rates as follows.
Each predicted arrival in the prediction window is correct with probability , and every actual packet arrival will be missed with probability , i.e., a packet will arrive unexpectedly with probability , as illustrated in Fig. 2. Thus, the true-positive rate is and the false-negative rate is . These two rates are decided by the learning methods used to forecast future arrivals and our analysis holds for general and .
Without loss of generality, we assume .^{1}

### 2.4 System Objective

We define the *timely-throughput* as the average number of packets delivered successfully before their deadlines, i.e.,^{2}

(1) |

where denotes the number of packets that timely reach their destinations for user at time . We also define the average resource expenditure as:

(2) |

where is the resource consumed by transmissions of packets for user at time .

Denote . Given a weight vector with , the *weighted timely-throughput* is defined as .
In this paper, we focus on the problem of maximizing the weighted timely-throughput, subject to an average resource constraint , i.e.,

(3) | |||||

s.t. | (4) |

This formulation is general and models many delay-constrained applications, e.g., video streaming and supply chain optimization.

### 2.5 Model Discussion

Our work builds upon the novel results in recent work [9]. However, our model and results are different as follows. (i) We consider a Markov model for the system while [9] focuses on an i.i.d. setting. (ii) We focus on prediction and predictive scheduling and quantify their fundamental impact on timely-throughput, while previous delay-constrained results, e.g., [9] and [8], mostly consider causal systems. Analysis for predictive systems is significantly complicated by potential errors in prediction and requires different arguments compared to those for causal systems.

Understanding how prediction impacts delay-constrained services is critical for future intelligent communication and computing systems, as providing delay-guaranteed services has long been an important problem in various applications and predictive scheduling has been successfully utilized in different delay-sensitive scenarios, such as video streaming [23] and supply chain optimization [24].

## 3 Scheduling by Packet-Level Decomposition

Problem (3) can be formulated as a constrained Markov Decision Process (MDP). However, it is known that the number of system states can grow exponentially large, making it complicated to obtain efficient algorithms. To tackle this issue, we adopt the packet-level decomposition approach in [9] and extend it to handle prediction in our setting. Specifically, for every individual packet still in the system at time , its state is described by the user it belongs to and a triple . Here denotes the reception status of the packet, i.e., means that the packet is at the source and means that the packet has reached the destination. is the time duration before reaching its deadline, and is the channel state index. Then, the state of the system at time can be described by the state of all packets. Note that the number of arrivals at any time-slot from any user is bounded by , each packet can stay in the system for no more than slots, and the number of channel states is finite, so the number of system states is finite (though can be exponentially large).

A (possibly randomized) scheduling policy decides at each system state, which packets to transmit and at what resource levels. Since the distribution of system state at time is decided by the state and the scheduling decision at time , problem (3) is a constrained MDP with finite states, which can be solved with algorithms such as value iteration or policy iteration [25].

### 3.1 Packet-Level Decomposition for the Constrained MDP

Let be the Lagrange multiplier for constraint (4). The Lagrangian of (3) can be written as

(5) | |||

Here counts the timely deliveries of packets, and comprises the resource consumed. Further notice that there is no capacity constraint for the server. Thus, by denoting the set of packet arrivals for user up to time , the Lagrangian (5) can be decomposed into the following packet-level form:

(6) |

where is the indicator that packet reaches the destination before its deadline and is the total resource consumed by packet .

From (6), the term related to packet of user is

(7) |

As a result, maximizing the Lagrangian (5) can be accomplished by maximizing (7) for each packet. In the following, we refer to problem (7) as the *Single Packet Scheduling Problem* (SPS) and describe how this problem can be solved in the presence of prediction and predictive-service.

### 3.2 The Single Packet Scheduling Problem

In this subsection, we consider the optimal solution to the SPS problem under a fixed multiplier . Recall that the state of a packet is described by a triple . At each time-slot, the time-to-deadline is decremented by one. If a packet is still at the source when becomes , it will be discarded from the system. On the other hand, if a packet is delivered successfully before the deadline, we collect a reward . The cost charged for resource expenditure in each transmission is per unit.

#### Perfect prediction

We start with the perfect prediction case (note that zero prediction corresponds to having ). In this case, the arrival of each packet from user is known timeslots in advance. Thus, we need to solve the SPS problem with an extended deadline of . We can define as the optimal value function for a packet of user at state . The value function and the optimal scheduling decision at each state can be obtained with the following Bellman equations.

(8) | |||

(9) | |||

(10) |

#### Imperfect prediction

The imperfect prediction case is more complicated. We tackle this case by dividing the arrivals into two categories, namely, the true-positive part and the false-negative part. The latter part contains the unpredicted true arrivals. Since these arrivals are not expected, they can only be served after they enter the system. Thus, scheduling decisions for these packets are the same as those in the zero prediction case.

For predicted arrivals, the server can pre-serve them while they are still in the prediction window. However, there is a complication in this case. If a predicted arrival is actually a false-alarm, we cannot collect any reward. As a result, the resource consumed to pre-serve the packet is wasted. Moreover, the correctness of a prediction can only be verified at the time when the predicted packet is supposed to arrive. Before that, the server will have to take chances and treat all predictions equally.

Based on the above reasoning, we will treat predicted packets and the mis-detections differently in the DP formulation. The optimal predictive-service can be done by the following augmented Bellman equations.

(11) | |||

(12) | |||

(13) | |||

(14) | |||

(15) |

Here (11) and (14) are for unpredicted arrivals, and (12) and (15) are for predicted packets. Compared to (8), the main difference is that for predicted packets, one needs to take into account the fact that the system will collect a reward from pre-serving a packet only with probability . Also note that the time is special, as in the next time slot we will be able to verify the correctness of a predicted arrival. Hence, it is separately treated in (12).

### 3.3 The Optimal Weighted Timely-Throughput

After solving the SPS problem for a fixed , the policy that maximizes the Lagrangian (5) can be derived by letting each packet take its own optimal scheduling decision. Next we describe how to optimize the overall problem.

Denote the Lagrange dual function, i.e.,

(16) |

Using Lemma 3 in [9], the optimal weighted timely-throughput equals the optimal value of the dual problem, i.e.,

(17) |

This can be established by showing that the constrained MDP, with or without prediction, is equivalent to a linear program [26],[27]. Hence, the duality gap is zero.

In the following, we look at the Lagrange dual function in the perfect and imperfect prediction cases.

#### Dual under zero or perfect prediction

Let in the case with perfect prediction (zero prediction corresponds to ). Then, using (6), the dual function can be expressed as (recall that is the steady-state distribution of the channel state for user ):

(18) |

#### Dual under imperfect prediction

In this case, we note that the rate of predicted arrivals may not equal the actual arrival rate. Denote the predicted arrival rate from user as . We have:

Here the second term is due to the fact that each packet is missed, i.e., not predicted, with probability . Thus,

(19) |

Since and , we get

(20) |

Let and . Similar to the perfect prediction case, the dual function can be expressed as:

(21) |

where is determined in (19).

After obtaining the dual function for a fixed , we still need to find the optimal Lagrange multiplier . One possible approach is to use the subgradient descent method, where we take an iterative procedure to converge to as follows. In the -th iteration, we solve the SPS problem to get the optimal policy and the average resource expenditure based on the current multiplier . Then, the multiplier for the next iteration is given by ( is a step size):

It is known that with an appropriately chosen sequence, [28].

Despite the generality of (18) and (21), directly solving them is complicated. Thus, in the following section, we first consider a slightly less general setting where user channels are static (can be different across users) and the resource expenditure option is binary.^{3}

## 4 The Static Scenario

In this scenario, we assume that the channel states are static, i.e., the success probability for transmitting a user packet is a constant . Moreover, we assume the resource level set is , i.e., at each time-slot, the scheduling decision for each packet is to transmit it or not. In this case, the state of each packet can be described by .

### 4.1 The Optimal Scheduling Policy

#### Perfect prediction

First we consider the perfect prediction case. We have the following theorem. Recall that the zero prediction case is a special case ( for all ).

###### Theorem 1.

For each user packet, if , then the optimal policy is to transmit the packet at every time-slot, until it is either successfully delivered to the destination or becomes outdated. Moreover, the value function is given by:

(22) |

Otherwise, if , the value function is . Specially, if , the optimal policy is to not transmit the packet at all.

###### Proof.

See Appendix A. ∎

###### Remark 1.

When , based on the KKT conditions [28], it can be shown that for packets with , the optimal policy is to transmit the packet at every time-slot with probability , where , and not to transmit the packet otherwise. Notice that this has no influence on the value of the dual function , as well as the optimal weighted timely-throughput obtained by .

Theorem 1 shows that if the expected reward is larger than the cost in one transmission, then the server should try its best to deliver packets for user . Otherwise, packets from user should never be served if the expected reward is less than the cost (from the point of optimizing the weighted throughput under an average resource constraint). From Theorem 1 and (17), we have the following corollary.

###### Corollary 1.

Let denote the the dual function of (3) and denote the optimal weighted timely-throughput in the perfect prediction case. We have:

(23) | ||||

and

(24) |

Corollary 1 enables us to characterize the fundamental improvement in weighted timely-throughput due to prediction, which is shown in the next theorem. In the theorem, and denote the dual function and optimal weighted timely-throughput without prediction, respectively.

###### Theorem 2.

Suppose and , then the weighted timely-throughput improvement satisfies:

(25) |

and

(26) |

Theorem 2 suggests two things. (i) for each user , prediction improves the throughput by an amount with .
This implies that the impact of prediction is decreasing with the deadline , which is expected. (ii) the gap to the optimal improvement decreases *exponentially* as the prediction power increases. This demonstrates the power of prediction and highlights the potential of investing in improving system prediction.

#### Imperfect prediction

In this case, we first note that there is not much the system can do with the unpredicted arrivals. Thus, the optimal scheduling policy and value function for these packets are the same as those in Theorem 1. For the predicted arrivals, on the other hand, the system will be able to start their services the moment they appear in the prediction window, following (11) to (15).

The following theorem characterizes the optimal predictive-service policy and the corresponding value functions. Recall that is the true-positive probability and is the false-negative probability.

###### Theorem 3.

Consider a predicted arrival for user .

(A) Suppose . Define

(27) |

(i) If , then the optimal pre-service policy is to transmit the packet at every time-slot once it enters the prediction window, until it is either successfully delivered to the destination or revealed to be a false-alarm. The value function is given by:

(28) | |||

where . (ii) If , the value function is given by:

(29) |

where . Specially, if , the optimal policy is to not pre-serve the packet and to wait until it enters the system (if it is a true-positive),

(B) If , . The optimal policy is to not transmit the packet at all if .

###### Proof.

See Appendix B. ∎

###### Remark 2.

Similar with Theorem 1, when , if there exist such that , then the optimal policy is to preserve packets with certain probability and to transmit packets with certain probability , such that .

Theorem 3 shows that for a predicted user arrival with , if the server waits until it enters the system, then does its best to deliver it, the expected reward is . This is consistent with our intuition compared with (22), as that the probability that a predicted arrival is real is . Also note that although does not appear in , we will see in the following corollary that it indirectly impacts the final timely-throughput by affecting the effective arrival rate as in (19).

Note that can intuitively be viewed as the weight put on resource consumption compared to reward. Hence, when , the true-positive rate is large enough such that pre-transmitting the packet in one time slot, i.e., at time-slot , will increase the value function. Under this circumstance, Theorem 3 shows that the optimal scheduling is to transmit the packet as early as possible.

###### Corollary 2.

Combining Corollaries 1 and 2, we conclude the following theorem regarding the improvement in timely-throughput with imperfect prediction.

###### Theorem 4.

Suppose . The weighted timely-throughput improvement satisfies

(33) | |||

and

(34) | |||

where is given in (19).

###### Proof.

Similar to the proof of Theorem 2. ∎

###### Remark 3.

Similar to Theorem 2, we still have in the imperfect prediction case that , the improvement scales in the order of , which recovers the perfect prediction result when and . This general result shows how different parameters affect the optimal timely-throughput, and will be useful for guiding prediction and control algorithm design for general computing systems.

### 4.2 Influence of Prediction Accuracy

We now investigate how prediction accuracy impacts performance improvement. The following theorem summarizes our results.

###### Theorem 5.

Let denote the false-negative rate vector, and let be the true-positive rate vector. Then,

(i) The optimal weighted timely-throughput is a non-increasing function of .

(ii) If , then the optimal weighted timely-throughput is a non-decreasing function of .

###### Proof.

See Appendix C. ∎

The results, though intuitive, are non-trivial to established and require detailed investigation of the structure of the value functions. The impact of general pairs, on the other hand, is much more complicated to characterize, as we will see in Fig. 6 in the simulation section.

## 5 The General Scenario

We now return to the general case. We first show that the optimal policy in the perfect prediction case (including zero prediction) is monotone with respect to the time-to-deadline.

###### Theorem 6.

Suppose is a concave strictly increasing function of . In the perfect prediction case, define as the optimal scheduling decision for a user packet at state . We have:

(35) |

where (where denotes the zero prediction case).

###### Proof.

See Appendix D. ∎

Theorem 6 shows that the optimal scheduling policy is a “lazy” policy, i.e., the server will not try hard to transmit the packet unless it is getting close to the deadline. From the proof of Theorem 6, we see that this is because the value function under the same channel state is monotonically non-decreasing with the time-to-deadline. Thus, the server will try to spend less resource at the beginning to see if the packet can luckily get through, and spend more resource when the deadline is getting close, so as to pursue the reward of successful delivery. This result nicely complements existing efficient scheduling results in the literature [31], [32], [33], and can be viewed as an extension to the predictive online scheduling setting.

It is tempting to conclude that similar property also holds in the general imperfect prediction case. However, *this monotonicity actually does not hold under imperfect prediction.*
This is because the value function is no longer monotone due to prediction error.
This fact will be illustrated by simulation in Section 6, where we see that the resource level can actual decrease with a smaller time-to-deadline (See User ’s strategy in Table 3).

### 5.1 Throughput Improvement

In this subsection, we investigate the throughput improvement due to prediction in the general scenario.

#### Perfect prediction

First we consider the perfect prediction case. Define and . Both and are well defined since there is a complete order on based on (see Section 2.2). We then have the following theorem (the zero prediction case is a special case, i.e., for all ).

###### Theorem 7.

For , define

(36) |

where , and

(37) | |||

where . For each user packet, given a fixed , the value function satisfies:

(38) |

where .

###### Proof.

See Appendix E. ∎

We then immediately have the following corollary, which shows that the functions and enable us to bound the optimal timely-throughput with prediction.

###### Corollary 3.

We similarly define and for the zero prediction case. Then, the following theorem characterizes the improvement in weighted timely-throughput from prediction.

###### Theorem 8.

Let and . The weighted timely-throughput improvement satisfies:

(43) |

###### Proof.

Similar to the proof of Theorem 2. ∎

#### Imperfect prediction

We now turn to the imperfect prediction case. Recall that is the true-positive probability and is the false-negative probability.

###### Theorem 9.

###### Proof.

See Appendix F. ∎

Interestingly, we will see in the proof that the three terms in in (46) correspond to not transmitting, transmitting after packet arrival, and proactive transmission.

From Theorem 9, we have the following corollary that bounds the optimal timely-throughput.

###### Corollary 4.

Define