DelayOptimal Probabilistic Scheduling with
Arbitrary Arrival and Adaptive Transmission
Abstract
In this paper, we aim to obtain the optimal delaypower tradeoff and the corresponding optimal scheduling policy for arbitrary i.i.d. arrival process and adaptive transmissions. The number of backlogged packets at the transmitter is known to a scheduler, who has to determine how many backlogged packets to transmit during each time slot. The power consumption is assumed to be convex in transmission rates. Hence, if the scheduler transmits faster, the delay will be reduced but with higher power consumption. To obtain the optimal delaypower tradeoff and the corresponding optimal policy, we model the problem as a Constrained Markov Decision Process (CMDP), where we minimize the average delay given an average power constraint. By steadystate analysis and Lagrangian relaxation, we can show that the optimal tradeoff curve is decreasing, convex, and piecewise linear, and the optimal policy is thresholdbased. Based on the revealed properties of the optimal policy, we develop an algorithm to efficiently obtain the optimal tradeoff curve and the optimal policy. The complexity of our proposed algorithm is much lower than a general algorithm based on Linear Programming. We validate the derived results and the proposed algorithm through Linear Programming and simulations.
I Introduction
In this paper, we study an important problem of how to schedule the number of packets to transmit over a link taking into account both the delay and the power cost. This is an important problem because delay is a vital metric for many emerging applications (e.g., instant messenger, social network service, streaming media, and so on), and power consumption is critical to battery life of various mobile devices. In other words, we are studying the tradeoff between the timeliness and greenness of the communication service.
Such a delaypower scheduling problem can be formulated using a Markov Decision Process (MDP). The authors in [1] were among the earliest who studied this type of scheduling problem. Specifically, they considered a twostate channel and finite time horizon. The dual problem was solved based on results derived by Dynamic Programming and induction. Followup papers [2, 3, 4, 5] extended this study in various directions. The optimal delaypower tradeoff curve is proven to be nonincreasing and convex in [2]. The existence of stationary optimal policy and the structure of the optimal policy are further investigated in [3]. Different types of power/rate control policies are studied in [4]. In [5], the asymptotic smalldelay regime is investigated. In [6], a piecewise linear delaypower tradeoff curve was obtained along with an approximate closed form expression.
If one can show monotonicity or a threshold type of structure to the optimal policy for MDPs, it helps to substantially reduce the computation complexity in finding the optimal policy. Indeed, the optimal scheduling policies are shown to be thresholdbased or monotone in [1, 3, 5, 7, 8, 9, 10], proven by studying the convexity, superadditivity / subadditivity, or supermodularity / submodularity of expected cost functions by induction using dynamic programming. However, most of these results are limited to the unconstrained Lagrangian Relaxation problem. In [3, 10], some properties of the optimal policy for the constrained problem are described based on the results for the unconstrained problem. Detailed analysis on the optimal policy for the constrained problem is conducted in [8, 9]. In [8], properties such as unichain policies and multimodularity of costs are assumed to be true so that monotone optimal policies can be proven. In [9], the transmission action is either 1 or 0, i.e. to transmit or not. In order to obtain the detailed structure of the solution to the constrained problem, we believe that the analysis of the Lagrangian relaxation problem and the analysis of the structure of the delaypower tradeoff curve should be combined together.
In [11], we study the optimal delaypower tradeoff problem. In particular, we minimize the average delay given an average power constraint, considering Bernoulli arrivals and adaptive transmissions. Some technical details are given in [12], where we proved that the optimal tradeoff curve is convex and piecewise linear, and the optimal policies are thresholdbased, by Constrained Markov Decision Process formulation and steadystate analysis. In this paper, we substantially generalize the Bernoulli arrival process to an arbitrary i.i.d. distribution. We show that the optimal policies for this generalized model are still thresholdbased. Furthermore, we develop an efficient algorithm to find the optimal policy and the optimal delaypower tradeoff curve.
The remainder of this paper is organized as follows. The system model and the constrained problem are introduced in Section II. We show that the optimal policy is thresholdbased in Section III by using steadystate analysis and Lagrangian relaxation. Based on theoretical results, we propose an efficient algorithm in Section IV to obtain the optimal tradeoff curve and the corresponding policies. In Section V, theoretical results and the proposed algorithm are verified by simulations. Section VI concludes the paper.
Ii System Model
The system model is shown in Fig. 1. We assume there are data packet(s) arriving at the end of the th timeslot. The number is i.i.d. for different values of and its distribution is given by , where , , and . Therefore the expected number of packets arrived in each timeslot is given by .
Let denote the number of data packets transmitted in timeslot . Assume that at most packets can be transmitted in each timeslot because of the constraints of the transmitter, and . Let denote the transmission power consumed in timeslot . Assume transmitting packet(s) will cost power , where , therefore . Transmitting packet will cost no power, hence . In typical communications, the power efficiency decreases as the transmission rate increases, hence we assume that is convex in . Detailed explanations can be found in the Introduction section in [12]. The convexity of the power consumption function will be utilized in Theorem 2 to prove that the optimal policy for the unconstrained problem is thresholdbased.
Backlog packets are stored in a buffer with size . Let denote the queue length at the beginning of timeslot . Since data arrive at the end of the timeslot, in order to avoid buffer overflow (i.e. ) and underflow (i.e. ), we should have . Therefore the dynamics of the buffer is given as
(1) 
In timeslot , we can decide how many packets to be transmitted based on the buffer state . It can be seen that this is a Markov Decision Process (MDP), where the queue length is the state of the MDP, and the number of packets transmitted in each timeslot is the action we take in each timeslot . The probability distribution of the next state is given by
(2) 
We minimize the average queueing delay given an average power constraint, which makes it a Constrained Markov Decision Process (CMDP). For an infinitehorizon CMDP with stationary parameters, according to [13, Theorem 11.3], stationary policies are complete, which means stationary policies can achieve the optimal performance. Therefore we only need to consider stationary policies in this problem. Let denote the probability to transmit packet(s) when , i.e.,
(3) 
Then we have for . Since we guarantee that the transmission strategy will avoid overflow or underflow, we set
(4) 
Let denote a matrix whose element in the th row and the th column is . Therefore matrix can represent a stationary transmission policy. Let and denote the average power consumption and the average queueing delay under policy . Let denote the set of all feasible stationary policies that guarantee no queue overflow or underflow. Let denote the set of all stationary and deterministic policies which can guarantee no overflow or underflow. Thus to obtain the optimal tradeoff curve, we can minimize the average delay given an average power constraint shown as
(5a)  
s.t.  (5b) 
From another perspective, policy will determine a point in the delaypower plane. Define as the set of all feasible points in the delaypower plane. Intuitively, since the power consumption for each data packet increases if we want to transmit faster, there is a tradeoff between the average queueing delay and the average power consumption. Thus the optimal delaypower tradeoff curve can be presented as .
If we fix a stationary policy for a Markov Decision Process, the Markov Decision Process will degenerate to a Markov Reward Process (MRP). Let denote the transition probability from state to state . According to the system model, because of the constraints of transmission and arrival processes, the state transition probability can be derived as
(6) 
An example of the transition diagram is shown in Fig. 2, where for are omitted to keep the diagram legible.
The Markov chain could have more than one closed communication classes under certain transmission policies. Under this circumstance, the limiting probability distribution and the average cost are dependent on the initial state and the sample paths. In Appendix A, it is proven that we only need to consider the cases where the Markov chain has only one closed communication class, which is called a unichain. Becausae of this key result, we focus only on the unichain cases in the following.
Iii Optimal ThresholdBased Policy for the Constrained Markov Decision Process
In this section, we will demonstrate that the optimal policy for the Constrained MDP problem is thresholdbased. In other words, for an optimal policy, more data will be transmitted if the queue is longer. We give the rigorous definition of a stationary thresholdbased policy that, there exist thresholds , such that only when (set for simplicity of notation). According to this definition, under policy , when the queue state is larger than threshold and smaller than , it transmits packet(s). When the queue state is equal to threshold , it transmits or packet(s). Note that under this definition, probabilistic policies can also be thresholdbased.
In the following, we will first conduct the steadystate analysis of the Markov process, based on which we can show the properties of the feasible delaypower region and the optimal delaypower tradeoff, and then by proving that the Lagrangian relaxation problem has a deterministic and thresholdbased optimal policy, we can finally show that the optimal policy for the constrained problem is thresholdbased.
Iiia Steady State Analysis
Since we can focus on unichain cases, which contain a single recurrent class plus possibly some transient states, the steadystate probability distribution exists for the Markov process. Let denote the steadystate probability for state when applying policy . Set . Define as a matrix whose element in the th column and the th row is , which is determined by policy . Set as the identity matrix. Define , and . Set . Set and .
According to the definition of the steadystate distribution, we have and . For a unichain, the rank of is . Therefore, we have is invertible and
(7) 
For state , transmitting packet(s) will cost with probability . Define , which is a function of . The average power consumption can be expressed as
(8) 
Similarly, define . According to Little’s Law, the average delay under policy is
(9) 
The following theorem describes the structure of the feasible delaypower region and the optimal delaypower tradeoff curve.
Theorem 1.
The set of all feasible points in the delaypower plane, , and the optimal delaypower tradeoff curve , satisfy that

The set is a convex polygon.

The curve is piecewise linear, decreasing, and convex.

Vertices of and are all obtained by deterministic scheduling policies.

The policies corresponding to adjacent vertices of and take different actions in only one state.
Proof:
See Appendix B. ∎
IiiB Optimal Deterministic ThresholdBased Policy for the Lagrangian Relaxation Problem
In (5), we formulate the optimization problem as a Constrained MDP, which is difficult to solve in general. Let denote the Lagrange multiplier. Consider the Lagrangian relaxation of (5)
(10) 
In (10), the term is constant. Therefore, the Lagrangian relaxation problem is minimizing the weighted average cost , which becomes an unconstrained infinitehorizon Markov Decision Process with an average cost criterion. It is proven in [14, Theorem 9.1.8] that, there exists an optimal stationary deterministic policy. Moreover, the optimal policy for the relaxation problem has the following property.
Theorem 2.
An optimal policy for the unconstrained Markov Decision Process is thresholdbased. That is to say, there exists thresholds , such that
(11) 
where .
Proof:
See Appendix C. ∎
IiiC Optimal ThresholdBased Policy for the Constrained Problem
From another perspective, can be seen as the inner product of vector and . Since is piecewise linear, decreasing and convex, the corresponding minimizing the inner product will be obtained by the vertices of , as can be observed in Fig. 3. Since the conclusion in Theorem 2 holds for any , the vertices of the optimal tradeoff curve can all be obtained by optimal policies for the Lagrangian relaxation problem, which are deterministic and thresholdbased. Moreover, from Theorem 1, the adjacent vertices of are obtained by policies which take different actions in only one state. Therefore, we can have the following theorem.
Theorem 3.
Given an average power constraint, the scheduling policy to minimize the average delay takes the following form: there exists thresholds , one of which we name , such that
(12) 
where .
Proof:
Since the optimal tradeoff curve is piecewise linear, assume is on the line segment between vertices and . According to Theorem 2, the form of optimal policies and , which are corresponding to vertices of the optimal tradeoff curve, satisfies (11). Moreover, according to Theorem 1, the policies corresponding to adjacent vertices of take different actions in only one state. Define the thresholds for as , then the thresholds for can be expressed as , where the two policies take different actions only in state . Since , the policy to obtain a point on the line segment between and is the convex combination of and , it should have the form shown in (12). ∎
We can see that the optimal policy for the Constrained Markov Decision Process may not be deterministic. At most two elements in the policy matrix , i.e. and , can be decimal, while the other elements are either 0 or 1. Policies in this form also satisfy our definition of stationary thresholdbased policy at the beginning of Section III.
Iv Algorithm to Efficiently Obtain the Optimal Tradeoff Curve
We design Algorithm 1 to efficiently obtain the optimal delaypower tradeoff curve and the corresponding optimal policies. Similar to [12], this algorithm takes advantage of the properties we have shown, i.e., the optimal delaypower tradeoff curve is piecewise linear, the vertices are obtained by deterministic thresholdbased policies, and policies corresponding to two adjacent vertices take different actions in only one state. Therefore given the optimal policy for a certain vertex, we can narrow down the alternatives of optimal policies for its adjacent vertex. The policies corresponding to points between two adjacent vertices can also be easily generated.
Construct where 
and for 
Draw the line segment connecting and 
Our proposed iterative algorithm starts from the bottomright vertex of the optimal tradeoff curve, whose corresponding policy is known to transmit as much as possible. Then for each vertex we have determined, we enumerate the candidates for the next vertex. According to the properties we have obtained, we only need to search for deterministic thresholdbased policies which take different actions in only one threshold. By comparing all the candidates, the next vertex will be determined by the policy candidate whose connecting line with the current vertex has the minimum absolute slope and the minimum length. Note that a vertex can be obtained by more than one policy, therefore we use lists and to restore all policies corresponding to the previous and the current vertices.
The complexity of this algorithm is much smaller than using general methods. Since during each iteration, one of the thresholds of the optimal policy will be decreased by 1, the maximum iteration times are . Within each iteration, we have thresholds to try. For each candidate, the most time consuming operation, i.e. the matrix inversion, costs . Therefore the complexity of the algorithm is .
In comparison, we also formulate a Linear Programming (LP) to obtain the optimal tradeoff curve. As demonstrated in [13, Chapter 11.5], all CMDP problems with infinite horizon and average cost can be formulated as Linear Programming. In our case, by taking as variables, we can formulate an LP with variables to minimize the average delay given a certain power constraint. Due to space limitation, we provide the LP without explanations.
(13a)  
s.t.  (13b)  
(13c)  
(13d)  
(13e)  
(13f) 
By solving the LP, we can obtain a point on the optimal tradeoff curve. If we apply the ellipsoid algorithm to solve the LP problem, the computational complexity is . It means that, the computation to obtain one point on the optimal tradeoff curve by applying LP is larger than obtaining the entire curve with our proposed algorithm. This demonstrates the inherent advantage of using the revealed properties of the optimal tradeoff curve and the optimal policies.
V Numerical Results
In this section, we validate our theoretical results and the proposed algorithm by conducting LP numerical computation and simulations. We consider a practical scenario with adaptive MPSK transmissions. The optional modulations are BPSK, QPSK, and 8PSK. Assume the bandwidth = 1 MHz, the length of a timeslot = 10 ms, and the target bit error rate ber=. Assume a data packet contains 10,000 bits, and in each timeslot the number of arriving packet could be 0, 1, 2 or 3. Then by adaptively applying BPSK, QPSK, or 8PSK, we can respectively transmit 1, 2, or 3 packets in a timeslot, which means . Assume the onesided noise power spectral density =150 dBm/Hz. The transmission power for different transmission rates can be calculated as J, J, J, and J. Set the buffer size as .
The optimal delaypower tradeoff curves are shown in Fig. 4 and Fig. 5. In each figure, we vary the arrival process to get different tradeoff curves. As can be observed, the tradeoff curves generated by Algorithm 1 perfectly match the Linear Programming and simulation results. As proven in Theorem 1, the optimal tradeoff curves are piecewise linear, decreasing, and convex. The vertices of the curves obtained by Algorithm 1 are marked by squares. The corresponding optimal policies can be checked as thresholdbased. The minimum average delay is 1 for all curves, because when we transmit as much as we can, all data packets will stay in the queue for exactly one timeslot. In Fig. 4, with the average arrival rate increasing, the curve gets higher because of the heavier workload. In Fig. 5, the three arrival processes have the same average arrival rate and different variance. When the variance gets larger, it is more likely that the queue size gets long in a short time duration, which leads to higher delay. It is interesting to characterize the effect of the variance in the arrival process, which we leave as a future work.
Vi Conclusion
In this paper, we extend our previous work to obtain the optimal delaypower tradeoff and the corresponding optimal scheduling policy considering arbitrary i.i.d. arrival and adaptive transmissions. The scheduler optimize the transmission in each timeslot according to the buffer state. We formulate this problem as a CMDP, and minimize the average delay to obtain the optimal tradeoff curve. By studying the steadystate properties and the Lagrangian relaxation of the CMDP problem, we can prove that the optimal delaypower tradeoff curve is convex and piecewise linear, on which the adjacent vertices are obtained by policies taking different actions in only one state. Based on this, the optimal policies are proven to be thresholdbased. We also design an efficient algorithm to obtain the optimal tradeoff curve and the optimal policies. Linear Programming and simulations are conducted to confirm the theoretical results and the proposed algorithm.
Appendix A Proof of the Equivalency to Reduce to Unichain cases
We claim that we can focus only on the unichain cases, because for any Markov process with multiple recurrent classes determined by a certain policy, we can design a policy which leads to a unichain Markov process having the same performance as any of the recurrent class. We strictly express the reason as a proposition below, and give the detailed proof.
Proposition 1.
In the Markov Decision Process with arbitrary arrival and adaptive transmission, if there is more than one closed communication class in the Markov chain generated by policy , which we define as , , where , then for any , there exists a policy , under which the Markov chain has as its only closed communication class. Furthermore, the steadystate distribution and the average cost of the Markov chain under starting from state are the same as the steadystate distribution and the average cost of the Markov chain under .
Proof:
Define the set of those transient states that have access to as . Define the set of transient states which don’t have access to as . Therefore is a partition of the states of the MDP. There should exists at least one state which is next to a state . We can always change the action in state such that state can access the set . After the modification, state will be a transient state which has access to . The states which communicate with will also be transient states which have access to .
We update the partition of states since the policy is changed. According to the above description, the set won’t change, while the cardinality of will be strictly increasing. Hence, by repeating the above operation for finite times, every state of the MDP will be partitioned in either or . The Markov chain generated by the modified policy has as its only closed communication class, and the modified policy is the we request.
Since the actions of states in are the same for policy and , the steadystate distribution and the average cost corresponding to policy starting from state are the same as those under policy . ∎
Appendix B Proof of Theorem 1
In order to prove Theorem 1, we will first prove a lemma showing that the mapping from to has a partially linear property in the first subsection. In the second subsection, we will prove that the set is a convex polygon, whose vertices are all obtained by deterministic scheduling policies, and the policies corresponding to adjacent vertices of take different actions in only one state. In the third subsection, we will prove that the set is piecewise linear, decreasing, and convex, whose vertices are obtained by deterministic scheduling policies, and the policies corresponding to adjacent vertices of take different actions in only one state.
In correspondence with Theorem 1, conclusion 1) in the theorem is proven in Subsection B, conclusion 2) is proven in Subsection C, and conclusion 3) and 4) are proven by combining results in Subsection B and C.
Ba Partially Linear Property of Scheduling Policies
Lemma 1.
and are two policies different only when , i.e., these two matrices are different only in the th row. Denote where . Then
1) There exists a certain so that and . Furthermore, parameter is a continuous nondecreasing function of .
2) When changes from 0 to 1, point moves on the line segment from to .
Proof:
In the following, the two conclusions of the lemma will be proven one by one.
1) According to the definition of and , we have that if , then and . Set and . Since and are different only in the th row, it can be derived that the th column of is the only column that can contain nonzero elements, and the th element of is its only nonzero element. Therefore can be expressed as , where is its th column, and can be expressed as , where is its th element. Based on this, we set
(14) 
Hence
(15) 
By mathematical induction, we can prove that for ,
(16)  
(17) 
and
(18) 
Therefore,
(19)  
(20) 
We have and . Hence
(21)  
(22)  
(23)  
(24)  
(25)  
(26) 
and
(27)  
(28)  
(29)  
(30) 
Hence , so that and . Furthermore, it can be seen that is a continuous nondecreasing function.
2) From the first part, we proved and is a continuous nondecreasing function of . When , we have . When , we have . Therefore when changes from 0 to 1, the point moves on the line segment from to . The slope of the line can be expressed as
(31)  
(32) 
∎
BB Properties of set
In this subsection, we will prove that , the set of all feasible points in the delaypower plane, is a convex polygon whose vertices are all obtained by deterministic scheduling policies. Moreover, the policies corresponding to adjacent vertices of take different actions in only one state.
Define as the convex hull of points corresponding to deterministic scheduling policies in the delaypower plane. Hence we will show that is a convex polygon whose vertices are all obtained by deterministic scheduling policies by proving .
The proof is made up of three parts. In Part I, we will prove by the construction method. Part II is the most difficult part. We will first define the concepts of basic polygons and compound polygons, then prove their convexity, based on which can be proven. By combining the results from Part I and II, we will have . Finally, in Part III, it will be shown that policies corresponding to adjacent vertices of are different in only one state.
Part I. Prove
For any probabilistic policy where , we construct
(33) 
and
(34) 
Since , and the fact that whenever , it must holds that