# Joint Scheduling and Resource Allocation in OFDMA Downlink Systems via ACK/NAK Feedback

###### Abstract

In this paper, we consider the problem of joint scheduling and resource allocation in the OFDMA downlink, with the goal of maximizing an expected long-term goodput-based utility subject to an instantaneous sum-power constraint, and where the feedback to the base station consists only of ACK/NAKs from recently scheduled users. We first establish that the optimal solution is a partially observable Markov decision process (POMDP), which is impractical to implement. In response, we propose a greedy approach to joint scheduling and resource allocation that maintains a posterior channel distribution for every user, and has only polynomial complexity. For frequency-selective channels with Markov time-variation, we then outline a recursive method to update the channel posteriors, based on the ACK/NAK feedback, that is made computationally efficient through the use of particle filtering. To gauge the performance of our greedy approach relative to that of the optimal POMDP, we derive a POMDP performance upper-bound. Numerical experiments show that, for slowly fading channels, the performance of our greedy scheme is relatively close to the upper bound, and much better than fixed-power random user scheduling (FP-RUS), despite its relatively low complexity.

Keywords: OFDMA downlink, scheduling and resource allocation, ACK/NAK feedback, particle filters.

## I Introduction

In the downlink of a wireless orthogonal frequency division multiple access (OFDMA) system, the base station (BS) must deliver data to a set of users whose channels may vary in both time and frequency. Since bandwidth and power resources are limited, data delivery must be carried out efficiently, e.g., by pairing users with strong subchannels and by distributing power across users in the most effective manner. Often, the BS must also adhere to per-user quality-of-service (QoS) constraints. Overall, the BS faces the challenging problem of jointly scheduling users across subchannels, optimizing their modulation-and-coding schemes, and allocating a limited power resource to maximize some function of per-user throughputs.

The OFDMA scheduling-and-resource-allocation problem has been addressed in a number of studies that assume the availability of perfect channel state information (CSI) at the BS (e.g., [1, 2, 3, 4, 5, 6, 7]). In practice, however, it is difficult for the BS to maintain perfect CSI (for all users and all subchannels), since CSI is most easily obtained at the user terminals, and the bandwidth available for feedback of CSI to the BS is scarce. Hence, practical resource allocation schemes use some form of limited feedback [8], such as quantized channel gains.

In this work, we consider the exclusive use of ACK/NAK feedback, as provided
by the automatic repeat request (ARQ) [9] mechanism present
in most wireless downlinks. We assume standard ARQ,^{1}^{1}1
The approach we develop in this paper could be easily extended to other
forms of link-layer feedback, e.g., Type-I and Type-II Hybrid ARQ.
For simplicity and ease of exposition, however, we consider only
standard ARQ.
where every scheduled user provides the BS with either an acknowledgment
(ACK), if the most recent data packet has been correctly decoded, or a
negative acknowledgment (NAK), if not. Although ACK/NAKs do not provide
direct information about the state of the channel, they do provide
relative information about channel quality that can be used for
the purpose of transmitter adaptation (e.g., [10, 11]).
For example, if an NAK was received for a particular packet, then it is
likely that the subchannel’s signal-to-noise ratio (SNR) was below that
required to support the transmission rate used for that packet.
We consider the exclusive use ACK/NAK feedback provided by the
link layer, because this allows us to completely avoid any additional
feedback, such as feedback about quantized channel gains.

There are interesting implications to the use of (quantized) error-rate feedback (like ACK/NAK) for transmitter adaptation, as opposed to quantized channel-state feedback. With error-rate feedback, the transmission parameters applied at a given time-slot affect not only the throughput for that slot, but also the corresponding feedback, which will impact the quality of future transmitter-CSI, and thus future throughput. For example, if the transmission parameters are chosen to maximize only the instantaneous throughput, e.g., by scheduling those users that the BS believes are currently best, then little will be learned about the changing states of other user channels, implying that future scheduling decisions will be compromised. On the other hand, if the BS schedules not-recently-scheduled users solely for the purpose of probing their channels, then instantaneous throughput will be compromised. Thus, when using error-rate feedback, the BS must navigate the classic tradeoff between exploitation and exploration [12].

In this work, we propose a scheme whereby the BS uses ACK/NAK feedback to maintain a posterior channel distribution for every user and, from these distributions, performs simultaneous user subchannel-scheduling, power-allocation, and rate-selection. In doing so, the BS aims to maximize an expected, long-term, generic utility criterion that is a function of the per-user/channel/rate goodputs. Our use of a generic utility-based criterion allows us to handle, e.g., sum-capacity maximization, throughput maximization under practical modulation-and-coding schemes, and throughput-based pricing (e.g., [13, 14, 15]), as discussed in the sequel. To this end, we exploit our recent work [16], which offers an efficient near-optimal scheme for utility-based OFDMA resource allocation under distributional CSI. Our use of ACK/NAK-feedback, however, makes our problem considerably more complicated than the one considered in [16]. For example, as we show in the sequel, the optimal solution to our expected long-term utility-maximization problem is a partially observable Markov decision process (POMDP) that would involve the solution of many mixed-integer optimization problems during each time-slot. Due to the impracticality of the POMDP solution, we instead consider (suboptimal) greedy utility-maximization schemes. As justification for this approach, we first establish that the optimal utility maximization strategy would itself be greedy if the BS had perfect CSI for all user-subchannel combinations. Moreover, we establish that the performance of this perfect-CSI (greedy) scheme upper-bounds the optimal ACK/NAK-feedback-based (POMDP) scheme. We then propose a novel, greedy utility-maximization scheme whose performance is shown (via the upper bound) to be close to optimal. Finally, due to the computational demands of tracking the posterior channel distribution for every user, we propose a low-complexity implementation based on particle filtering.

We now describe the relation of our work to the existing literature [17, 18, 19]. In [17], a learning-automata-based user/rate scheduling algorithm was proposed to maximize system throughput based on ACK/NAK feedback while satisfying per-user throughput constraints. While [17] considered a single channel, we consider joint user/rate scheduling and power allocation in a multi-channel OFDMA setting. In [18], a state-space-based approach was taken to jointly schedule users/rates and allocate powers in downlink OFDMA systems under slow-fading channels in the presence of ACK/NAK feedback and imperfect subchannel-gain estimates at the BS. In particular, assuming a discrete channel model, goodput maximization was considered under a target maximum packet-error probability constraint and a sum-power constraint across all time-slots. Its solution led to a POMDP which was solved using a dynamic-program. While the approach in [18] is applicable to only goodput maximization under discrete-state channels, ours is applicable to generic utility maximization problems under continuous-state channels. Furthermore, our approach is based on particle filtering and lends itself to practical implementation. In [19], the user/rate scheduling and power allocation problem in OFDMA systems with quasi-static channels and ACK/NAK feedback was formulated as a Markov Decision Process and an efficient algorithm was proposed to maximize achievable sum-rate while maintaining a target packet-error-rate and a sum-power constraint over a finite time-horizon. Apart from assuming a discrete-state quasi-static channel model, the scope of this work was limited by two other assumptions: i) in each time-slot, the BS scheduled only one user across all subchannels for data transmission, and ii) all users decoded the broadcasted data-packet and sent ACK/NAK feedback to the BS. In contrast, we consider the scenario where multi-user diversity is efficiently exploited by scheduling different users across different subchannels, and only the scheduled users report ACK/NAK feedback. Furthermore, we consider general utility maximization under continuous-state time-varying channels, and propose a polynomial-complexity joint scheduling and resource allocation scheme with provable performance guarantees.

The rest of the paper is organized as follows. In Section II, we outline the system model and, in Section III, we investigate the optimal scheduling and resource allocation scheme. Due to the implementation complexity of the optimal scheme, we propose a suboptimal greedy scheme in Section IV that maintains posterior channel distributions inferred from the received ACK/NAK feedback. In Section V, we show how these posteriors can be recursively updated via particle filtering. Numerical results are presented in Section VI, and conclusions are stated in Section VII.

## Ii System Model

We consider a packetized downlink OFDMA system with a pool of users. During each time slot, the BS (i.e., “controller”) transmits packets of data, composed of codewords from a generic signaling scheme, through OFDMA subchannels (with ). Each packet propagates through a fading channel on the way to its intended mobile user, where the fading channel is assumed to be time-invariant over the packet duration, but is allowed to vary across packets in a Markovian manner. Henceforth, we will use “time” when referring to the packet index. At each time-instant, the BS must decide—for each subchannel—which user to schedule, which modulation-and-coding scheme (MCS) to use, and how much power to allocate.

We assume choices of MCS, where the MCS index corresponds to a transmission rate of bits per packet and a packet error rate of the form under transmit power and squared subchannel gain (SSG) , where and are constants [20]. Let represent the combination of user and MCS over subchannel . In the sequel, we use , , and to denote—respectively—the power allocated to, the SSG experienced by, and the error rate of the combination at time . Additionally, we denote the scheduling decision by , where indicates that user/rate was scheduled on subchannel at time , whereas indicates otherwise. Since we assume that only one user/rate can be scheduled on a given subchannel at a given time , we have the “subchannel resource” constraint for all . We also assume a “sum-power constraint” of the form for all .

Our goal in scheduling and resource allocation is to maximize an expected long-term utility criterion that is a function of the per-user/rate/subchannel goodputs, i.e., . Here, denotes the goodput contributed by user with MCS on subchannel at time , which can be expanded as . Meanwhile, is a generic utility function that we assume (for technical reasons) is twice differentiable, strictly-increasing, and concave, with . We use to transform goodput into other metrics that are more meaningful from the perspective of quality-of-service (QoS), fairness [21], or pricing (e.g., [13, 14, 15]). For example, to maximize sum-goodput, one would simply use . To enforce fairness across users, one could instead maximize weighted sum-goodput via , where are appropriately chosen user-dependent weights. To maximize sum capacity, i.e., , one would choose and for . To incorporate user-fairness into capacity maximization, one could instead choose , where again are appropriately chosen user-dependent weights [20].

For each time , the BS performs scheduling and resource allocation based on posterior distributions on the SSGs inferred from previously received ACK/NAK feedback. In the sequel, we write the ACK/NAK feedback about the packet transmitted to user across subchannel at time by , where indicates an ACK, indicates a NAK, and covers the case that user was not scheduled on subchannel at time . Thus, in the case of an infinite past horizon and a feedback delay of packets, the BS would have access to the feedbacks for time- scheduling.

## Iii Optimal Scheduling and Resource Allocation

In this section, we describe the optimal solution to the problem of scheduling and resource allocation over the finite time-horizon . For this purpose, some additional notation will be useful. To denote the collection of all time- scheduling variables , we use . To denote the collection of all time- powers , we use . To denote the collection of all time- ACK/NAK feedbacks we use , and to denote the collection of all time- user- feedbacks we use .

For time- scheduling and resource allocation, the controller has access to the previous feedback , scheduling decisions , and power allocations . It then uses this knowledge to determine the schedule and power allocation maximizing the expected utility of the current and remaining packets:

(1) | |||||

where the domain of is , the domain of is , and . The expectation in (1) is jointly over the squared subchannel gains (SSGs) . Using the abbreviations and , the optimal expected utility over the remaining packets can be written (for ) as

(2) |

For a unit-delay^{2}^{2}2
For the case, the Bellman equation is more complicated, and so
we omit it for brevity.
system (i.e. ), the following Bellman equation [22]
specifies the corresponding finite-horizon dynamic program:

(3) | |||||

where the second expectation is over the feedbacks . The solution obtained by solving (1) is typically referred to as a partially observable Markov decision process (POMDP) [12].

The definition of implies that the controller has an uncountably infinite number of possible actions. Although this could be circumvented (at the expense of performance) by restricting the powers to come from a finite set, the problem would remain very complex due to the continuous-state nature of the SSGs . While these SSGs could then be quantized (causing additional performance loss), the problem would still remain computationally intensive, since POMDPs (even with finite states and actions) are PSPACE-complete, i.e., they require both complexity and memory that grow exponentially with the horizon [23]. To see why, notice from (3) that the solution of the problem at every time depends on the optimal solution at times up to . Because both terms on the right side of (3) are dependent on , however, the solution of the problem at time also depends on the solution of the problem at time , which in turn depends on the solution of the problem at time , and so on. In conclusion, the optimal controller is not practical to implement, even under power/SSG quantization.

Consequently, we will turn our attention to (sub-optimal) greedy strategies, i.e., those that do not consider the effect of current actions on future utilities. To better understand their performance relative to that of the optimal POMDP, we derive an upper bound on POMDP performance.

### Iii-a The “Causal Global Genie” Upper Bound

Our POMDP-performance upper-bound, which we will refer to as the “causal global genie” (CGG), is based on the presumption of perfect error-rate feedback of all previous user/subchannel combinations, i.e., . For comparison, the ACK/NAK feedback available to the POMDP is a form of degraded error-rate feedback on previously scheduled user/subchannel combinations. Since, given knowledge of and for any rate index , the SSG can be obtained by simply inverting the error-rate expression , our genie-aided bound is based, equivalently, on perfect feedback of all previous SSGs . In the sequel, we use to denote the collection of all time- SSGs , and we define .

We characterize the CGG as “global” since it uses feedback from all user/subchannel combinations, not just the previously scheduled ones. Although a tighter bound might result if the (perfect) error-rate feedback was restricted to only previously scheduled user/subchannel pairs, the bounding solution would remain a POMDP with an uncountable number of state-action pairs, making it impractical to evaluate. Evaluating the performance of the CGG, however, is straightforward since—under CGG feedback—optimal scheduling and resource maximization can be performed greedily. To see why, notice that, for any scheduling time , the CGG scheme allocates resources according to the following mixed-integer optimization problem:

(4) | |||||

Since the choice of does not depend on the choice of , the previous optimization problem simplifies to

(5) |

In the following lemma, we formally establish that the utility achieved by the CGG upper-bounds that achieved by the optimal POMDP controller with ACK/NAK feedback.

###### Lemma 1

Given arbitrary past allocations , and the corresponding ACK/NAKs , the expected total utility for optimal resource allocation under the latter feedback is no higher than the expected total utility under CGG feedback, i.e.,

(6) |

The proof of the above lemma follows the same steps as the proof of [11, Lemma ], which is omitted here to save space. In the next section, we detail the greedy scheduling and resource allocation problem and propose a near-optimal solution.

## Iv Greedy Scheduling and Resource Allocation

The greedy scheduling and resource allocation (GSRA) problem is defined as follows.

GSRA | (7) | ||||

Note that, in contrast to the -horizon objective (1), the greedy objective (7) does not consider the effect of on future utility. As stated earlier, we allow to be any real-valued function that is twice differentiable, strictly-increasing, and concave, with . Therefore, and , using to denote the derivative.

Since it involves both discrete and continuous optimization variables, the GSRA problem (7) is a mixed-integer optimization problem. Such problems are generally NP-hard, meaning that polynomial-complexity solutions do not exist. Thus, in Section IV-B, we propose a near-optimal algorithm for (7) with polynomial complexity. To better explain that scheme, we first describe, in Section IV-A, a “brute force” optimal solution whose complexity grows exponentially in , the number of subchannels.

### Iv-a Brute-Force Algorithm

The brute-force approach considers all possibilities of , each with the corresponding optimal power allocation. Supposing that , the optimal power allocation can be found by solving the convex optimization problem

(8) |

To proceed, we identify the Lagrangian associated with (8) as

(9) | |||||

which yields the corresponding dual problem

(10) |

where and denote the optimal Lagrange multiplier and power allocation, respectively.

A detailed solution to (10) is given in [16], and so we describe only the main points here. First, for a given value of the Lagrange multiplier , it has been shown that the optimal powers equal

(11) |

where is defined as the (unique) solution to

(12) |

Then, for a given , the optimal value of (i.e., ) obeys , where

(13) | |||||

(14) |

and satisfies .

Based on (11)-(14), Table I details the brute-force steps for a given . In the end, for a specified tolerance , these steps find and such that and . Using an approximation of that lies in , the corresponding utility is guaranteed to be no less than from the optimal (for the given ). Therefore, by adjusting , one can achieve a performance arbitrarily close to the optimum. Since values of must be considered, the total complexity of the brute-force approach—in terms of the number of times (12) must be solved—can be shown to be

(15) |

which grows exponentially with .

### Iv-B Proposed Algorithm

We propose to attack the mixed-integer GSRA problem (7) using the well known Lagrangian relaxation approach [22]. In doing so, we relax the domain of the scheduling variables from the set to the interval , allowing the application of low-complexity dual optimization techniques. Although the solution to the relaxed problem does not necessarily coincide with that of the original greedy problem (7), we establish in the sequel that the corresponding performance loss is very small, and in some cases zero.

The relaxed version of the greedy problem (7) is

rGSRA | (16) | ||||

where . Although (16) is a non-convex optimization problem due to non-convex constraints, it can be converted into a convex optimization problem by using the new set of variables , where . In this case, we have

(17) |

where denotes the collection of all time- variables , denotes element-wise non-negativity, and is defined as

(18) |

The modified problem (17) is a convex optimization problem and can be solved using a dual optimization approach with zero duality gap. In particular, the dual problem can be written as

(19) | |||||

where

(20) |

where is the optimal for a given , where denotes the optimal for a given , and where denotes the optimal .

A detailed solution to this problem was given in [16], and so we describe only the main points here. For given values of and , we have , where

(21) |

and where is defined as the (unique) solution to

(22) |

To give equations that govern for a given , we first define

(23) | |||

(24) |

If is a null or a singleton set, then the optimal schedule on subchannel is given by

(25) |

However, if has cardinality greater than one, then multiple combinations can be scheduled simultaneously while achieving the optimal value of the Lagrangian. In particular, if , then

(26) |

where the vector lies anywhere in the unit- simplex, i.e., it lives within the region and satisfies . Finally, the optimal Lagrange multiplier (i.e., ) is such that and

(27) |

For several fixed values of , the proposed algorithm minimizes the relaxed Lagrangian (20) over (or, equivalently, over ) to obtain candidate solutions for the original greedy problem (7). If, for a given , for all (i.e., the candidate employs at most one user/MCS per subchannel), then the candidate solution is admissible for the non-relaxed problem, and thus retained by the proposed algorithm. If, on the other hand, for some (i.e., the candidate employs more than one user/MCS on some subchannels), then the proposed algorithm transforms the candidate into an admissible solution as follows:

(28) |

The following lemma then states an important property of these fixed- admissible solutions.

###### Lemma 2

Lemma 2 (see [16] for a proof) implies that the optimal value of the Lagrange multiplier (i.e., ) is the one that achieves the power constraint . To find this , the proposed algorithm performs a bisection search over that refines the search interval until , where is a user-defined tolerance. Then, between the two schedules , it chooses the one that maximizes utility, reminiscent of the brute-force algorithm. Table II summarizes the proposed algorithm.

The complexity of the proposed algorithm—in terms of number of times (22) is solved—is

(29) |

which is significantly less than the brute-force complexity in (15). Although the proposed algorithm is sub-optimal, the difference between the optimal GSRA utility and that attained by the proposed algorithm , as , can be bounded as follows [16]:

(30) | |||||

(31) |

In Section VI, we evaluate (30) by simulation, and show that the performance loss is negligible.

## V Updating the Posterior Distributions from ACK/NAK Feedback

In this section, we propose a recursive procedure to compute the posterior
pdfs
required by the proposed greedy algorithm in Table II
when the channel is first-order^{3}^{3}3
The extension to higher-order Markov channels is straightforward.
Markov.

Let the time- user- channel be described by the discrete-time channel impulse response , where denotes transpose. The corresponding frequency-domain subchannel gains are then given by

(32) |

where the OFDMA modulation matrix contains the first columns of the -DFT matrix. Assuming additive white Gaussian noise with unit variance, the SSG of subchannel for user is given by , and so we can write

(33) |

with , where is the Dirac delta and is the column of the identity matrix. Using the channel’s Markov property and Bayes rule, we find that

(34) | |||||

(35) |

where denotes the set-difference operator. Using the fact that , along with the fact that is a deterministic function of (and therefore of ), we then have from (35) that