Delay Performance of the Multiuser MISO Downlink
We analyze a MISO downlink channel where a multi-antenna transmitter communicates with a large number of single-antenna receivers. Using linear beamforming or nonlinear precoding techniques, the transmitter can serve multiple users simultaneously during each transmission slot. However, increasing the number of users, i.e., the multiplexing gain, reduces the beamforming gain, which means that the average of the individual data rates decreases and their variance increases. We use stochastic network calculus to analyze the queueing delay that occurs due to the time-varying data rates. Our results show that the optimal number of users, i.e., the optimal trade-off between multiplexing gain and beamforming gain, depends on incoming data traffic and its delay requirements.
The capacity of wireless communication systems can be significantly increased when both the transmitter and the receiver are equipped with multiple antennas. Interestingly, similar capacity gains can also be achieved when a multi-antenna transmitter communicates simultaneously with multiple receivers that have only a single antenna each. In order to achieve the capacity of such a multi-user multiple-input single-output (MU-MISO) downlink channel, the transmitter must employ nonlinear precoding techniques like dirty paper coding (DPC) . However, linear precoding techniques are sufficient to achieve a large fraction of the capacity. A commonly used linear precoding scheme is zero-forcing beamforming (ZFBF), which projects the signal intended for a user into a subspace that is orthogonal to the channels of the other users. A transmitter with antennas can use ZFBF to send different data streams to users at a time. Increasing increases the multiplexing gain, but it decreases the beamforming gain due to a reduced dimensionality of the subspaces that are orthogonal to the other users. Thus, when becomes equal to , the linear growth in capacity is lost . Previous works, e.g. , have studied the optimal number of scheduled users , i.e., the optimal trade-off between the multiplexing gain and beamforming gain such that the ergodic capacity of the system is maximized.
However such an analysis of the ergodic sum capacity does not accurately reflect the performance when the system is subject to constraints on maximum delay, such as in live video or audio transmissions. This is due to two reasons. First, when the total number of users is larger than the number of antennas , then the transmitter can only schedule a subset of users in each transmission slot. In order to meet strict requirements on the delay, the scheduling scheme must ensure that each user is scheduled regularly. Second, large variations in the instantaneous data rates mean that the transmitter cannot always send all the available data. When the channel conditions are bad, the data must be stored in a buffer for transmission in subsequent time slots, causing a buffering or queueing delay.
I-a Related Work
Several works have studied the use of linear precoding in the multiuser MISO downlink, as nonlinear dirty paper coding techniques are difficult to implement in practice. Specifically, Yoo and Goldsmith  showed that when the total number of users in the system greatly exceeds the number of antennas, then ZFBF achieves asymptotically the same performance as DPC. However, their scheme assumes that the transmitter has channel state information (CSI) of all users. The cost of collecting this CSI would be overwhelming when the number of users is large. Sharif and Hassibi  reduce the overhead from collecting CSI by randomly creating a set of beamforming vectors and then transmitting only to the users which report the highest signal-to-interference-and-noise ratio (SINR) along those random beams. Although the scheduling probabilities of all users are equal, the scheduling of the users is random, which can result in unacceptably long delays for some users. Zhang et al.  studied the optimal number of scheduled users when the transmitter has only knowledge of the channel of the scheduled users, and also considered imperfect CSI. Ravindran and Jafar  also studied imperfect CSI due to quantized feedback. They found that collecting many bits of feedback (accurate CSI) from very few users is more beneficial than collecting few bits of feedback from many users, which supports the assumption in  that CSI should be obtained only for the scheduled users.
However, all of these works studied only the ergodic capacity of the MU-MISO downlink, and did not address the system performance under delay constraints. When the transmission rate varies due to channel fading, the transmitter cannot always transmit all data and must keep data in a buffer, causing a random queueing delay. This queueing delay can be analyzed through the frameworks of stochastic network calculus [8, 9] or effective capacity . Several authors [11, 12, 13] studied the effective capacity of MIMO systems considering only the single user case. Li et al.  investigated the effective capacity of multiuser MIMO systems. However, the authors make many assumptions that we do not consider practical, e.g., that the channel coefficients are non-fading and that there is always a backlog of data in each user’s queue.
In this paper, we analyze the queueing performance of the MU-MISO downlink using stochastic network calculus (SNC). We consider both linear ZFBF precoding and nonlinear dirty paper coding. We demonstrate that SNC can still be applied when the users are not scheduled in each transmission slot, but scheduled regularly in a round robin fashion. Based on previous results, we present closed-form expressions to analytically determine the distribution of the queueing delay. Our numerical results show that the optimal number of scheduled users changes when considering the delay performance instead of the ergodic capacity.
Ii System Model
We consider downlink transmissions in a time-slotted system from a single base station with antennas to users. We consider the case , where the transmitter cannot serve all users at once. Instead, in each time slot , only a subset of users are scheduled for transmission, with . Contrary to , we assume that the scheduling scheme does not depend on the channel states, as acquiring channel state information (CSI) for all users would result in an infeasible amount of overhead. Instead, we follow , where the channel is estimated only for the scheduled users. We assume that the base station has perfect CSI for all scheduled users.
We describe in Sec. II-A the physical layer transmission for the scheduled users . Round robin scheduling is presented in Sec. II-B. Then, we describe in Sec. II-C the queueing delay of the system on the link layer, followed by the problem statement in Sec. II-D.
Ii-a Physical Layer Model
The received signal at the scheduled users in time slot can be described as
For the channel matrix , we assume Rayleigh fading, i.e., all elements are independent and identically distributed (i.i.d.) with Gaussian distribution . Furthermore, we consider the quasi-static fading model where the channel remains constant for the duration of time slot , consisting of channel uses, and changes to an independent realization in the next time slot (note that the set of scheduled users also changes). The input signal is denoted as and must satisfy a short-term power constraint for each realization of . The noise has i.i.d. components .
Given the channel matrix , the transmitter must encode the data for the scheduled users into coded symbols . We now present two different encoding strategies.
Ii-A1 Zero-Forcing Beamforming (ZFBF)
When the transmitter applies ZFBF, the input signal vector is given by 
where is the precoding matrix, is the power allocation matrix, and is the vector of (independently) coded Gaussian symbols for the scheduled users. The ZFBF precoder is given as 
where is the Moore-Penrose pseudo-inverse of and is the normalization matrix such that the columns of have unit-2 norm. The variables are central chi-square distributed (scaled by a factor ) with degrees of freedom, where . Their PDF is given by [1, Lemma 4]
We asume that the blocklength of the channel code is sufficiently long, so that the system can achieve error-free transmission to user at a rate 
Ii-A2 Zero-Forcing with Dirty-Paper Coding (ZF-DPC)
For comparison, we also present a scheme known originally as ranked known interference (RKI) . Assume that the scheduled users are ordered from to . When a scheduled user is the -th ordered user, it experiences interference from the ordered users . The interference from those users is non-causally known at the transmitter. Therefore, the transmitter can employ dirty paper coding (DPC) when encoding the data for the -th ordered user, which allows sending data at the same rate as if no interference was present. Furthermore, if the ordered users apply zero-forcing (ZF) towards the users , then they will not interfere with the -th user. Thus, when user is the -th ordered user, it can achieve a rate , where the variables have central chi-square distribution (scaled by ) with degrees of freedom with . The PDF of is given by (4). Note that and the rates depend on the user ordering.
Ii-B Round Robin (RR) Scheduling
In the considered scenario, the number of users exceeds the number of transmit antennas . Therefore, the transmitter must schedule a subset of users in each time slot . We consider round robin (RR) scheduling as in , where multiple users can be scheduled in each time slot. Each user is scheduled exactly once within a superframe of slots. The average number of scheduled users per slot is then given as , with . As the total number of users is fixed, may not always be an integer, and thus the scheduler must sometimes select more than users, sometimes less. We assume that in of the subslots, users are served, in of the subslots, users are served, such that the total number of users served in the superframe is .
In order to maintain fairness between the users, the transmitter randomly assigns the users to the slots in each superframe. Furthermore, in case of ZF-DPC, where the performance depends on the encoding order of the users, we require that the users are ordered randomly.
Ii-C Link Layer Model
In time slot , data bits intended for downlink transmission to user arrive at the transmitter. The data is stored in a transmit buffer, with individual buffers (or queues) for each user. We assume that the arrival process is constant over time and equal for all users, with denoting the constant number of bits that arrive in the queue of each user in each time slot. The service rate offered by the wireless system in each time slot to a scheduled user is given by , and when . The departure process describes the amount of data that is transmitted to the receiver. Thus, is limited both by the amount of data waiting in the buffer, as well as by the service rate . The cumulative arrival, service, and departure processes are defined as
The delay is random. We want to find the probability that the delay of the data for user exceeds a specified target delay at any time :
Ii-D Problem Statement
In this work, we want to find the value that minimizes the delay violation probability . On the one hand, choosing a small value of means that only few users are scheduled in each time slot, so that their signals are transmitted with high beamforming gain and transmit power. However, this also results in a small multiplexing gain and a long time to schedule all users. On the other hand, a large value results in poor beamforming gain.
We note that the delay violation probability cannot be determined directly in an analytically tractable form. However, the delay violation probability can be analytically approximated/bounded using the frameworks of effective capacity  or stochastic network calculus [8, 9]. Effective capacity provides an approximation for that is tight for large . In this work, we perform the optimization of based on stochastic network calculus, as it provides a strict upper bound on that holds also for small .
In Sec. III-A, we present a summary of the delay analysis through stochastic network calculus in a transform domain . We demonstrate in Sec. III-B how stochastic network calculus can be used when round robin scheduling is used. In Sec. III-C, we analytically obtain the stochastic network calculus bounds for the considered scenario. Note that the transmission and scheduling strategies in Sec. II are fair, as the distribution of the service process is the same for all users. We assume that all users are subject to the same delay requirements and thus drop the subscript to shorten the notation.
Iii-a Stochastic Network Calculus (SNC)
The delay in (8) is defined in terms of the arrival and departure processes. However, the distribution of the delay can be found directly from the statistics of the arrival and service processes. We follow  and describe these processes in the exponential domain, also referred to as SNR domain. The arrival and service processes in the bit domain, and , are converted to the SNR domain (denoted by calligraphic letters) as
In this work, we assume constant arrivals with . Consider for now a service process that is independent and identically distributed (i.i.d.) between time slots. An upper bound on the delay violation probability can then be obtained in terms of the Mellin transforms of and . The Mellin transform of a nonnegative random variable is defined as 
For any parameter , the kernel provides an upper bound on the delay violation probability [9, 16]. This holds for any time slot , including the limit (steady-state). In order to find the tightest upper bound, one must find the parameter that minimizes :
Iii-B SNC and Round Robin Scheduling
For round robin scheduling, the delay analysis through stochastic network calculus as shown in Sec. III-A cannot be applied directly, as is zero in the time slots where the user is not scheduled, i.e., is not i.i.d. between time slots. However, stochastic network calculus can be applied on the superframe level. The service that a user receives in superframe is denoted as , and is i.i.d. between superframes, because each user is scheduled exactly once per superframe of length . The arrival process on the superframe level is given as bits, and the Mellin transform of the process in the SNR domain is .
Assume first that , where is maximum delay in time slots, is an integer: Then, the queueing analysis can easily be done on the superframe level:
In case is not an integer, some users (denoted as group 1) will be served times before the deadline, while others (group 2) will only be served times. For the sake of fairness, we assume that the users are assigned randomly to the slots. Then, the probability of being in the second group is , and . Thus, the overall bound on the delay violation probability is given by
Iii-C Delay Analysis for MU-MISO Downlink
The kernel (15) depends on the Mellin transform of the service offered to each user in each superframe. Users are scheduled exactly once in a superframe, so that has the same distribution as the service experienced by a scheduled user. The SNR-domain service process of a scheduled user is given as , with .
For ZFBF and ZF-DPC, is a scaled central variable with varying degrees of freedom as outlined in Sec. II-A. For ZFBF, we have , depending on the slot type . For ZF-DPC, , each with probability .
The transmitter is subject to a short-term power constraint . A simple power allocation strategy shares equally among the or scheduled users:
The Mellin transform of the service process can be obtained by averaging over the Mellin transforms of the service process with specific values of and :
where denotes the joint probability of a user’s channel having degrees of freedom () and power .111For ZFBF, is equal to or as given in (17). For ZF-DPC, the different can simply be obtained as .
For a specific constant power and a specific , the Mellin transform of the service process can be obtained as
We define and follow the derivations in  to obtain
and in (25), we applied the upper incomplete Gamma function
Thus, given the arrival rate in bits per time slot and a specific choice of superframe length (which determines the average number of scheduled users ), the upper bound (16) on can be obtained analytically through (18) and (25).
Iv Numerical Results
In Fig. 1, we show various aspects of the performance of a system with users and antennas. First, in Fig. (a)a, we show the expected service rate per slot vs. the average number of scheduled users for different values of the SNR . Note that the superframe length must always be integer, but is not always integer. In each superframe of time slots, the transmitter sends bits to each user. Thus, the expected service rate per user and per time slot is given as
For ZFBF, we observe for every SNR that the expected service rate first increases and then decreases in . At very small , an increase in means that more users are scheduled simultaneously, and the multiplexing gain from transmitting to multiple users outweighs the performance loss due to slightly decreased service rates of each user. However, at very large , the expected service rate decreases, because the relative increase in the number of scheduled users is small, whereas the beamforming gain is massively reduced. Furthermore, we observe that the value of that maximizes the expected service rate grows with the SNR. This is in line with previous results . For ZF-DPC, the behavior is different. In fact, the expected service rate is strictly increasing in for . When adding more users to a ZF-DPC transmission, the additional users do not create any interference towards the previous users. The only downside from adding more users to the ZF-DPC system is that a small fraction of the transmitted signal power is shared with the new users. For very low SNR , this effect may decrease the expected service at large , but even at , this effect remains almost unnoticeable.
In Fig. (b)b and Fig. (c)c, we consider the delay performance of the system for ZFBF and ZF-DPC, respectively, with different arrival rates , a maximum delay of time slots, and with . For ZFBF, Fig. (b)b shows that the delay violation probability, obtained from the analytical bound (16), remains high at bits/slot. However, when the arrival rate is decreased, the delay violation probability decreases significantly. Interestingly, the minimum in the delay violation probability is attained at for , at for , and at for bits/slot. Thus, the optimal value of changes depending on the arrival rate and delay constraints imposed on the system. Many of our additional experiments also show that the optimal under delay constraints is slightly below the value of that maximizes the expected service rate. An explanation for this phenomenon is that even though decreasing the number of users means that users are scheduled less often (lower multiplexing gain), the system has a higher beamforming gain, i.e., the channel gains of all users have more degrees of freedom. This decreases the variance of the service experienced by each user and thus improves the delay performance of the system.
Fig. (c)c shows the delay violation probability for ZF-DPC. Here, we observe that the minimum in the delay violation probability is attained at for and at for , whereas Fig. (a)a showed that the expected service rate is maximized at . The explanation is similar to the explanation in case of ZFBF: When scheduling users, the effective channel gains for some of the users (the users which are encoded last in the ZF-DPC order) have only 2 degrees of freedom. These users may experience very low data rates, so that the delay violation probability increases.
In Fig. 2, we further investigate the optimal value of and how the optimal choice of influences the delay performance. Fig. (a)a shows the optimal values for for ZFBF (blue, solid lines) and ZF-DPC (red, dotted lines). We observe that the optimal value of decreases when the arrival rate is reduced. In Fig. (b)b, we investigate how the optimal choice of affects the delay performance of the considered systems. In case of ZFBF, we find that choosing the suboptimal value deteriorates the performance only slightly. For ZF-DPC, the selected value of seems to have a larger impact. We observe that choosing the value , i.e., the value that maximizes the expected service rate of the system, would lead to a massive increase in the delay violation probability.
In this work, we have presented an analytical framework to study the delay performance of the multiuser MISO downlink. We found that the optimal number of scheduled users depends on the delay requirements of the system. There are many interesting possible extensions of this work. First of all, we considered equal power allocation, whereas the transmitter could also optimize the transmission power. Another line of research would be to investigate the system performance for a huge number of transmit antennas (massive MIMO). Finally, when the maximum tolerable delay becomes very short, the length of each time slot should also be chosen very small. For very short time slots, the channel estimates may become inaccurate, and the impact of channel coding at finite blocklength must be considered in order to gain more realistic insights into the system performance.
-  G. Caire and S. Shamai, “On the achievable throughput of a multiantenna gaussian broadcast channel,” IEEE Trans. Inf. Theory, vol. 49, no. 7, pp. 1691–1706, 2003.
-  C. B. Peel, B. M. Hochwald, and A. L. Swindlehurst, “A vector-perturbation technique for near-capacity multiantenna multiuser communication-part I: channel inversion and regularization,” IEEE Trans. Commun., vol. 53, no. 1, pp. 195–202, 2005.
-  B. Hochwald and S. Vishwanath, “Space-time multiple access: Linear growth in the sum rate,” in Proc. 40th Annual Allerton Conf. Communications, Control and Computing, 2002.
-  T. Yoo and A. Goldsmith, “On the optimality of multiantenna broadcast scheduling using zero-forcing beamforming,” IEEE J. Sel. Areas Commun., vol. 24, no. 3, pp. 528–541, 2006.
-  M. Sharif and B. Hassibi, “On the capacity of mimo broadcast channels with partial side information,” IEEE Trans. Inf. Theory, vol. 51, no. 2, pp. 506–522, 2005.
-  J. Zhang, M. Kountouris, J. G. Andrews, and R. W. Heath, “Multi-mode transmission for the MIMO broadcast channel with imperfect channel state information,” IEEE Trans. Commun., vol. 59, no. 3, pp. 803–814, 2011.
-  N. Ravindran and N. Jindal, “Multi-user diversity vs. accurate channel state information in mimo downlink channels,” IEEE Trans. Wireless Commun., vol. 11, no. 9, pp. 3037–3046, Sept. 2012.
-  M. Fidler, “A network calculus approach to probabilistic quality of service analysis of fading channels,” in Proc. IEEE Global Telecommun. Conf. (GLOBECOM), Nov. 2006, pp. 1–6.
-  H. Al-Zubaidy, J. Liebeherr, and A. Burchard, “Network-layer performance analysis of multihop fading channels,” IEEE/ACM Trans. Netw., vol. 24, no. 1, pp. 204–217, Feb. 2016.
-  D. Wu and R. Negi, “Effective capacity: a wireless link model for support of quality of service,” IEEE Trans. Wireless Commun., vol. 2, no. 4, pp. 630–643, Jul. 2003.
-  L. Liu and J. F. Chamberland, “On the effective capacities of multiple-antenna gaussian channels,” in 2008 IEEE Int. Symp. Inf. Theory, July 2008, pp. 2583–2587.
-  E. A. Jorswieck, R. Mochaourab, and M. Mittelbach, “Effective capacity maximization in multi-antenna channels with covariance feedback,” IEEE Trans. Wireless Commun., vol. 9, no. 10, pp. 2988–2993, 2010.
-  M. C. Gursoy, “MIMO wireless communications under statistical queueing constraints,” IEEE Trans. Inf. Theory, vol. 57, no. 9, pp. 5897–5917, Sept. 2011.
-  J. Li, N. Bao, W. Xia, and L. Shen, “Adaptive user scheduling and resource management for multiuser mimo downlink systems with heterogeneous delay requirements,” in IEEE Wireless Commun. and Netw. Conf. (WCNC), Apr. 2013, pp. 1351–1356.
-  S. Schiessl, H. Al-Zubaidy, M. Skoglund, and J. Gross, “Delay performance of wireless communications with imperfect CSI and finite length coding,” arXiv preprint arXiv:1608.08445, 2016.
-  S. Schiessl, J. Gross, and H. Al-Zubaidy, “Delay analysis for wireless fading channels with finite blocklength channel coding,” in Proc. 18th ACM Int. Conf. Modeling, Analysis and Simulation of Wireless and Mobile Systems (MSWiM). ACM, 2015, pp. 13–22.
-  S. Schiessl, H. Al-Zubaidy, M. Skoglund, and J. Gross, “Finite length coding in edge computing scenarios,” in Proc. 21th Int. ITG Workshop on Smart Antennas (WSA), Mar. 2017, pp. 1–6.