The Capacity of the Multiple Access Channel Under Distributed Scheduling - Order Optimality of Linear Receivers

# The Capacity of the Multiple Access Channel Under Distributed Scheduling - Order Optimality of Linear Receivers

Joseph Kampeas, Asaf Cohen and Omer Gurewitz
Department of Communication Systems Engineering
Ben-Gurion University of the Negev
Beer-Sheva, 84105, Israel
Email: {kampeas,coasaf,gurewitz}@bgu.ac.il
###### Abstract

Consider the problem of a Multiple-Input Multiple-Output (MIMO) Multiple-Access Channel (MAC) at the limit of large number of users. Clearly, in practical scenarios, only a small subset of the users can be scheduled to utilize the channel simultaneously. Thus, a problem of user selection arises. However, since solutions which collect Channel State Information (CSI) from all users and decide on the best subset to transmit in each slot do not scale when the number of users is large, distributed algorithms for user selection are advantageous.

In this paper, we suggest a distributed user selection algorithm, which selects a group of users to transmit without coordinating between all users and without all users sending CSI to the base station. This threshold-based algorithm is analyzed for both Zero-Forcing (ZF) and Minimal Mean Square Error (MMSE) receivers, and its expected capacity in the limit of large number of users is investigated. It is shown that for large number of users it achieves the same scaling laws as the optimal centralized scheme. Multi-stage distributed schemes are also considered and shown to be advantageous in practical scenarios.

## 1 Introduction

Wireless access networks are the typical last-mile networks connecting multiple users to a high speed backbone. In these networks, a Base Station (BS) serves a large group of users. Traditionally, either the time or the frequency are divided to ensure users do not interfere with each other. Modern coding techniques, however, allow for multiple users to either transmit or receive simultaneously, and be decoded successfully using the appropriate Multiple Access Channel (MAC) codes or Broadcast Channel (BC) codes, respectively.

Nevertheless, consider a BS serving a very large number of users. In practical scenarios, not all users can be served simultaneously, and the problem of user selection or scheduling arises. In the downlink setting, where a BS transmits to a group of users, it is common to assume Channel State Information (CSI) is available at the BS, hence intelligent user selection can be employed. E.g., the BS can select a subset of the users with both strong channel norms (high SNR) and relatively orthogonal directions (to avoid interference). See Section 2 for a literature survey. Such a selection can exploit the multi-user diversity inherent in these models.

In the uplink setting, however, since the users transmit to the BS, it is highly desirable to avoid the process of collecting all information at the BS beforehand, and notifying the users which should transmit. This process is prohibitively complex when the number of users is very large. In fact, this is one of the main reasons why an emerging standard, IEEE 802.11ac, includes Multi-User Multiple Input Multiple Output (MU-MIMO) only at the downlink [10]. That is, the allegedly complex process of selecting a subset of users in the uplink is refrained from. Hence, efficient and distributed algorithms for selecting the appropriate group of users are desirable. This way, there is hope to harness the benefits of multi-user diversity without the need to collect CSI from all users. In this paper, we show that this is indeed possible, by showing the order-optimality of distributed algorithms.

#### Main Contribution

We consider a MIMO MAC channel with receiving antennas and users. We suggest distributed algorithms for selecting a group of users to transmit in each slot. In the first, a threshold value for the norm of the channel vector is set, and only users above the threshold transmit. Hence, there is no need to collect CSI from all users. Nor is any cooperation required. In the second, an iterative process is suggested, where multiple thresholds are set on the norms of the projections of the channel vectors on the spaces of previously selected users. In this case, CSI is shared, but only among the selected group.

An analysis of the resulting sum capacity in the limit of large and the respective scaling laws are given for both Zero-Forcing (ZF) and Minimal Mean Square Error (MMSE) receivers. This analysis employs recent tools from both Point Process approximation and asymptotic random matrix theory, that to the best of our knowledge, were not used in this setting before. Via this analysis, the simple distributed, threshold based algorithm, is shown to achieve the optimal scaling laws. Consequently, the benefit compared to traditional (centralized) techniques is demonstrated both analytically and via simulation results.

The rest of this paper is organized as follows: Section 2 includes the most relevant related work. Section 3 includes the required preliminary material. Section 4 introduces the distributed algorithm, gives its analysis and scaling law under ZF decoding. Section 5 gives an improved, multi-stage algorithm, which, while only semi-distributed (as it requires messaging between the selected users), gives better performance for a small number of users. Section 6 gives the analysis under an MMSE receiver. While conceptually similar, this analysis is the more technically challenging. Section 7 gives the previously omitted proofs. Section 8 concludes the paper.

## 2 Related Work

The essence of multi-user diversity was introduced in [11], where selecting the strongest user in each time slot was first suggested. The work was followed by numerous scheduling algorithms for various scenarios. We list here only the most relevant.

In [12], the authors considered the impact of multi-user diversity on the MIMO downlink channel (BC). Assuming channel state information at the BS, the authors used order statistics to evaluate the effective SNR when scheduling the strongest user in each slot. However, only one user was scheduled in each slot, and the results were given in terms of the -fold statistics, without an extreme value analysis for large .

In [13], a similar downlink model was considered, however, when users are scheduled simultaneously. The authors considered Zero-Forcing Beamforming (ZFBF), and suggested a greedy algorithm to schedule the strongest and most orthogonal users. Additional scheduling algorithms for downlink communication were given in [14, 15, 16, 17]. In fact, in the downlink scenario, it was shown later that ZFBF and optimal user selection can indeed achieve the Dirty Paper Coding (DPC) region [18], and is hence optimal in the Gaussian case [19]. Additional surveys and scaling laws can be found in [20, 21, 22].

A closely related scheme, yet still for the downlink model, was suggested in [23]. Using Block Diagonalization (BD), a capacity-based greedy algorithm was suggested, in which first the strongest user is scheduled, and then additional users are added, one by one, based on their marginal contribution to the total capacity. In the same context, [24] considered the special case of two transmit antennas and one receive antenna per user, and showed that a greedy, two-stage algorithm, which first selects the strongest user and then the second to form the best pair is asymptotically optimal. In the context of heterogeneous users, [25] proposed a scheduling scheme which selects a small subset of the users with favorable channel characteristics.

The above works focus on the downlink setting. In this scenario, it is reasonable to assume that at least some information is available at the BS, and a centralized decision can be made. In the uplink (MAC) model, however, if one wishes to select a group of users without centralized processing at the BS, distributed algorithms are required. In this paper, we suggest both a single-stage distributed algorithm, and a multi-stage semi-distributed one for the uplink scenario, and, in addition, analyze their sum capacity in the limit of large number of users and give the resulting scaling laws.

A pioneering study of the uplink model was done in [26, 27], where a decentralized MAC protocol for Orthogonal Frequency Division Multiple Access (OFDMA) channels was suggested. In this scheme, each user estimates the channel gain and compares it to a threshold. Only above-the-threshold users can transmit. [28] extended the scheme to a multi-channel setup, where each user competes on channels. In [9], the authors used a similar approach for power allocation in the multi-channel setup, and suggested an algorithm that asymptotically achieves the optimal water filling solution. However, the works above do not consider a MIMO setting, nor do they consider the interaction within a group of users, when all are scheduled to use the same resources. Space-time coding for fading multi-antenna MAC was considered in [8]. The focus therein, however, was on joint code design for a given point in the rate region and the resulting error probability, rather than user scheduling and its resulting capacity.

Recently, we proposed a Point Process approximation which facilitates the analysis of various distributed threshold-based scheduling algorithms in the non-homogeneous scenario [29, 30]. This work, however, assumed only a single user can be successfully decoded in each time slot. A key contribution of the current work is the non-trivial extension of the work in [29] to truly multiple-access protocols, where several users transmit simultaneously and should be decoded successfully, hence the questions that arise are how to distributively select a good subset of users to transmit and what the mutual influence between the users in the selected group is. For example, a closely related work is [31]. Therein, various decoding procedures were discussed, and the corresponding best user selection for the uplink setting was given. However, while reinforcing the necessity of proper user selection, the work in [31] considered only the scenario where one user can access the radio channel at a given time.

As for more complex topologies, spatial diversity in the context of multiple relays was considered in [32]. Therein, communication between a source and a destination is done through a group of relays. However, unlike conventional relay schemes, only the relays with the strongest received signal decode the message and cooperate via space time coding to successfully relay it to the destination. An asymptotically optimal scheme for multiple base stations (with joint optimization) was given in [33].

Extreme Value Theory (EVT) is a key tool in proving capacity results under scheduling and multi-user diversity. In [34], the authors suggested a sub-carrier assignment algorithm, and used order statistics to derive an expression for the resulting link outage probability. In [35], the authors used EVT to derive the scaling laws for scheduling systems using beamforming and linear combining. [36] analyzed the scaling laws of base station scheduling, and showed that by scheduling the strongest among stations one can gain a factor of in the expected capacity (compared to random or Round-Robin scheduling).

## 3 Preliminaries

In this section, we describe the system model and relevant results which will be used throughout this paper.

### 3.1 System Model

Throughout this paper, random matrices and random vectors are denoted in bold upper-case and bold lower-case letters, respectively. We consider a multiple-access model with K users, each with a single transmit antenna. The BS is equipped with receiving antennas. When users utilize the channel simultaneously, the received signal at the base station can be described as:

 y=k∑i=1hixi+w, (1)

where is the transmitted signal (scalar). is constrained in its total power to , i.e., . However, in most cases, we will assume a constant power constraint . denotes the uncorrelated Gaussian noise. is a complex random Gaussian channel vector. When all users are identically and independently distributed, it is common to assume that all entries of are independent and have zero mean and variance imaginary and real parts, for all users. We assume that the channel is memoryless, that is, for each channel use (slot), independent realizations of are drawn. Furthermore, we assume full CSI is available at the transmitter. That is, is known to the th user. This can be accomplished by sending a pilot signal form each of the antennas at the base station.

### 3.2 Capacity and Multi-User Diversity Via EVT

EVT is a key tool in evaluating the capacity under scheduling in multi-user systems. We review here the most relevant result. Moreover, we develop new normalizing constants for the problem at hand (EVT for the distribution), which will later aid at speeding up convergence results.

The capacity obtained by letting an arbitrary user utilize the channel is given by . However, as mentioned, it is beneficial to schedule the strongest user in each slot. Denote by the received channel vector with the largest norm. Scheduling the strongest user clearly results in , while letting a group of orthogonal users, with the largest channel gains to utilize an uplink channel, results in a sum-rate that has the following upper bound:

 C≤rlog(1+P∥h(1)∥2).
###### Remark 1.

For the downlink channel, the sum-rate upper bound has the form [25]

 C≤rlog(1+Pr∥h(1)∥2).

The difference between the uplink to the downlink originates in the power constraint applied to the transmitter. That is, in the downstream, when transmitting to a group of receivers, each receiver gets a share of the available power, while in the upstream, there is a group of transmitters that transmit to a single receiver. It should be noted that usually the power constraint for the downlink and uplink are not equal, since the base station has a strong and steady power supply, whereas the user has a limited battery power supply.

As the sum rate is mainly influenced by the channel vectors’ gains and directions, our goal is to explore this behavior for large number of users. Specifically, we first wish to explore the behavior of the maximal gain. Since the entries of are complex Gaussian, the channel’s gain follows a distribution with degrees of freedom, denoted . We utilize the following EVT theorem.

###### Theorem 1 ([37, 38, 39]).

Let be a sequence of i.i.d. random variables with distribution , and let . If there exists a sequence of normalizing constants and such that as ,

 Pr(Mn≤anx+bn)\lx@stackreli.d.⟶G(x)

for some non-degenerate distribution G, then G is of the generalized extreme value (GEV) distribution type

 G(x)=exp{−(1+ξx)−1/ξ}

and we say that is in the domain of attraction of , where is the shape parameter, determined by the ancestor distribution .

The normalizing constants and the shape parameter of the GEV can be obtained as follows. Let be the reciprocal hazard function

 h(x)=1−F(x)f(x)\textmdforxF≤x≤xF,

where and are the lower and upper endpoints of the ancestor distribution, respectively. The shape parameter is obtained as the following limit [37, 38]:

 ξ=\lx@stackrellimx→xFddxh(x).

When is a sequence of i.i.d.  variables, the asymptotic distribution of is a Gumbel distribution [39, pp. 156]. Specifically,

 Pr(Mn≤anx+bn)⟶e−e−x,

where

 an = 2, (2) bn = 2(logn+(r−1)loglogn−logΓ(r))+o(1), (3)

and is the Gamma function. In the sequel, we will also use the upper incomplete Gamma function, .

However, for i.i.d.  random variables, the convergence of the maxima to the Gumbel distribution using the above normalizing constants is quite slow. That is, the approximation of the maximal value will not be tight for moderate values of . Hence, a more appropriate set of normalizing constants for the distribution, which takes into account both and should be derived. Letting to be the quantile, i.e., , and choosing [37, 38], we have the following.

###### Claim 1.

For the -distribution, the following normalizing constants apply.

 a{n,r} = 2nΓ(r)exp{Q−1(r,1n)}Q−1(r,1n)1−r (4) b{n,r} = 2Q−1(r,1n)+o(a{n,r}), (5)

where is the inverse of the regularized upper incomplete gamma function, that is, , and the inverse is defined with respect to .

###### Proof.

The distribution is a special case of the gamma distribution. I.e., if then , where is the Gamma distribution with shape parameter and rate parameter . Accordingly, for the constant we consider the quantile of the Gamma distribution, which can be obtained by using the inverse of the regularized upper incomplete gamma function. In particular, yields the quantile of the Gamma distribution. To attain the constant, let us examine the hazard function of the Gamma distribution.

 h(x/β) = 1−FΓ(x/β)fΓ(x/β) = βex/β(x/β)1−rΓ(r)(1−FΓ(x/β)).

Accordingly, for we obtain,

 a{n,r} = h(b{n,r}/β) = βnΓ(r)exp{Q−1(r,1n)}Q−1(r,1n)1−r.

Figures 1 and 2 depict the simulation results versus the analytical results of the EVT with the new normalizing constants derived herein ( and given in (4) and (5), respectively). The maximum of -distributed random variables is compared to the Gumbel distribution predicted by EVT. Figure 1 depicts the case where while in Figure 2 for . In both cases, figures are plotted for several values of receiving antennas (). Tight convergence is clearly visible, with reasonable approximation even for users.

To see how the new normalizing constants relate to the previous ones reported in the literature, Figure 3 depicts the value of for several values of , as increases. While the new constant converges to slowly, it is constant for moderate values of , hence the tight convergence of the distribution of the maximum to the Gumbel distribution.

As we aim at analysing the capacity under practical constraints such as scheduling only a subset of the users in each time slot, we focus our attention on linear decoding at the BS. Such decoders, such as the ZF decoder or the MMSE decoder, are indeed widely used in practice. Hence, the analysis in this paper will be based on either linear decorrelation (Section 4) or the MMSE receiver (Section 6), assuming optimal coding of the resulting single user Gaussian channels, given the effective Signal to Noise Ratio (SNR).

Specifically, for the ZF receiver, focusing on the signal received from the th user, rewrite (1) as:

 y=hjxj+k∑i≠jhixi+w.

Let be a unitary matrix representing the null space of the subspace spanned by . Since the entries of the channel vectors are i.i.d., when users transmit the subspace spanned by the vectors has rank with probability one [40, Chapter 8]. Thus, to decode, the receiver projects the received vector on the subspace spanned by , and nulls the inter-stream interference. Finally, the signal of user can be demodulated using a matched filter (i.e., maximal ratio combiner). The algorithm is given in Figure 4.

Note that a full degrees-of-freedom gain is attained when users transmit. In this case, , and , e.g., [40]. Accordingly, when using a ZF receiver, we aim at algorithms which select at most users (of the available ) in each time slot. As mentioned, since we focus on the scenario in which , the set of selected users has a crucial effect on the system capacity. Optimally, a BS would receive CSI from all users, and schedule the best users for transmission. Under the linear decorrelation above, the resulting expected capacity is

However, we wish to avoid the overhead and complexity of such a centralized process, and select a group of users, approximating the optimal selection, distributively.

While simple and intuitive, the ZF receiver is limited in its performance. The MMSE receiver, however, although still linear, maximizes the mutual information and hence achieves better performance (e.g. [40, 7, 6]). In this receiver, to decode the data stream, the receiver treats the rest of the streams as noise. It then whitens the resulting colored noise and uses a matched filter to obtain maximum SINR.

Let be a matrix whose columns are the channel vectors of the transmitting users. Similarly, let be the matrix with its column removed and define

 R=(H(−i)H†(−i)+I)−1. (6)

Then, the corresponding output on the stream can be expressed by [6]:

 SINRi=h†iRhi. (7)

This SINR value will be at the basis of our analysis in Section 6.

## 4 A Distributed Algorithm

A common approach to select a single user distributively, is a threshold-based procedure, in which a capacity threshold is set, and only a user who exceeds it transmits ([29, 26]). Of course, the events in which none of the users or several users exceed the threshold should be taken into account. In this paper, however, we wish to select a group of users, and analyze the resulting capacity.

At the heart of the algorithms we suggest herein, stands a similar threshold-based procedure. However, the challenge is twofold. First, in selecting a threshold such that a favorable group of users exceed it. Second, in analyzing the results under the various decoding procedures and at the limit of large . When doing this, a few important questions arise: On which variable should a threshold be set and how many users will pass it? How can one assess the mutual interference between the users which passed? What will be the loss in this distributed procedure compared to the optimal, centralized one?

In the next three sections, we answer the above questions. We set a threshold on the channel norms, and analyze the resulting exceedance rate. We further analyze the mutual interference, in terms of the angles between the exceeding users, and conclude by analysing the resulting sum capacity, showing that a distributed algorithm can achieve the same scaling laws as a centralized one.

In particular, we suggest two distributed algorithms. In the first, described in this section, a single threshold is utilized when decoding is done using linear decorrelation. In the second, described in Section 5, we offer a set of thresholds to match a Successive Interference Cancellation (SIC) procedure.

Given the number of users , we set a threshold on the norm , such that strongest users exceed it on average. In each slot, each user estimates its channel’s norm. A user with a norm greater than the threshold, transmits. We assume the transmission includes the channel vector as a low-rate preamble so the BS has the CSI of the transmitting users. The algorithm for user is given in Figure 5.

Note that the receiver cannot recover more than data streams. That is, since for more than users the performance (both under ZF decoding and MMSE decoding in Section 6) deteriorates significantly, if more than users begin transmission simultaneously, we assume a collision occurs and the whole slot is lost (zero capacity). Similarly, since users act independently, a slot might be idle, if no user exceeded the threshold. Thus, we say that a slot is utilized if at least one user, but no more than users, are transmitting.

The first result, Proposition 1 below, gives the sum capacity under the above distributed user selection algorithm and ZF decoding. Note that this simple proposition still includes an expectation on the channel vectors seen by the users, hence cannot give the understanding we wish regarding the sum capacity under the suggested algorithm. Still, it will be the starting point, from which we will derive the bounds which give the right insight and scaling laws.

###### Proposition 1.

For , the expected sum capacity of Algorithm Channel-Access with ZF decoding is given by

 EC(uk)

where , to be optimized, is the expected number of users to exceed the threshold , the are the channel vectors of the users who exceeded the threshold and are the corresponding null spaces.

###### Proof.

According to the law of total probability, we express the expected capacity in a slot by summing over the number of users who exceed the threshold, and the sum capacity these users see, given that they exceeded the threshold. As mentioned, if more than users are transmitting in a slot, the receiver cannot successfully null the inter-stream interference, and the capacity in that slot is zero.

Hence, the expected capacity has the form:

 r∑j=1Pr{j\textmdusersexceed}j∑i=1E[Ci|∥hi∥2>uk].

When the users are i.i.d., the probability of threshold exceedances follows the binomial distribution with probability to exceed the threshold. Since we consider large and small values of , the number of users to exceed threshold can be approximated by the Poisson distribution with an approximation error in the order of . This is based on the Poisson Point Process Approximation developed in [29] for the single-user scenario. In short, this method gives a Poisson approximation for the number of users exceeding a high threshold . However, as this approximation error is within the sum, it is multiplied by the individual capacities, which scales like the optimal scaling law of the multi-user diversity when a single, strongest user, is scheduled, i.e., (see e.g., [21, 18] and the references therein). Hence, the approximation error. Finally, note that the number of exceeding users affects the effective SNR seen by the attending users. In particular, when users exceed threshold, the dimension of is . Thus, as decreases, the signal of the attending users is projected on a less restrictive null-space. Accordingly, each stream may spread on more receiving antennas in the ZF process, which leads to a higher power gain (for details on ZF decoding, see [40]). Nonetheless, the reader should not be confused. The highest capacity is attained when users utilize the channel simultaneously to achieve a full degrees-of-freedom gain. ∎

To ease notation, the approximation error is omitted from now on.

To evaluate the result in Proposition 1, the behavior of should be understood, especially considering the fact that the number of users exceeding the threshold is random. To this end, the following upper and lower bounds are useful. These bounds will be the basis of the scaling laws we derive.

###### Lemma 1.

The expected sum capacity of Algorithm Channel-Access with ZF decoding satisfies the following upper bound

 EC(uk)≤r∑j=1kje−kj!jlog(1+Pr(r−j+1)(uk+a{K,r})),

where is given by (4) and is the threshold set such that users exceed it on average.

The bound in Lemma 1, while not giving the exact capacity, still depicts the essence of the system behavior. To understand its implications, we note the following: We set a threshold such that out of the users exceed it on average. I.e., the average exceedance rate is . Indeed, the expression in the sum over gives the probability for exactly users exceeding. Each of the users, under zero forcing, experiences a single user channel, with its power scaled according to two factors: (i) a multiplication by , as this is the average norm of its channel vector, where is the threshold exceeded, and is the average distance above the threshold. (ii) a multiplication by , as in case only users exceeded the threshold, the zero forcing algorithm does not have to cancel users, only , hence the null space has a larger dimension, yet the number of receive antennas is . As the threshold will be shown to be , the optimal scaling law will follow. A complete discussion will be given after the lower bound is introduced. Indeed, as it turns out in the simulation results, the bound in Lemma 1 is tight even for relatively small number of antennas and users. Note, however, that if less than users exceed, as a higher SNR can be attained at the receiver, in order to achieve the capacity in this case, a user must know how many users exceeded, so it can exploit the high SNR for, e.g., higher transmission rate. Hence, we require that the number of users that actually exceeded the threshold will be announced.

###### Remark 2.

Note that in practice it is beneficial to choose slightly smaller than the number of antennas . This is since if less than users exceed, the SNR seen by each user is only larger, yet if more than users exceed, the slot is lost.

###### Proof (Lemma 1).

 EC(uk) = (8) ≤

Consider the norm , where has rows. Denoting by the th row of , we have

 E[∥Vihi∥2∣∣∥hi∥2>uk] = E[r−j+1∑m=1|⟨V(m)i,hi⟩|2∣∣ ∣∣∥hi∥2>uk] \lx@stackrel(a)= r−j+1∑m=1E⎡⎣∥hi∥2|⟨V(m)i,hi⟩|2∥hi∥2∥V(m)i∥2∣∣ ∣∣∥hi∥2>uk⎤⎦ \lx@stackrel(b)= E[∥hi∥2∣∣∥hi∥2>uk]r−j+1∑m=1E⎡⎣|⟨V(m)i,hi⟩|2∥hi∥2∥V(m)i∥2⎤⎦ \lx@stackrel(c)= E[∥hi∥2∣∣∥hi∥2>uk](r−j+1)∫10(1−α)r−1dα \lx@stackrel(d)= (uk+aK)(r−j+1)1r.

In the above chain of equalities, (a) is since (b) is since is a random i.i.d. complex normal vector, and the squared-normalized inner product is its angle from , a vector in the null space of . Since these vectors are independent of , this angle is independent of the norm of (c) is since the distributions of the norms and angles are independent of , and since, by [24, Lemma 3.2], the angle has the same distribution as the minimum of independent uniform random variables (i.e., with CDF , ) (d) is the result of computing the expected norm of an i.i.d. complex normal random vector, given that it is above a threshold which is exceeded by only out of norms on average. The details are in Corollary 1, Section 7.

Substituting in (8), we have

 EC(uk)≤r∑j=1kje−kj!j∑i=1log(1+Pr(uk+a{K,r})(r−j+1))≤r∑j=1kje−kj!jlog(1+Pr(uk+aK)(r−j+1)),

which completes the proof. ∎

We now present a corresponding lower bound.

###### Lemma 2.

The expected sum capacity of Algorithm Channel-Access with ZF decoding satisfies the following lower bound.

 EC(uk)≥(r∑j=1kje−kj!j)(r−1)∫10(1−α)r−2log(1+Pukα)dα,

where is a threshold set such that users exceed it on average.

It is important to note that the integral in Lemma 2 above has a finite series expansion with summands. This finite series has at the leading term, resulting in the expected scaling law. We describe it in Claim 2 below, within the proof of the main result in this section - Theorem 2.

###### Proof.

Following the derivations of the upper bound, we have

 EC(uk) \lx@stackrel(a)≥r∑j=1kje−kj!j∑i=1Elog⎛⎝1+Pukr−j+1∑m=1⟨V(m)i,hi⟩2∥hi∥2∥V(m)i∥2⎞⎠ \lx@stackrel(b)=r∑j=1kje−kj!jElog⎛⎝1+Pukr−j+1∑m=1⟨V(m)i′,hi′⟩2∥hi′∥2∥V(m)i′∥2⎞⎠ ≥r∑j=1kje−kj!jElog⎛⎝1+Puk⟨V(1)i′,hi′⟩2∥hi′∥2∥V(1)i′∥2⎞⎠ \lx@stackrel(c)=r∑j=1kje−kj!j∫10(r−1)(1−α)r−2log(1+Pukα)dα

where (a) is since the norms of all users participating are above the threshold ; (b) is since the angles in the inner sum are identically distributed and independent of , hence an arbitrary can be used; (c) is by explicitly computing the expectation over the angle between and , remembering that it has a density for . This completes the proof. ∎

The results above lead to the following scaling law, which is the main result in this section. It asserts that the scaling law of for the sum rate in a multi-user system can in fact be achieved distributively, without collecting all channel states from all users and scheduling them in a centralized manner. In other words, the threshold based algorithm suggested selects an optimal set of users (asymptotically in the number of users) distributively and without any cooperation. This is summarized in the next theorem.

###### Theorem 2.

The expected sum capacity of Algorithm Channel-Access with ZF decoding scales as for large enough number of users .

###### Proof.

By Lemma 1,

 EC(uk) ≤ r∑j=1kje−kj!jlog(1+P(r−j+1)(uk+aK)r) ≤ r∑j=1kje−kj!jlog(1+P(uk+aK)) ≤ rlog(1+P(uk+aK)).

On the other hand, consider the lower bound given in Lemma 2. Since is a parameter to be optimized, the optimum is at least as large as when choosing . We have

 r∑j=1rje−rj!j=rr−1∑j=0rje−rj!j≥0.4r,

where the last inequality is by evaluating the sum at . Note that larger values of give only slightly larger values, with a limit of as .111This is the CDF of a Poisson random variable with parameter , calculated at . The limiting behavior can be found in [41].

Now, consider the integral over in Lemma 2. In Section 7, we prove the following claim.

###### Claim 2.

The integral over in Lemma 2 has the following finite series expansion:

 (r−1)∫10(1−α)r−2log(1+Puα)dα=(1+PuPu)r−1log(1+Pu)−r−2∑i=0(1+PuPu)i1r−1−i.

This gives a finite series expansion for the integral, in terms of the power and the threshold . For example, for we have

 (r−1)∫10(1−α)r−2log(1+Puα)dα=6(1+Pu)3log(1+Pu)−2u3P3−3u2P2(1+Pu)−6uP(1+Pu)26u3p3. (9)

Thus, the integral can be easily approximated by .

Since we consider the regime of large enough number of users , yet a finite number of antennas , we have and as a result . In fact, since the distribution of the projected channel gain seen by a user in this decoding scheme is the exponential distribution with rate , it can be shown that (we discuss the threshold value in detail in Section 7.2). This gives rise to the scaling law. ∎

###### Remark 3.

It is well known that linear decorrelation is asymptotically optimal at high SNR. The suggested algorithms pick the users with the highest SNR, accordingly, they operate at high SNR and loose only a constant fraction of the optimal centralized scheduling scheme.

Extensive simulations were conducted to compare the analytical bounds derived above to real world situations with a finite number of users. In Figure 6, we compare the bounds on the expected capacity of the threshold-based scheduling scheme under ZF-receiver (i.e., Lemma 1 and Lemma 2), to the simulation results. The bounds and simulation are for users, with and receiving antennas at the BS. The tightness of the upper bound is clearly visible. While the lower bound is looser, it still gives the correct behaviour as a function of the number of users passing the threshold on average, . Indeed, it is clear from the figure that a key factor affecting the system performance is the number of users passing the threshold. This distribution gives the graphs their Poisson-like shape.

To see that the trend holds even for a relatively small number of users, Figure 7 includes the same plots for . Note that even for users the algorithm manages to achieve a significant rate (compared, e.g., to the one achieved with users). This means the essence of the multi-user diversity is exploited by the algorithm even for a relatively small number of users. Note, however, that as approaches then the accuracy of the bounds decreases.

The results in Figure 6 and Figure 7 give the expected sum capacity as a function of . The capacity distribution, compared to different algorithms, will be given in the subsequent sections.

## 5 A Multi-Threshold, SIC-Based Algorithm

In the previous section, only a single threshold was used, and a user’s data was decoded by projecting the received signal on the null space of the sub-space spanned by the channels of the interfering users. While this algorithm managed to capture the key performance-enhancing aspects of the multi-user system, and achieve the optimal scaling laws distributively, better results can be expected for low to moderate number of users. In that regime, harnessing additional multiple access techniques and a more sophisticated thresholding algorithm can achieve better performance. In this section, we see that this is indeed so.

When using SIC [40, Ch.6], the decoder uses the decoded signal of a previous user to decode the next one (by subtracting it from the received stream). After iterations of SIC, all data streams are decoded. With this in mind, we derive a second, iterative algorithm, to further utilizing the benefits of SIC in a threshold based algorithm. At first glance, for large number of users, a set of thresholds should be chosen, such that in each iteration only a single user exceeds on average. This way, one can adapt the thresholds set according to the users “admitted” thus far (that is, users who already passed). For example, one can set the next threshold based on the angles of the previously admitted users.

However, it is very likely that in some of the iterations, more than one user, or no user, will exceed the threshold. Specifically, the probability that exactly one user will exceed in an iteration is approximately . Yet, if a carrier sense, or collision detection mechanism is available, collisions can be resolved by allocating a few mini-slots devoted to finding the strongest user in each iteration [26, 29].

This gives rise to the following algorithm: After the strongest user is found, it begins its transmission by announcing its channel vector. In the th iteration, the rest of the users project their channel vector on the orthonormal basis , which spans the null space of the channels vectors announced thus far. Now, the scheduled user for iteration is the one with the strongest projection. The process ends when users are selected. Note that this involves announcements of the channel vectors only from the selected users. The algorithm is described in Figure 8.

When a user projects its channel on , the resulting gain distribution follows the -distribution with degrees of freedom [40]. That is, . Let be the channel with the strongest norm in iteration , i.e.,

 h(l)=max1≤i≤K∥V(l−1)hi∥2.

Note that in practice, the BS does not have to search for the maximum above, but, instead, a threshold is set, only the strongest users pass, and one is selected using a collision resolution mechanism. Utilizing the EVT for the -distribution, the norm can then be characterized. We have the following.

###### Proposition 2.

For sufficiently large , the expected sum capacity of Algorithm SIC-Channel-Access with ZF-SIC decoding is given by:

 Cav({u(l)}rl=1)=r∑l=1E[log(1+P∥h(l)∥2)],

where is the channel vector of with the largest norm after the projection.

Note that the above result assumes each transmitting user can adapt its transmission rate to be decoded successfully at the base station. This is possible as each user knowns the channel vectors on the interfering users and his own channel vector, thus can calculate the SNR the BS will experience when decoding. Furthermore, a splitting algorithm is used [26], hence the Poisson coefficients can be omitted.

Again, to evaluate the performance of Proposition 2, the distribution of should be examined. We now derive upper and lower bounds to better evaluate the performance unders the suggested algorithm.

###### Lemma 3.

The expected sum capacity of Algorithm SIC-Channel-Access with ZF-SIC decoding satisfies the following upper bound.

 Cav≤r∑l=1log(1+P(b{K,r−l+1}+γa{K,r−l+1})),

where are the normalizing constant given in (4) and (5) respectively, and is the Euler Gamma constant.

Note that (as it converges to in equation (3)), hence the same scaling law is achieved.

###### Proof.

By Jensen inequality,

 Cav({u(l)}rl=1) = r∑l=1E[log(1+P∥h(l)∥2)] ≤ r∑l=1log(1+PE[∥h(l)∥2]) = r∑l=1log(1+PE[max1≤i≤K∥V(l−1)hi∥2]).

To analyse the expectation some care is needed. Without conditioning in the -th iteration on the event that the remaining users did not come out ahead in the previous iterations, that is, “allowing” all users to compete again, the norm follows -distribution. Hence, by Theorem 1, follows the Gumbel distribution, with normalizing constants and given (4) and (5), respectively. However, when such a conditioning is applied, the expectation can only decrease, as the previous users are not allowed to participate. Thus, the expected maximal norm is upper bounded by the expected value of the Gumbel distribution with the above parameters. Lemma 3 then follows. ∎

To bound the performance from below, we only derive probabilistic lower bound. That is, a set of fixed relatively low threshold values are computed, such that in each iteration, we can guarantee that the norm of the strongest channel is greater than that value with high probability. However, in order to attain a non-trivial bound, these values need to be as high as possible, yet, maintain the lower bound with high probability. Specifically, in each iteration, we chose a threshold value that is greater than the strongest norm with probability (hence the probability that no user exceeds it is ). This corresponds to a threshold that users exceed it on average. Nonetheless, depending on the strictness of the bound, a different threshold sets can be chosen. Accordingly, we have the following.

###### Lemma 4.

The expected sum capacity of Algorithm SIC-Channel-Access with ZF-SIC decoding satisfies the following lower bound with high probability.

 Cav≥r∑l=1log(1+u(l−1)logK)

where is a threshold such that projected channel norms exceeds it on average, in iteration .

###### Proof.

Let us consider the probability that the no user exceeds the threshold on iteration . This can be expressed as

 Pr(∥h(l)∥2

Since is monotone,

 Pr(log(1+P∥h(l)∥2)

Let , namely, the event in which the strongest channel in iteration is greater than threshold . We have

 Pr(r⋂l=1A(l)) = ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯Pr(r⋃l=1¯¯¯¯¯¯¯¯A(l)) ≥ 1−r∑l=1Pr(¯¯¯¯¯¯¯¯A(l)) = 1−rO(1/K).

That is, the probability that at least one user exceeds the threshold in each iteration, is greater than . As a result, since each l.h.s. element is greater than threshold with high probability, we conclude that

 r∑l=1log(1+P∥h(l)∥2)≥r∑l=1log(1+u(l)logK),

with high probability. Consequently,

 r∑l=1E[log(1+P∥h(l)∥2)]≥r∑l=1log(1+u(l)logK).

Note that the SIC scheme above achieves the same scaling law as the former linear decorrelation scheme (clearly, one cannot do better asymptotically), yet, it also achieves higher power gain, i.e., there is a boost in performance since the effective SNR is increasing in each iteration. This is clearly seen in the simulations below, where the results are clearly better for the finite population we tested. Specifically, Figure 9 depicts the gain of multi-user scheduling with and without SIC, and compares it to scheduling schemes which schedule a single user at a time.

###### Remark 4.

Note that the capacity distribution in each iteration under SIC can be approximated by the Gumbel distribution. Further, the sum-capacity distribution under SIC is also approximated by the Gumbel distribution, since the Gumbel distribution is infinitely divisible [5]. This is also consistent with Figure 10, which zooms in on the comparison between the capacity distributions: when the receiver uses ZF versus ZF-SIC to decode the data streams.

###### Remark 5.

Note that in the suggested SIC algorithm, the users are naturally ordered from the strongest to the -strongest user. Hence, to improve the sum capacity, one might devise a distributed water-filling algorithm (e.g. [9]), such that each user will invest a power amount proportional to its order.

The ZF receiver discussed thus far is on the one hand simple enough to facilitate rigorous analysis, yet, as shown in the previous sections, powerful enough in the sense that with intelligent user selection (in this paper, distributed) can achieve the optimal scaling laws. Still, this is not the optimal linear receiver. In this section, we explore the scaling laws of the expected capacity of the MMSE receiver.

As mentioned, in this case we let denote a matrix whose columns are the channel vectors of the transmitting users. That is, when using the Channel-Access algorithm, vectors with norm greater than the threshold. is the matrix with its column removed. Under these definitions, the SINR seen at the th stream was given in (7). Let denote the set of channels with norm greater than a threshold. Then, the expected capacity under the threshold based scheduling algorithm in Section 4 is as follows.

###### Proposition 3.

For , the expected sum capacity with MMSE decoding is

 E[C(uk)]= r∑j=1kje−kj!j∑i=1E[log(1+Ph†iRhi)∣∣∥hs∥2>uk,∀s∈S]+O(loglogKK)

where is the threshold set such that users exceed it on average.

This expected capacity should be optimized over . The proof is similar to the ZF setting. The only change is in the SNR seen by the users, as reflected by the term within the .

###### Remark 6.

When ZF decoding was used, the effective SNR seen by a user was . As the matrix is unitary, it is clear why scaling up resulted in scaling up the SNR. In the MMSE case, however, the analysis is intricate. The SNR is . While scales up, may appear to scale down as the eigenvalues of scale up. However, note that is not full rank, hence has at least one zero eigenvalue. As a result, has at least one eigenvalue which does not scale down.

To evaluate the expected capacity in Propositions 3, the characteristics of the random variables should be understood, especially when the number of transmitting users is random. Further, the influence of the norms on should be evaluated. The key technical challenge, however, is due to the norm condition inducing dependence on the matrix elements, hence the random matrix theory usually used in the MIMO literature does not hold. Part of the contribution in this section, is by bringing new tools to tackle this problem.

Note that in this section, regular type letters represent random variables as well as scalar variables. The difference will be clear from the context. As a first tool to handle the dependence within the matrix entries, we start with the following claim:

###### Claim 3.

Assume the norms are above a given threshold . Then the following properties hold:

• The entries of the channel vector remain zero mean.

• The variance of each entry in the vector scales. In particular, is equals to

 E[|hi,n|2∣∣∥hi∥2>uk]=(uk+a{K,r})/r.
• The vector elements remain uncorrelated in pairs.

Note that, to begin with, the entries of are i.i.d. The claim states that conditioned on exceeding a threshold, while not i.i.d., they sustain the zero correlation. This property will be useful throughout the remainder of this paper. The proof of Claim 3 is deferred to Section 7.

### 6.1 Threshold-based MMSE Upper Bound

Now we are ready to derive the scaling-law of the MMSE receiver, using the following upper and lower bounds on the threshold based expected capacity.

###### Lemma 5.

The expected sum capacity of Algorithm Channel-Access with MMSE decoding satisfies the following upper bound.

 E[C(uk)]≤r∑j=1kje−kj!jlog⎛⎜ ⎜⎝1+P(1−(j−1)u2kr((1+jr)(uk+a{K,r})2+a{K,r}(a{K,r}+1)+uk))(uk+a{K,r})⎞⎟ ⎟⎠.

where is given by (4) and is the threshold set such that users exceed it on average.

Before we prove the lemma, it is interesting to compare the scaling law under MMSE decoding to that achieved with ZF decoding. In both Lemma 1 and Lemma 5, the capacity seen by each user is approximately , for some constant . Asymptotically, it follows that , and in both cases, the capacity gain comes from the threshold value , which is, as mentioned, . This gives the growth rate of per user. However, note that while in Lemma 1, for large enough the gain in Lemma 5 is asymptotically , that is, a larger gain for any . This is not surprising, as an MMSE decoder does give a better power gain, but does not improve the already optimal scaling law.

###### Proof (Lemma 5).

The capacity seen by user is bounded by:

 E[Ci(uk)] \lx@stackrel(b)=log(1+PE[r∑n=1r∑m=1h∗i,mhi,n[R]mn∣∣ ∣∣∥hs∥2>uk,∀s∈S]) \lx@stackrel(c)=log(1+Pr∑n=1r∑m=1E[h∗i,mhi,n∣∣∥hi∥2>uk]E[[R]mn∣∣∥hs∥2>uk,∀s∈S]) \lx@stackrel(d)=log(1+Pr∑n=1E[∥hi,n∥2∣∣∥hi∥2>uk]E% [[R]nn∣∣∥hs∥2>uk,∀s∈S]) \lx@stackrel(e)=log(1+P(uk