On Stochastic Orders and Fast Fading Multiuser Channels with Statistical CSIT
Abstract
In this paper, we investigate the ergodic capacity of fast fading Gaussian multiuser channels when only the statistics of the channel state are known at the transmitter. In general, the characterization of capacity regions of multiuser channels with only statistical channel state information at the transmitter (CSIT) is open. Instead of directly matching achievable rate regions and the corresponding outer bounds, in this work we resort to classifying the random channels through their probability distributions. To be more precise, in order to attain capacity results, we first derive sufficient conditions to attain some information theoretic channel orders such as degraded and strong/very strong interference by applying the usual stochastic order and exploiting the same marginal property such that the capacity regions of the memoryless Gaussian multiuser channels can be characterized. These include Gaussian interference channels, Gaussian broadcast channels, and Gaussian wiretap channels/secret key generation. We also extend the framework to channels with a specific memory structure, namely, channels with finitestate, wherein the Markov fading channel is discussed as a special case. Several practical examples such as Rayleigh fading and Nakagamim fading, etc., illustrate the application of the derived results.
CSICSIChannel State Information
I Introduction
For Gaussian multiuser (GMU) channels with perfect channel state information at the transmitter (CSIT), due to the capability of ordering the channels of different users, capacity regions/secrecy capacity are known for the degraded broadcast channel (BC) [1] [2] and wiretap channel (WTC) [3], [4], [5], and also the sum capacity of lowinterference regime [6] and the capacity region for some cases of interference channel (IC) such as strong IC [7], [8], [9] and very strong IC [8]. When fading effects of wireless channels are taken into account, if there is perfect CSIT, some of the above capacity results still hold with an additional operation of taking the average with respect to the fading channels. For example, in [10], the ergodic secrecy capacity of Gaussian WTC is derived; in [11], the ergodic capacity regions are derived for ergodic very strong and uniformly strong (each realization of the fading process is a strong interference channel) Gaussian IC. Because of limited feedback bandwidth and the delay caused by channel estimation, the transmitter may not be able to track channel realizations instantaneously if they vary rapidly. Thus, for fast fading channels, it is more practical to consider the case with only partial CSIT of the legitimate channel. However, when there is only statistical CSIT, there are only few known capacity results, such as the layered BC [12], the binary fading interference channel [13], the onesided layered IC [14], Gaussian WTC [15], [16], and layered WTC [16], etc. The main difficulty is that, without the knowledge of the instantaneous CSIT, we may not be able to directly compare the channels in a simple form and manner. In particular, channel orders including degraded, less noisy, and more capable [17] [18] in Gaussian BC and Gaussian WTC or the strong and very strong in IC depend on the knowledge of CSIT. Note that these channel orders can help to simplify the functional optimization with respect to the channel input distribution and/or channel prefixing.
To consider a GMUchannel in which the transmitters only know the distributions of the channels but not the realizations, some immediate questions are: How to compare channel qualities only by their distributions? How to derive the capacity region by exploiting such comparison of channel qualities? From an informationtheoretic point of view, to deal with this problem, a better achievable scheme and tighter outer bound shall be developed. In contrast, in this work we resort to identifying that whether the random channels are stochastically orderable or not, to find the capacity region. In particular, a GMU channel with orderable random channels means that there exists an equivalent GMU channel in which we can reorder channel realizations among different transmitterreceiver pairs in the desired manner^{1}^{1}1The desired manner is related to the intrinsic property of the channel to be analyzed, which will be shown later.. Taking the Gaussian BC (GBC) as an example, an orderable twouser GBC means that under the same noise distributions at the two receivers, in the equivalent GBC, one channel strength is always stronger or weaker than the other for all realizations within a codeword length. We attain this goal mainly by the following elements: stochastic orders [19], coupling [20], and the same marginal property [21]. The stochastic orders have been widely used in the last several decades in diverse areas of probability and statistics such as reliability theory, queueing theory, and operations research, etc., see [19] and references therein. Different stochastic orders such as the usual stochastic order, the convex order, and the increasing convex order can help us to identify the location, dispersion, or both location and dispersion of random variables, respectively [19]. Choosing a proper stochastic order to compare the unknown channels allows us to form an equivalent channel in which realizations of channel gains are ordered in the desired manner. Then, we are able to derive the capacity regions of the equivalent MU channel, which is simpler than directly considering the original channel.
We also investigate the applicability of the developed framework to MU channels with memory, e.g., the finitestate BC (FSBC) [22]. In the FSBC model, the channels from the transmitter to the receivers are governed by a state sequence that depends on the channel input, outputs, and previous states, which are described by the transition function. Some examples for FSC include multiple access channels [23], degraded BC [22], etc. More detailed discussion on FSC without and with feedback can be found in [22], [24], respectively, and references therein.
The main contributions of this paper, which are all novel in the literature to the best of our knowledge, are as follows:

We use maximal coupling to illustrate the insight of the proposed framework and we use coupling to derive sufficient conditions to attain the capacity results. In addition to the coupling scheme, we also identify that another alternatively explicit construction to derive the sufficient conditions is indeed a copula.

By coupling, we classify memoryless fast fading GMU channels such that we can obtain the capacity results of them under statistical CSIT. To attain this goal, we integrate the concepts of usual stochastic order and the same marginal property, i.e., an intrinsic property of MU channels without receivers cooperation, such that we can align the realizations of the fading channel gains between different users in an identical trichotomy order over time to obtain an equivalent channel.

We then connect the trichotomy order of channel strengths in the equivalent channels and different information theoretic orders to characterize the capacity regions of: Gaussian IC (GIC), GBC, GWTC, and also the secret key capacity of the secret key generation with a GaussianMaurer’s satellite model [25].

We further extend the framework to timevarying channels with memory. In particular, we consider the finitestate channel (FSC) model introduced by [26]. We use the finitestate BC (FSBC) [22] as an example. The Markov fading channel, which is commonly used to model the memory effect in wireless channels, is also considered in this paper.

Several examples with practical channel distributions are illustrated to show the usage scenarios of the developed framework.
Notation: Upper case normal/bold letters denote random variables/random vectors (or matrices), which will be defined when they are first mentioned; lower case bold letters denote vectors. The expectation is denoted by . We denote the probability mass function (PMF) and probability density function (PDF) of a random variable by and , respectively. The cumulative distribution function (CDF) is denoted by , where is the complementary CDF (CCDF) of . denotes that the random variable follows the distribution with CDF . The mutual information between two random variables and is denoted by while the conditional mutual information given is denoted by . The differential and conditional differential entropies are denoted by and , respectively. A Markov chain relation between , , and is described by . denotes the uniform distribution between and and Bernoulli distribution with probability is defined by Bern. The indicator function is denoted by , e.g., means the expression in (1) is valid. The support of a random variable is denoted by either supp or supp. The null set is denoted by . The logarithms used in the paper are all with respect to base 2. We define . We denote the equality in distribution by . The convolution of functions and is denoted by . Calligraphic alphabets in capital denote sets. The convex hull of a set is denoted by .
The remainder of the paper is organized as follows. In Section II, we introduce our motivation and important preliminaries. In Section III, we develop our framework from maximal coupling, coupling, and copulas. In Section IV we apply the developed framework to fast fading Gaussian interference channels, broadcast channels, wiretap channels, and source model secret key generation with statistical CSIT. In Section V we discuss the finite state broadcast channel as an example of applying the developed framework for channels with memory. Finally, Section VI concludes the paper.
Ii Motivation and Preliminaries
In this section, we first explain our motivation and then review several key properties and definitions including the same marginal property, degradedness, and the usual stochastic orders, which are crucial for deriving the main results of this work. We assume that each node in the considered MU channel is equipped with a single antenna.
Iia Motivation
With perfect CSIT, the transmitter can easily compare the strengths of instantaneous channels among time and/or users and find the optimal strategy for the transmission [18], including the designs of the codebook, channel input distribution, resource allocation, etc., simply according to the trichotomy law. In contrast, when the transmitter has imperfect CSIT, e.g., only the statistics of the channels, the comparison is nontrivial due to the random strength of the fading channels. In the following, we use two simple examples, e.g., fast fading channels with additive white Gaussian noises (AWGN) to illustrate the difficulty of comparing channels in such a scenario. For simplicity, we consider a two receiver case where we assume that the noise variances at different receivers are identical, without loss of generality. Denote the square of the magnitudes of the two real random channels by and with PDF’s and , respectively. In Fig. 1 (a), the supports of and are nonoverlapping. Therefore, even when there is only statistical CSIT, we still can know that the second channel is always stronger than the first channel. However, due to the lacked knowledge of CSI, the capacity results degrade. In contrast, in Fig. 1 (b), the supports of the two channels are overlapping. Intuitively, the transmitter is not able to distinguish the stronger channel just based on the channel statistics. This is because, due to the overlapping part of the PDF’s, the trichotomy order of the channel realizations and may alter over time. As an example, from Fig. 1 (b) we may have two vectors of absolute square of channel gains as and , where each entry is independently and identically generated by the corresponding distributions at a specific sample time. It can be easily seen that there is no fixed trichotomy order between and within the whole codeword length. Therefore, we may not be able to claim which channel is better in this scenario. Based on the above observation, we would like to ask the following question: can the transmitter compare the strengths of channels to different receivers, only based on the knowledge of the distributions of the fading channels in general, just like the comparison of and in Fig. 1(a)? In addition, can we extend the scenarios of channel comparison from the easy case as Fig. 1(a) to Fig. 1(b)? In the following, we partly answer this question.
IiB Preliminaries
In the following, we review important background knowledge that will be necessary in the following.
IiB1 The Same Marginal Property
The same marginal property is crucial in the sense that it provides us the degree of freedom to construct an equivalent channel in which the realizations of all random channel tuples are aligned in the desired manner. For channels with multiple transmitters and multiple receivers, e.g., the interference channel, we can extend from the following building block:
Theorem 1 (The Same Marginal Property for Two Transmitters [27, Theorem 16.6]).
The capacity region of a discrete memoryless multiuser channel including two transmitters and two noncooperative receivers with input and output alphabets and , respectively, depends only on the conditional marginal distributions and and not on the joint conditional distribution , where and are the transmit and receive signal pairs, respectively.
For channels with a single transmitter and multiple receivers, e.g., BC or WTC, we can specialize from Theorem 1 as follows by removing or .
Corollary 1 (The Same Marginal Property for One Transmitter [27, Theorem 13.9]).
The capacity region of a discrete memoryless multiuser channel including one transmitter and two noncooperative receivers with input and output alphabets and , respectively, depends only on the conditional marginal distributions and and not on the joint conditional distribution , where and are the transmit signal and receive signal pair, respectively.
IiB2 InformationTheoretic Orders for Memoryless Channels and Stochastic Orders
Based on the same marginal property, we introduce the following definitions describing the relation of reception qualities among different receivers.
Definition 1.
A discrete memoryless channel with two noncooperative receivers and one transmitter is physically degraded if the transition probability satisfies , for all , i.e., , , and form a Markov chain . The channel is stochastically degraded if its conditional marginal distribution is the same as that of a physically degraded channel, i.e.,
(1) 
By applying the quantization scheme used in [18, Proof of Theorem 3.3 and Remark 3.8], we can extend the discussion from discrete alphabets to continuous ones. In the following we consider fast fading GMU channels. Denote the square of the fading channels from the transmitter to the first and second receivers considered in Definition 1 by the random variables and , respectively. We then define two sets of pairs of the random channels as and , i.e., in must satisfy (1). In the following, we call a stochastically degraded channel simply a degraded channel due to the same marginal property. Note that discussions on the relation between degradedness and other information theoretic channel orders can be referred to [17], [27], [28].
Definition 2.
Denote the square of channels from the th transmitter to the first and second receivers by and , respectively, . A discrete memorylessIC is said to have strong interference if
(2)  
(3) 
for all . Define the sets and .
Definition 3.
A discrete memorylessIC is said to have very strong interference if
(4)  
(5) 
for all . Define the set .
In the following, we introduce the definition of the usual stochastic order, which is the underlying tool to derive the main results in this paper.
Definition 4.
[19, (1.A.3)] For random variables and , is smaller than in the usual stochastic order, namely, , if and only if for all .
Note that the definition is applicable to both discrete or continuous random variables.
Iii Main Results
In this section, we develop a general framework to classify fading channels such that we are able to characterize the corresponding capacity results under statistical CSIT.
Iiia Problem Formulation and the Proposed Framework
According to the motivation described in Section IIA, we aim to formulate a problem in a tractable way. We first define a set , which is a subset of all tuples of random channels of the aforementioned MU channels. We also define a set , which includes all above MU channel orders from Definitions 1, 2, and 3 as
(6) 
Intuitively, we want to find a subset of all fading channel tuples, namely, , which should possess the following properties:

encompasses a constructive way to find the transformation ;

allows the existence of a corresponding set in which the channel qualities follow a certain order;

by this channel order, capacity results are attainable.
The considered problem in this work is formulated as follows, also illustrated in Fig. 6.
P1: Find a set of tuples , namely, , such that
(7)  
(8) 
Note that (7) is due to the fact that in the desired equivalent channel, the tractability of the derivation of the capacity region can be guaranteed, while (8) ensures that after the transformation , the capacity result is not changed by Theorem 1.
Remark 1.
The optimal classification shall be finding the three elements of a tuple , simultaneously, instead of fixing and then finding . However, as mentioned previously, the goal of this work is not to match new capacity inner and outer bounds for the open scenarios, which is out of the scope of this work.
Note that under different assumptions on different MU channel models, ways to identify and may be different, not to mention the corresponding capacity results. Therefore, we will introduce those results case by case.
In the following we summarize three feasible schemes for P1 to compare channel strengths for the case in Fig. 1 (b), when the transmitter has only statistical CSIT. The first two schemes are related to coupling and the third one is related to copula [29]. At the end we will show that the schemes by coupling and copula are equivalent. In brevity, the trick of all schemes is that under fixed marginal distributions, to find a special structure on the dependence among fading channels corresponding to different users to fit our purpose, e.g., to attain the capacity results. Note that the three schemes can be easily extended to an arbitrary number of receivers. We first give the following definition.
Definition 5.
[30, Definition 2.1] The pair is a coupling of the random variables if and .
It will be clarified that coupling plays an important role of the function in P1 and the proposed framework.
IiiA1 A Construction from Maximal Coupling
For the random variables whose PDF’s partially overlap as in Fig. 1 (b), a straightforward idea to align and for each realization is as follows: we try to form an equivalent channel pair with PDF’s and , respectively, which is a coupling of , to reorder and in the sense that if . Otherwise, and follow PDF’s and , respectively. If this alignment is possible, we can transform the remaining parts of the PDF’s into two nonoverlapped ones. Then we know that one of the fading channel is always equivalently no weaker than the other one, even if the statistics of the channels are known, only. However, we may ask whether such an alignment exists or not? Our answer is yes, which relies on the concept of maximal coupling [30] by reordering part of the channel realizations as defined as follows.
Definition 6.
[30, Section 2.2] For the random variables , the coupling is called a maximal coupling if gets its maximal value among all the couplings of .
To proceed, we introduce a result of maximal coupling from [30].
Proposition 1.
[30, Proposition 2.5] Suppose and are random variables with respective piecewise continuous density functions and . The maximal coupling for results in
(9) 
Based on Proposition 1, we derive the following result, which reorders the realizations of a class of in the same trichotomy order.
Theorem 2.
For a singletransmitter tworeceiver AWGN channel, assume the PDF’s of the two channels, namely, and , are continuous or with finite discontinuities. Then the selection
(10)  
(11)  
(12) 
where , and , is feasible to P1.
Proof.
The explicit construction of in (12) is from the proof of maximal coupling [30, Proposition 2.5], while (10) is a sufficient condition to guarantee that the realizations of the two equivalent channels follow the desired order after maximal coupling. To be selfcontained, we firstly restate the important steps of the proof of [30, Proposition 2.5] in the following. Define . Any coupling of should satisfy
(13) 
where (a) is by Definition 5, (b) is by the definition of and (c) is by the definition of .
On the other hand, we can use (12) as an explicit construction of the maximal coupling. Define three independent random variables , , and , following the PDF’s , , and , respectively, as shown in (12). We can show that (12) is a coupling as follows:
(14) 
where (a) is by construction, i.e., by assigning and to with probability and , respectively. Hence, it is clear that (14) fulfills the definition of coupling in Definition 5. On the other hand, it is clear that . Therefore, from (13) and (14), we know that (12) can achieve maximal coupling.
In the following, we will prove that (10) and (12) form a feasible solution for P1. By (9) it is clear that is the area of the intersection of and . Then from (12), we know that and do not overlap with probability . In addition, from (10) we know that if supp and supp. That is, with probability . Furthermore, from (12) (or from (9)), we know that is with probability . In other words, by the maximal coupling (12), we can construct an equivalent channel where there are only two relations between the fading channels realizations and : 1) ; 2) , which completes the proof. ∎
Remark 2.
Note that by applying the maximal coupling, we can transform the two PDF’s like the ones in Fig. 1(b) into equivalent ones where one of the PDF’s can generate channel realizations always no worse than those of the other. In the following we use two examples to illustrate the feasibility of the selection of and in Theorem 2. Assume suppsupp and sup(suppsup(supp, there may exist supp and supp, such that , where and , and for the case . Then can be larger or smaller than . An example is shown in Fig. 3(a). On the other hand, if suppsupp and the maximum value of the alphabets of is the same as that of , it is possible that , for the case . An example is shown in Fig. 3(b). This example shows that (10) is sufficient but not necessary. Note that the latter example fulfills the definition of the usual stochastic order and further discussion can be seen in the next method.
Remark 3.
Note that from [30, Proposition 2.7] we know that the probability of from the coupling of can be described by the total variation distance between and as
(15) 
where . By this way we can observe the relationship between the closeness of two random variables in distributions and the closeness of the two random variables can be coupled.
In fact, the condition described in Theorem 2 corresponds to . The reason that also results in with probability 1, will be explained by the next method, namely, coupling.
IiiA2 A Construction from Coupling
In this method we resort to constructing an explicit coupling such that each realization of channel pair has the same trichotomy order.
Theorem 3.
For a singletransmitter tworeceiver AWGN channel, the selection
(16)  
(17)  
(18) 
where , , is feasible to P1.
Proof.
From the coupling theorem [20] we know that, if and only if there exist random variables and such that with probability 1. Therefore, by construction we know that distributions of and fulfill the same marginal condition (8) in P1. In addition, the trichotomy order fulfills , i.e., the 1st channel is degraded from the 2nd one by (1) under this trichotomy order for AWGN channel. The proof of the coupling theorem [30, Ch. 2] provides us a constructive way to find , which is restated as follows for a self contained proof. If , and if the generalized inverses and exist, where the generalized inverse of is defined by , then the equivalent channels and can be constructed by and , respectively, where . This is because
(19) 
i.e., . Similarly, . Since , from Definition 4 we know that , for all . Then it is clear that , such that . Therefore, we attain (18), which completes the proof. ∎
Remark 4.
Originally, even though we know , the order of the channel realizations and may vary for each realization. Then we are not able to claim which channel is stronger for the duration of a codeword. However, from Theorem 3 we know that we can virtually explicitly align all the channel realizations within a codeword length such that each channel gain realization of is no worse than that of , if . Then we can claim that there exists an equivalently degraded channel. Note that the usual stochastic order aligns random fading gains in the equivalent channel, which makes it simple to adopt the MU channel orders, e.g., in Sec. IIB2. However, the alignment operation done in the equivalent channel may be too strict for feasible scenarios. This is because the considered channel orders, e.g., the degradedness, among MU channels are in fact described by just statistics but not to the realizations of individual channels.
IiiA3 A Construction from Copula
In this method, we first explicitly construct a joint distribution between the fading channels, such that each realization of channel pairs has also the same trichotomy order. We then identify this construction is indeed the copula [29]. The concepts of coupling and copula are similar in the sense that, in both cases the marginal distributions are given, and we want to form joint distributions to achieve certain goals. In the former case, we can introduce any relation between the random channels such that, for example, we can order the channel pairs for all realizations in the desired trichotomy order. However, we do not construct an explicit joint distribution. In the latter case, we may try to find a constructive way to attain an explicit joint distribution from the marginal ones, then identify that this joint distribution fulfill our requirements. Note that the existence of the copula is proved by Sklar’s theorem [29].
Theorem 4.
For a singletransmitter tworeceiver AWGN channel, the selection of
(20)  
(21)  
(22) 
where , , is feasible to P1.
Proof.
By the definition of it is clear that the marginal distributions are unchanged, i.e., and . Note that we consider the square of channel gains, so we substitute 0 into to marginalize it. With the selection , we prove that in the following.
Assume that , , we can prove that
(23) 
where (a) follows from the definition of the joint CCDF and (b) follows from (22) with the given property , which implies that . To ensure that for all random samples, we let . Thus, as long as , we can form an equivalent joint distribution as (22) such that it has the same marginal distribution as the original channel, then the capacity is unchanged. ∎
We can reexpress Theorem 4 in terms of the joint CDF instead of the joint CCDF (or the joint survival function [29]) as follows.
Corollary 2.
For a singletransmitter tworeceiver AWGN channel, the selection of
(24)  
(25)  
(26) 
where , solves P1.
Proof.
Here we construct the function by showing that (25) is identical to the FréchetHoeffdinglike upper bound for the survival copulas. From FréchetHoeffding bounds [29, Sec. 2.5] we know that a joint CDF can be upper and lower bounded by the marginals and as follows:
(27) 
On the other hand, by definition of the joint CCDF and the joint CDF , we can easily see:
(28) 
After substituting the upper bound in (27) into (28), we can get:
where (a) is by the definition of CCDF, which completes the proof. ∎
In fact, we can prove that (18) and (25) are equivalent by showing that:
(29) 
where (a) is due to the assumption that .
Note that copula and coupling introduce different expressions in the joint distributions in the sense that, in the latter case, the joint distribution is implicitly indicated by the random variable and also the marginal distributions in (18) while the joint distribution is explicitly constructed as (22) in copula.
Remark 5.
We can directly identity that (22) is a twodimensional copula [29, Definition 2.2.2] by definition without the aid of FréchetHoeffdinglike upper bound and the derivation in the proof of Corollary 2. An equivalent definition of a twodimensional copula [29, (2.2.2a), (2.2.2b), (2.2.3)] is a function with the following properties:

For every in ,
(30) (31) 
For every in such that and ,
(32)
The identification that in (22) is a copula is derived in Appendix I.
Iv Application on Fast Fading Memoryless MU Channels
In the following, we will use the coupling scheme introduced in Theorem 3 for memoryless MU channels due to its intuitive characteristics. We first consider the binary symmetric memoryless MU channels as examples for discrete memoryless MU channels and then extend to memoryless GMU channels.
Iva Binary and Binary Fading MU Channels with Statistical CSIT
In the following we take the binary symmetric channel with and without fading as examples.
IvA1 Binary Symmetric MU Channels with Statistical CSIT
For binary symmetric channels (BSC), we can model the random bit flip by a switch , i.e., if it switches to a bit flip with probability . Consider a binary symmetric MU channel with one transmitter and two receivers where and for the first and second BSC’s, respectively. Define the statistical CSIT as that the transmitter only knows the probabilities and , and define the perfect CSIT as that the transmitter knows the instantaneous positions of and . For statistical CSIT, we can use Theorem 3 to compare the two subBSC’s, i.e., the second channel is stronger than the first one if . Note that the direction of the inequality in usual stochastic order is in the reverse direction compared to the three schemes since here higher probability of means the channel is worse. Note also that the definition of the usual stochastic order in Definition (4) is also valid for discrete random variables.
IvA2 Binary Erasure MU Channels with Statistical CSIT
Similar to the BSC case, we can use a switch to change the input bit to an erasure. Then for two binary erasure channels and , , similar to the BSC case, we can claim that the second channel is stronger than the first one if .
IvA3 Binary Fading MU Channels with Statistical CSIT
For BSC’s, similar to the AWGN cases, we can consider a binary fading [13] as following, where the receive signal at the th receiver is:
(33) 
where and denote the fading channel and the equivalent additive noise, respectively. Then for a tworeceiver case, from Theorem 3 we can easily see that if , i.e., , the first channel is stronger than the second one.
IvB Fading Gaussian Interference Channels with Statistical CSIT
From this section we consider the Gaussian MU channels. When there is only statistical CSIT and full CSI at the receiver, the ergodic capacity region of GIC is unknown in general. In this section, we identify a sufficient condition based on the results in Section IIIA to attain the capacity regions of Gaussian interference channel with strong and very strong interference. Examples illustrate the results.
IvB1 Preliminaries and Results
We assume that each receiver perfectly knows the two channels to it. . Therefore, the considered received signals of a twouser fast fading Gaussian interference channel can be stated as
(34)  