Signal Shaping for BICM at Low SNR
Abstract
The generalized mutual information (GMI) of bitinterleaved coded modulation (BICM) systems, sometimes called the BICM capacity, is investigated at low signaltonoise ratio (SNR). The combinations of input alphabet, input distribution, and binary labeling that achieve the Shannon limit –1.59 dB are completely characterized. The main conclusion is that a BICM system with probabilistic shaping achieves the Shannon limit at low SNR if and only if it can be represented as a zeromean linear projection of a hypercube. Hence, probabilistic shaping offers no extra degrees of freedom to optimize the lowSNR BICMGMI, in addition to what is provided by geometrical shaping. The analytical conclusions are confirmed by numerical results, which also show that for a fixed input alphabet, probabilistic shaping can improve the BICMGMI in the low and medium SNR range.
I Introduction
The most important breakthrough for coded modulation (CM) in fading channels came in 1992, when Zehavi introduced the socalled bitinterleaved coded modulation (BICM) [1], usually referred to as a pragmatic approach for CM [2, 3]. Despite not being fully understood theoretically, BICM has been rapidly adopted in commercial systems such as wireless and wired broadband access networks, 3G/4G telephony, and digital video broadcasting, making it the de facto standard for current telecommunications systems [3, Ch. 1].
Signal shaping refers to the use of nonequally spaced and/or nonequally likely symbols, i.e., geometrical shaping and probabilistic shaping, resp. Signal shaping has been studied during many years, cf. [4, 5] and references therein. In the context of BICM, geometrical shaping was studied in [6, 7, 8], and probabilistic shaping, i.e., varying the probabilities of the bit streams, was first proposed in [9, 10] and developed further in [11, 12, 13, 14]. Probabilistic shaping offers another degree of freedom in the BICM design, which can be used to make the discrete input distribution more similar to the optimal distribution (which is in general unknown). This is particularly advantageous at low and medium SNR.
For the additive white Gaussian noise (AWGN) channel, the socalled Shannon Limit (SL) represents the average bit energytonoise ratio needed to transmit information reliably when the signaltonoise ratio (SNR) tends to zero [15, 16], i.e., in the wideband regime. When discrete input alphabets are considered at the transmitter and a BICM decoder is used at the receiver, the SL is not always achieved as first noticed in [17]. This was later shown to be caused by the selection of the binary labeling [18]. The behavior of BICM in the wideband regime was studied in [17, 19, 20, 18, 21] as a function of the alphabet () and the binary labeling (), assuming a uniform input distribution. Firstorder optimal (FOO) constellations were defined in [21] as the triplet that make a BICM system achieve the SL, where represents the input distribution.
In this paper, the results of [21] are generalized to nonuniform input distributions and give a complete characterization of FOO constellations for BICM in terms of . More particularly, the geometrical and/or probabilistic shaping rules that should be applied to a constellation to make it FOO are found. The main conclusion is that probabilistic shaping offers no extra degrees of freedom in addition to what is provided by geometrical shaping for BICM in the wideband regime.
Ii Preliminaries
Iia Notation
Bold italic letters denote row vectors. Block letters denote matrices or sometimes column vectors. The identity matrix is . The inner product between two row vectors and is denoted by and their elementwise product by . The Euclidean norm of the vector is denoted by . Random variables are denoted by capital letters and random vectors by boldface capital vectors . The probability density function (pdf) of the random vector is denoted by and the conditional pdf by . A similar notation applies to probability mass functions of a random variable, which are denoted by and . Expectations are denoted by .
The empty set is denoted by and the binary set by . The negation of a bit is denoted by . Binary addition (exclusiveOR) of two bits and is denoted by . The same notation denotes the integer that results from taking the bitwise exclusiveor of two integers and .
IiB System Model
We consider transmissions over a discretetime memoryless vectorial fast fading channel. The received vector at any discrete time instant is
(1) 
where is the channel input and is Gaussian noise with zero mean and variance in each dimension [1], [3, App. 2.A]. The channel is represented by the dimensional vector . It contains the real fading coefficients , which are random, possibly dependent, with the same pdf . We assume that and are perfectly known at the receiver or can be perfectly estimated, and that the technical requirements on and in [21, Sec. ID] are satisfied.
The conditional transition pdf of the channel in (1) is
(2) 
The SNR is defined as
(3) 
where is the average transmitted symbol energy, is the transmission rate in information bits per symbol, and is the average received energy per information bit.
The generic BICM scheme in Fig. 1 is considered. The transmitter is, in the simplest case, a single binary encoder concatenated with an interleaver and a memoryless mapper . Multiple encoders and/or interleavers may be needed to achieve probabilistic shaping [10, 11, 12, 13]. At the receiver, using the channel output , the demapper computes metrics for the individual coded bits with , usually in the form of logarithmic likelihood ratios. These metrics are then passed to the deinterleaver(s) and decoder(s) to obtain an estimate of the information bits.
The mapper is defined via the input alphabet , where bits are used to index the symbols vectors for . We associate with each symbol the codeword (binary labeling) and the probability , where . The binary labeling is denoted by and the input distribution by .
In the following, the labeling used throughout this paper is defined. This can be done without loss of generality, as will be explained in Sec. IIC.
Definition 1 (Natural binary code)
The natural binary code (NBC) is the binary labeling , where denotes the base2 representation of the integer , with being the most significant bit.
This definition of the NBC is different from the one in [21]. The difference lies only in the bit ordering, i.e., in this paper we consider the last column of to contain the most significant bits of the base2 representation of the integers . It follows from Definition 1 that
(4) 
for and , and
(5) 
for , and .
IiC Probabilistic Shaping in BICM
Assuming independent, but possibly nonuniformly distributed, bits at the input of the modulator (cf. Fig. 1), the symbol probabilities are given by [21, eq. (30)] [13, eq. (8)] [22, eq. (9)]
for , where for is the probability of . Since , the distribution is fully specified by the vector of bit probabilities .
Throughout this paper, we assume that for all ; i.e., all constellation points are used with a nonzero probability. This can be done without loss of generality, because if or for some , then half of the constellation points will never be transmitted. If this is the case, the corresponding branches in Fig. 1 are removed, is reduced by one, and the mapper is redefined accordingly.^{1}^{1}1Constellations with for some can yield counterintuitive results, such as Graylabeled constellations being FOO (see [13, 14] and Example 7.) The result is another BICM scheme with identical performance, which satisfies for all .
For any constellation , a set of equivalent constellations can be constructed by permuting the rows of , , and , provided that the same permutation is applied to all three matrices. Specifically, denote the permutation that maps the NBC into the desired labeling by , i.e., . The BICM system defined by the alphabet , the distribution , and the labeling is entirely equivalent to the system with alphabet , distribution , and labeling . Without loss of generality, the analysis in this paper is therefore restricted to the latter case.
Based on the previous discussion, from now on we use the name constellation to denote the pair , where the NBC labeling is implicit. Thus, and for all and , which simplifies the analysis. Note that cannot be chosen arbitrarily in BICM; only distributions that satisfy
(6) 
for some vector of bit probabilities will be considered in the paper. An important special case is the uniform distribution, for which and .
IiD The Hadamard Transform
The Hadamard transform (HT), or Walsh–Hadamard transform, is a discrete, linear, orthogonal transform, whose coefficients take values in . It is popular in image processing [23] and can be used to analyze various aspects of binary labelings in digital communications and source coding [24, 25, 26, 21].
Definition 2
The HT of a matrix (or vector) with rows is
(7) 
where for all and
(8) 
Because for , setting in (7)–(8) shows that the first HT vector
(9) 
can be interpreted as the uniformly weighted mean of the alphabet. This is a property that the HT shares with, e.g., the discrete Fourier transform.
It can be shown from (8) that
(10) 
for all and . Therefore, the inverse transform is identical to the forward transform, apart from a scale factor:
(11) 
IiE A New Transform
In this section, we define a linear transform between vectors or matrices, which depends on the input distribution via the bit probabilities . Its usage will become clear in Section IIIC.
Definition 3
Remark 1
For equally likely symbols, i.e., , the transform becomes the identity operation , because then for and for .
The transform coefficients are nonsymmetric in the sense that in general . They have some appealing properties given by the following lemma, which will be used in the proofs of Theorems 3, 4, and 8.
Proof:
See the Appendix. \qed
We pay particular attention to two important special cases of (15). First, if , then and for . Second, if for any integer , then by (8), for any and by (6)
Using first (5) and then (4), we obtain
Substituting these two cases ( and ) into (15) proves the following corollary.
Corollary 2
For any ,
(16)  
(17) 
The fact that the sums in (14) are zero whenever , independently of the input distribution, implies that the coefficients form an orthogonal basis. As a consequence, the transform is invertible, as shown in the next theorem.
Theorem 3
The inverse transform of a matrix (or vector) is, given the bit probabilities ,
(18) 
Proof:
Example 1
If the bit probabilities are , then the symbol probabilities (6) are . The transform coefficients in (13) are the elements at row , column of
(19) 
It is readily verified that , which is (14) in matrix notation. The mean values in each column of (19) are , which in agreement with (16) are the square roots of the elements in . Similarly, it can be shown that in (19) satisfies (15) and (17).
If the Graylabeled ary pulse amplitude modulation (PAM) constellation is considered, . Rewriting (12) in matrix notation, the transform can be calculated as , where . This nonequally spaced 4PAM alphabet will be illustrated and analyzed in Example 3. The inverse transform (18) can be written as . For a uniform distribution, , which agrees with Remark 1.
In Sec. IVB, we will need to apply the HT and the new transform after each other to the same alphabet. However, the two transforms do not commute, and the result will therefore depend on in which order the transforms are applied. Of particular interest for our analysis is the setup in Fig. 2, where and are related via the transform defined above. Their HTs and are however not related via the same transform. Instead, a relation between and can be established via the following theorem.
Theorem 4
If , then their HTs and satisfy
(20)  
(21) 
where
(22) 
and a product over is defined as 1.
Proof:
See the Appendix. \qed
Remark 2
Example 2
Expression (4) can be written as , or . The element at row , column of and are given by (4)–(21) as, resp., and . With , and from Example 1, we obtain and
(23) 
which, as predicted by Remark 2, are upper triangular.
Another relation between and can be deduced from Fig. 2. Defining the Hadamard matrix as the matrix with elements for , the HT relations (7) and (11) yield and . Since from Example 1 , we conclude that , which implies that . Because (see (10)) and , the inverse relation is . It is straightforward to verify that and calculated in this manner, using the numerical values of and in Example 1, indeed yield (23).
Iii BICM at low SNR
Iiia Mutual Information
The mutual information (MI) in bits per channel use between the random vectors and for an arbitrary channel parameter perfectly known at the receiver is defined as
where the expectation is taken over the joint pdf , and is given by (2).
The MI between and conditioned on the value of the th bit at the input of the modulator is defined as
where the expectation is taken over the joint pdf .
Definition 4 (BICM Generalized Mutual Information)
where the second line follows by the chain rule. We will analyze the righthand side of (4) as a function of , for a given pdf . According to (3), can be varied in two ways, either by varying for a fixed constellation or, equivalently, by rescaling the alphabet linearly for fixed and input distribution .
Martinez et al. [27] recognized the BICM decoder in Fig. 1 as a mismatched decoder and showed that the BICMGMI in (4) corresponds to an achievable rate of such a decoder. This means that reliable transmission using a BICM system at rate is possible if . Since from (3) , the inequality gives^{2}^{2}2The definition of the related function in [21, eq. (37)] is erroneous and should read “ is bounded from below by , where .”
(25) 
for any . Focusing on the wideband regime, i.e., asymptotically low SNR, we make the following definition.
Definition 5 (LowGMI Parameters)
The lowGMI parameters of a constellation are defined as , where
In the wideband regime, the average bit energytonoise ratio needed for reliable transmission is, using (25) and the definition of , lowerbounded by
(26) 
Furthermore, since in the wideband regime dB [15], dB.
The firstorder behavior of the BICMGMI in (4) is fully determined by , which, as we shall see later (e.g., in (IIIC)), in turn depends on and . This is why we designate this triplet as lowGMI parameters. The same definitions can be applied to other MI functions such as the coded modulation MI (CMMI) [21]. In this paper, however, we are only interested in the BICMGMI.
IiiB LowGMI Parameters for Uniform Distributions
The lowGMI parameters have been analyzed in detail for arbitrary input alphabets under the assumption of uniform probabilities [21]. Under this assumption, they can be expressed as given by the following theorem.
Theorem 5
For a constellation , the lowGMI parameters are
(27)  
(28)  
(29) 
Proof:
The lowGMI parameters can be conveniently expressed as functions of the HT of the alphabet , as shown in the following theorem.
Theorem 6
The lowGMI parameters can be expressed as
(30)  
(31)  
(32) 
IiiC LowGMI Parameters for Nonuniform Distributions
or  



FOO Condition  ,  and , 
The next theorem is analogous to Theorem 5 but applies to an arbitrary input distribution.
Theorem 7
For a constellation , the lowGMI parameters are
(33)  
(34)  
(35) 
Proof:
Again, (33) and (34) follow from Definition 5, while (35) requires some analysis. It was shown in [21, Th. 10] that
(36) 
Substituting (33) and writing the squared norms as the inner products of two identical vectors yields
The expression in brackets can be simplified as
which completes the proof of (35). \qed
Theorem 7 shows that the lowGMI parameters depend on the input alphabet , the binary labeling (via in the expression for ), and the input distribution (via and ). While the lowGMI parameters of an alphabet with uniform probabilities are conveniently expressed in terms of its HT (cf. Theorem 6), no similar expressions are known for the lowGMI parameters of a general constellation in (33)–(35). This has so far prevented the analytic optimization of such constellations. The new transform introduced in Section IIE, however, solves this problem by establishing an equivalence between an arbitrary constellation, possibly with nonuniform probabilities, and another constellation with uniform probabilities.
Theorem 8
The lowGMI parameters of any constellation are equal to the lowGMI parameters of .