Signal Shaping for BICM at Low SNR
The generalized mutual information (GMI) of bit-interleaved coded modulation (BICM) systems, sometimes called the BICM capacity, is investigated at low signal-to-noise ratio (SNR). The combinations of input alphabet, input distribution, and binary labeling that achieve the Shannon limit –1.59 dB are completely characterized. The main conclusion is that a BICM system with probabilistic shaping achieves the Shannon limit at low SNR if and only if it can be represented as a zero-mean linear projection of a hypercube. Hence, probabilistic shaping offers no extra degrees of freedom to optimize the low-SNR BICM-GMI, in addition to what is provided by geometrical shaping. The analytical conclusions are confirmed by numerical results, which also show that for a fixed input alphabet, probabilistic shaping can improve the BICM-GMI in the low and medium SNR range.
The most important breakthrough for coded modulation (CM) in fading channels came in 1992, when Zehavi introduced the so-called bit-interleaved coded modulation (BICM) , usually referred to as a pragmatic approach for CM [2, 3]. Despite not being fully understood theoretically, BICM has been rapidly adopted in commercial systems such as wireless and wired broadband access networks, 3G/4G telephony, and digital video broadcasting, making it the de facto standard for current telecommunications systems [3, Ch. 1].
Signal shaping refers to the use of non-equally spaced and/or non-equally likely symbols, i.e., geometrical shaping and probabilistic shaping, resp. Signal shaping has been studied during many years, cf. [4, 5] and references therein. In the context of BICM, geometrical shaping was studied in [6, 7, 8], and probabilistic shaping, i.e., varying the probabilities of the bit streams, was first proposed in [9, 10] and developed further in [11, 12, 13, 14]. Probabilistic shaping offers another degree of freedom in the BICM design, which can be used to make the discrete input distribution more similar to the optimal distribution (which is in general unknown). This is particularly advantageous at low and medium SNR.
For the additive white Gaussian noise (AWGN) channel, the so-called Shannon Limit (SL) represents the average bit energy-to-noise ratio needed to transmit information reliably when the signal-to-noise ratio (SNR) tends to zero [15, 16], i.e., in the wideband regime. When discrete input alphabets are considered at the transmitter and a BICM decoder is used at the receiver, the SL is not always achieved as first noticed in . This was later shown to be caused by the selection of the binary labeling . The behavior of BICM in the wideband regime was studied in [17, 19, 20, 18, 21] as a function of the alphabet () and the binary labeling (), assuming a uniform input distribution. First-order optimal (FOO) constellations were defined in  as the triplet that make a BICM system achieve the SL, where represents the input distribution.
In this paper, the results of  are generalized to nonuniform input distributions and give a complete characterization of FOO constellations for BICM in terms of . More particularly, the geometrical and/or probabilistic shaping rules that should be applied to a constellation to make it FOO are found. The main conclusion is that probabilistic shaping offers no extra degrees of freedom in addition to what is provided by geometrical shaping for BICM in the wideband regime.
Bold italic letters denote row vectors. Block letters denote matrices or sometimes column vectors. The identity matrix is . The inner product between two row vectors and is denoted by and their element-wise product by . The Euclidean norm of the vector is denoted by . Random variables are denoted by capital letters and random vectors by boldface capital vectors . The probability density function (pdf) of the random vector is denoted by and the conditional pdf by . A similar notation applies to probability mass functions of a random variable, which are denoted by and . Expectations are denoted by .
The empty set is denoted by and the binary set by . The negation of a bit is denoted by . Binary addition (exclusive-OR) of two bits and is denoted by . The same notation denotes the integer that results from taking the bitwise exclusive-or of two integers and .
Ii-B System Model
We consider transmissions over a discrete-time memoryless vectorial fast fading channel. The received vector at any discrete time instant is
where is the channel input and is Gaussian noise with zero mean and variance in each dimension , [3, App. 2.A]. The channel is represented by the -dimensional vector . It contains the real fading coefficients , which are random, possibly dependent, with the same pdf . We assume that and are perfectly known at the receiver or can be perfectly estimated, and that the technical requirements on and in [21, Sec. I-D] are satisfied.
The conditional transition pdf of the channel in (1) is
The SNR is defined as
where is the average transmitted symbol energy, is the transmission rate in information bits per symbol, and is the average received energy per information bit.
The generic BICM scheme in Fig. 1 is considered. The transmitter is, in the simplest case, a single binary encoder concatenated with an interleaver and a memoryless mapper . Multiple encoders and/or interleavers may be needed to achieve probabilistic shaping [10, 11, 12, 13]. At the receiver, using the channel output , the demapper computes metrics for the individual coded bits with , usually in the form of logarithmic likelihood ratios. These metrics are then passed to the deinterleaver(s) and decoder(s) to obtain an estimate of the information bits.
The mapper is defined via the input alphabet , where bits are used to index the symbols vectors for . We associate with each symbol the codeword (binary labeling) and the probability , where . The binary labeling is denoted by and the input distribution by .
In the following, the labeling used throughout this paper is defined. This can be done without loss of generality, as will be explained in Sec. II-C.
Definition 1 (Natural binary code)
The natural binary code (NBC) is the binary labeling , where denotes the base-2 representation of the integer , with being the most significant bit.
This definition of the NBC is different from the one in . The difference lies only in the bit ordering, i.e., in this paper we consider the last column of to contain the most significant bits of the base-2 representation of the integers . It follows from Definition 1 that
for and , and
for , and .
Ii-C Probabilistic Shaping in BICM
for , where for is the probability of . Since , the distribution is fully specified by the vector of bit probabilities .
Throughout this paper, we assume that for all ; i.e., all constellation points are used with a nonzero probability. This can be done without loss of generality, because if or for some , then half of the constellation points will never be transmitted. If this is the case, the corresponding branches in Fig. 1 are removed, is reduced by one, and the mapper is redefined accordingly.111Constellations with for some can yield counter-intuitive results, such as Gray-labeled constellations being FOO (see [13, 14] and Example 7.) The result is another BICM scheme with identical performance, which satisfies for all .
For any constellation , a set of equivalent constellations can be constructed by permuting the rows of , , and , provided that the same permutation is applied to all three matrices. Specifically, denote the permutation that maps the NBC into the desired labeling by , i.e., . The BICM system defined by the alphabet , the distribution , and the labeling is entirely equivalent to the system with alphabet , distribution , and labeling . Without loss of generality, the analysis in this paper is therefore restricted to the latter case.
Based on the previous discussion, from now on we use the name constellation to denote the pair , where the NBC labeling is implicit. Thus, and for all and , which simplifies the analysis. Note that cannot be chosen arbitrarily in BICM; only distributions that satisfy
for some vector of bit probabilities will be considered in the paper. An important special case is the uniform distribution, for which and .
Ii-D The Hadamard Transform
The Hadamard transform (HT), or Walsh–Hadamard transform, is a discrete, linear, orthogonal transform, whose coefficients take values in . It is popular in image processing  and can be used to analyze various aspects of binary labelings in digital communications and source coding [24, 25, 26, 21].
The HT of a matrix (or vector) with rows is
where for all and
can be interpreted as the uniformly weighted mean of the alphabet. This is a property that the HT shares with, e.g., the discrete Fourier transform.
It can be shown from (8) that
for all and . Therefore, the inverse transform is identical to the forward transform, apart from a scale factor:
Ii-E A New Transform
In this section, we define a linear transform between vectors or matrices, which depends on the input distribution via the bit probabilities . Its usage will become clear in Section III-C.
For equally likely symbols, i.e., , the transform becomes the identity operation , because then for and for .
See the Appendix. \qed
Substituting these two cases ( and ) into (15) proves the following corollary.
For any ,
The fact that the sums in (14) are zero whenever , independently of the input distribution, implies that the coefficients form an orthogonal basis. As a consequence, the transform is invertible, as shown in the next theorem.
The inverse transform of a matrix (or vector) is, given the bit probabilities ,
It is readily verified that , which is (14) in matrix notation. The mean values in each column of (19) are , which in agreement with (16) are the square roots of the elements in . Similarly, it can be shown that in (19) satisfies (15) and (17).
If the Gray-labeled -ary pulse amplitude modulation (PAM) constellation is considered, . Rewriting (12) in matrix notation, the transform can be calculated as , where . This nonequally spaced 4-PAM alphabet will be illustrated and analyzed in Example 3. The inverse transform (18) can be written as . For a uniform distribution, , which agrees with Remark 1.
In Sec. IV-B, we will need to apply the HT and the new transform after each other to the same alphabet. However, the two transforms do not commute, and the result will therefore depend on in which order the transforms are applied. Of particular interest for our analysis is the setup in Fig. 2, where and are related via the transform defined above. Their HTs and are however not related via the same transform. Instead, a relation between and can be established via the following theorem.
If , then their HTs and satisfy
and a product over is defined as 1.
See the Appendix. \qed
which, as predicted by Remark 2, are upper triangular.
Another relation between and can be deduced from Fig. 2. Defining the Hadamard matrix as the matrix with elements for , the HT relations (7) and (11) yield and . Since from Example 1 , we conclude that , which implies that . Because (see (10)) and , the inverse relation is . It is straightforward to verify that and calculated in this manner, using the numerical values of and in Example 1, indeed yield (23).
Iii BICM at low SNR
Iii-a Mutual Information
The mutual information (MI) in bits per channel use between the random vectors and for an arbitrary channel parameter perfectly known at the receiver is defined as
where the expectation is taken over the joint pdf , and is given by (2).
The MI between and conditioned on the value of the th bit at the input of the modulator is defined as
where the expectation is taken over the joint pdf .
Definition 4 (BICM Generalized Mutual Information)
where the second line follows by the chain rule. We will analyze the right-hand side of (4) as a function of , for a given pdf . According to (3), can be varied in two ways, either by varying for a fixed constellation or, equivalently, by rescaling the alphabet linearly for fixed and input distribution .
Martinez et al.  recognized the BICM decoder in Fig. 1 as a mismatched decoder and showed that the BICM-GMI in (4) corresponds to an achievable rate of such a decoder. This means that reliable transmission using a BICM system at rate is possible if . Since from (3) , the inequality gives222The definition of the related function in [21, eq. (37)] is erroneous and should read “ is bounded from below by , where .”
for any . Focusing on the wideband regime, i.e., asymptotically low SNR, we make the following definition.
Definition 5 (Low-GMI Parameters)
The low-GMI parameters of a constellation are defined as , where
In the wideband regime, the average bit energy-to-noise ratio needed for reliable transmission is, using (25) and the definition of , lower-bounded by
Furthermore, since in the wideband regime dB , dB.
The first-order behavior of the BICM-GMI in (4) is fully determined by , which, as we shall see later (e.g., in (III-C)), in turn depends on and . This is why we designate this triplet as low-GMI parameters. The same definitions can be applied to other MI functions such as the coded modulation MI (CM-MI) . In this paper, however, we are only interested in the BICM-GMI.
Iii-B Low-GMI Parameters for Uniform Distributions
The low-GMI parameters have been analyzed in detail for arbitrary input alphabets under the assumption of uniform probabilities . Under this assumption, they can be expressed as given by the following theorem.
For a constellation , the low-GMI parameters are
The low-GMI parameters can be conveniently expressed as functions of the HT of the alphabet , as shown in the following theorem.
The low-GMI parameters can be expressed as
Iii-C Low-GMI Parameters for Nonuniform Distributions
|FOO Condition||,||and ,|
The next theorem is analogous to Theorem 5 but applies to an arbitrary input distribution.
For a constellation , the low-GMI parameters are
Substituting (33) and writing the squared norms as the inner products of two identical vectors yields
The expression in brackets can be simplified as
which completes the proof of (35). \qed
Theorem 7 shows that the low-GMI parameters depend on the input alphabet , the binary labeling (via in the expression for ), and the input distribution (via and ). While the low-GMI parameters of an alphabet with uniform probabilities are conveniently expressed in terms of its HT (cf. Theorem 6), no similar expressions are known for the low-GMI parameters of a general constellation in (33)–(35). This has so far prevented the analytic optimization of such constellations. The new transform introduced in Section II-E, however, solves this problem by establishing an equivalence between an arbitrary constellation, possibly with nonuniform probabilities, and another constellation with uniform probabilities.
The low-GMI parameters of any constellation are equal to the low-GMI parameters of .