Signal Shaping for BICM at Low SNR

Signal Shaping for BICM at Low SNR

Erik Agrell and Alex Alvarado Research supported by The British Academy and The Royal Society (via the Newton International Fellowship scheme), U.K., and by the European Community’s Seventh’s Framework Programme (FP7/2007-2013) under grant agreement No. 271986. This work was presented in part at the Information Theory and Applications (ITA) Workshop, San Diego, CA, February 2012, and at the IEEE International Symposium on Information Theory, Cambridge, MA, July 2012. E. Agrell is with the Dept. of Signals and Systems, Chalmers Univ. of Technology, SE-41296 Göteborg, Sweden (email: A. Alvarado is with the Dept. of Engineering, University of Cambridge, Cambridge CB2 1PZ, United Kingdom (email:

The generalized mutual information (GMI) of bit-interleaved coded modulation (BICM) systems, sometimes called the BICM capacity, is investigated at low signal-to-noise ratio (SNR). The combinations of input alphabet, input distribution, and binary labeling that achieve the Shannon limit –1.59 dB are completely characterized. The main conclusion is that a BICM system with probabilistic shaping achieves the Shannon limit at low SNR if and only if it can be represented as a zero-mean linear projection of a hypercube. Hence, probabilistic shaping offers no extra degrees of freedom to optimize the low-SNR BICM-GMI, in addition to what is provided by geometrical shaping. The analytical conclusions are confirmed by numerical results, which also show that for a fixed input alphabet, probabilistic shaping can improve the BICM-GMI in the low and medium SNR range.

Binary labeling, bit-interleaved coded modulation, generalized mutual information, Hadamard transform, probabilistic shaping, Shannon limit, wideband regime.

I Introduction

The most important breakthrough for coded modulation (CM) in fading channels came in 1992, when Zehavi introduced the so-called bit-interleaved coded modulation (BICM) [1], usually referred to as a pragmatic approach for CM [2, 3]. Despite not being fully understood theoretically, BICM has been rapidly adopted in commercial systems such as wireless and wired broadband access networks, 3G/4G telephony, and digital video broadcasting, making it the de facto standard for current telecommunications systems [3, Ch. 1].

Signal shaping refers to the use of non-equally spaced and/or non-equally likely symbols, i.e., geometrical shaping and probabilistic shaping, resp. Signal shaping has been studied during many years, cf. [4, 5] and references therein. In the context of BICM, geometrical shaping was studied in [6, 7, 8], and probabilistic shaping, i.e., varying the probabilities of the bit streams, was first proposed in [9, 10] and developed further in [11, 12, 13, 14]. Probabilistic shaping offers another degree of freedom in the BICM design, which can be used to make the discrete input distribution more similar to the optimal distribution (which is in general unknown). This is particularly advantageous at low and medium SNR.

For the additive white Gaussian noise (AWGN) channel, the so-called Shannon Limit (SL) represents the average bit energy-to-noise ratio needed to transmit information reliably when the signal-to-noise ratio (SNR) tends to zero [15, 16], i.e., in the wideband regime. When discrete input alphabets are considered at the transmitter and a BICM decoder is used at the receiver, the SL is not always achieved as first noticed in [17]. This was later shown to be caused by the selection of the binary labeling [18]. The behavior of BICM in the wideband regime was studied in [17, 19, 20, 18, 21] as a function of the alphabet () and the binary labeling (), assuming a uniform input distribution. First-order optimal (FOO) constellations were defined in [21] as the triplet that make a BICM system achieve the SL, where represents the input distribution.

In this paper, the results of [21] are generalized to nonuniform input distributions and give a complete characterization of FOO constellations for BICM in terms of . More particularly, the geometrical and/or probabilistic shaping rules that should be applied to a constellation to make it FOO are found. The main conclusion is that probabilistic shaping offers no extra degrees of freedom in addition to what is provided by geometrical shaping for BICM in the wideband regime.

Fig. 1: A generic BICM system, consisting of a BICM transmitter, the channel, and a BICM receiver.

Ii Preliminaries

Ii-a Notation

Bold italic letters denote row vectors. Block letters denote matrices or sometimes column vectors. The identity matrix is . The inner product between two row vectors and is denoted by and their element-wise product by . The Euclidean norm of the vector is denoted by . Random variables are denoted by capital letters and random vectors by boldface capital vectors . The probability density function (pdf) of the random vector is denoted by and the conditional pdf by . A similar notation applies to probability mass functions of a random variable, which are denoted by and . Expectations are denoted by .

The empty set is denoted by and the binary set by . The negation of a bit is denoted by . Binary addition (exclusive-OR) of two bits and is denoted by . The same notation denotes the integer that results from taking the bitwise exclusive-or of two integers and .

Ii-B System Model

We consider transmissions over a discrete-time memoryless vectorial fast fading channel. The received vector at any discrete time instant is


where is the channel input and is Gaussian noise with zero mean and variance in each dimension [1], [3, App. 2.A]. The channel is represented by the -dimensional vector . It contains the real fading coefficients , which are random, possibly dependent, with the same pdf . We assume that and are perfectly known at the receiver or can be perfectly estimated, and that the technical requirements on and in [21, Sec. I-D] are satisfied.

The conditional transition pdf of the channel in (1) is


The SNR is defined as


where is the average transmitted symbol energy, is the transmission rate in information bits per symbol, and is the average received energy per information bit.

The generic BICM scheme in Fig. 1 is considered. The transmitter is, in the simplest case, a single binary encoder concatenated with an interleaver and a memoryless mapper . Multiple encoders and/or interleavers may be needed to achieve probabilistic shaping [10, 11, 12, 13]. At the receiver, using the channel output , the demapper computes metrics for the individual coded bits with , usually in the form of logarithmic likelihood ratios. These metrics are then passed to the deinterleaver(s) and decoder(s) to obtain an estimate of the information bits.

The mapper is defined via the input alphabet , where bits are used to index the symbols vectors for . We associate with each symbol the codeword (binary labeling) and the probability , where . The binary labeling is denoted by and the input distribution by .

In the following, the labeling used throughout this paper is defined. This can be done without loss of generality, as will be explained in Sec. II-C.

Definition 1 (Natural binary code)

The natural binary code (NBC) is the binary labeling , where denotes the base-2 representation of the integer , with being the most significant bit.

This definition of the NBC is different from the one in [21]. The difference lies only in the bit ordering, i.e., in this paper we consider the last column of to contain the most significant bits of the base-2 representation of the integers . It follows from Definition 1 that


for and , and


for , and .

Ii-C Probabilistic Shaping in BICM

Assuming independent, but possibly nonuniformly distributed, bits at the input of the modulator (cf. Fig. 1), the symbol probabilities are given by [21, eq. (30)] [13, eq. (8)] [22, eq. (9)]

for , where for is the probability of . Since , the distribution is fully specified by the vector of bit probabilities .

Throughout this paper, we assume that for all ; i.e., all constellation points are used with a nonzero probability. This can be done without loss of generality, because if or for some , then half of the constellation points will never be transmitted. If this is the case, the corresponding branches in Fig. 1 are removed, is reduced by one, and the mapper is redefined accordingly.111Constellations with for some can yield counter-intuitive results, such as Gray-labeled constellations being FOO (see [13, 14] and Example 7.) The result is another BICM scheme with identical performance, which satisfies for all .

For any constellation , a set of equivalent constellations can be constructed by permuting the rows of , , and , provided that the same permutation is applied to all three matrices. Specifically, denote the permutation that maps the NBC into the desired labeling by , i.e., . The BICM system defined by the alphabet , the distribution , and the labeling is entirely equivalent to the system with alphabet , distribution , and labeling . Without loss of generality, the analysis in this paper is therefore restricted to the latter case.

Based on the previous discussion, from now on we use the name constellation to denote the pair , where the NBC labeling is implicit. Thus, and for all and , which simplifies the analysis. Note that cannot be chosen arbitrarily in BICM; only distributions that satisfy


for some vector of bit probabilities will be considered in the paper. An important special case is the uniform distribution, for which and .

Ii-D The Hadamard Transform

The Hadamard transform (HT), or Walsh–Hadamard transform, is a discrete, linear, orthogonal transform, whose coefficients take values in . It is popular in image processing [23] and can be used to analyze various aspects of binary labelings in digital communications and source coding [24, 25, 26, 21].

Definition 2

The HT of a matrix (or vector) with rows is


where for all and


Because for , setting in (7)–(8) shows that the first HT vector


can be interpreted as the uniformly weighted mean of the alphabet. This is a property that the HT shares with, e.g., the discrete Fourier transform.

It can be shown from (8) that


for all and . Therefore, the inverse transform is identical to the forward transform, apart from a scale factor:


Ii-E A New Transform

In this section, we define a linear transform between vectors or matrices, which depends on the input distribution via the bit probabilities . Its usage will become clear in Section III-C.

Definition 3

Given the bit probabilities , the transform of a matrix (or vector) is


where is given by (6). The coefficients are defined as


for all and , where the bars represent negation (, see Sec. II-A).

Remark 1

For equally likely symbols, i.e., , the transform becomes the identity operation , because then for and for .

The transform coefficients are nonsymmetric in the sense that in general . They have some appealing properties given by the following lemma, which will be used in the proofs of Theorems 3, 4, and 8.

Lemma 1

For any and ,


where is given by (6) and is defined in (8).


See the Appendix. \qed

We pay particular attention to two important special cases of (15). First, if , then and for . Second, if for any integer , then by (8), for any and by (6)

Using first (5) and then (4), we obtain

Substituting these two cases ( and ) into (15) proves the following corollary.

Corollary 2

For any ,


The fact that the sums in (14) are zero whenever , independently of the input distribution, implies that the coefficients form an orthogonal basis. As a consequence, the transform is invertible, as shown in the next theorem.

Theorem 3

The inverse transform of a matrix (or vector) is, given the bit probabilities ,


For ,

Applying (14) and dividing both sides by , which by Sec. II-C is nonzero, completes the proof. \qed

Example 1

If the bit probabilities are , then the symbol probabilities (6) are . The transform coefficients in (13) are the elements at row , column of


It is readily verified that , which is (14) in matrix notation. The mean values in each column of (19) are , which in agreement with (16) are the square roots of the elements in . Similarly, it can be shown that in (19) satisfies (15) and (17).

If the Gray-labeled -ary pulse amplitude modulation (PAM) constellation is considered, . Rewriting (12) in matrix notation, the transform can be calculated as , where . This nonequally spaced 4-PAM alphabet will be illustrated and analyzed in Example 3. The inverse transform (18) can be written as . For a uniform distribution, , which agrees with Remark 1.

Def. 3 Theorem 3 HT HT Theorem 4

Fig. 2: The relations between the alphabet , its transform , and their respective Hadamard transforms and . The transform matrices , , , and are defined in Examples 1 and 2.

In Sec. IV-B, we will need to apply the HT and the new transform after each other to the same alphabet. However, the two transforms do not commute, and the result will therefore depend on in which order the transforms are applied. Of particular interest for our analysis is the setup in Fig. 2, where and are related via the transform defined above. Their HTs and are however not related via the same transform. Instead, a relation between and can be established via the following theorem.

Theorem 4

If , then their HTs and satisfy




and a product over is defined as 1.


See the Appendix. \qed

Remark 2

The summation in (4) can be confined to , because whenever , there exists at least one bit position for which . Analogously, the summation in (21) can be confined to .

Example 2

Expression (4) can be written as , or . The element at row , column of and are given by (4)–(21) as, resp., and . With , and from Example 1, we obtain and


which, as predicted by Remark 2, are upper triangular.

Another relation between and can be deduced from Fig. 2. Defining the Hadamard matrix as the matrix with elements for , the HT relations (7) and (11) yield and . Since from Example 1 , we conclude that , which implies that . Because (see (10)) and , the inverse relation is . It is straightforward to verify that and calculated in this manner, using the numerical values of and in Example 1, indeed yield (23).

Iii BICM at low SNR

Iii-a Mutual Information

The mutual information (MI) in bits per channel use between the random vectors and for an arbitrary channel parameter perfectly known at the receiver is defined as

where the expectation is taken over the joint pdf , and is given by (2).

The MI between and conditioned on the value of the th bit at the input of the modulator is defined as

where the expectation is taken over the joint pdf .

Definition 4 (BICM Generalized Mutual Information)

The BICM generalized mutual information (BICM-GMI) is defined as [2, 17, 27, 18]


where the second line follows by the chain rule. We will analyze the right-hand side of (4) as a function of , for a given pdf . According to (3), can be varied in two ways, either by varying for a fixed constellation or, equivalently, by rescaling the alphabet linearly for fixed and input distribution .

Martinez et al. [27] recognized the BICM decoder in Fig. 1 as a mismatched decoder and showed that the BICM-GMI in (4) corresponds to an achievable rate of such a decoder. This means that reliable transmission using a BICM system at rate is possible if . Since from (3) , the inequality gives222The definition of the related function in [21, eq. (37)] is erroneous and should read “ is bounded from below by , where .”


for any . Focusing on the wideband regime, i.e., asymptotically low SNR, we make the following definition.

Definition 5 (Low-GMI Parameters)

The low-GMI parameters of a constellation are defined as , where

In the wideband regime, the average bit energy-to-noise ratio needed for reliable transmission is, using (25) and the definition of , lower-bounded by


Furthermore, since in the wideband regime dB [15], dB.

The first-order behavior of the BICM-GMI in (4) is fully determined by , which, as we shall see later (e.g., in (III-C)), in turn depends on and . This is why we designate this triplet as low-GMI parameters. The same definitions can be applied to other MI functions such as the coded modulation MI (CM-MI) [21]. In this paper, however, we are only interested in the BICM-GMI.

The main contributions of this paper are to characterize the low-GMI parameters for arbitrary constellations, including those with nonuniform distributions (Sec. III-C), and to identify the set of constellations for BICM that maximize , i.e., minimize in the wideband regime (Sec. IV-B).

Iii-B Low-GMI Parameters for Uniform Distributions

The low-GMI parameters have been analyzed in detail for arbitrary input alphabets under the assumption of uniform probabilities [21]. Under this assumption, they can be expressed as given by the following theorem.

Theorem 5

For a constellation , the low-GMI parameters are


Expressions (27) and (28) follow directly from Definition 5, while (29) was proved in [21, eq. (50)]. \qed

The low-GMI parameters can be conveniently expressed as functions of the HT of the alphabet , as shown in the following theorem.

Theorem 6

The low-GMI parameters can be expressed as


The expression (30) is obtained from (9), (31) from [21, eq. (16)], and (32) from [21, Th. 11]. \qed

Iii-C Low-GMI Parameters for Nonuniform Distributions


FOO Condition , and ,
TABLE I: Low-GMI parameters and FOO conditions for BICM using uniform and nonuniform input distributions. The results for are from [21] (cf. Theorems 6 and 9) and the ones for or are from Theorems 7, 8, and 11.

The next theorem is analogous to Theorem 5 but applies to an arbitrary input distribution.

Theorem 7

For a constellation , the low-GMI parameters are


Again, (33) and (34) follow from Definition 5, while (35) requires some analysis. It was shown in [21, Th. 10] that


Substituting (33) and writing the squared norms as the inner products of two identical vectors yields

The expression in brackets can be simplified as

which completes the proof of (35). \qed

Theorem 7 shows that the low-GMI parameters depend on the input alphabet , the binary labeling (via in the expression for ), and the input distribution (via and ). While the low-GMI parameters of an alphabet with uniform probabilities are conveniently expressed in terms of its HT (cf. Theorem 6), no similar expressions are known for the low-GMI parameters of a general constellation in (33)–(35). This has so far prevented the analytic optimization of such constellations. The new transform introduced in Section II-E, however, solves this problem by establishing an equivalence between an arbitrary constellation, possibly with nonuniform probabilities, and another constellation with uniform probabilities.

Theorem 8

The low-GMI parameters of any constellation are equal to the low-GMI parameters of .


Let the low-GMI parameters of be denoted by . First, (27) and (12) yield


Applying (16) to the inner sum in (III-C) reveals that

Second, (28) and (12) yield

Evaluating the inner sum using (14) gives

For the third and last part of the theorem, (29) yields