Polar Codes for Broadcast Channels

Polar Codes for Broadcast Channels

Naveen Goela, Emmanuel Abbe, and Michael Gastpar This work was presented in part at the International Zurich Seminar on Communications, Zurich, Switzerland on March 1, 2012, and submitted in part to the IEEE International Symposium on Information Theory on January, 2013.N. Goela and M. C. Gastpar are with the Department of Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA 94720-1770 USA (e-mail: {ngoela, gastpar}@eecs.berkeley.edu) and also with the School of Computer and Communication Sciences, Ecole Polytechnique Fédérale (EPFL), Lausanne, Switzerland (e-mail: {naveen.goela, michael.gastpar}@epfl.ch).E. Abbe was with the School of Computer and Communication Sciences, Ecole Polytechnique Fédérale (EPFL), Lausanne, Switzerland, and is currently with the School of Engineering and Applied Sciences, Princeton University, Princeton, NJ, 08544 USA (e-mail: {eabbe@princeton.edu}).

Polar codes are introduced for discrete memoryless broadcast channels. For -user deterministic broadcast channels, polarization is applied to map uniformly random message bits from independent messages to one codeword while satisfying broadcast constraints. The polarization-based codes achieve rates on the boundary of the private-message capacity region. For two-user noisy broadcast channels, polar implementations are presented for two information-theoretic schemes: i) Cover’s superposition codes; ii) Marton’s codes. Due to the structure of polarization, constraints on the auxiliary and channel-input distributions are identified to ensure proper alignment of polarization indices in the multi-user setting. The codes achieve rates on the capacity boundary of a few classes of broadcast channels (e.g., binary-input stochastically degraded). The complexity of encoding and decoding is where is the block length. In addition, polar code sequences obtain a stretched-exponential decay of of the average block error probability where .

Polar Codes, Deterministic Broadcast Channel, Cover’s Superposition Codes, Marton’s Codes.

I Introduction

Itroduced by T. M. Cover in 1972, the broadcast problem consists of a single source transmitting independent private messages to receivers through a single discrete, memoryless, broadcast channel (DM-BC) [1]. The private-message capacity region is known if the channel structure is deterministic, degraded, less-noisy, or more-capable [2]. For general classes of DM-BCs, there exist inner bounds such as Marton’s inner bound [3] and outer bounds such as the Nair-El-Gamal outer bound [4]. One difficult aspect of the broadcast problem is to design an encoder which maps independent messages to a single codeword of symbols which are transmitted simultaneously to all receivers. Several codes relying on random binning, superposition, and Marton’s strategy have been analyzed in the literature (see e.g., the overview in [5]).

I-a Overview of Contributions

The present paper focuses on low-complexity codes for broadcast channels based on polarization methods. Polar codes were invented originally by Arıkan and were shown to achieve the capacity of binary-input, symmetric, point-to-point channels with encoding and decoding complexity where is the code length [6]. In this paper, we obtain the following results.

  • Polar codes for deterministic, linear and non-linear, binary-output, -user DM-BCs (cf. [7]). The capacity-achieving broadcast codes implement low-complexity random binning, and are related to polar codes for other multi-user scenarios such as Slepian-Wolf distributed source coding [8, 9], and multiple-access channel (MAC) coding [10]. For deterministic DM-BCs, the polar transform is applied to channel output variables. Polarization is useful for shaping uniformly random message bits from independent messages into non-equiprobable codeword symbols in the presence of hard broadcast constraints. As discussed in Section I-B1 and referenced in [11, 12, 13], it is difficult to design low-complexity parity-check (LDPC) codes or belief propagation algorithms for the deterministic DM-BC due to multi-user broadcast constraints.

  • Polar codes for general two-user DM-BCs based on Cover’s superposition coding strategy. In the multi-user setting, constraints on the auxiliary and channel-input distributions are placed to ensure alignment of polarization indices. The achievable rates lie on the boundary of the capacity region for certain classes of DM-BCs such as binary-input stochastically degraded channels.

  • Polar codes for general two-user DM-BCs based on Marton’s coding strategy. In the multi-user setting, due to the structure of polarization, constraints on the auxiliary and channel-input distributions are identified to ensure alignment of polarization indices. The achievable rates lie on the boundary of the capacity region for certain classes of DM-BCs such as binary-input semi-deterministic channels.

  • For the above broadcast polar codes, the asymptotic decay of the average error probability under successive cancelation decoding at the broadcast receivers is established to be where . The error probability is analyzed by averaging over polar code ensembles. In addition, properties such as the chain rule of the Kullback-Leibler divergence between discrete probability measures are exploited.

Throughout the paper, for different broadcast coding strategies, a systems-level block diagram of the communication channel and polar transforms is provided.

I-B Relation to Prior Work

I-B1 Deterministic Broadcast Channels

The deterministic broadcast channel has received considerable attention in the literature (e.g. due to related extensions such as secure broadcast, broadcasting with side information, and index coding [14, 15]). Several practical codes have been designed. For example, the authors of [11] propose sparse linear coset codes to emulate random binning and survey propagation to enforce broadcast channel constraints. In [12], the authors propose enumerative source coding and Luby-Transform codes for deterministic DM-BCs specialized to interference-management scenarios. Additional research includes reinforced belief propagation with non-linear coding [13]. To our knowledge, polarization-based codes provide provable guarantees for achieving rates on the capacity-boundary in the general case.

I-B2 Polar Codes for Multi-User Settings

Subsequent to the derivation of channel polarization in [6] and the refined rate of polarization in [16], polarization methods have been extended to analyze multi-user information theory problems. In [10], a joint polarization method is proposed for -user MACs with connections to matroid theory. Polar codes were extended for several other multi-user settings: arbitrarily-permuted parallel channels [17], degraded relay channels [18], cooperative relaying [19], and wiretap channels [20, 21, 22]. In addition, several binary multi-user communication scenarios including the Gelfand-Pinsker problem, and Wyner-Ziv problem were analyzed in [23, Chapter 4]. Polar codes for lossless and lossy source compression were investigated respectively in [8] and [24]. In [8], source polarization was extended to the Slepian-Wolf problem involving distributed sources. The approach is based on an “onion-peeling” encoding of sources, whereas a joint encoding is proposed in [25]. In [9], a unified approach is provided for the Slepian-Wolf problem based on generalized monotone chain rules of entropy. To our knowledge, the design of polarization-based broadcast codes is relatively new.

I-B3 Binary vs. -ary Polarization

The broadcast codes constructed in the present paper for DM-BCs are based on polarization for binary random variables. However, in extending to arbitrary alphabet sizes, a large body of prior work exists and has focused on generalized constructions and kernels [26], and generalized polarization for -ary random variables and -ary channels [27, 28, 29, 30]. The reader is also referred to the monograph in [31] containing a clear overview of polarization methods.

I-C Notation

An index set is abbreviated as . An matrix array of random variables is comprised of variables where represents the row and the column. The notation for . When clear by context, the term represents . In addition, the notation for the random variable is used interchangeably with . The notation means that there exists a constant such that for sufficiently large . For a set , represents set closure, and the convex hull operation over set . Let denote the binary entropy function. Let .

Fig. 1: Blackwell Channel: An example of a deterministic broadcast channel with broadcast users. The channel is defined as and where the non-linear functions and . The private-message capacity region of the Blackwell channel is drawn. For different input distributions , the achievable rate points are contained within corresponding polyhedrons in .

Ii Model

Definition 1 (Discrete, Memoryless Broadcast Channel)

The discrete memoryless broadcast channel (DM-BC) with broadcast receivers consists of a discrete input alphabet , discrete output alphabets for , and a conditional distribution where and .

Definition 2 (Private Messages)

For a DM-BC with broadcast receivers, there exist private messages such that each message is composed of bits and is uniformly distributed over .

Definition 3 (Channel Encoding and Decoding)

For the DM-BC with independent messages, let the vector of rates . An code for the DM-BC consists of one encoder

and decoders specified by for . Based on received observations , each decoder outputs a decoded message .

Definition 4 (Average Probability of Error)

The average probability of error for a DM-BC code is defined to be the probability that the decoded message at all receivers is not equal to the transmitted message,

Definition 5 (Private-Message Capacity Region)

If there exists a sequence of codes with , then the rates are achievable. The private-message capacity region is the closure of the set of achievable rates.

Iii Deterministic Broadcast Channels

Definition 6 (Deterministic DM-BC)

Define deterministic functions for . The deterministic DM-BC with receivers is defined by the following conditional distribution


Iii-a Capacity Region

Proposition 1 (Marton [32], Pinsker [33])

The capacity region of the deterministic DM-BC includes those rate-tuples in the region


where the polyhedral region is given by


The union in Eqn. (2) is over all random variables with joint distribution induced by and .

Example 1 (Blackwell Channel)

In Figure 1, the Blackwell channel is depicted with and . For any fixed distribution , it is seen that has zero mass for the pair . Let . Due to the symmetry of this channel, the capacity region is the union of two regions,

where the first region is achieved with input distribution , and the second region is achieved with  [2, Lec. 9]. The sum rate is maximized for a uniform input distribution which yields a pentagonal achievable rate region: , , . Figure 1 illustrates the capacity region.

Iii-B Main Result

Theorem 1 (Polar Code for Deterministic DM-BC)

Consider an -user deterministic DM-BC with arbitrary discrete input alphabet , and binary output alphabets . Fix input distribution where and constant . Let be a permutation on the index set of receivers. Let the vector

There exists a sequence of polar broadcast codes over channel uses which achieves rates where the rate for receiver is bounded as

The average error probability of this code sequence decays as . The complexity of encoding and decoding is .

Remark 1

To prove the existence of low-complexity broadcast codes, a successive randomized protocol is introduced in Section V-A which utilizes bits of randomness at the encoder. A deterministic encoding protocol is also presented.

Remark 2

The achievable rates for a fixed input distribution are the vertex points of the polyhedral rate region defined in (3). To achieve non-vertex points, the following coding strategies could be applied: time-sharing; rate-splitting for the deterministic DM-BC [34]; polarization by Arıkan utilizing generalized chain rules of entropy [9]. For certain input distributions , as illustrated in Figure 1 for the Blackwell channel, a subset of the achievable vertex points lie on the capacity boundary.

Remark 3

Polarization of channels and sources extends to -ary alphabets (see e.g. [27]). Similarly, it is entirely possible to extend Theorem 1 to include DM-BCs with -ary output alphabets.

Iv Overview of Polarization Method
For Deterministic DM-BCs

For the proof of Theorem 1, we utilize binary polarization theorems. By contrast to polarization for point-to-point channels, in the case of deterministic DM-BCs, the polar transform is applied to the output random variables of the channel.

Iv-a Polar Transform

Consider an input distribution to the deterministic DM-BC. Over channel uses, the input random variables to the channel are given by

where are independent and identically distributed () random variables. The channel output variables are given by where are the deterministic functions to each broadcast receiver. Denote the random matrix of channel output variables by


where . For and , the polar transform is defined as the following invertible linear transformation,


The matrix is formed by multiplying a matrix of successive Kronecker matrix-products (denoted by ) with a bit-reversal matrix introduced by Arıkan [8]. The polarized random matrix is indexed as


Iv-B Joint Distribution of Polarized Variables

Consider the channel output distribution of the deterministic DM-BC induced by input distribution . The -th column of the random matrix is distributed as . Due to the memoryless property of the channel, the joint distribution of all output variables is


The joint distribution of the matrix variables in is characterized easily due to the structure. The polarized random matrix does not have an structure. However, one way to define the joint distribution of the variables in is via the polar transform equation (5). An alternate representation is via a decomposition into conditional distributions as follows111The abbreviated notation of the form which appears in (8) indicates , i.e. the conditional probability where and are random variables..


As derived by Arıkan in [8] and summarized in Section IV-E, the conditional probabilities in (8) and associated likelihoods may be computed using a dynamic programming method which “divides-and-conquers” the computations efficiently.

Iv-C Polarization of Conditional Entropies

Fig. 2: The polar transform applied to random matrix with structure results in a polarized random matrix .
Proposition 2 (Polarization [8])

Consider the pair of random matrices related through the polar transformation in (5). For and any , define the set of indices


Then in the limit as ,


For sufficiently large , Theorem 2 establishes that there exist approximately indices per row of random matrix for which the conditional entropy is close to . The total number of indices in for which the conditional entropy terms polarize to is approximately . The polarization phenomenon is illustrated in Figure 2.

Remark 4

Since the polar transform is invertible, are in one-to-one correspondence with . Therefore the conditional entropies also polarize to or .

Iv-D Rate of Polarization

The Bhattacharyya parameter of random variables is closely related to the conditional entropy. The parameter is useful for characterizing the rate of polarization.

Definition 7 (Bhattacharyya Parameter)

Let where and where is an arbitrary discrete alphabet. The Bhattacharyya parameter is defined


As shown in Lemma 16 of Appendix A, implies , and similarly implies for a binary random variable. Based on the Bhattacharyya parameter, the following theorem specifies sets that will be called message sets.

Proposition 3 (Rate of Polarization)

Consider the pair of random matrices related through the polar transformation in (5). Fix constants , , . Let be the rate of polarization. Define the set


Then there exists an such that


for all .

The proposition is established via the Martingale Convergence Theorem by defining a super-martingale with respect to the Bhattacharyya parameters [6] [8]. The rate of polarization is characterized by Arıkan and Telatar in [16].

Remark 5

The message sets are computed “offline” only once during a code construction phase. The sets do not depend on the realization of random variables. In the following Section IV-E, a Monte Carlo sampling approach for estimating Bhattacharyya parameters is reviewed. Other highly efficient algorithms are known in the literature for finding the message indices (see e.g. Tal and Vardy [35]).

Iv-E Estimating Bhattacharyya Parameters

As shown in Lemma 11 in Appendix A, one way to estimate the Bhattacharyya parameter is to sample from the distribution and evaluate . The function is defined based on likelihood ratios

Similarly, to determine the indices in the message sets defined in Proposition 3, the Bhattacharyya parameters must be estimated efficiently. For , define the likelihood ratio


The dynamic programming method given in [8] allows for a recursive computation of the likelihood ratio. Define the following sub-problems

where the notation and represents the odd and even indices respectively of the sequence . The recursive computation of the likelihoods is characterized by

where if and if . In the above recursive computations, the base case is for sequences of length .

V Proof Of Theorem 1

The proof of Theorem 1 is based on binary polarization theorems as discussed in Section IV. The random coding arguments of C. E. Shannon prove the existence of capacity-achieving codes for point-to-point channels. Furthermore, random binning and joint-typicality arguments suffice to prove the existence of capacity-achieving codes for the deterministic DM-BC. However, it is shown in this section that there exist capacity-achieving polar codes for the binary-output deterministic DM-BC.

V-a Broadcast Code Based on Polarization

The ordering of the receivers’ rates in is arbitrary due to symmetry. Therefore, let be the identity permutation which denotes the successive order in which the message bits are allocated for each receiver. The encoder must map independent messages uniformly distributed over to a codeword . To construct a codeword for broadcasting independent messages, the following binary sequences are formed at the encoder: . To determine a particular bit in the binary sequence , if , the bit is selected as a uniformly distributed message bit intended for receiver . As defined in (12) of Proposition 3, the message set represents those indices for bits transmitted to receiver . The remaining non-message indices in the binary sequence for each user are computed either according to a deterministic or random mapping.

V-A1 Deterministic Mapping

Consider a class of deterministic boolean functions indexed by and :


As an example, consider the deterministic boolean function based on the maximum a posteriori polar coding rule.


V-A2 Random Mapping

Consider a class of random boolean functions indexed by and :


As an example, consider the random boolean function



The random boolean function may be thought of as a vector of Bernoulli random variables indexed by the input to the function. Each Bernoulli random variable of the vector has a fixed probability of being one or zero that is well-defined.

V-A3 Mapping From Messages To Codeword

The binary sequences for are formed successively bit by bit. If , then the bit is one message bit from the uniformly distributed message intended for user . If , in the case of a deterministic mapping, or in the case of a random mapping. The encoder then applies the inverse polar transform for each sequence: . The codeword is formed symbol-by-symbol as follows:

If the intersection set is empty, the encoder declares a block error. A block error only occurs at the encoder.

V-A4 Decoding at Receivers

If the encoder succeeds in transmitting a codeword , each receiver obtains the sequence noiselessly and applies the polar transform to recover exactly. Since the message indices are known to each receiver, the message bits in are decoded correctly by receiver .

V-B Total Variation Bound

While the deterministic mapping performs well in practice, the average probability of error of the coding scheme is more difficult to analyze in theory. The random mapping at the encoder is more amenable to analysis via the probabilistic method. Towards that goal, consider the following probability measure defined on the space of tuples of binary sequences222A related proof technique was provided for lossy source coding based on polarization in a different context [24]. In the present paper, a different proof is supplied that utilizes the chain rule for KL-divergence..


where the conditional probability measure

The probability measure defined in (19) is a perturbation of the joint probability measure defined in (8) for the random variables . The only difference in definition between and is due to those indices in message set . The following lemma provides a bound on the total variation distance between and .

Lemma 1

(Total Variation Bound) Let probability measures and be defined as in (8) and (19) respectively. Let . For sufficiently large , the total variation distance between and is bounded as


See Section B of the Appendices.

V-C Analysis of the Average Probability of Error

For the -user deterministic DM-BC, an error event occurs at the encoder if a codeword is unable to be constructed symbol by symbol according to the broadcast protocol described in Section V-A. Define the following set consisting of -tuples of binary sequences,


The set consists of those -tuples of binary output sequences which are inconsistent due to the properties of the deterministic channel. In addition, due to the one-to-one correspondence between sequences and , denote by the set of -tuples that are inconsistent.

For the broadcast protocol, the rate for each receiver. Let the total sum rate for all broadcast receivers be . If the encoder uses a fixed deterministic map in the broadcast protocol, the average probability of error is


In addition, if the random maps are used at the encoder, the average probability of error is a random quantity given by


Instead of characterizing directly for deterministic maps, the analysis of leads to the following lemma.

Lemma 2

Consider the broadcast protocol of Section V-A. Let for be the broadcast rates selected according to the criterion given in (12) in Proposition 3. Then for and sufficiently large ,


Step (23) follows since the probability measure matches the desired calculation exactly. Step (24) is due to the fact that the probability measure has zero mass over -tuples of binary sequences that are inconsistent. Step (25) follows directly from Lemma 1. Lastly, since the expectation over random maps of the average probability of error decays stretched-exponentially, there must exist a set of deterministic maps which exhibit the same behavior.

Vi Noisy Broadcast Channels
Superposition Coding

Coding for noisy broadcast channels is now considered using polarization methods. By contrast to the deterministic case, a decoding error event occurs at the receivers on account of the randomness due to noise. For the remaining sections, it is assumed that there exist users in the DM-BC. The private-message capacity region for the DM-BC is unknown even for binary input, binary output two-user channels such as the skew-symmetric DM-BC. However, the private-message capacity region is known for specific classes.

Vi-a Special Classes of Noisy DM-BCs

Definition 8

The two-user physically degraded DM-BC is a channel for which form a Markov chain, i.e. one of the receivers is statistically stronger than the other: