Polar Codes for Broadcast Channels
Abstract
Polar codes are introduced for discrete memoryless broadcast channels. For user deterministic broadcast channels, polarization is applied to map uniformly random message bits from independent messages to one codeword while satisfying broadcast constraints. The polarizationbased codes achieve rates on the boundary of the privatemessage capacity region. For twouser noisy broadcast channels, polar implementations are presented for two informationtheoretic schemes: i) Cover’s superposition codes; ii) Marton’s codes. Due to the structure of polarization, constraints on the auxiliary and channelinput distributions are identified to ensure proper alignment of polarization indices in the multiuser setting. The codes achieve rates on the capacity boundary of a few classes of broadcast channels (e.g., binaryinput stochastically degraded). The complexity of encoding and decoding is where is the block length. In addition, polar code sequences obtain a stretchedexponential decay of of the average block error probability where .
I Introduction
Itroduced by T. M. Cover in 1972, the broadcast problem consists of a single source transmitting independent private messages to receivers through a single discrete, memoryless, broadcast channel (DMBC) [1]. The privatemessage capacity region is known if the channel structure is deterministic, degraded, lessnoisy, or morecapable [2]. For general classes of DMBCs, there exist inner bounds such as Marton’s inner bound [3] and outer bounds such as the NairElGamal outer bound [4]. One difficult aspect of the broadcast problem is to design an encoder which maps independent messages to a single codeword of symbols which are transmitted simultaneously to all receivers. Several codes relying on random binning, superposition, and Marton’s strategy have been analyzed in the literature (see e.g., the overview in [5]).
Ia Overview of Contributions
The present paper focuses on lowcomplexity codes for broadcast channels based on polarization methods. Polar codes were invented originally by Arıkan and were shown to achieve the capacity of binaryinput, symmetric, pointtopoint channels with encoding and decoding complexity where is the code length [6]. In this paper, we obtain the following results.

Polar codes for deterministic, linear and nonlinear, binaryoutput, user DMBCs (cf. [7]). The capacityachieving broadcast codes implement lowcomplexity random binning, and are related to polar codes for other multiuser scenarios such as SlepianWolf distributed source coding [8, 9], and multipleaccess channel (MAC) coding [10]. For deterministic DMBCs, the polar transform is applied to channel output variables. Polarization is useful for shaping uniformly random message bits from independent messages into nonequiprobable codeword symbols in the presence of hard broadcast constraints. As discussed in Section IB1 and referenced in [11, 12, 13], it is difficult to design lowcomplexity paritycheck (LDPC) codes or belief propagation algorithms for the deterministic DMBC due to multiuser broadcast constraints.

Polar codes for general twouser DMBCs based on Cover’s superposition coding strategy. In the multiuser setting, constraints on the auxiliary and channelinput distributions are placed to ensure alignment of polarization indices. The achievable rates lie on the boundary of the capacity region for certain classes of DMBCs such as binaryinput stochastically degraded channels.

Polar codes for general twouser DMBCs based on Marton’s coding strategy. In the multiuser setting, due to the structure of polarization, constraints on the auxiliary and channelinput distributions are identified to ensure alignment of polarization indices. The achievable rates lie on the boundary of the capacity region for certain classes of DMBCs such as binaryinput semideterministic channels.

For the above broadcast polar codes, the asymptotic decay of the average error probability under successive cancelation decoding at the broadcast receivers is established to be where . The error probability is analyzed by averaging over polar code ensembles. In addition, properties such as the chain rule of the KullbackLeibler divergence between discrete probability measures are exploited.
Throughout the paper, for different broadcast coding strategies, a systemslevel block diagram of the communication channel and polar transforms is provided.
IB Relation to Prior Work
IB1 Deterministic Broadcast Channels
The deterministic broadcast channel has received considerable attention in the literature (e.g. due to related extensions such as secure broadcast, broadcasting with side information, and index coding [14, 15]). Several practical codes have been designed. For example, the authors of [11] propose sparse linear coset codes to emulate random binning and survey propagation to enforce broadcast channel constraints. In [12], the authors propose enumerative source coding and LubyTransform codes for deterministic DMBCs specialized to interferencemanagement scenarios. Additional research includes reinforced belief propagation with nonlinear coding [13]. To our knowledge, polarizationbased codes provide provable guarantees for achieving rates on the capacityboundary in the general case.
IB2 Polar Codes for MultiUser Settings
Subsequent to the derivation of channel polarization in [6] and the refined rate of polarization in [16], polarization methods have been extended to analyze multiuser information theory problems. In [10], a joint polarization method is proposed for user MACs with connections to matroid theory. Polar codes were extended for several other multiuser settings: arbitrarilypermuted parallel channels [17], degraded relay channels [18], cooperative relaying [19], and wiretap channels [20, 21, 22]. In addition, several binary multiuser communication scenarios including the GelfandPinsker problem, and WynerZiv problem were analyzed in [23, Chapter 4]. Polar codes for lossless and lossy source compression were investigated respectively in [8] and [24]. In [8], source polarization was extended to the SlepianWolf problem involving distributed sources. The approach is based on an “onionpeeling” encoding of sources, whereas a joint encoding is proposed in [25]. In [9], a unified approach is provided for the SlepianWolf problem based on generalized monotone chain rules of entropy. To our knowledge, the design of polarizationbased broadcast codes is relatively new.
IB3 Binary vs. ary Polarization
The broadcast codes constructed in the present paper for DMBCs are based on polarization for binary random variables. However, in extending to arbitrary alphabet sizes, a large body of prior work exists and has focused on generalized constructions and kernels [26], and generalized polarization for ary random variables and ary channels [27, 28, 29, 30]. The reader is also referred to the monograph in [31] containing a clear overview of polarization methods.
IC Notation
An index set is abbreviated as . An matrix array of random variables is comprised of variables where represents the row and the column. The notation for . When clear by context, the term represents . In addition, the notation for the random variable is used interchangeably with . The notation means that there exists a constant such that for sufficiently large . For a set , represents set closure, and the convex hull operation over set . Let denote the binary entropy function. Let .
Ii Model
Definition 1 (Discrete, Memoryless Broadcast Channel)
The discrete memoryless broadcast channel (DMBC) with broadcast receivers consists of a discrete input alphabet , discrete output alphabets for , and a conditional distribution where and .
Definition 2 (Private Messages)
For a DMBC with broadcast receivers, there exist private messages such that each message is composed of bits and is uniformly distributed over .
Definition 3 (Channel Encoding and Decoding)
For the DMBC with independent messages, let the vector of rates . An code for the DMBC consists of one encoder
and decoders specified by for . Based on received observations , each decoder outputs a decoded message .
Definition 4 (Average Probability of Error)
The average probability of error for a DMBC code is defined to be the probability that the decoded message at all receivers is not equal to the transmitted message,
Definition 5 (PrivateMessage Capacity Region)
If there exists a sequence of codes with , then the rates are achievable. The privatemessage capacity region is the closure of the set of achievable rates.
Iii Deterministic Broadcast Channels
Definition 6 (Deterministic DMBC)
Define deterministic functions for . The deterministic DMBC with receivers is defined by the following conditional distribution
(1) 
Iiia Capacity Region
Proposition 1 (Marton [32], Pinsker [33])
The capacity region of the deterministic DMBC includes those ratetuples in the region
(2) 
where the polyhedral region is given by
(3) 
The union in Eqn. (2) is over all random variables with joint distribution induced by and .
Example 1 (Blackwell Channel)
In Figure 1, the Blackwell channel is depicted with and . For any fixed distribution , it is seen that has zero mass for the pair . Let . Due to the symmetry of this channel, the capacity region is the union of two regions,
where the first region is achieved with input distribution , and the second region is achieved with [2, Lec. 9]. The sum rate is maximized for a uniform input distribution which yields a pentagonal achievable rate region: , , . Figure 1 illustrates the capacity region.
IiiB Main Result
Theorem 1 (Polar Code for Deterministic DMBC)
Consider an user deterministic DMBC with arbitrary discrete input alphabet , and binary output alphabets . Fix input distribution where and constant . Let be a permutation on the index set of receivers. Let the vector
There exists a sequence of polar broadcast codes over channel uses which achieves rates where the rate for receiver is bounded as
The average error probability of this code sequence decays as . The complexity of encoding and decoding is .
Remark 1
To prove the existence of lowcomplexity broadcast codes, a successive randomized protocol is introduced in Section VA which utilizes bits of randomness at the encoder. A deterministic encoding protocol is also presented.
Remark 2
The achievable rates for a fixed input distribution are the vertex points of the polyhedral rate region defined in (3). To achieve nonvertex points, the following coding strategies could be applied: timesharing; ratesplitting for the deterministic DMBC [34]; polarization by Arıkan utilizing generalized chain rules of entropy [9]. For certain input distributions , as illustrated in Figure 1 for the Blackwell channel, a subset of the achievable vertex points lie on the capacity boundary.
Iv Overview of Polarization Method
For Deterministic DMBCs
For the proof of Theorem 1, we utilize binary polarization theorems. By contrast to polarization for pointtopoint channels, in the case of deterministic DMBCs, the polar transform is applied to the output random variables of the channel.
Iva Polar Transform
Consider an input distribution to the deterministic DMBC. Over channel uses, the input random variables to the channel are given by
where are independent and identically distributed () random variables. The channel output variables are given by where are the deterministic functions to each broadcast receiver. Denote the random matrix of channel output variables by
(4) 
where . For and , the polar transform is defined as the following invertible linear transformation,
(5)  
The matrix is formed by multiplying a matrix of successive Kronecker matrixproducts (denoted by ) with a bitreversal matrix introduced by Arıkan [8]. The polarized random matrix is indexed as
(6) 
IvB Joint Distribution of Polarized Variables
Consider the channel output distribution of the deterministic DMBC induced by input distribution . The th column of the random matrix is distributed as . Due to the memoryless property of the channel, the joint distribution of all output variables is
(7) 
The joint distribution of the matrix variables in is characterized easily due to the structure. The polarized random matrix does not have an structure. However, one way to define the joint distribution of the variables in is via the polar transform equation (5). An alternate representation is via a decomposition into conditional distributions as follows^{1}^{1}1The abbreviated notation of the form which appears in (8) indicates , i.e. the conditional probability where and are random variables..
(8) 
As derived by Arıkan in [8] and summarized in Section IVE, the conditional probabilities in (8) and associated likelihoods may be computed using a dynamic programming method which “dividesandconquers” the computations efficiently.
IvC Polarization of Conditional Entropies
Proposition 2 (Polarization [8])
Consider the pair of random matrices related through the polar transformation in (5). For and any , define the set of indices
(9) 
Then in the limit as ,
(10) 
For sufficiently large , Theorem 2 establishes that there exist approximately indices per row of random matrix for which the conditional entropy is close to . The total number of indices in for which the conditional entropy terms polarize to is approximately . The polarization phenomenon is illustrated in Figure 2.
Remark 4
Since the polar transform is invertible, are in onetoone correspondence with . Therefore the conditional entropies also polarize to or .
IvD Rate of Polarization
The Bhattacharyya parameter of random variables is closely related to the conditional entropy. The parameter is useful for characterizing the rate of polarization.
Definition 7 (Bhattacharyya Parameter)
Let where and where is an arbitrary discrete alphabet. The Bhattacharyya parameter is defined
(11) 
As shown in Lemma 16 of Appendix A, implies , and similarly implies for a binary random variable. Based on the Bhattacharyya parameter, the following theorem specifies sets that will be called message sets.
Proposition 3 (Rate of Polarization)
Consider the pair of random matrices related through the polar transformation in (5). Fix constants , , . Let be the rate of polarization. Define the set
(12) 
Then there exists an such that
(13) 
for all .
The proposition is established via the Martingale Convergence Theorem by defining a supermartingale with respect to the Bhattacharyya parameters [6] [8]. The rate of polarization is characterized by Arıkan and Telatar in [16].
Remark 5
The message sets are computed “offline” only once during a code construction phase. The sets do not depend on the realization of random variables. In the following Section IVE, a Monte Carlo sampling approach for estimating Bhattacharyya parameters is reviewed. Other highly efficient algorithms are known in the literature for finding the message indices (see e.g. Tal and Vardy [35]).
IvE Estimating Bhattacharyya Parameters
As shown in Lemma 11 in Appendix A, one way to estimate the Bhattacharyya parameter is to sample from the distribution and evaluate . The function is defined based on likelihood ratios
Similarly, to determine the indices in the message sets defined in Proposition 3, the Bhattacharyya parameters must be estimated efficiently. For , define the likelihood ratio
(14) 
The dynamic programming method given in [8] allows for a recursive computation of the likelihood ratio. Define the following subproblems
where the notation and represents the odd and even indices respectively of the sequence . The recursive computation of the likelihoods is characterized by
where if and if . In the above recursive computations, the base case is for sequences of length .
V Proof Of Theorem 1
The proof of Theorem 1 is based on binary polarization theorems as discussed in Section IV. The random coding arguments of C. E. Shannon prove the existence of capacityachieving codes for pointtopoint channels. Furthermore, random binning and jointtypicality arguments suffice to prove the existence of capacityachieving codes for the deterministic DMBC. However, it is shown in this section that there exist capacityachieving polar codes for the binaryoutput deterministic DMBC.
Va Broadcast Code Based on Polarization
The ordering of the receivers’ rates in is arbitrary due to symmetry. Therefore, let be the identity permutation which denotes the successive order in which the message bits are allocated for each receiver. The encoder must map independent messages uniformly distributed over to a codeword . To construct a codeword for broadcasting independent messages, the following binary sequences are formed at the encoder: . To determine a particular bit in the binary sequence , if , the bit is selected as a uniformly distributed message bit intended for receiver . As defined in (12) of Proposition 3, the message set represents those indices for bits transmitted to receiver . The remaining nonmessage indices in the binary sequence for each user are computed either according to a deterministic or random mapping.
VA1 Deterministic Mapping
Consider a class of deterministic boolean functions indexed by and :
(15) 
As an example, consider the deterministic boolean function based on the maximum a posteriori polar coding rule.
(16) 
VA2 Random Mapping
Consider a class of random boolean functions indexed by and :
(17) 
As an example, consider the random boolean function
(18) 
where
The random boolean function may be thought of as a vector of Bernoulli random variables indexed by the input to the function. Each Bernoulli random variable of the vector has a fixed probability of being one or zero that is welldefined.
VA3 Mapping From Messages To Codeword
The binary sequences for are formed successively bit by bit. If , then the bit is one message bit from the uniformly distributed message intended for user . If , in the case of a deterministic mapping, or in the case of a random mapping. The encoder then applies the inverse polar transform for each sequence: . The codeword is formed symbolbysymbol as follows:
If the intersection set is empty, the encoder declares a block error. A block error only occurs at the encoder.
VA4 Decoding at Receivers
If the encoder succeeds in transmitting a codeword , each receiver obtains the sequence noiselessly and applies the polar transform to recover exactly. Since the message indices are known to each receiver, the message bits in are decoded correctly by receiver .
VB Total Variation Bound
While the deterministic mapping performs well in practice, the average probability of error of the coding scheme is more difficult to analyze in theory. The random mapping at the encoder is more amenable to analysis via the probabilistic method. Towards that goal, consider the following probability measure defined on the space of tuples of binary sequences^{2}^{2}2A related proof technique was provided for lossy source coding based on polarization in a different context [24]. In the present paper, a different proof is supplied that utilizes the chain rule for KLdivergence..
(19) 
where the conditional probability measure
The probability measure defined in (19) is a perturbation of the joint probability measure defined in (8) for the random variables . The only difference in definition between and is due to those indices in message set . The following lemma provides a bound on the total variation distance between and .
Lemma 1
See Section B of the Appendices.
VC Analysis of the Average Probability of Error
For the user deterministic DMBC, an error event occurs at the encoder if a codeword is unable to be constructed symbol by symbol according to the broadcast protocol described in Section VA. Define the following set consisting of tuples of binary sequences,
(20) 
The set consists of those tuples of binary output sequences which are inconsistent due to the properties of the deterministic channel. In addition, due to the onetoone correspondence between sequences and , denote by the set of tuples that are inconsistent.
For the broadcast protocol, the rate for each receiver. Let the total sum rate for all broadcast receivers be . If the encoder uses a fixed deterministic map in the broadcast protocol, the average probability of error is
(21) 
In addition, if the random maps are used at the encoder, the average probability of error is a random quantity given by
(22) 
Instead of characterizing directly for deterministic maps, the analysis of leads to the following lemma.
Lemma 2
(23)  
(24)  
(25) 
Step (23) follows since the probability measure matches the desired calculation exactly. Step (24) is due to the fact that the probability measure has zero mass over tuples of binary sequences that are inconsistent. Step (25) follows directly from Lemma 1. Lastly, since the expectation over random maps of the average probability of error decays stretchedexponentially, there must exist a set of deterministic maps which exhibit the same behavior.
Vi Noisy Broadcast Channels
Superposition Coding
Coding for noisy broadcast channels is now considered using polarization methods. By contrast to the deterministic case, a decoding error event occurs at the receivers on account of the randomness due to noise. For the remaining sections, it is assumed that there exist users in the DMBC. The privatemessage capacity region for the DMBC is unknown even for binary input, binary output twouser channels such as the skewsymmetric DMBC. However, the privatemessage capacity region is known for specific classes.
Via Special Classes of Noisy DMBCs
Definition 8
The twouser physically degraded DMBC is a channel for which form a Markov chain, i.e. one of the receivers is statistically stronger than the other: