Channel Polarization on ary Discrete Memoryless Channels by Arbitrary Kernels
Abstract
A method of channel polarization, proposed by Arıkan, allows us to construct efficient capacityachieving channel codes. In the original work, binary input discrete memoryless channels are considered. A special case of ary channel polarization is considered by Şaşoğlu, Telatar, and Arıkan. In this paper, we consider more general channel polarization on ary channels. We further show explicit constructions using ReedSolomon codes, on which asymptotically fast channel polarization is induced.
I Introduction
Channel polarization, proposed by Arıkan, is a method of constructing capacity achieving codes with low encoding and decoding complexities [1]. Channel polarization can also be used to construct lossy source codes which achieve ratedistortion tradeoff with low encoding and decoding complexities [2]. Arıkan and Telatar derived the rate of channel polarization [3]. In [4], a more detailed rate of channel polarization which includes coding rate is derived. In [1], channel polarization is based on a matrix. Korada, Şaşoğlu, and Urbanke considered generalized polarization phenomenon which is based on an matrix and derived the rate of the generalized channel polarization [5]. In [6], a special case of channel polarization on ary channels is considered. In this paper, we consider channel polarization on ary channels which is based on arbitrary mappings.
Ii Preliminaries
Let and denote a row vector and its subvector . Let denote the complement of a set , and denotes cardinality of . Let and be an input alphabet and an output alphabet, respectively. In this paper, we assume that is finite and that is at most countable. A discrete memoryless channel (DMC) is defined as a conditional probability distribution over where and . We write to mean a DMC with an input alphabet and an output alphabet . Let be the cardinality of . In this paper, the base of the logarithm is unless otherwise stated.
Definition 1
The symmetric capacity of ary input channel is defined as
Note that .
Definition 2
Let . The error probability of the maximumlikelihood estimation of the input on the basis of the output of the channel is defined as
Definition 3
The Bhattacharyya parameter of is defined as
where the Bhattacharyya parameter of between and is defined as
The symmetric capacity , the error probability , and the Bhattacharyya parameter are interrelated as in the following lemmas.
Lemma 4
Lemma 5
[6]
Definition 6
The maximum and the minimum of the Bhattacharyya parameters between two symbols are defined as
Let be a permutation. Let denote the th power of . The average Bhattacharyya parameter of between and with respect to is defined as the average of over the subset as
Iii Channel polarization on ary DMC induced by nonlinear kernel
We consider a channel transform using a onetoone onto mapping , which is called a kernel. In the previous works [1], [5], it is assumed that and that is linear. In [6], is arbitrary but is restricted. In this paper, and are arbitrary.
Definition 7
Let be a DMC. Let , , and be defined as DMCs with transition probabilities
Definition 8
Let be independent random variables such that with probability , for each .
In probabilistic channel transform , expectation of the symmetric capacity is invariant due to the chain rule for mutual information. The following lemma is a consequence of the martingale convergence theorem.
Lemma 9
There exists a random variable such that converges to almost surely as .
When and , Arıkan showed that [1]. This result is called channel polarization phenomenon since subchannels polarize to noiseless channels and pure noise channels. Korada, Şaşoğlu, and Urbanke consider channel polarization phenomenon when and is linear [5].
From Lemma 5, is close to 0 and 1 when is close to 1 and 0, respectively. Hence, it would be sufficient to prove channel polarization if one can show that converges to almost surely. Here we instead show a weaker version of the above property in the following lemma and its corollary.
Lemma 10
Let be a sequence of discrete sets. Let be a sequence of ary DMCs. Let and be permutations on . Let
where , . Assume . Then, for any , there exists such that for any , and .
Proof:
Let , and be random variables which take values on , and , respectively, and jointly obey the distribution
Since and ,
tends to 0 by the assumption. Since the mutual information is lower bounded by the cutoff rate, one obtains
where
Since
it holds
The convergence of to 0 implies that
converges to 0 for any . It consequently implies that for any , there exists such that for any , and . \qed
Using Lemma 10, one can obtain a partial result of the channel polarization as follows.
Corollary 11
Assume that there exists , and permutations and on such that th element of and th element of are and , respectively, and such that for any there exists and a permutation on such that th element of is . Then, for almost every sequence of , and for any , there exists such that for any , and .
Since converges to almost surely, has to converge to 0 almost surely. Let and denote random variables ranging over and , and obeying the distribution
Then, it holds
From the assumption, for all . Hence, has to converge to 0 almost surely. By applying Lemma 10, one obtains the result. When , since , this corollary immediately implies the channel polarization phenomenon, although it is not sufficient for general . Note that in this derivation one does not use extra conditions e.g., symmetricity of DMC, linearity of a kernel.
If a kernel is linear, a more detailed condition is obtained.
Definition 12
Assume be a commutative ring. A kernel is said to be linear if for all , , , and .
If is linear, can be represented by a square matrix such that . Let , and denote random variables taking values on , and , respectively, and obeying distribution
where denotes an fullrank upper triangle matrix. There exists a onetoone correspondence between and for all . Hence, statistical properties of are invariant under an operation . Further, a permutation of columns of does not change statistical properties of either. Since any fullrank matrix can be decomposed to the form where , , and are upper triangle, lower triangle, and permutation matrices, without loss of generality we assume that is a lower triangle matrix and that where is the largest number such that the number of nonzero elements in th row of is greater than 1, and where denotes element of .
Theorem 13
Assume that is a field of prime cardinality, and that linear kernel is not diagonal. Then, .
Proof:
It holds
where , , and is th element of where is allzero vector of length . Let be such that . Since each occurs with positive probability , we can apply Lemma 10 with and for arbitrary . Hence, for sufficiently large , is close to 0 or 1 almost surely where for all and . Since is a prime, when for , is close to 0 or 1 if and only if is close to 0 or 1, respectively. \qed
This result is a simple generalization of the special case considered by Şaşoğlu, Telatar, and Arıkan [6]. For a prime power and a finite field , we show a sufficient condition for channel polarization in the following corollary.
Corollary 14
Assume that is a field and that a linear kernel is not diagonal. If there exists such that is a primitive element. Then, .
Proof:
By applying Lemma 10, one sees that for almost every sequence of , and for any , there exists such that for any , and where for arbitrary . It suffices to show that for any and , is close to 1 if and only if is close to 1. When is close to 1, is close to 1. Hence, is close to 1 for any . Since is a primitive element, is close to 1 for any . It completes the proof. \qed
Iv Speed of polarization
Arıkan and Telatar showed the speed of polarization [3]. Korada, Şaşoğlu, and Urbanke generalized it to any binary linear kernels [5].
Proposition 15
Let be a random process satisfying the following properties.

converges to almost surely.

where are independent and identically distributed random variables, and is a constant.
Then,
for where denotes an expectation. Similarly, let be a random process satisfying the following properties.

converges to almost surely.

where are independent and identically distributed random variables, and is a constant.
Then,
for .
Note that the above proposition can straightforwardly be extended to include the rate dependence [4].
In order to apply Proposition 15 to and as and , respectively, the second conditions have to be proven. In the argument of [5], partial distance of a kernel corresponds to the random variables and in Proposition 15.
Definition 16
Partial distance of a kernel is defined as
where denotes the Hamming distance between and .
We also use the following quantities.
When is linear, does not depend on , or , in which case we will use the notation instead of .
From Lemma 21 in the appendix, the following lemma is obtained.
Lemma 17
For ,
Corollary 18
For ,
Theorem 19
Assume . It holds
for .
When ,
for .
When is a linear kernel represented by a square matrix , is called the exponent of [5].
Example 20
Assume that is a field and that is a primitive element. For a nonzero element , let
Since , can be regarded as a generalization of Arıkan’s original matrix. The relation between binary polar codes and binary ReedMuller codes [1] also holds for ary polar codes using and ary ReedMuller codes. From Theorem 13, the channel polarization phenomenon occurs on for any when is a prime. When is a primitive element, from Corollary 14, the channel polarization phenomenon occurs on for any prime power . We call the ReedSolomon kernel since the submatrix which consists of th row to th row of is a generator matrix of a generalized ReedSolomon code, which is a maximum distance separable code i.e., . Hence, the exponent of is where . Since
the exponent of the ReedSolomon kernel tends to 1 as tends to infinity. When , the exponent of the ReedSolomon kernel is . In Arıkan’s original work, the exponent of the matrix is [3]. In [5], Korada, Şaşoğlu, and Urbanke showed that by using large kernels, the exponent can be improved, and found a matrix of size 16 whose exponent is about 0.51828. The abovementioned ReedSolomon kernel with is reasonably small and simple but has a larger exponent than binary linear kernels of small size. This demonstrates the usefulness of considering ary rather than binary channels. For ary DMC where is not a prime, it can be decomposed to subchannels of input sizes of prime numbers [7] by using the method of multilevel coding [8]. The above example shows that when is a power of a prime, without the decomposition of ary DMC, asymptotically better coding scheme can be constructed by using ary polar codes with .
V Conclusion
The channel polarization phenomenon on ary channels has been considered. We give several sufficient conditions on kernels under which the channel polarization phenomenon occurs. We also show an explicit construction with a ary linear kernel for being a power of a prime. The exponent of is which is larger than the exponent of binary matrices of small size even if . Our discussion includes channel polarization on nonlinear kernels as well. It is known that nonlinear binary codes may have a larger minimum distance than linear binary codes, e.g. the NordstromRobinson codes [9]. This implies possibility that there exists a nonlinear kernel with a larger exponent than any linear kernel of the same size.
Lemma 21
Proof:
For the second inequality, one has