Channel Polarization on q-ary Discrete Memoryless Channels by Arbitrary Kernels

Channel Polarization on -ary Discrete Memoryless Channels by Arbitrary Kernels

Ryuhei Mori Graduate School of Informatics
Kyoto University
Kyoto, 606–8501, Japan
Email: rmori@sys.i.kyoto-u.ac.jp
   Toshiyuki Tanaka Graduate School of Informatics
Kyoto University
Kyoto, 606–8501, Japan
Email: tt@i.kyoto-u.ac.jp
Abstract

A method of channel polarization, proposed by Arıkan, allows us to construct efficient capacity-achieving channel codes. In the original work, binary input discrete memoryless channels are considered. A special case of -ary channel polarization is considered by Şaşoğlu, Telatar, and Arıkan. In this paper, we consider more general channel polarization on -ary channels. We further show explicit constructions using Reed-Solomon codes, on which asymptotically fast channel polarization is induced.

I Introduction

Channel polarization, proposed by Arıkan, is a method of constructing capacity achieving codes with low encoding and decoding complexities [1]. Channel polarization can also be used to construct lossy source codes which achieve rate-distortion trade-off with low encoding and decoding complexities [2]. Arıkan and Telatar derived the rate of channel polarization [3]. In [4], a more detailed rate of channel polarization which includes coding rate is derived. In [1], channel polarization is based on a matrix. Korada, Şaşoğlu, and Urbanke considered generalized polarization phenomenon which is based on an matrix and derived the rate of the generalized channel polarization [5]. In [6], a special case of channel polarization on -ary channels is considered. In this paper, we consider channel polarization on -ary channels which is based on arbitrary mappings.

Ii Preliminaries

Let and denote a row vector and its subvector . Let denote the complement of a set , and denotes cardinality of . Let and be an input alphabet and an output alphabet, respectively. In this paper, we assume that is finite and that is at most countable. A discrete memoryless channel (DMC) is defined as a conditional probability distribution over where and . We write to mean a DMC with an input alphabet and an output alphabet . Let be the cardinality of . In this paper, the base of the logarithm is unless otherwise stated.

Definition 1

The symmetric capacity of -ary input channel is defined as

Note that .

Definition 2

Let . The error probability of the maximum-likelihood estimation of the input on the basis of the output of the channel is defined as

Definition 3

The Bhattacharyya parameter of is defined as

where the Bhattacharyya parameter of between and is defined as

The symmetric capacity , the error probability , and the Bhattacharyya parameter are interrelated as in the following lemmas.

Lemma 4
Lemma 5

[6]

Definition 6

The maximum and the minimum of the Bhattacharyya parameters between two symbols are defined as

Let be a permutation. Let denote the th power of . The average Bhattacharyya parameter of between and with respect to is defined as the average of over the subset as

Iii Channel polarization on -ary DMC induced by non-linear kernel

We consider a channel transform using a one-to-one onto mapping , which is called a kernel. In the previous works [1], [5], it is assumed that and that is linear. In [6], is arbitrary but is restricted. In this paper, and are arbitrary.

Definition 7

Let be a DMC. Let , , and be defined as DMCs with transition probabilities

Definition 8

Let be independent random variables such that with probability , for each .

In probabilistic channel transform , expectation of the symmetric capacity is invariant due to the chain rule for mutual information. The following lemma is a consequence of the martingale convergence theorem.

Lemma 9

There exists a random variable such that converges to almost surely as .

When and , Arıkan showed that  [1]. This result is called channel polarization phenomenon since subchannels polarize to noiseless channels and pure noise channels. Korada, Şaşoğlu, and Urbanke consider channel polarization phenomenon when and is linear [5].

From Lemma 5, is close to 0 and 1 when is close to 1 and 0, respectively. Hence, it would be sufficient to prove channel polarization if one can show that converges to almost surely. Here we instead show a weaker version of the above property in the following lemma and its corollary.

Lemma 10

Let be a sequence of discrete sets. Let be a sequence of -ary DMCs. Let and be permutations on . Let

where , . Assume . Then, for any , there exists such that for any , and .

Proof:

Let , and be random variables which take values on , and , respectively, and jointly obey the distribution

Since and ,

tends to 0 by the assumption. Since the mutual information is lower bounded by the cut-off rate, one obtains

where

Since

it holds

The convergence of to 0 implies that

converges to 0 for any . It consequently implies that for any , there exists such that for any , and . \qed

Using Lemma 10, one can obtain a partial result of the channel polarization as follows.

Corollary 11

Assume that there exists , and permutations and on such that -th element of and -th element of are and , respectively, and such that for any there exists and a permutation on such that -th element of is . Then, for almost every sequence of , and for any , there exists such that for any , and .

{proof}

Since converges to almost surely, has to converge to 0 almost surely. Let and denote random variables ranging over and , and obeying the distribution

Then, it holds

From the assumption, for all . Hence, has to converge to 0 almost surely. By applying Lemma 10, one obtains the result. When , since , this corollary immediately implies the channel polarization phenomenon, although it is not sufficient for general . Note that in this derivation one does not use extra conditions e.g., symmetricity of DMC, linearity of a kernel.

If a kernel is linear, a more detailed condition is obtained.

Definition 12

Assume be a commutative ring. A kernel is said to be linear if for all , , , and .

If is linear, can be represented by a square matrix such that . Let , and denote random variables taking values on , and , respectively, and obeying distribution

where denotes an full-rank upper triangle matrix. There exists a one-to-one correspondence between and for all . Hence, statistical properties of are invariant under an operation . Further, a permutation of columns of does not change statistical properties of either. Since any full-rank matrix can be decomposed to the form where , , and are upper triangle, lower triangle, and permutation matrices, without loss of generality we assume that is a lower triangle matrix and that where is the largest number such that the number of non-zero elements in -th row of is greater than 1, and where denotes element of .

Theorem 13

Assume that is a field of prime cardinality, and that linear kernel is not diagonal. Then, .

Proof:

It holds

where , , and is -th element of where is all-zero vector of length . Let be such that . Since each occurs with positive probability , we can apply Lemma 10 with and for arbitrary . Hence, for sufficiently large , is close to 0 or 1 almost surely where for all and . Since is a prime, when for , is close to 0 or 1 if and only if is close to 0 or 1, respectively. \qed

This result is a simple generalization of the special case considered by Şaşoğlu, Telatar, and Arıkan [6]. For a prime power and a finite field , we show a sufficient condition for channel polarization in the following corollary.

Corollary 14

Assume that is a field and that a linear kernel is not diagonal. If there exists such that is a primitive element. Then, .

Proof:

By applying Lemma 10, one sees that for almost every sequence of , and for any , there exists such that for any , and where for arbitrary . It suffices to show that for any and , is close to 1 if and only if is close to 1. When is close to 1, is close to 1. Hence, is close to 1 for any . Since is a primitive element, is close to 1 for any . It completes the proof. \qed

In [7], it is shown that the channel polarization phenomenon occurs by using a random kernel in which is chosen uniformly from nonzero elements. Corollary 14 says that a deterministic primitive element is sufficient for the channel polarization phenomenon.

Iv Speed of polarization

Arıkan and Telatar showed the speed of polarization [3]. Korada, Şaşoğlu, and Urbanke generalized it to any binary linear kernels [5].

Proposition 15

Let be a random process satisfying the following properties.

  1. converges to almost surely.

  2. where are independent and identically distributed random variables, and is a constant.

Then,

for where denotes an expectation. Similarly, let be a random process satisfying the following properties.

  1. converges to almost surely.

  2. where are independent and identically distributed random variables, and is a constant.

Then,

for .

Note that the above proposition can straightforwardly be extended to include the rate dependence [4].

In order to apply Proposition 15 to and as and , respectively, the second conditions have to be proven. In the argument of [5], partial distance of a kernel corresponds to the random variables and in Proposition 15.

Definition 16

Partial distance of a kernel is defined as

where denotes the Hamming distance between and .

We also use the following quantities.

When is linear, does not depend on , or , in which case we will use the notation instead of .

From Lemma 21 in the appendix, the following lemma is obtained.

Lemma 17

For ,

Corollary 18

For ,

From Proposition 15 and Corollary 18, the following theorem is obtained.

Theorem 19

Assume . It holds

for .

When ,

for .

When is a linear kernel represented by a square matrix , is called the exponent of  [5].

Example 20

Assume that is a field and that is a primitive element. For a non-zero element , let

Since , can be regarded as a generalization of Arıkan’s original matrix. The relation between binary polar codes and binary Reed-Muller codes [1] also holds for -ary polar codes using and -ary Reed-Muller codes. From Theorem 13, the channel polarization phenomenon occurs on for any when is a prime. When is a primitive element, from Corollary 14, the channel polarization phenomenon occurs on for any prime power . We call the Reed-Solomon kernel since the submatrix which consists of -th row to -th row of is a generator matrix of a generalized Reed-Solomon code, which is a maximum distance separable code i.e., . Hence, the exponent of is where . Since

the exponent of the Reed-Solomon kernel tends to 1 as tends to infinity. When , the exponent of the Reed-Solomon kernel is . In Arıkan’s original work, the exponent of the matrix is  [3]. In [5], Korada, Şaşoğlu, and Urbanke showed that by using large kernels, the exponent can be improved, and found a matrix of size 16 whose exponent is about 0.51828. The above-mentioned Reed-Solomon kernel with is reasonably small and simple but has a larger exponent than binary linear kernels of small size. This demonstrates the usefulness of considering -ary rather than binary channels. For -ary DMC where is not a prime, it can be decomposed to subchannels of input sizes of prime numbers [7] by using the method of multilevel coding [8]. The above example shows that when is a power of a prime, without the decomposition of -ary DMC, asymptotically better coding scheme can be constructed by using -ary polar codes with .

V Conclusion

The channel polarization phenomenon on -ary channels has been considered. We give several sufficient conditions on kernels under which the channel polarization phenomenon occurs. We also show an explicit construction with a -ary linear kernel for being a power of a prime. The exponent of is which is larger than the exponent of binary matrices of small size even if . Our discussion includes channel polarization on non-linear kernels as well. It is known that non-linear binary codes may have a larger minimum distance than linear binary codes, e.g. the Nordstrom-Robinson codes [9]. This implies possibility that there exists a non-linear kernel with a larger exponent than any linear kernel of the same size.

Lemma 21
Proof:

For the second inequality, one has