Channel Polarization on q-ary Discrete Memoryless Channels by Arbitrary Kernels

# Channel Polarization on q-ary Discrete Memoryless Channels by Arbitrary Kernels

Ryuhei Mori Graduate School of Informatics
Kyoto University
Kyoto, 606–8501, Japan
Email: rmori@sys.i.kyoto-u.ac.jp
Toshiyuki Tanaka Graduate School of Informatics
Kyoto University
Kyoto, 606–8501, Japan
Email: tt@i.kyoto-u.ac.jp
###### Abstract

A method of channel polarization, proposed by Arıkan, allows us to construct efficient capacity-achieving channel codes. In the original work, binary input discrete memoryless channels are considered. A special case of -ary channel polarization is considered by Şaşoğlu, Telatar, and Arıkan. In this paper, we consider more general channel polarization on -ary channels. We further show explicit constructions using Reed-Solomon codes, on which asymptotically fast channel polarization is induced.

## I Introduction

Channel polarization, proposed by Arıkan, is a method of constructing capacity achieving codes with low encoding and decoding complexities [1]. Channel polarization can also be used to construct lossy source codes which achieve rate-distortion trade-off with low encoding and decoding complexities [2]. Arıkan and Telatar derived the rate of channel polarization [3]. In [4], a more detailed rate of channel polarization which includes coding rate is derived. In [1], channel polarization is based on a matrix. Korada, Şaşoğlu, and Urbanke considered generalized polarization phenomenon which is based on an matrix and derived the rate of the generalized channel polarization [5]. In [6], a special case of channel polarization on -ary channels is considered. In this paper, we consider channel polarization on -ary channels which is based on arbitrary mappings.

## Ii Preliminaries

Let and denote a row vector and its subvector . Let denote the complement of a set , and denotes cardinality of . Let and be an input alphabet and an output alphabet, respectively. In this paper, we assume that is finite and that is at most countable. A discrete memoryless channel (DMC) is defined as a conditional probability distribution over where and . We write to mean a DMC with an input alphabet and an output alphabet . Let be the cardinality of . In this paper, the base of the logarithm is unless otherwise stated.

###### Definition 1

The symmetric capacity of -ary input channel is defined as

 I(W):=∑x∈X∑y∈Y1qW(y∣x)logW(y∣x)1q∑x′∈XW(y∣x′).

Note that .

###### Definition 2

Let . The error probability of the maximum-likelihood estimation of the input on the basis of the output of the channel is defined as

 Pe(W):=1q∑x∈X∑y∈DcxW(y∣x).
###### Definition 3

The Bhattacharyya parameter of is defined as

 Z(W):=1q(q−1)∑x∈X,x′∈X,x≠x′Zx,x′(W)

where the Bhattacharyya parameter of between and is defined as

 Zx,x′(W):=∑y∈Y√W(y∣x)W(y∣x′).

The symmetric capacity , the error probability , and the Bhattacharyya parameter are interrelated as in the following lemmas.

###### Lemma 4
 Pe(W)≤(q−1)Z(W).
###### Lemma 5

[6]

 I(W) ≥logq1+(q−1)Z(W) I(W) ≤log(q/2)+(log2)√1−Z(W)2 I(W) ≤2(q−1)(loge)√1−Z(W)2.
###### Definition 6

The maximum and the minimum of the Bhattacharyya parameters between two symbols are defined as

 Zmax(W) :=maxx∈X,x′∈X,x≠x′Zx,x′(W) Zmin(W) :=minx∈X,x′∈XZx,x′(W).

Let be a permutation. Let denote the th power of . The average Bhattacharyya parameter of between and with respect to is defined as the average of over the subset as

 Zσx,x′(W) :=1q!q!−1∑i=0Zσi(x),σi(x′)(W).

## Iii Channel polarization on q-ary DMC induced by non-linear kernel

We consider a channel transform using a one-to-one onto mapping , which is called a kernel. In the previous works [1], [5], it is assumed that and that is linear. In [6], is arbitrary but is restricted. In this paper, and are arbitrary.

###### Definition 7

Let be a DMC. Let , , and be defined as DMCs with transition probabilities

 Wℓ(yℓ−10∣xℓ−10) :=ℓ−1∏i=0W(yi∣xi) W(i)(yℓ−10,ui−10∣ui) :=1qℓ−1∑uℓ−1i+1Wℓ(yℓ−10∣g(uℓ−10)) W(i)ui−10(yℓ−10∣ui) :=1qℓ−i−1∑uℓ−1i+1Wℓ(yℓ−10∣g(uℓ−10)).
###### Definition 8

Let be independent random variables such that with probability , for each .

In probabilistic channel transform , expectation of the symmetric capacity is invariant due to the chain rule for mutual information. The following lemma is a consequence of the martingale convergence theorem.

###### Lemma 9

There exists a random variable such that converges to almost surely as .

When and , Arıkan showed that  [1]. This result is called channel polarization phenomenon since subchannels polarize to noiseless channels and pure noise channels. Korada, Şaşoğlu, and Urbanke consider channel polarization phenomenon when and is linear [5].

From Lemma 5, is close to 0 and 1 when is close to 1 and 0, respectively. Hence, it would be sufficient to prove channel polarization if one can show that converges to almost surely. Here we instead show a weaker version of the above property in the following lemma and its corollary.

###### Lemma 10

Let be a sequence of discrete sets. Let be a sequence of -ary DMCs. Let and be permutations on . Let

 W′n(y1,y2∣x)=Wn(y1∣σ(x))Wn(y2∣τ(x))

where , . Assume . Then, for any , there exists such that for any , and .

###### Proof:

Let , and be random variables which take values on , and , respectively, and jointly obey the distribution

 Pn(Z=z,Y1=y1,Y2=y2)=1qWn(y1∣σ(z))Wn(y2∣τ(z)).

Since and ,

 I(Z;Y1,Y2)−I(Z;Y1)=I(Z;Y2∣Y1)

tends to 0 by the assumption. Since the mutual information is lower bounded by the cut-off rate, one obtains

 I(Z;Y2∣Y1)≥−log∑y1∈Yn,y2∈YnPn(Y1=y1) ×[∑z∈XPn(Z=z∣Y1=y1) ×√Pn(Y2=y2∣Z=z,Y1=y1)]2 =−log∑y1∈Yn,z∈X,x∈XPn(Y1=y1)Pn(Z=z∣Y1=y1) ×Pn(Z=x∣Y1=y1)Zτ(z),τ(x)(Wn) =−log∑y1∈Yn,z∈X,x∈Xqn(y1,z,x)Zτ(σ−1(z)),τ(σ−1(x))(Wn)

where

 qn(y1,z,x):=Pn(Y1=y1)×Pn(Z=σ−1(z)∣Y1=y1)Pn(Z=σ−1(x)∣Y1=y1).

Since

 ×(√Pn(Z=σ−1(z)∣Y1=y1)Pn(Z=σ−1(x)∣Y1=y1))2 ≥(∑y1∈YPn(Y1=y1) ×√Pn(Z=σ−1(z)∣Y1=y1)Pn(Z=σ−1(x)∣Y1=y1))2 =1q2Zz,x(Wn)2

it holds

 I(Z;Y2∣Y1)≥−log[1− 1q2∑z∈X,x∈XZz,x(Wn)2(1−Zτ(σ−1(z)),τ(σ−1(x))(Wn))].

The convergence of to 0 implies that

 Zz,x(Wn)2(1−Zτ(σ−1(z)),τ(σ−1(x))(Wn))

converges to 0 for any . It consequently implies that for any , there exists such that for any , and . \qed

Using Lemma 10, one can obtain a partial result of the channel polarization as follows.

###### Corollary 11

Assume that there exists , and permutations and on such that -th element of and -th element of are and , respectively, and such that for any there exists and a permutation on such that -th element of is . Then, for almost every sequence of , and for any , there exists such that for any , and .

{proof}

Since converges to almost surely, has to converge to 0 almost surely. Let and denote random variables ranging over and , and obeying the distribution

 P(Ui0=uℓ−10,Yℓ−10=yℓ−10)=1qW(ℓ−1)(yℓ−10,uℓ−20∣uℓ−1).

Then, it holds

 I(W(ℓ−1)) =I(Yℓ−10,Uℓ−20;Uℓ−1) =I(Yℓ−10;Uℓ−1∣Uℓ−20) =∑uℓ−201qℓ−1I(Yℓ−10;Uℓ−1∣Uℓ−20=uℓ−20).

From the assumption, for all . Hence, has to converge to 0 almost surely. By applying Lemma 10, one obtains the result. When , since , this corollary immediately implies the channel polarization phenomenon, although it is not sufficient for general . Note that in this derivation one does not use extra conditions e.g., symmetricity of DMC, linearity of a kernel.

If a kernel is linear, a more detailed condition is obtained.

###### Definition 12

Assume be a commutative ring. A kernel is said to be linear if for all , , , and .

If is linear, can be represented by a square matrix such that . Let , and denote random variables taking values on , and , respectively, and obeying distribution

 P(Uℓ−10=uℓ−10,Xℓ−10=xℓ−10,Yℓ−10=yℓ−10)=12ℓWℓ(yℓ−10∣uℓ−10G)I{xℓ−10V=uℓ−10}

where denotes an full-rank upper triangle matrix. There exists a one-to-one correspondence between and for all . Hence, statistical properties of are invariant under an operation . Further, a permutation of columns of does not change statistical properties of either. Since any full-rank matrix can be decomposed to the form where , , and are upper triangle, lower triangle, and permutation matrices, without loss of generality we assume that is a lower triangle matrix and that where is the largest number such that the number of non-zero elements in -th row of is greater than 1, and where denotes element of .

###### Theorem 13

Assume that is a field of prime cardinality, and that linear kernel is not diagonal. Then, .

###### Proof:

It holds

 W(k)(yℓ−10,uk−10∣uk)=1qℓ−1ℓ−1∏j=k+1(∑x∈XW(yj∣x))×∏j∈S0W(yj∣xj)∏j∈S1W(yj∣Gkjuk+xj)

where , , and is -th element of where is all-zero vector of length . Let be such that . Since each occurs with positive probability , we can apply Lemma 10 with and for arbitrary . Hence, for sufficiently large , is close to 0 or 1 almost surely where for all and . Since is a prime, when for , is close to 0 or 1 if and only if is close to 0 or 1, respectively. \qed

This result is a simple generalization of the special case considered by Şaşoğlu, Telatar, and Arıkan [6]. For a prime power and a finite field , we show a sufficient condition for channel polarization in the following corollary.

###### Corollary 14

Assume that is a field and that a linear kernel is not diagonal. If there exists such that is a primitive element. Then, .

###### Proof:

By applying Lemma 10, one sees that for almost every sequence of , and for any , there exists such that for any , and where for arbitrary . It suffices to show that for any and , is close to 1 if and only if is close to 1. When is close to 1, is close to 1. Hence, is close to 1 for any . Since is a primitive element, is close to 1 for any . It completes the proof. \qed

In [7], it is shown that the channel polarization phenomenon occurs by using a random kernel in which is chosen uniformly from nonzero elements. Corollary 14 says that a deterministic primitive element is sufficient for the channel polarization phenomenon.

## Iv Speed of polarization

Arıkan and Telatar showed the speed of polarization [3]. Korada, Şaşoğlu, and Urbanke generalized it to any binary linear kernels [5].

###### Proposition 15

Let be a random process satisfying the following properties.

1. converges to almost surely.

2. where are independent and identically distributed random variables, and is a constant.

Then,

 limn→∞P(^Xn<2−2βn)=P(^X∞=0)

for where denotes an expectation. Similarly, let be a random process satisfying the following properties.

1. converges to almost surely.

2. where are independent and identically distributed random variables, and is a constant.

Then,

 limn→∞P(ˇXn<2−2βn)=0

for .

Note that the above proposition can straightforwardly be extended to include the rate dependence [4].

In order to apply Proposition 15 to and as and , respectively, the second conditions have to be proven. In the argument of [5], partial distance of a kernel corresponds to the random variables and in Proposition 15.

###### Definition 16

Partial distance of a kernel is defined as

 D(i)x,x′(ui−10):=minvℓ−1i+1,wℓ−1i+1d(g(ui−10,x,vℓ−1i+1),g(ui−10,x′,wℓ−1i+1))

where denotes the Hamming distance between and .

We also use the following quantities.

 D(i)x,x′ :=minui−10D(i)x,x′(ui−10) D(i)max :=maxx∈X,x′∈XD(i)x,x′ D(i)min :=minx∈X,x′∈Xx≠x′D(i)x,x′.

When is linear, does not depend on , or , in which case we will use the notation instead of .

From Lemma 21 in the appendix, the following lemma is obtained.

###### Lemma 17

For ,

 1q2ℓ−2−iZmin(W)D(i)x,x′≤Zx,x′(W(i)ℓ)≤qℓ−1−iZmax(W)D(i)x,x′
###### Corollary 18

For ,

 Zmax(W(i)) ≤qℓ−1−iZmax(W)D(i)min 1q2ℓ−2−iZmin(W)D(i)max ≤Zmin(W(i)).

From Proposition 15 and Corollary 18, the following theorem is obtained.

###### Theorem 19

Assume . It holds

 limn→∞P(Z(W(B1)…(Bn))<2−ℓβn)=I(W)

for .

When ,

 limn→∞P(Z(W(B1)…(Bn))<2−ℓβn)=0

for .

When is a linear kernel represented by a square matrix , is called the exponent of  [5].

###### Example 20

Assume that is a field and that is a primitive element. For a non-zero element , let

 GRS(q)=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣11…110α(q−2)(q−2)α(q−3)(q−2)…αq−210α(q−2)(q−3)α(q−3)(q−3)…αq−310⋮⋮…⋮⋮⋮αq−2αq−3…α1011…11γ⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

Since , can be regarded as a generalization of Arıkan’s original matrix. The relation between binary polar codes and binary Reed-Muller codes [1] also holds for -ary polar codes using and -ary Reed-Muller codes. From Theorem 13, the channel polarization phenomenon occurs on for any when is a prime. When is a primitive element, from Corollary 14, the channel polarization phenomenon occurs on for any prime power . We call the Reed-Solomon kernel since the submatrix which consists of -th row to -th row of is a generator matrix of a generalized Reed-Solomon code, which is a maximum distance separable code i.e., . Hence, the exponent of is where . Since

 1ℓℓ−1∑i=0logℓ(i+1)≥1ℓlogeℓ∫ℓ1logexdx=1−ℓ−1ℓlogeℓ

the exponent of the Reed-Solomon kernel tends to 1 as tends to infinity. When , the exponent of the Reed-Solomon kernel is . In Arıkan’s original work, the exponent of the matrix is  [3]. In [5], Korada, Şaşoğlu, and Urbanke showed that by using large kernels, the exponent can be improved, and found a matrix of size 16 whose exponent is about 0.51828. The above-mentioned Reed-Solomon kernel with is reasonably small and simple but has a larger exponent than binary linear kernels of small size. This demonstrates the usefulness of considering -ary rather than binary channels. For -ary DMC where is not a prime, it can be decomposed to subchannels of input sizes of prime numbers [7] by using the method of multilevel coding [8]. The above example shows that when is a power of a prime, without the decomposition of -ary DMC, asymptotically better coding scheme can be constructed by using -ary polar codes with .

## V Conclusion

The channel polarization phenomenon on -ary channels has been considered. We give several sufficient conditions on kernels under which the channel polarization phenomenon occurs. We also show an explicit construction with a -ary linear kernel for being a power of a prime. The exponent of is which is larger than the exponent of binary matrices of small size even if . Our discussion includes channel polarization on non-linear kernels as well. It is known that non-linear binary codes may have a larger minimum distance than linear binary codes, e.g. the Nordstrom-Robinson codes [9]. This implies possibility that there exists a non-linear kernel with a larger exponent than any linear kernel of the same size.

###### Lemma 21
 1q2(ℓ−1−i)Zmin(W)D(i)x,x′(ui−10)≤Zx,x′(W(i)ui−10)≤qℓ−1−iZmax(W)D(i)x,x′(ui−10)
###### Proof:

For the second inequality, one has

 Zx,x′(W(i)ui−10)=∑yℓ−10√W(i)ui−10(yℓ−10∣x)W(i)ui−10(yℓ−10∣x′) =qi∑yℓ−10√W(i)(yℓ−10,ui−10∣x)W(i)(yℓ−10,ui−10∣x′) =1qℓ−1−i∑yℓ−10(∑vℓ−1i+1,wℓ−1i+1 Wℓ(yℓ−10∣ui−10,x,vℓ−1i+1)Wℓ(yℓ−10∣ui−10,x′,wℓ−1i+1))12 ≤1qℓ−1−i∑yℓ−10∑vℓ−1i+1,wℓ−1i+1 √Wℓ(yℓ−10∣ui−10,x,vℓ−1i+1)Wℓ(yℓ−10∣ui−10,x′,wℓ−1i+1) ≤1qℓ−1−i∑vℓ−1i+1,wℓ−1i+1Zmax(W)D(i)x,x′(ui−10) =qℓ−1−iZmax(W)D(i)x,x′(ui−