Exact Asymptotics for the Random Coding Error Probability

# Exact Asymptotics for the Random Coding Error Probability

\authorblockNJunya Honda \authorblockAGraduate School of Frontier Sciences, The University of Tokyo
Kashiwa-shi Chiba 277–8561, Japan
Email: honda@it.k.u-tokyo.ac.jp
###### Abstract

Error probabilities of random codes for memoryless channels are considered in this paper111This paper is the full version of  in ISIT2015 with some corrections and refinements.. In the area of communication systems, admissible error probability is very small and it is sometimes more important to discuss the relative gap between the achievable error probability and its bound than to discuss the absolute gap. Scarlett et al. derived a good upper bound of a random coding union bound based on the technique of saddlepoint approximation but it is not proved that the relative gap of their bound converges to zero. This paper derives a new bound on the achievable error probability in this viewpoint for a class of memoryless channels. The derived bound is strictly smaller than that by Scarlett et al. and its relative gap with the random coding error probability (not a union bound) vanishes as the block length increases for a fixed coding rate.

channel coding, random coding, error exponent, finite-length analysis, asymptotic expansion.

## I Introduction

It is one of the most important task of information theory to clarify the achievable performance of channel codes under finite block length. For this purpose Polyanskiy  and Hayashi  considered the achievable coding rate under a fixed error probability and a block length. They revealed that the next term to the channel capacity is for the block length and expressed by a percentile of a normal distribution.

The essential point for derivation of such a bound is to evaluate error probabilities of channel codes with an accurate form. For this evaluation an asymptotic expansion of sums of random variables is used in . On the other hand, the admissible error probability in communication systems is very small, say, for example. In such cases it is sometimes more important to consider the relative gap between the achievable error probability and its bound than the absolute gap. Nevertheless, an approximation of a tail probability obtained by the asymptotic expansion sometimes results in a large relative gap and it is known that the technique of saddlepoint approximation and the (higher-order) large deviation principle is a more powerful tool rather than the asymptotic expansion .

Bounds of the error probability of random codes with a small relative gap have been researched extensively although most of them treat a fixed rate whereas  consider varying rate for the fixed error probability. Gallager  derived an upper bound called a random coding union bound on the rate of exponential decay of the random coding error probability for fixed rate . It is proved that this exponent of the random code is tight for both rates below the critical rate  and above the critical rate .

There have also been many researches on tight bounds of the random coding error probability with vanishing or constant relative error for a fixed rate . Dobrushin  derived a bound of the random coding error probability for symmetric channels in the strong sense that each row and the column of the transition probability matrix are permutations of the others. The relative error of this bound is asymptotically bounded by a constant. In particular, it vanishes in the case that the channel satisfies a nonlattice condition.

For general class of discrete memoryless channels, Gallager  derived a bound with a vanishing relative error for the rate below the critical rate based on the technique of exact asymptotics for i.i.d. random variables, and Altuğ and Wagner  corrected his result for singular channels. For general (possibly variable) rate , Scarlett et al.  derived a simple upper bound (we write this as ) of a random coding union bound based on the technique of saddlepoint approximation and showed that for nonsingular finite-alphabet discrete memoryless channels . However, This bound does not assure .

In this paper we consider the error probability of random coding for a fixed but arbitrary rate below the capacity. We derive a new bound which satisfies for (possibly infinite-alphabet or nondiscrete) nonsingular memoryless channels such that random variables associated with the channels satisfy a condition called a strongly nonlattice condition. The derived bound matches that by Gallager  for the rate below the critical rate222In the ISIT proceedings version it was described that the result contradicts the bound in  but it was the confirmation error of the author because of the difference of notations between this paper and . See Remark 4 for detail. .

The essential point to derive the new bound is that we optimize the parameter depending on the sent and the received sequences to bound the error probability. This fact contrasts to discussion in  and the classic random coding error exponent where the parameter is first fixed and optimized after the expectation over is taken. We confirm that this difference actually affects the derived bound and by this difference we can assure that the bound also becomes a lower bound of the probability with a vanishing relative error.

## Ii Preliminary

We consider a memoryless channel with input alphabet and output alphabet . The output distribution for input is denoted by . Let be a random variable with distribution and be following given . We define as the marginal distribution of . We assume that is absolutely continuous with respect to for any with density

 ν(x,y) =dW(⋅|x)dPY(y).

We also assume that the mutual information is finite, that is, .

Let be a random variable with the same distribution as and independent of and define . Since holds almost surely we have is well-defined almost surely. denotes independent copies of . We define .

We consider the error probability of a random code such that each element of codewords is generated independently from distribution . The coding rate of this code is given by . We use the maximum likelihood decoding with ties broken uniformly at random.

### Ii-a Error Exponent

Define a random variable on the space of functions by

 Z(λ) =logEX′[eλr(X,Y,X′)]

and its derivatives by

 Z(m)(λ) =dmdλmlogEX′[eλr(X,Y,X′)],

which we sometimes write by . Here denotes the expectation over for given . We define333We omit the discussion on the multi-valuedness of . The discussion involving logarithm of a complex number in this paper arises by following [12, Sect. XVI.2] and refer this to see that no problem occurs.

 Z(λ+iξ) =logEX′[e(λ+iξ)r(X,Y,X′)] Za(λ+iξ) =log∣∣EX′[e(λ+iξ)r(X,Y,X′)]∣∣,

where and is the imaginary unit. Here we always consider the case and define . We define

 Zi(λ)=logEX′[eλr(Xi,Yi,X′)],¯Z(λ)=1nn∑i=1Zi(λ).

and are defined in the same way.

The random coding error exponent for is denoted by

 Er(R) =−inf(α,λ)∈[0,1]×[0,∞){αR+logE[eαZ(λ)]} =−minα∈(0,1]{αR+logE[eαZ(1/(1+α))]}, (1)

and we write the optimal solution of as . We write .

In the strict sense the random coding error exponent represents the supremum of (1) over but for notational simplicity we fix and omit its dependence. See [9, Theorem 2] for a condition that there exists which attains this supremum.

Let be the probability measure such that . We write the expectation under by and define

 μi =Eρ[Z(i)(η)]=e−Λ(ρ)E[Z(i)(η)eρZ(η)] σij =Eρ[(Z(i)(η)−μi)(Z(j)(η)−μj)] =e−Λ(ρ)E[(Z(i)(η)−μi)(Z(j)(η)−μj)eρZ(η)] Σij =(σiiσijσjiσjj).

From derivatives of in and we have

 ∂logE[eαZ(η)]∂α∣∣∣α=ρ =μ0{=−R,if R≥Rcrit,<−R,otherwise, (2) ∂logE[eρZ(λ)]∂λ∣∣∣λ=η =αμ1=0. (3)

where is the critical rate, that is, the largest such that the optimal solution of (1) is . We assume that , or equivalently, where is the support of . This corresponds to the non-singular assumption in  for the finite alphabet.

To avoid somewhat technical argument on the continuity and integrability we also assume that there exists and a neighborhood of such that for any

 supλ∈SEρ[eα|Z(m)(λ)|]<∞,i=1,2,3, supλ∈S,ξ∈[−b0,b0]Eρ[eα|(∂4/∂ξ4)Z(λ+iξ)|]<∞, supλ∈S,ξ∈[b1,b2]Eρ[eα|Za(λ+iξ)−Za(λ)|]<∞. (4)

where is given later. Note that these conditions trivially hold if the input and output alphabets are finite.

### Ii-B Lattice and Nonlattice Distributions

In the asymptotic expansion with an order higher than the central-limit theorem, it is necessary to consider cases that the distribution is lattice or nonlattice separately. Here we call that a random variable has a lattice distribution if almost surely for some and linearly independent vectors . For the case we call the largest satisfying the above condition the span of the lattice.

On the other hand, we call that has a strongly nonlattice distribution if for all , where denotes the inner product. Note that a one dimensional random variable is lattice or strongly nonlattice but, in general, there exists a random variable which is not lattice and not strongly nonlattice.

As given above, a lattice distribution is defined for a random variable in standard references such as . In this paper we call that the distribution of is lattice if the conditional distribution of given is lattice and nonlattice otherwise. It is easy to see that no contradiction occurs under this definition.

We consider the following condition regarding lattice and nonlattice distributions.

###### Definition 1.

We call that the log-likelihood ratio satisfies the lattice condition with span if the conditional distribution of given is lattice with span almost surely where may depend on and is the largest value satisfying this condition.

For notational simplicity we define the span of the lattice for to be if does not satisfy the lattice condition. Other than the classification of , we also discuss cases that is strongly nonlattice or not separately.

Note that a one-dimentional random variable with support is always lattice if , and is strongly nonlattice except for some special cases if . Similarly, a two-dimensional random variable is always not strongly nonlattice if , and is strongly nonlattice except for some special cases if . Based on this observation we see that most channels with input and output alphabet sizes larger than 3 are strongly nonlattice. Another example of each class of channels (excluding those with specially chosen parameters) are given in Table. I.

###### Remark 1.

The above conditions are different from the condition considered in  as a classification of lattice and nonlattice cases. This difference arises from two reasons. First, we consider in addition to to derive an accurate bound. Second, the proof of [10, Lemma 1] does not use the correct span when applying the result [15, Sect. VII.1, Thm. 2].

## Iii Main Result

Define

 gh(u)=1−e−hηehη−1u(1−e−hηu)hηu.

for . Here we define for and therefore . We give some properties on in Appendix -A. Now we can represent the random coding error probability as follows.

###### Theorem 1.

Fix any and , and let be sufficiently small. Then, for the span of the lattice for , there exists such that for all

 (1−ϵ)E[gh((1−ϵ)en(¯Z(η)+R−(¯Z′(η))2/2(μ2−δ2))η√2πnμ2)] ≤PRC(n) ≤(1+ϵ)E[gh((1+ϵ)en(¯Z(η)+R−(¯Z′(η))2/2(μ2+δ2))η√2πnμ2)],

By this theorem we can reduce the evaluation of error probability into that of an expectation over two-dimensional random variable , although this expectation is still difficult to compute. If is strongly nonlattice then we can derive the following bound which gives an explicit representation for the asymptotic behavior of .

###### Theorem 2.

Fix and assume that has a strongly nonlattice distribution. Then

 PRC(n) =⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩ψρ,hμ(1−ρ)/22(1+o(1))ηρ(2πn)(1+ρ)/2√(μ2σ00+ρ|Σ01|)e−nEr(R),R>Rcrit,h(1+o(1))2(eηh−1)√2πn(μ2+σ11)e−nEr(R),R=Rcrit,h(1+o(1))(eηh−1)√2πn(μ2+σ11)e−nEr(R),R

where

 ψρ,h =∫∞−∞e−ρwgh(ew)dw =Γ(1−ρ)ρ(hηehη−1)ρ+1eh−1h

for the gamma function .

We prove Theorems 1 and 2 in Sections IV and V, respectively. From this theorem we see that at least for the strongly nonlattice case the error probability of the random coding is

 PRC(n) ={Ω(n−(1+ρ)/2e−nEr(R)),R>RcritΩ(n−1/2e−nEr(R)),R≤Rcrit. (6)

The RHS of (6) for is the same expression as the upper bounds in  but our bound is tighter in its coefficient and is also assured to be the lower bound.

It may be possible to derive a similar bound as Theorem 2 for the case that is not strongly nonlattice by replacement of integrals with summations, but for this case the author was not able to find an expression of the asymptotic expansion straightforwardly applicable to our problem and this remains as a future work.

###### Remark 2.

We can show in the same way as Theorem 2 that the random coding union bound is obtained by replacement of with

 ∫∞−∞e−ρwmin{hηewehη−1,1}dw =(11−ρ+1ρ)(hηehη−1)ρ.

On the other hand, the terms and in the square roots of (5) are the characteristic parts of the analysis of this paper obtained by the optimization of parameter depending on . Thus, the optimization of is necessary to derive a tight coefficient whether we evaluate the error probability itself or the union bound.

###### Remark 3.

The results in this paper assume a fixed coding rate and are weaker in this sense than the result by Scarlett et al.  where they assure an upper bound for varying rate by leaving an integral (or a summation) to a form such that the integrant depends on . It may be possible to extend Theorem 1 for varying rate since the most part of the proof deals with and the error probability of each codeword separately. However, the proof of Theorem 2 heavily depends on fixed and it is also an important problem to derive an easily computable bound for varying rate.

###### Remark 4.

In  it is shown for discrete nonlattice444There is a calculation error for the lattice case in  with a redundant factor . channels with that

 PRC(n) =(1+o(1))η√2πnμ′2e−nEr(R), (7)

where

 μ′2 =∂2logE[eZ(λ)]∂λ2∣∣∣λ=η =2∑y(ω0(y)ω2(y)−ω1(y)2)∑yω20(y) (8)

for

 ωm(y)=∑xPX(x)(logW(y|x))m√W(y|x).

The author misunderstood that in the ISIT version and described that Theorem 2 contradicts (7). The correct calculation show that and

 μ2=σ11 =∑y(ω0(y)ω2(y)−ω1(y)2)∑yω20(y)

for . Therefore no contradiction occurs between this paper and .

## Iv First Asymptotic Expansion

In this section we give a sketch of the proof of Theorem 1. We prove Theorem 1 separately depending on whether satisfies the lattice condition or not. The proofs are different to each other in some places for two reasons. First, we cannot ignore the case that a codeword has the same likelihood as that of the sent codeword under the lattice condition whereas such a case is almost negligible in the nonlattice case. Second, especially in the case of infinite alphabet we have to use the asymptotic expansion with a careful attention to components implicitly assumed to be fixed and the derivation of asymptotic expansion varies in some places between the lattice and nonlattice cases regarding this aspect.

Here we give a proof of Theorem 1 for the case that satisfies the lattice condition with span . The proof for the nonlattice case is easier than the lattice case in most places because ties of likelihoods can be almost ignored as described above. See Appendix -D for the difference of the proof in the nonlattice case.

Now define

 p0(x,y) =PX′[r(x,y,X′)=0] p+(x,y) =PX′[r(x,y,X′)>0]=PX′[r(x,y,X′)≥h]. (9)

The last equation of (9) holds since and the offset of the lattice of equals to that of given . Under the maximum likelihood decoding, the average error probability is expressed as for

 qM(p+,p0) =1−(1−p+)M−1 +M−1∑i=1pi0(1−p+−p0)M−i−1(M−1i)(1−1i+1). (10)

Here the first term corresponds to the probability that the likelihood of some codeword exceeds that of the sent codeword, and each component of the second term corresponds to the probability that codewords have the same likelihood as the sent codeword and the others do not exceed this likelihood.

One of the most basic bound for this quantity is to use a union bound given by

 qM(p+,p0)≤min{1,(M−1)(p++p0)}.

A lower can also be found in, e.g., [16, Chap. 23]. For evaluation of the error probability with a vanishing relative error the following lemma is useful.

###### Lemma 1.

It holds for any that

 ¯¯¯¯¯¯¯¯limM→∞sup(p+,p0)∈(0,1/3]2:p+≤Mcp0qM(p+,p0)1−e−Mp+(1−e−Mp0)Mp0 =lim––––M→∞inf(p+,p0)∈(0,1/3]2:p+≤Mcp0qM(p+,p0)1−e−Mp+(1−e−Mp0)Mp0=1.

We prove this lemma in Appendix -E. We see from this theorem that the error probability can be approximated by

 1−e−Mp+(X,Y)(1−e−Mp0(X,Y))Mp0(X,Y)

for satisfying some regularity condition.

Next we consider the evaluation of and . We use Lemma 2 in the following as a fundamental tool of the proof. Let be (possibly not identically distributed) independent lattice random variables such that the greatest common divisor of their spans555 The greatest common divisor for a set , is defined as if is the maximum number such that for all and defined as if such does not exist. is . Define

Then its large deviation probability is evaluated as follows.

###### Lemma 2.

Fix such that and define as the solution of . Let and be arbitrary. Then there exists such that

 ∣∣ ∣ ∣ ∣∣Pr[∑ni=1Vi=x]he−n(ηx−ΛV(λ∗))√2πΛ′′V(λ∗)−1∣∣ ∣ ∣ ∣∣ ≤ϵ, ∣∣ ∣ ∣ ∣∣Pr[∑ni=1Vi≥x+h]he−n(ηx−ΛV(λ∗))(ehλ∗−1)√2πΛ′′V(λ∗)−1∣∣ ∣ ∣ ∣∣ ≤ϵ,

hold for all satisfying

 ns–m≤n∑i=1dmΛVi(λ)dλm∣∣∣λ=λ∗≤n¯¯¯sm,i=2,3, n∑i=1∣∣ ∣∣∂4ΛVi(λ∗+iξ)∂ξ4∣∣ ∣∣≤n¯¯¯s4,∀|ξ|≤b0 n∑i=1(log|E[e(λ∗+iξ)Vi]|−logE[eλ∗Vi])≤−nγ2, wwwwwwwwwwwwwww∀ξ∈[−π/h,π/h]∖[−b1,b1].

The proof of this lemma is largely the same as that of [17, Thm. 3.7.4] for the i.i.d. case and given in Appendix -B.

Let satisfy . To apply Lemma 2 we consider the following sets to formulate regularity conditions.

 Am ={f1∈C1:∀λ,|fm(λ)−μm|≤δ2}, B ={f2∈C2:∀λ,ξ∉[−b1,b1],f2(λ,ξ)≤−γ2}, C ={f2∈C2:∀λ,ξ∈[−b0,b0],f2(λ,ξ)≤¯¯¯s4},

where and are the spaces of continuous functions and , respectively, and is a constant determined from with Lemma 2.

We define the event as

 S ={|¯Z(1)(η)|≤δ1}∪{¯Z(2)(λ)∈A2}∪{¯Z(3)(λ)∈A3} ∪{¯Za(λ+iξ)−¯Za(λ)∈B} ∪{∣∣∣∂4∂ξ4¯Z(4)(λ+iξ)∣∣∣∈C},

where we regard as function . Under this condition we can bound the excess probability of the likelihood of each codeword given the sent codeword and the received sequence as follows.

###### Lemma 3.

Let be arbitrary and in the definition of be sufficiently small with respect to . Then, there exists such that under the event it holds for all that,

 hen(¯Z(η)−¯Z′(η)2/2(μ2−δ2))√2πn(μ2+δ2)(1−ϵ)≤p0(X,Y) ≤hen(¯Z(η)−¯Z′(η)2/2(μ2+δ2))√2πn(μ2−δ2)(1+ϵ), hen(¯Z(η)−¯Z′(η)2/2(μ2−δ2))(eh(η+γ1)−1)√2πn(μ2+δ2)(1−ϵ)≤p+(X,Y) ≤hen(¯Z(η)−¯Z′(η)2/2(μ2+δ2))(eh(η−γ1)−1)√2πn(μ2−δ2)(1+ϵ).
###### Proof.

Note that and for all from and (3). From the convexity of in , if we set then is minimized at a point in with

 ¯Z(η)−(¯Z′(η))22(μ2−δ2)≤minλ¯Z(λ)≤¯Z(η)−(¯Z′(η))22(μ2+δ2).

Thus the lemma follows from Lemma 2. ∎

Next we define

 g(−)h(X,Y) =(1−ϵ/2)gh(en(¯Z(η)+R−(¯Z′(η))2/2(μ2−δ2))c(−)√n), g(+)h(X,Y) =(1+ϵ/2)gh(en(¯Z(η)+R−(¯Z′(η))2/2(μ2+δ2))c(+)√n), G(s)h =E[g(s)h(X,Y)],s∈{−,+},

where

 c(−)=η(eh(η+γ1)−1)√2π(μ2+δ2)(ehη−1)(1−ϵ/2), c(+)=η(eh(η−γ1)−1)√2π(μ2−δ2)(ehη−1)(1+ϵ/2).

Then the error probability can be evaluated as follows.

###### Lemma 4.

Fix the coding rate and assume that the same condition as Lemma 3 holds. Then, for all sufficiently large ,

 g(−)h(X,Y)≤qM(p+(X,Y),p0(X,Y))≤g(+)h(X,Y).

This lemma is straightforward from Lemmas 1 and 3. We use the following lemma to evaluate the contribution of the case .

###### Lemma 5.

Let . Then

 qM(p+(X,Y),p0(X,Y)) ≤~g(X,Y), (11) g(−)h(X,Y) ≤1+hη/2(c(−))ρ~g(X,Y). (12)

Furthermore, for sufficiently large and sufficiently small and we have

 ¯¯¯¯¯¯¯¯limn→∞1nlogEXY[\rm 1\rm\small l\,[Sc]~g(X,Y)]<−Er(R).

We prove this lemma in Appendix -C. The proof is obtained by Cramér’s theorem for general topological vector spaces [17, Theorem 6.1.3] with the fact that and are separable Banach spaces under the max norm.

###### Proof of Theorem 1.

From Lemma 4, it holds for and sufficiently large that

 PRC =EXY[\rm 1\rm% \small l\,[S]qM(p+(X,Y),p0(X,Y))] +EXY[\rm 1\rm% \small l\,[Sc]qM(p+(X,Y),p0(X,Y))]

Thus we obtain from Lemma 5 that