[

# [

S
\field

A On Two Strong Converse Theorems for Discrete Memoryless Channels] On Two Strong Converse Theorems for Discrete Memoryless Channels \titlenote This work was presented in part at the 28rd Symposium on Information Theory and Its Applications, Onna, Okinawa, Japan, Nov. 20-23, 2005. \authorlist\authorentryYasutada OOHAMAlabelA \breakauthorline3 \affiliate[labelA] The author is with The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-Shi, Tokyo 182-8585, Japan

{summary}

In 1973, Arimoto proved the strong converse theorem for the discrete memoryless channels stating that when transmission rate is above channel capacity , the error probability of decoding goes to one as the block length of code word tends to infinity. He proved the theorem by deriving the exponent function of error probability of correct decoding that is positive if and only if . Subsequently, in 1979, Dueck and Körner determined the optimal exponent of correct decoding. Arimoto’s bound has been said to be equal to the bound of Dueck and Körner. However its rigorous proof has not been presented so far. In this paper we give a rigorous proof of the equivalence of Arimoto’s bound to that of Dueck and Körner. trong converse theorem, discrete memoryless channels, exponent of correct decoding

## 1 Introduction

In some class of noisy channels the error probability of decoding goes to one as the block length of transmitted codes tends to infinity at rates above the channel capacity. This is well known as a strong converse theorem for noisy channels. In 1957, Wolfowitz [1] proved the strong converse theorem for discrete of memoryless channels(DMCs). His result is the first result on the strong converse theorem.

In 1973, Arimoto [2] obtained some stronger result on the strong converse theorem for DMCs. He proved that the error probability of decoding goes to one exponentially and derived a lower bound of the exponent function. To prove the above strong converse theorem he introduced an interesting bounding technique based on a symmetrical structure of the set of transmission codes. Using this bounding method and an analytical argument on convex functions developed by Gallager [3], he derived the lower bound.

Subsequently, Dueck and Körner [4] determined the optimal exponent function for the error probability of decoding to go to one. They derived the result by using a combinatorial method base on the type of sequences. Their method is quite different from the method of Arimoto [2]. In their paper, Dueck and Körner [4] stated that their optimal bound can be proved to be equal to the lower bound of Arimoto [2] by analytical computation. However, after their statement we have found no rigorous proof of the above equality so far in the literature.

In this paper we give a rigorous proof of the equality of the lower bound of Arimoto [2] to that of the optimal bound of Dueck and Körner [4]. To prove the above equality, we need to prove the convex property of the optimal exponent function. We prove this by an operational meaning of the optimal exponent function. Contrary to their statement, our arguments of the proof are not completely analytical. A dual equivalence of two exponent functions was established by Csiszár and Körner [5] on the exponent functions for the error probability of decoding to go to zero at rates below capacity. Their arguments of the proof of equivalence are completely analytical. We compare our arguments to their ones to clarify an essential difference between them.

## 2 Coding Theorems for Discrete Memoryless Channels

We consider the discrete memoryless channel with the input set and the output set . We assume that and are finite sets. Let be a random variable taking values in . Suppose that has a probability distribution on denoted by . Let be a random variable obtained as the channel output by connecting to the input of channel. We write a conditional distribution of on given as . A noisy channel is defined by a sequence of stochastic matrices . In particular, a stationary discrete memoryless channel is defined by a stochastic matrix with input set and output set . We write this stochastic matrix as .

Information transmission using the above noisy channel is formulated as follows. Let be a message set to be transmitted through the channel. Set . For given , a -code is a set of , that satisfies the following:

 1)\boldmathx(m)∈Xn,2)D(m),m∈Mn are disjoint subsets of Yn,3)εn=1Mn∑m∈MnWn((D(m))c|\boldmathx(m)),

where are decoding regions of the code and is the error probability of decoding.

A transmission rate is achievable if there exists a sequence of -codes, such that

 limsupn→∞εn=0,liminfn→∞1nlogMn≥R. (1)

Let the supremum of achievable transmission rate be denoted by , which we call the channel capacity. It is well known that is given by the following formula:

 C(W)=maxP∈P(X)I(P,W), (2)

where is a set of probability distribution on and stands for a mutual information between and when input distribution of is .

To examine an asymptotic behavior of for large at , we define the following quantities. For give , the quantity is achievable error exponent if there exits a sequence of -codes, such that

 liminfn→∞1nlogMn≥R,liminfn→∞(−1n)logεn≥E.

The supremum of the achievable error exponent is denoted by . Several lower and upper bounds of have been derived so far. An explicit form of is known for large below . An explicit formula of for all below has been unknown yet.

## 3 Strong Converse Theorems for Discrete Memoryless Channels

Wolfowitz [1] first established the strong converse theorem for DMCs by proving that when , we have . When strong converse theorem holds, we are interested in a rate of convergence for the error probability of decoding to tend to one as for. To examine the above rate of convergence, we define the following quantity. For give , the quantity is achievable exponent if there exits a sequence of -codes, such that

 liminfn→∞1nlogMn≥R,limsupn→∞(−1n)log(1−εn)≤G.

The infmum of the achievable exponent is denoted by . This quantity has the following property.

###### Property 1

The function is a monotone increasing and convex function of .

Proof: By definition it is obvious that is a monotone increasing function of . To prove the convexity fix two positive rates arbitrary. For each , we consider the infimum of the achievable exponent function . By the definitions of , for each , there exists a sequence of -codes, such that

 liminfn→∞1nlogM(i)n ≥ Ri, limsupn→∞(−1n)log(1−ε(i)n) ≤ G∗(Ri|W).

Fix any with and set , where stands for the integer part of . Set . It is obvious that .

Next, we consider the code obtained by concatenating -codes for . If or 2, we further append -code. For the above constructed -code we have

 Mn=∏i=1,2M(i)ni,1−εn=∏i=1,2(1−ε(i)ni).

Then, we have

 liminfn→∞1nlogMn = ∑i=1,2liminfn→∞nin⋅1nilogM(i)ni≥∑i=1,2λiRi, limsupn→∞(−1n)log(1−εn) = ∑i=1,2limsupn→∞nin⋅(−1ni)log(1−ε(i)ni) ≤ ∑i=1,2λiG∗(Ri|W).

Hence, we have

 ∑i=1,2λiG∗(Ri|W)≥G∗(∑i=1,2λiRi∣∣∣W),

which implies the convexity of . \QED

Arimoto [2] derived a lower bond of . To state his result we define some functions. For , define

 Jδ(P|W)\lx@stackrel△=−log∑y∈Y[∑x∈XP(x)W(y|x)11+δ]1+δ, Fδ(R,P|W)\lx@stackrel△=δR+Jδ(P|W), Gδ(R|W)\lx@stackrel△=minP∈P(X)Fδ(R,P|W).

Furthermore, set

 G(R|W) \lx@stackrel△= max−1≤δ≤0Gδ(R|W) = max−1≤δ≤0minP∈P(X)Fδ(R,P|W) = max−1≤δ≤0[−δR+minP∈P(X)Jδ(P|W)].

According to Arimoto [2], the following property holds.

###### Property 2

The function is a monotone increasing and convex function of and is positive if and only if .

Arimoto [2] proved the following theorem.

###### Theorem 1

For any

Arimoto [2] derived the lower bound of by an analytical method. Subsequently, Dueck and Körner [4] determined by a combinatorial method quite different from that of Arimoto. To state their result for and , we define the following function

 ~F+δ(R,P|W) \lx@stackrel△= minV∈P(Y|X){[δ(−R+I(P;V))]+ +D(V||W|P)},

where is a set of all noisy channels with input and output and . Furthermore, for , define

 ~G+−1(R|W) \lx@stackrel△= minP∈P(X)~F+−1(R,P|W),

and for , define

 ~G{\scriptsize sp}(R|W) \lx@stackrel△= minP∈P(X)minV∈P(Y|X):I(P;V)≥RD(V||W|P).

The suffix “sp” of the function derives from that it has a form of the sphere packing exponent function. Those functions satisfy the following.

###### Property 3

• The function is monotone increasing for and takes positive value if and only if .

• For , we have

 ~G+−1(R|W)=~G{\scriptsize sp}(R|W).

Furthermore, for , we have

 ~G+−1(R|W)=~G−1(R|W).
• For

 |~G+−1(R|W)−~G+−1(R′|W)|≤|R−R′|.

Proof: Property 3 part a) is obvious. Proof of part c) is found in Dueck and Körner [4]. In this paper we prove the part b). To prove the first inequality, for fixed , we set

 ~G{\scriptsize sp}(R,P|W)\lx@stackrel△=minV∈P(Y|X):I(P;V)≥RD(V||W|P) ^F−1(R,P|W) \lx@stackrel△= minV∈P(Y|X):I(P;V)≤R{R−I(P;V)+D(V||W|P)}.

It is obvious that

 ~F+−1(R,P|W) (3) = min{~G{\scriptsize sp}(R,P|W),^F−1(R,P|W)} ~G{\scriptsize sp}(R|W)=minP∈P(X)~G{\scriptsize sp}(R,P|W). (4)

Since is a linear function of , the minimum is attained by some satisfying . Then, by (3), we have

 ~F+−1(R,P|W)=~G{\scriptsize sp}(R,P|W).

From the above equality and (4), we obtain the first equality. The second equality is obvious since when . \QED

Dueck and Körner [4] proved the following.

###### Theorem 2

For any ,

 ~G+−1(R|W)=G∗(R|W).

Although the lower bound derived by Arimoto [2] is a form quite different from the optimal exponent determined by Dueck and Körner [4], the former coincides with the latter, i.e., the following theorem holds.

###### Theorem 3

For any ,

 ~G+−1(R|W)=G(R|W),

or equivalent to

 max−1≤δ≤0minP∈P(X){−δR −log∑y∈Y[∑x∈XP(x)W(y|x)11+δ]1+δ⎫⎬⎭ = minP∈P(X)minV∈P(Y|X){[R−I(P;V)]++D(V||W|P)}.

The result of Theorem 3 is stated in Csiszár and Körner [5] without proof. Dueck and Körner [4] stated that the equivalence between their bound and that of Arimoto [2] can be proved by an analytical computation. In the next section we give a rigorous proof of the above theorem. Contrary to their statement, our proof is not completely analytical.

## 4 Proof of Theorem 3

In this section we prove Theorem 3. The following is a key lemma for the proof.

###### Lemma 1

The function is a monotone increasing and convex function of .

Proof: The results follows from the convexity of and Theorem 2. \QED

###### Remark 1

We first tried to prove Lemma 1 by an analytical computation but could not succeed proving this lemma via this approach. According to [6], for each fixed , is a convex function of . However, this does not imply the contexity of with respect to .

Next, for , we set

 ~Fδ(R,P|W) \lx@stackrel△= minV∈P(Y|X){δ[I(P;V)−R] +D(V||W|P)}, ~Gδ(R|W) \lx@stackrel△= minP∈P(X)~Fδ(R,P|W).

Then, we have the following two lemmas.

###### Lemma 2

For any ,

 ~G+−1(R|W)=max−1≤δ≤0~Gδ(R|W).
###### Lemma 3

For any , and any , we have

 ~Fδ(R,P|W)≥Fδ(R,P|W).

Furthermore, for any and ,

 ~Gδ(R|W)=Gδ(R|W).

It is obvious that Theorem 3 immediately follows from Lemmas 2 and 3. Those two lemmas can be proved by analytical computations. In the following we prove Lemma 2. The proof of Lemma 3 is omitted here. For the detail see Oohama [7].

Proof of Lemma 2: From its formula, it is obvious that

 ~G+−1(R|W)≥max−1≤δ≤0~Gδ(R|W).

In particular, from Property 3 part b), the equality holds for . Then, again by Property 3 part b), it suffices to prove that for there exists such that

 ~G{\scriptsize sp}(R|W)=~Gδ(R|W).

For , we set

 Kδ(W) \lx@stackrel△= maxP∈P(X)maxV∈P(Y|X){−δI(P;V)−D(V||W|P)}.

Then, by the definition of , we have the following.

 ~Gδ(R|W)=−δR−Kδ(W).

Next, observe that by Property 3 part b) and Lemma 1, is a monotone increasing and convex function of . By this property and Property 3 part c), for any , there exists such that for any , we have

 ~G{\scriptsize sp}(R′|W)≥~G{% \scriptsize sp}(R|W)−δ(R′−R).

Let be a joint distribution that attains . For any set . Then, we have the following chain of inequalities:

 δI(P′;V′)−D(V′||W|P′) ≤ −δR′−~G{\scriptsize sp}(R′|W)≤−δR−~G{\scriptsize sp}(R|W) ≤ −δI(P;V)−D(V||W|P).

The above inequality implies that

 Kδ(W) = −δI(P;V)−D(V||W|P) = −δR−~G{\scriptsize sp}(R|W).

This completes the proof. \QED

## 5 Comparison with the Proof of the Dual Result

Theorem 3 has some duality with a result stated in Csiszár and Körner [5]. To describe their result we define

 Eδ(R|W) \lx@stackrel△= maxP∈P(X)Fδ(R,P|W), E(R|W) \lx@stackrel△= maxδ≥0Eδ(R|W) = maxδ≥0maxP∈P(X)Fδ(R,P|W) = maxδ≥0[−δR+maxP∈P(X)Jδ(P|W)].

An explicit lower bound of is first derived by Gallager [8]. He showed that the function serves as an lower bound of . Next, we set

 C0(W) \lx@stackrel△= maxP∈P(X)minV∈P(Y|X)I(P;V)

According to Shannon, Gallager and Berlekamp [9], has the following formula:

 C0(W) = −minP∈P(X)maxy∈Ylog∑x∈X:W(y|x)>0P(x).

For , define

 ~E{\scriptsize sp}(R|W) \lx@stackrel△= maxP∈P(X)minV∈P(Y|X):I(P;V)≤RD(V||W|P).

According to Csiszár and Körner [5], serves as an upper bound of and matches it for large below . Csiszár and Körner [5] obtained the following result.

###### Theorem 4 (Csiszár and Körner [5])

For any ,

 E(R|W)=~E{\scriptsize sp}(R|W),

or equivalent to

 Unknown environment '% −log∑y∈Y[∑x∈XP(x)W(y|x)11+δ]1+δ⎫⎬⎭ = maxP∈P(X)minV∈P(Y|X):I(P;V)≤RD(V||W|P).

In the following we outline the arguments of the proof of the above theorem and compare them with those of the proof of Theorem 3.

By an analytical computation we have the following lemma.

###### Lemma 4

The function is a monotone decreasing and convex function of and is positive if and only if .

Next, for , we define

 ~Eδ(R|W) = minP∈P(X)~Fδ(R,P|W).

Then, we have the following two lemmas

###### Lemma 5

For any ,

 ~E{\scriptsize sp}(R|W)=maxδ≥0~Eδ(R|W).
###### Lemma 6

For any , and any , we have

 ~Fδ(R,P|W)≥Fδ(R,P|W).

Furthermore, for any and ,

 ~Eδ(R|W)=Eδ(R|W).

It is obvious that Theorem 4 immediately follows from Lemmas 5 and 6. We prove Lemmas 5 and 6 in manners quite similar to those of the proofs of Lemmas 2 and 3, respectively. We omit the details of the proofs.

We compare the arguments of the proof of Theorem 3 with those of the proof of Theorem 4. An essential difference between them is in the proof of the convexity of exponent functions. We can prove the convexity of with an analytical method. On the other hand, the convexity follows from and the convexity of . The proof of the convexity of is based on an operational meaning of the optimal exponent function of . We first tried an analytical proof of the convexity but could not have succeeded in it. The difference of arguments is summarized in TABLE 1.

### References

1. J. Wolfowitz, “The coding of messages subject to chance errors,” Illinoise J. Math., vol. 1, pp. 591-606, 1957.
2. S. Arimoto, “On the converse to the coding theorem for discrete memoryless channels,” IEEE Trans. Inform. Theory, vol. IT-19, pp. 357-359, May 1973.
3. R. G. Gallager, Information Theory and Reliable Communication. New York: Willey, 1968.
4. G. Dueck and J. Körner, “Reliability function of a discrete memoryless channel at rates above capacity,” IEEE Trans. Inform. Theory, vol. IT-25, pp. 82-85, Jan. 1979.
5. I. Csiszár and J. Körner, Information Theory : Coding Theorems for Discrete Memoryless Systems. Academic Press, New York, 1981.
6. I. Csiszár, “Generalized cutoff rates and Rényi’s Information Measures,” IEEE Trans. Inform. Theory, vol. 41, pp. 26-34, Jan. 1995.
7. Y. Oohama, “Converse coding theorems for identification via channels,” submitted for publication in IEEE Trans. Inform. Theory.
8. R. G. Gallager, “A simple derivation of the coding theorem and some applications,” IEEE Trans. Inform. Theory, vol. IT-11, pp. 3-18, Jan. 1965.
9. C. E. Shannon, R. Gallager and E. R.Berlekamp, “Lower bounds to error probability for coding in discrete memoryless channels I-II,” Information and Control vol. 10, pp 65-103, 522-552, 1967.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minumum 40 characters