[
A On Two Strong Converse Theorems for Discrete Memoryless Channels] On Two Strong Converse Theorems for Discrete Memoryless Channels \titlenote This work was presented in part at the 28rd Symposium on Information Theory and Its Applications, Onna, Okinawa, Japan, Nov. 2023, 2005. \authorlist\authorentryYasutada OOHAMAlabelA \breakauthorline3 \affiliate[labelA] The author is with The University of ElectroCommunications, 151 Chofugaoka, ChofuShi, Tokyo 1828585, Japan
In 1973, Arimoto proved the strong converse theorem for the discrete memoryless channels stating that when transmission rate is above channel capacity , the error probability of decoding goes to one as the block length of code word tends to infinity. He proved the theorem by deriving the exponent function of error probability of correct decoding that is positive if and only if . Subsequently, in 1979, Dueck and Körner determined the optimal exponent of correct decoding. Arimoto’s bound has been said to be equal to the bound of Dueck and Körner. However its rigorous proof has not been presented so far. In this paper we give a rigorous proof of the equivalence of Arimoto’s bound to that of Dueck and Körner. trong converse theorem, discrete memoryless channels, exponent of correct decoding
1 Introduction
In some class of noisy channels the error probability of decoding goes to one as the block length of transmitted codes tends to infinity at rates above the channel capacity. This is well known as a strong converse theorem for noisy channels. In 1957, Wolfowitz [1] proved the strong converse theorem for discrete of memoryless channels(DMCs). His result is the first result on the strong converse theorem.
In 1973, Arimoto [2] obtained some stronger result on the strong converse theorem for DMCs. He proved that the error probability of decoding goes to one exponentially and derived a lower bound of the exponent function. To prove the above strong converse theorem he introduced an interesting bounding technique based on a symmetrical structure of the set of transmission codes. Using this bounding method and an analytical argument on convex functions developed by Gallager [3], he derived the lower bound.
Subsequently, Dueck and Körner [4] determined the optimal exponent function for the error probability of decoding to go to one. They derived the result by using a combinatorial method base on the type of sequences. Their method is quite different from the method of Arimoto [2]. In their paper, Dueck and Körner [4] stated that their optimal bound can be proved to be equal to the lower bound of Arimoto [2] by analytical computation. However, after their statement we have found no rigorous proof of the above equality so far in the literature.
In this paper we give a rigorous proof of the equality of the lower bound of Arimoto [2] to that of the optimal bound of Dueck and Körner [4]. To prove the above equality, we need to prove the convex property of the optimal exponent function. We prove this by an operational meaning of the optimal exponent function. Contrary to their statement, our arguments of the proof are not completely analytical. A dual equivalence of two exponent functions was established by Csiszár and Körner [5] on the exponent functions for the error probability of decoding to go to zero at rates below capacity. Their arguments of the proof of equivalence are completely analytical. We compare our arguments to their ones to clarify an essential difference between them.
2 Coding Theorems for Discrete Memoryless Channels
We consider the discrete memoryless channel with the input set and the output set . We assume that and are finite sets. Let be a random variable taking values in . Suppose that has a probability distribution on denoted by . Let be a random variable obtained as the channel output by connecting to the input of channel. We write a conditional distribution of on given as . A noisy channel is defined by a sequence of stochastic matrices . In particular, a stationary discrete memoryless channel is defined by a stochastic matrix with input set and output set . We write this stochastic matrix as .
Information transmission using the above noisy channel is formulated as follows. Let be a message set to be transmitted through the channel. Set . For given , a code is a set of , that satisfies the following:
where are decoding regions of the code and is the error probability of decoding.
A transmission rate is achievable if there exists a sequence of codes, such that
(1) 
Let the supremum of achievable transmission rate be denoted by , which we call the channel capacity. It is well known that is given by the following formula:
(2) 
where is a set of probability distribution on and stands for a mutual information between and when input distribution of is .
To examine an asymptotic behavior of for large at , we define the following quantities. For give , the quantity is achievable error exponent if there exits a sequence of codes, such that
The supremum of the achievable error exponent is denoted by . Several lower and upper bounds of have been derived so far. An explicit form of is known for large below . An explicit formula of for all below has been unknown yet.
3 Strong Converse Theorems for Discrete Memoryless Channels
Wolfowitz [1] first established the strong converse theorem for DMCs by proving that when , we have . When strong converse theorem holds, we are interested in a rate of convergence for the error probability of decoding to tend to one as for. To examine the above rate of convergence, we define the following quantity. For give , the quantity is achievable exponent if there exits a sequence of codes, such that
The infmum of the achievable exponent is denoted by . This quantity has the following property.
Property 1
The function is a monotone increasing and convex function of .
Proof: By definition it is obvious that is a monotone increasing function of . To prove the convexity fix two positive rates arbitrary. For each , we consider the infimum of the achievable exponent function . By the definitions of , for each , there exists a sequence of codes, such that
Fix any with and set , where stands for the integer part of . Set . It is obvious that .
Next, we consider the code obtained by concatenating codes for . If or 2, we further append code. For the above constructed code we have
Then, we have
Hence, we have
which implies the convexity of . \QED
Arimoto [2] derived a lower bond of . To state his result we define some functions. For , define
Furthermore, set
According to Arimoto [2], the following property holds.
Property 2
The function is a monotone increasing and convex function of and is positive if and only if .
Arimoto [2] proved the following theorem.
Theorem 1
For any
Arimoto [2] derived the lower bound of by an analytical method. Subsequently, Dueck and Körner [4] determined by a combinatorial method quite different from that of Arimoto. To state their result for and , we define the following function
where is a set of all noisy channels with input and output and . Furthermore, for , define
and for , define
The suffix “sp” of the function derives from that it has a form of the sphere packing exponent function. Those functions satisfy the following.
Property 3

The function is monotone increasing for and takes positive value if and only if .

For , we have
Furthermore, for , we have

For
Proof: Property 3 part a) is obvious. Proof of part c) is found in Dueck and Körner [4]. In this paper we prove the part b). To prove the first inequality, for fixed , we set
It is obvious that
(3)  
(4) 
Since is a linear function of , the minimum is attained by some satisfying . Then, by (3), we have
From the above equality and (4), we obtain the first equality. The second equality is obvious since when . \QED
Dueck and Körner [4] proved the following.
Theorem 2
For any ,
Although the lower bound derived by Arimoto [2] is a form quite different from the optimal exponent determined by Dueck and Körner [4], the former coincides with the latter, i.e., the following theorem holds.
Theorem 3
For any ,
or equivalent to
The result of Theorem 3 is stated in Csiszár and Körner [5] without proof. Dueck and Körner [4] stated that the equivalence between their bound and that of Arimoto [2] can be proved by an analytical computation. In the next section we give a rigorous proof of the above theorem. Contrary to their statement, our proof is not completely analytical.
4 Proof of Theorem 3
In this section we prove Theorem 3. The following is a key lemma for the proof.
Lemma 1
The function is a monotone increasing and convex function of .
Proof: The results follows from the convexity of and Theorem 2. \QED
Remark 1
Next, for , we set
Then, we have the following two lemmas.
Lemma 2
For any ,
Lemma 3
For any , and any , we have
Furthermore, for any and ,
It is obvious that Theorem 3 immediately follows from Lemmas 2 and 3. Those two lemmas can be proved by analytical computations. In the following we prove Lemma 2. The proof of Lemma 3 is omitted here. For the detail see Oohama [7].
Proof of Lemma 2: From its formula, it is obvious that
In particular, from Property 3 part b), the equality holds for . Then, again by Property 3 part b), it suffices to prove that for there exists such that
For , we set
Then, by the definition of , we have the following.
Next, observe that by Property 3 part b) and Lemma 1, is a monotone increasing and convex function of . By this property and Property 3 part c), for any , there exists such that for any , we have
Let be a joint distribution that attains . For any set . Then, we have the following chain of inequalities:
The above inequality implies that
This completes the proof. \QED
5 Comparison with the Proof of the Dual Result
Theorem 3 has some duality with a result stated in Csiszár and Körner [5]. To describe their result we define
An explicit lower bound of is first derived by Gallager [8]. He showed that the function serves as an lower bound of . Next, we set
According to Shannon, Gallager and Berlekamp [9], has the following formula:
For , define
According to Csiszár and Körner [5], serves as an upper bound of and matches it for large below . Csiszár and Körner [5] obtained the following result.
Theorem 4 (Csiszár and Körner [5])
For any ,
or equivalent to
In the following we outline the arguments of the proof of the above theorem and compare them with those of the proof of Theorem 3.
By an analytical computation we have the following lemma.
Lemma 4
The function is a monotone decreasing and convex function of and is positive if and only if .
Next, for , we define
Then, we have the following two lemmas
Lemma 5
For any ,
Lemma 6
For any , and any , we have
Furthermore, for any and ,
It is obvious that Theorem 4 immediately follows from Lemmas 5 and 6. We prove Lemmas 5 and 6 in manners quite similar to those of the proofs of Lemmas 2 and 3, respectively. We omit the details of the proofs.
We compare the arguments of the proof of Theorem 3 with those of the proof of Theorem 4. An essential difference between them is in the proof of the convexity of exponent functions. We can prove the convexity of with an analytical method. On the other hand, the convexity follows from and the convexity of . The proof of the convexity of is based on an operational meaning of the optimal exponent function of . We first tried an analytical proof of the convexity but could not have succeeded in it. The difference of arguments is summarized in TABLE 1.




Operational Meaning  
Convexity of ?  


Theorem 2 and Property 1  Analytical Computation  



Lemma 2  Lemma 5  
Lemmas 2 and 3  Lemmas 5 and 6  


References
 J. Wolfowitz, “The coding of messages subject to chance errors,” Illinoise J. Math., vol. 1, pp. 591606, 1957.
 S. Arimoto, “On the converse to the coding theorem for discrete memoryless channels,” IEEE Trans. Inform. Theory, vol. IT19, pp. 357359, May 1973.
 R. G. Gallager, Information Theory and Reliable Communication. New York: Willey, 1968.
 G. Dueck and J. Körner, “Reliability function of a discrete memoryless channel at rates above capacity,” IEEE Trans. Inform. Theory, vol. IT25, pp. 8285, Jan. 1979.
 I. Csiszár and J. Körner, Information Theory : Coding Theorems for Discrete Memoryless Systems. Academic Press, New York, 1981.
 I. Csiszár, “Generalized cutoff rates and Rényi’s Information Measures,” IEEE Trans. Inform. Theory, vol. 41, pp. 2634, Jan. 1995.
 Y. Oohama, “Converse coding theorems for identification via channels,” submitted for publication in IEEE Trans. Inform. Theory.
 R. G. Gallager, “A simple derivation of the coding theorem and some applications,” IEEE Trans. Inform. Theory, vol. IT11, pp. 318, Jan. 1965.
 C. E. Shannon, R. Gallager and E. R.Berlekamp, “Lower bounds to error probability for coding in discrete memoryless channels III,” Information and Control vol. 10, pp 65103, 522552, 1967.