Bounds on the Reliability Function of Typewriter Channels
New lower and upper bounds on the reliability function of typewriter channels are given. Our lower bounds improve upon the (multiletter) expurgated bound of Gallager, furnishing a new and simple counterexample to a conjecture made in 1967 by Shannon, Gallager and Berlekamp on its tightness. The only other known counterexample is due to Katsman, Tsfasman and Vlăduţ who used algebraic-geometric codes on a -ary symmetric channels, . Here we prove, by introducing dependence between codewords of a random ensemble, that the conjecture is false even for a typewriter channel with inputs. In the process, we also demonstrate that Lovász’s proof of the capacity of the pentagon was implicitly contained (but unnoticed!) in the works of Jelinek and Gallager on the expurgated bound done at least ten years before Lovász. In the opposite direction, new upper bounds on the reliability function are derived for channels with an odd number of inputs by using an adaptation of Delsarte’s linear programming bound. First we derive a bound based on the minimum distance, which combines Lovász’s construction for bounding the graph capacity with the McEliece-Rodemich-Rumsey-Welch construction for bounding the minimum distance of codes in the Hamming space. Then, for the particular case of cross-over probability , we derive an improved bound by also using the method of Kalai and Linial to study the spectrum distribution of codes.
Consider the typewriter channel whose input and output alphabets are , and whose transition probabilities are
where, without loss of generality we assume through the paper that . We also assume , for reasons which will be clear in what follows.
where is the smallest possible probability of error of codes with codewords of length . In particular, since the definition of does not depend on whether one considers maximal or average probability of error over codewords (see ), we will use one quantity or the other according to convenience. In this paper all logarithms are to the base 2 and rates are thus measured in bits per channel use.
Bounding for the considered channels needs first a discussion of their capacity and zero-error capacity. For any , the capacity of the channel has the simple expression , where is the binary entropy function. Furthermore, for , those channels have a positive zero-error capacity , which is defined as the highest rate at which communication is possible with probability of error precisely equal to zero. For even , it is easily proved that , while for odd determining is a much harder problem. For Shannon  gave the lower bound , which Lovász proved to be tight more than twenty years later . For larger odd values of , Shannon observed that standard information theoretic arguments imply , while Lovász  gave a better upper bound of the form , where is the Lovász theta function of a graph and is the cycle of length , for which
Good lower bounds on for odd values of are also difficult to derive. Specific results have been obtained for example in , , , but there does not seem to be a sufficiently general result which singles out as the best for all odd .
The focus of this paper is on the discussion of known bounds on and on the derivation of new lower and upper bounds. Specifically, the paper is structured as follows. In Section II we discuss the classical upper and lower bounds on the reliability function . Evaluation of the expurgated bound is non-trivial and requires deducing some observations which seemingly have not appeared in the literature. In particular it is observed that the zero-error capacity of the pentagon can be determined by a careful study of the expurgated bound, something which could have been done at least ten years before Lovász’s paper settled the question. Then, in Section III we present an improved lower bound for the case of even , showing that it also is a precisely shifted version of the expurgated bound for the BSC. The technique also applies in principle to odd values of and we show in particular the result obtained for . This result also provides an elementeary disproof of the conjecture suggested in  that the expurgated bound might be asymptotically tight when computed on arbitrarily large blocks, a conjecture which had been already disproved in  by means of algebraic geometric codes.
In Section IV we discuss upper bounds. Section IV-A shows an error-exponent bound by extracting a binary subcode. Then in Section IV-B we present a new upper bound for the case of odd based on the minimum distance of codes. We use Delsarte’s linear programming method  combining the construction used by Lovász  for bounding the graph capacity with the construction used by McEliece-Rodemich-Rumsey-Welch  for bounding the minimum distance of codes in Hamming spaces. Finally, in Section IV-C we give an improved upper bound for the case of odd and following ideas of Litsyn , see also Barg-McGregor , which in turn are based on estimates for the spectra of codes originated in Kalai-Linial .
Ii Classical bounds and Shannon-Gallager-Berlekamp conjecture
Ii-a Background on random coding bounds
In  Gallager showed that for an arbitrary DMC there exists a blocklength- code of rate with average probability of error bounded by
For low rates Gallager also proved an improved (expurgated) bound given by:
for any which is a multiple of , where is an arbitrary positive integer and
and where is the -fold memoryless extension of . This results in the following lower bound on the reliability function:
where the equality in (11) follows from super-additivity of and Fekete’s lemma.111Super-additivity follows from taking , where are optimal inputs for lengths and , respectively. For a general channel computing is impossible due to maximization over all -letter distributions, and hence most commonly this bound is used in the weakened form by replacing with .222Note that the exponent in (3) does not change if we propose a similar -letter extension: the optimal distribution may always be chosen to be a product of single-letter ones.
For understanding our results it is important to elaborate on Gallager’s proof of (5). Consider an arbitrary blocklength- code with codewords and maximum-likelihood decoder . Define
to be the probability of detecting codeword when was sent. A standard upper bound on this probability [3, (5.3.4)] is given by
we agree that , and is a semidistance defined as
and extended additively to sequences in
The average probability of error of the code can then be bounded using the union bound as
where is the spectrum of the code
From expression (16) one may get existence results for good codes by (for example), selecting randomly according to some i.i.d. distribution and averaging (16). Gallager  observed that for low rates the dominant term in the summation may correspond to such that . By expurgating from the code all pairs of codewords at distances s.t. he obtained the exponential improvement (5).
Remark 1 (Shannon-Gallager-Berlekamp conjecture)
In  it was conjectured that the hard-to-evaluate quantity equals the true reliability function for rates below the critical one.333Quoting from : “The authors would all tend to conjecture […] As yet there is little concrete evidence for this conjecture.” For symmetric channels it would be implied by the (conjectured) tightness of the Gilbert-Varshamov bound. The conjecture was disproved by Katsman, Tsfasman and Vlăduţ  using algebraic-geometric codes which also beat the Gilbert-Varshamov bound (for alphabets with or more symbols). To the best of our knowledge, no other disproof is known in the literature. The bound we provide in the next Section proves in particular that in some rate range for all typewriter channels for which we could compute exactly (among which ), and hence it offers a second disproof of the conjecture. The main innovation of our approach is that our ensemble of codewords has carefully designed dependence between codewords. Otherwise, we do still rely on (16).
Ii-B Evaluating classical bounds for the typewriter channel
To calculate the random-coding exponent one needs to notice that due to the symmetry of the typewriter channel (1) the optimal input distribution is uniform. In this way we get in parametric form over , cf. [15, (46)-(50)]:
One should notice that coincides with the random coding exponent for the BSC just shifted by on the axis (extending of course the straight line portion at low rates down to ).
A general upper-bound on is the so-called sphere-packing bound , which for the typewriter channel can be computed in a similar parametric form over :
We proceed to evaluating the expurgated bound. As we mentioned above, evaluating is generally non-trivial due to the necessity of optimizing over multi-letter distributions. We need, therefore, to use the special structure of the channel. For the BSC with parameter , for example, Jelinek  proved that does not depend on and takes the form
and is the Gilbert-Varshamov bound for -ary codes defined by the condition
We next present our result on finding for typewriter channels.
Let if is even and if is odd. Define , then
In short, for even the expurgated bound “single-letterizes”, while for the asymptotics is already achieved at . Note that for we do not compute for odd values of , but due to super-additivity of and we may compute the limit along the subsequence of even . The we defined is precisely the Lovász -function for the -cycle (a graph with vertices and edges connected to form a polygon with sides). How can a -function appear in the study of the expurgated bound, when the latter predates the former by a decade? See Remark 4 below.
If , for even and any
Furthermore, for the second equality holds and the first one holds for even . If the expression above holds for rates outside of the interval , where
Inside the interval we have (again with the same specifications on and ) the parametric representation
where runs in the interval , being such that . Note that for even , since , does not depend on (in particular, ); the functions all have the same shape and simply shift on the axis by 1 (bit) as moves from one even value to the next. For all odd , the above expressions provide an upper bound on for all (which, again, is tight for and even ).
The idea is to use Jelinek’s criterion  for single-letter optimality. Consider first the minimization of the quadratic form and note that the matrix with elements , call it , is the -fold Kronecker power of the matrix
Note that if is a positive semidefinite matrix, so is . In that case, the quadratic form defining for any is a convex function of . Jelinek  showed that it is minimized by a product distribution , where is optimal for , and the achieved minimum is just the -th power of the minimum achieved for . Thus, if the matrix with elements is positive semidefinite, then . Furthermore, in this case the convexity of the quadratic form and the fact that is circulant imply that the uniform distribution is optimal. Hence, by direct computation,
whenever is positive semidefinite. The eigenvalues of are , , and the matrix is positive semidefinite whenever , which proves (27).
We now proceed to studying the case . First, note that . When exceeds the matrix has negative eigenvalues and the previous method of evaluation of does not apply. Instead, we observe that the minimum of is non-decreasing in , and hence for
Remark 4 (How Gallager missed discovering Lovász’s -function)
Let us denote , . Also let be the largest rate of a zero-error code of blocklength and be the zero-error capacity of the channel (which is also the Shannon capacity of a confusability graph). By taking to be uniform on the zero-error code, Gallager  observed already in 1965 that and thus . Since Theorem 1 finds and hence , it also shows that the Shannon capacity of the pentagon is . In particular, we point out that this result is obtained by only using tools which were already available in the ’60s, at least ten years before Lovász’s paper  appeared (!). Similarly, Theorem 1 implies the upper bound for -cycles, where is precisely the Lovász theta function. The reader might compare the statement of Theorem 1 with the results in [17, Sec. V.C]; for example, it implies the bound in [17, page 8038, last equation] and it shows that [17, eq. (27)] holds for a typewriter channel with even number of inputs even though the matrix is not positive semidefinite for all .
Slightly generalizing the reasoning of Gallager and Jelinek, leads the following bound on the Shannon capacity of an arbitrary graph :
where supremum is over all probability distributions on and infimum is over all positive-semidefinite matrices with unit diagonal and whenever and . This bound, in turn, is known to be equivalent to Lovász’s bound, see [20, Theorem 3].
Iii New lower (achievability) bound on
In this section we provide new lower bounds on for some typewriter channels. Our new bounds are based on the idea of building codes which are the union of cosets of good zero-error codes. In particular, we improve Gallager’s expurgated bound in all those cases in which we can evaluate exactly, namely when is even or .
Let be even. Then, for we have the bound
where is the expurgated bound of the binary symmetric channel given in (24).
We upper bound the error probability for a code by using a standard union bound on the probability of confusion among single pairs of codewords (16). The code is built using a Gilbert-Varshamov-like procedure, though we exploit carefully the properties of the channel to add some structure to the random code (i.e. we introduce dependence among codewords) and obtain better results than just picking random independent codewords.
A code is composed of cosets of the zero-error code . In particular, let
where is a binary code and where the sum is the ordinary sum of vectors in . It is easy to see that if is linear over then is linear over . This is because is linear and the -ary sum of two codewords in can be decomposed as the sum of a codeword in and a codeword in . In this case then, for the spectrum components in (16) we have the simpler expression
where is the weight of codeword and we assume . We can now relate the spectrum of under the metric with the spectrum of under the usual Hamming metric. We observe that any codeword of of Hamming weight leads to codewords of of weight and codewords of infinite weight. So, we can write
where is the number of codewords in of Hamming weight . Let now be the rate of . It is known from the Gilbert-Varshamov procedure (see for example  and [22, Sec. II.C]) that, as , binary linear codes of rate exist whose spectra satisfy
Such binary codes of rate used in the role of in (42) lead to codes with rate whose error exponent can be bounded to the first order by the leading term in the summation (16). Using (43) and (44) we find
The argument of the maximum is increasing for where it achieves the maximum value
which, using , simplifies to
This is thus the maximum in (45) if or, equivalently, if
Otherwise, the maximum in (45) is achieved at and has value .
So, we have the bound
The expression on the right hand side is simply as defined in equation (24). \qed
A graphical comparison of our bound and the standard expurgated bound is visible in Figure 3 for and . Note that the straight line portion of the bound coincides with a portion of the straight line in the standard expurgated bound as in (32). However, the rate value at which the standard expurgated bound departs from the straight line is strictly smaller than the value ( even) at which our bound does for all . A comparison of these two quantities for different is given in Figure 1. Finally, Figure 2 shows a comparison of the lower bounds on at rates near , for even and varying , which shows that our bound is always strictly better than the standard expurgated bound.
It is a remarkable fact that our bound corresponds exactly to the expurgated bound of a binary symmetric channel with cross-over probability shifted by on the axis. On one hand, it is not very surprising that a bound for binary codes shows up, given the construction we used in (42). On the other hand, it is curious to observe that we obtain specifically the expression of the BSC because the coefficient which relates to in (43) leads to the coefficient inside the logarithm in (45), thus replacing the quantity which has to be used for the typewriter channel with the quantity which appears in the expurgated bound of the BSC.
We think it is reasonable to consider the bound given in Theorem 2 as the correct modification of the expurgated bound for these particular channels. It is interesting to observe that the derived bound does not really use constructions which are totally out of reach with the standard expurgated bound; the zero-error code used in (42) is in fact also “found” by the standard expurgated bound as shown in Section II. However, this zero-error code shows up in the standard expurgated bound only at very low rates, specifically at , while our procedure shows that it is useful even at higher rates. It is rather natural to ask then how the expurgated bound should be modified in general to exploit, at a given rate , zero-error codes which would usually appear in that bound only at lower rates.
In the same way as the bound in Theorem 2 is a -shifted version of the expurgated bound for the BSC, it was already observed after equations (17) and (23) that the random coding bound and the sphere packing bound are also -shifted versions of the ones for the BSC. In particular, we find that at rate our lower bound has value precisely half the value of the sphere packing bound, as happens at for the BSC. However, while closing the gap at for the BSC is essentially trivial, for the typewriter channel it seems to be a harder problem. See Section II and Remark 9 in Section IV-C.
For odd values of , deriving a corresponding lower bound on is difficult in general, since general good zero-error codes are not known or, in any case, have a rather complicated structure. One particular exception is the case , for which an asymptotically optimal zero-error code is known.
For , in the range
we have the lower bound
A comparison of this bound with the expurgated bound is shown in Figure 6. Note that at the upper extreme of the interval considered in Theorem th:GV-lower-5, , the given bound touches the expurgated bound of Theorem 1. A comparison of this quantity with is shown in Figure 4. Figure 5 shows a comparison of the new bound with the expurgated bound as approaches the zero-error capacity .
We start from equation (16), but restated for codes of even length . In particular, consider linear codes with a generator matrix of the form
where is the identity matrix and is a matrix of rank over . Note that this corresponds to taking cosets of the -fold cartesian power of Shannon’s zero-error code of length 2 . Since we focus again on linear codes, the ’s in (16) still take the simple form .
We now proceed to the study of . We can decompose any information sequence in two parts, , with and . The associated codeword can be correspondingly decomposed in two parts with and . Call . We now relate the weight to the Hamming weight and to the form of . Note in particular that we can write
Note first that if for some . So, for the study of we need only consider the cases . Consider first the case when . If then while if then is infinite. So, if one choice of gives no contribution to while all other choices lead to , and hence give no contribution to for any finite . Consider then the case of a component . It is not too difficult to check that one choice of in gives , one gives and or vice-versa, and the remaining one gives . So, if one choice of contributes to , one choice of contributes , while all other choices lead to , and hence give no contribution to for any finite .
So, for a fixed of Hamming weight , and for a fixed , there are vectors which give codewords of weight . If is the number of sequences which lead to a of Hamming weight , then we have
But is now simply the Hamming spectrum component of the linear code with generator matrix , and it is known (see [9, Prop. 1]) that as we let and grow to infinity with ratio , matrices exist for which
Defining , and , the probability of error is bounded to the first order in the exponent by the largest term in the sum (53) as
The maximum over is obtained by maximizing , which is solved by with maximum value , independently of . So, we are left with the maximization
The argument is increasing for , where it achieves the maximum value , and decreasing for larger values of . So, the maximizing is if and otherwise. Combining these facts, noticing that , we find
Considering that the block length is and the rate of the global code is with , after some simple algebraic manipulations we obtain
The first part of the bound coincides with the standard straight line portion of the expurgated bound, while the second part is the claimed new bound. \qed
Iv New upper (converse) bounds on
We present three different new upper bounds below, each of which is the tightest known bound for certain values of .
Iv-a Bound via a reduction to binary codes
We have already evaluated the sphere-packing bound above (23). For channels with , the sphere packing bound is known to be weak at low rates. In particular, it was proved by Berlekamp   that the expurgated bound is tight at for all channels with . For general channels with positive no similar result is known. In the case of typewriter channels, standard methods can be adapted to give the following result.
For and any we have