Attaining Capacity with Algebraic Geometry Codes through the Construction and Koetter-Vardy Soft Decoding
In this paper we show how to attain the capacity of discrete symmetric channels with polynomial time decoding complexity by considering iterated constructions with Reed-Solomon code or algebraic geometry code components. These codes are decoded with a recursive computation of the a posteriori probabilities of the code symbols together with the Koetter-Vardy soft decoder used for decoding the code components in polynomial time. We show that when the number of levels of the iterated construction tends to infinity, we attain the capacity of any discrete symmetric channel in this way. This result follows from the polarization theorem together with a simple lemma explaining how the Koetter-Vardy decoder behaves for Reed-Solomon codes of rate close to . However, even if this way of attaining the capacity of a symmetric channel is essentially the Arıkan polarization theorem, there are some differences with standard polar codes. Indeed, with this strategy we can operate succesfully close to channel capacity even with a small number of levels of the iterated construction and the probability of error decays quasi-exponentially with the codelength in such a case (i.e. exponentially if we forget about the logarithmic terms in the exponent). We can even improve on this result by considering the algebraic geometry codes constructed in [TVZ82]. In such a case, the probability of error decays exponentially in the codelength for any rate below the capacity of the channel. Moreover, when comparing this strategy to Reed-Solomon codes (or more generally algebraic geometry codes) decoded with the Koetter-Vardy decoding algorithm, it does not only improve the noise level that the code can tolerate, it also results in a significant complexity gain.
ncases itemsep=0pt,align=left,leftmargin=itemindent=!,label=Case \SetEnumitemKeycases itemsep=0pt,align=left,leftmargin=itemindent=!,before=
Improving upon the error correction performance of Reed-Solomon codes.
Reed-Solomon codes are among the most extensively used error correcting codes. It has long been known how to decode them up to half the minimum distance. This gives a decoding algorithm that is able to correct a fraction of errors in a Reed-Solomon code of rate . However, it is only in the late nineties that a breakthrough was obtained in this setting with Sudan’s algorithm [Sud97] and its improvement in [GS99] who showed how to go beyond this barrier with an algorithm which in its [GS99] version decodes any fraction of errors smaller than . This exceeds the minimum distance bound in the whole region of rates . Later on, it was shown that this decoding algorithm could also be modified a little bit in order to cope with soft information on the errors [KV03a]. A few years later, it was also realized by Parvaresh and Vardy in [PV05] that by a slight modification of Reed-Solomon codes and by an increase of the alphabet size it was possible to beat the decoding radius. Their new family of codes is list decodable beyond this radius for low rate. Then, Guruswami and Rudra [GR06] improved on these codes by presenting a new family of codes, namely folded Reed-Solomon codes with a polynomial time decoding algorithm achieving the list decoding capacity for every rate and .
The initial motivation of this paper is to present another modification of Reed-Solomon codes that improves the fraction of errors that can be corrected. It consists in using them in a construction. In other words, we choose in this construction and to be Reed-Solomon codes. We will show that, in the low rate regime, this class of codes outperforms a little bit a Reed-Solomon code decoded with the Guruswami and Sudan decoder. The point is that this code can be decoded in two steps :
First by subtracting the left part to the right part of the received vector and decoding it with respect to . In such a case, we are left with decoding a Reed-Solomon code with about twice as many errors.
Secondly, once we have recovered the right part of the codeword, we can get a word which should match two copies of a same word of . We can model this decoding problem by having some soft information on the received word when we have sent .
It turns that this channel error model is much less noisy than the original -ary symmetric channel we started with. This soft information can be used in Koetter and Vardy’s decoding algorithm. By this means we can choose to be a Reed-Solomon code of much bigger rate than . All in all, it turns out that by choosing and with appropriate rates we can beat the bound of Reed-Solomon codes in the low-rate regime.
It should be noted however that beating this bound comes at the cost of having now an algorithm which does not work as for the aforementioned papers [Sud97, GS99, PV05, GR06] for every error of a given weight (the so called adversarial error model) but with probability for errors of a given weight. However contrarily to [PV05, GR06] which results in a significant increase of the alphabet size of the code, our alphabet size actually decreases when compared to a Reed-Solomon code: it can be half of the code length and can be even smaller when we apply this construction recursively. Indeed, we will show that we can even improve the error correction performances by applying this construction again to the and components, i.e we can choose to be a code and we replace in the same way the Reed-Solomon code by a code where and are Reed-Solomon codes (we will say that these ’s and ’s codes are the consituent codes of the iterated -construction). This improves slightly the decoding performances again in the low rate regime.
Attaining the capacity by letting the depth of the construction go to infinity with an exponential decay of the probability of error after decoding.
The first question raised by these results is to understand what happens when we apply this iterative construction a number of times which goes to infinity with the codelength. In this case, the channels faced by the constituent Reed-Solomon codes polarize: they become either very noisy channels or very clean channels of capacity close to . This is precisely the polarization phenomenon discovered by Arıkan in [Arı09]. Indeed this iterated -construction is nothing but a standard polar code when the constituent codes are Reed-Solomon codes of length (i.e. just a single symbol). The polarization phenomenon together with a result proving that the Koetter-Vardy decoder is able to operate sucessfully at rates close to for channels of capacity close to can be used to show that it is possible to choose the rates of the constituent Reed-Solomon codes in such a way that the code construction together with the Koetter-Vardy decoder is able to attain the capacity of symmetric channels. On a theoretical level, proceeding in this way would not change however the asymptotics of the decay of the probability of error after decoding: the codes obtained in this way would still behave as polar codes and would in particular have a probability of error which decays exponentially with respect to (essentially) the square root of the codelength.
The situation changes completely however when we allow ourself to change the input alphabet of the channel and/or to use Algebraic Geometry (AG) codes. The first point can be achieved by grouping together the symbols and view them as a symbol of a larger alphabet. The second point is also relevant here since the Koetter and Vardy decoder also applies to AG codes (see [KV03b]) with only a rather mild penalty in the error-correction capacity related to the genus of the curve used for constructing the code. Both approaches can be used to overcome the limitation of having constituent codes in the iterated -construction whose length is upper-bounded by the alphabet size. When we are allowed to choose long enough constituent codes the asymptotic behavior changes radically. We will indeed show that if we insist on using Reed-Solomon codes in the code construction we obtain a quasi-exponential decay of the probability of error in terms of the codelength (i.e. exponential if we forget about the logarithmc terms in the exponent) and an exponential decay if we use the right AG codes. This improves very significantly upon polar codes. Not only are we able to attain the channel capacity with a polynomial time decoding algorithm with this approach but we are also able to do so with an exponential decay of the probability of error after decoding. In essence, this sharp decay of the probability of error after decoding is due to a result of this paper (see Theorems 7 and 11) showing that even if the Koetter-Vardy decoder is not able to attain the capacity with a probability of error going to zero as the codelength goes to infinity its probability of error decays like where is the codelength and is the difference between a quantity which is strictly smaller than the capacity of the channel and the code-rate.
Notation. Throughout the paper we will use the following notation.
A linear code of length , dimension and distance over a finite field is referred to as an -code.
The concatenation of two vectors and is denoted by .
For a vector we either denote by or by the -th coordinate of . We use the first notation when the subscript is already used for other purposes or when there is already a superscript for .
For a vector we denote by the vector .
For a matrix we denote by the -th column of .
By some abuse of terminology, we also view a discrete memoryless channel with input alphabet and output alphabet as an matrix whose entry is denoted by which is defined as the probability of receiving given that was sent. We will identify the channel with this matrix later on.
2 The code construction and the link with polar codes
Iterated codes. This section details the code construction we deal with. It can be seen as a variation of polar codes and is nothing but an iterated code construction. We first recall the definition of a code. We refer to [MS86, Th.33] for the statements on the dimension and minimum distance that are given below.
Definition 1 ( code).
Let and be two codes of the same length and defined over the same finite field . We define the -construction of and as the linear code:
The dimension of the code is and its minimum distance is when the dimensions of and are and respectively, the minimum distance of is and the minimum distance of is .
The codes we are going to consider here are iterated constructions defined by
Definition 2 (iterated -construction of depth ).
An iterated -code of depth is defined from a set of codes which have all the same length and are defined over the same finite field by using the recursive definition
The codes for are called the constituent codes of the construction.
In other words, an iterated -code of depth is nothing but a standard -code and an iterated -code of depth is a -code where and are themselves -codes.
Graphical representation of an iterated code. Iterated -codes can be represented by complete binary trees in which each node has exactly two children except the leaves. A -code is represented by a node with two childs, the left child representing the code and the right child representing the code. The simplest case is given is given in Figure 1. Another example is given in Figure 2 and represents an iterated -code of depth with a binary tree of depth whose leaves are the constituent codes of this construction.
Standard polar codes (i.e. the ones that were constructed by Arıkan in [Arı09]) are clearly a special case of the iterated construction. Indeed such a polar code of length can be viewed as an iterated -code of depth where the set of constituent codes are just codes of length . In other words, standard polar codes correspond to binary trees where all leaves are just single bits.
Recursive soft decoding of an iterated -code. As explained in the introduction our approach is to use the same decoding strategy as for Arıkan polar codes (that is his successive cancellation decoder) but by using now leaves that are codes which are much longer than single symbols. This will have the effect of lowering rather significantly the error probability of error after decoding when compared to standard polar codes. It will be helpful to change slightly the way the successive cancellation decoder is generally explained. Indeed this decoder can be viewed as an iterated decoder for a -code, where decoding the -code consists in first decoding the code and then the code with a decoder using soft information in both cases. This decoder was actually considered before the invention of polar codes and has been considered for decoding for instance Reed-Muller codes based on the fact that they are codes [Dum06, DS06].
Let us recall how such a -decoder works. Suppose we transmit the codeword over a noisy channel and we receive the vector: . We denote by the probability of receiving when was sent and assume a memoryless channel here. We also assume that all the codeword symbols and are uniformly distributed.
We first decode . We compute the probabilities for all positions and all in . Under the assumption that we use a memoryless channel and that the ’s and the ’s are uniformly distributed for all , it is straightforward to check that this probability is given by
We use now Arıkan’s successive decoding approach and assume that the decoder was correct and thus we have recovered . We compute now for all and all coordinates the probabilities by using the formula
This can be considered as soft-information on which can be used by a soft information decoder for .
This decoder can then be used recursively for decoding an iterated -code. For instance if we denote by an iterated -code of depth derived from the set of codes , the decoding works as follows (we used here the same notation as in Definition 2).
Decoder for . We first compute the probabilities for decoding , this code is decoded with a soft information decoder. Once we have recovered the part (we denote the corresponding codeword by ), we can compute the relevant probabilities for decoding the code. This code is also decoded with a soft information decoder and we output a codeword . All this work allows to recover the codeword denoted by by combining the and part as .
Decoder for . Once the codeword is recovered we can compute the probabilities for decoding the code and we decode this code in the same way as we decoded the code .
Figure 3 gives the order in which we recover each codeword during the decoding process.
When the constituent codes of this recursive construction are just codes of length , it is readily seen that this decoding simply amounts to the successive cancellation decoder of Arıkan. We will be interested in the case where these constituent codes are longer than this. In such a case, we have to use as constituent codes, codes for which we have an efficient but possibly suboptimal decoder which can make use of soft information. Reed-Solomon codes or algebraic geometry codes with the Koetter Vardy decoder are precisely codes with this kind of property.
Polarization. The probability computations made during the decoding (1) and (2) correspond in a natural way to changing the channel model for the code and for the code. These two channels really correspond to the two channel combining models considered for polar codes. More precisely, if we consider a memoryless channel of input alphabet and output alphabet defined by a transition matrix , then the channel viewed by the decoder, respectively the decoder is a memoryless channel with transition matrix and respectively, which are given by
Here the ’s belong to and the ’s belong to . If we define the channel for recursively by
then the channel viewed by the decoder for one of the constituent codes of an iterated code of depth (with the notation of Definition 2) is nothing but the channel .
The key result used for showing that polar codes attain the capacity is that these channels polarize in the following sense
Let be an arbitrary prime. Then for a discrete -ary input channel of symmetric capacity 333Recall that the symmetric capacity of such a channel is defined as the mutual information between a uniform input and the corresponding output of the channel, that is , where denotes the output alphabet of the channel. we have for all
Here denotes the Bhattacharyya parameter of which is assumed to be a memoryless channel with -ary inputs and outputs in an alphabet . It is given by
Recall that this Bhattacharrya parameter quantifies the amount of noise in the channel. It is close to for channels with very low noise (i.e. channels of capacity close to ) whereas it is close to for very noisy channels (i.e. channels of capacity close to ).
3 Soft decoding of Reed-Solomon codes with the Koetter-Vardy decoding algorithm
It has been a long standing open problem to obtain an efficient soft-decision decoding algorithm for Reed-Solomon codes until Koetter and Vardy showed in [KV03a] how to modify appropriately the Guruswami-Sudan decoding algorithm in order to achieve this purpose. The complexity of this algorithm is polynomial and we will show here that the probability of error decreases exponentially in the codelength when the noise level is below a certain threshold. Let us first review a few basic facts about this decoding algorithm.
The reliability matrix.
The Koetter-Vardy decoder [KV03a] is based on a reliability matrix of the codeword symbols computed from the knowledge of the received word and which is defined by
Recall that the -th column of this matrix is denoted by . It gives the a posteriori probabilities (APP) that the -th codeword symbol is equal to where ranges over .
We will be particularly interested in the -ary symmetric channel model. The -ary symmetric channel with error probability , denoted by , takes a -ary symbol at its input and outputs either the unchanged symbol, with probability , or any of the other symbols, with probability . Therefore, if the channel input symbols are uniformly distributed, the reliability matrix for is given by
Thus, all columns of are identical up to permutation:
This matrix is used by the Koetter-Vardy decoder to compute a multiplicity matrix that serves as the input to its soft interpolation step. When used in a construction and decoded as mentioned before, we will need to understand how the reliability matrix behaves through the decoding process. This is what we will do now.
Reliability matrix for the -decoder.
We denote the reliability matrix of the decoder by when and are the initial reliability matrices corresponding to the two halves of the received word . From the definition of the reliability matrix and (1) we readily obtain that
Reliability matrix for the -decoder
Similarly, by using (2) we see that the reliability matrix of the decoder, that we denote by is given by
To simplify notation we will generally avoid the dependency on and and simply write and .
When does the Koetter-Vardy decoding algorithm succeed ?
as the codelength tends to infinity, where represents a matrix with entries if , and otherwise; and denotes the inner product of the two matrices and , i.e.
The algorithm uses a parameter (the total number of interpolation points counted with multiplicity). The little-O depends on the choice of this parameter and the parameters and .
We need a more precise formulation of the little-O of (6) to understand that we can get arbitrarily close to the lower bound with polynomial complexity. In order to do so, let us provide more details about the Koetter Vardy decoding algorithm. Basically this algorithm starts by computing with Algorithm A of [KV03a, p.2814] from the knowledge of the reliability matrix and for the aforementioned integer parameter a nonnegative integer matrix whose entries sum up to . When goes to infinity becomes proportional to . The cost of this matrix (we will drop the dependency in ) is defined as
where denotes the entry of at row and column and is the all-one matrix. The complexity of the Koetter-Vardy decoding algorithm is dominated by solving a system of linear equations. Then, the number of codewords on the list produced by the Koetter-Vardy decoder for a given multiplicity matrix does not exceed
It is straightforward to obtain from these considerations a soft-decision list decoder with a list which does not exceed some prescribed quantity . Indeed it suffices to increase the value of in [KV03a, Algorithm A] until getting a matrix which is such that
and to use this multiplicity matrix in the Koetter-Vardy decoding algorithm. By following the terminology of [KV03a] we refer to this decoding procedure as algebraic soft-decoding with list size limited to . [KV03a, Theorem 17] explains that convergence to the lower-bound is at least as fast as
Theorem 2 (Theorem 17, [KV03a]).
Algebraic soft-decoding with list size limited to produces a codeword if
where and the constant in depends only on and .
This theorem shows that the size of the list required to approach the asymptotic performance does not depend (directly) on the length of the code, it may depend on the rate of the code and the cardinality of the alphabet though.
As observed in [KV03a], this theorem is a very loose bound. The actual performance of algebraic soft-decoding with list size limited to is usually orders of magnitude better than that predicted by (8). A somewhat better bound is given by [KV03a, (44) p. 2819] where the condition for successful decoding is
where the approximation assumes that which holds for noise levels of practical interest. Note that this strengthens a little bit the constant in that appears in Theorem 2, since it would not depend on anymore.
Decoding capability of the Koetter-Vardy decoder when the channel is symmetric.
The previous formula does not explain directly under which condition on the rate of the Reed-Solomon code decoding typically succeeds (in some sense this would be a “capacity” result for the Koetter-Vardy decoder). We will derive now such a result that appears to be new (but see the discussion at the end of this section). It will be convenient to restrict a little bit the class of memoryless channels we will consider- this will simplify formulas a great deal. The idea underlying this restriction is to make the behavior of the quantity which appears in the condition of successful decoding (6) independent of the codeword which is sent. This is readily obtained by restricting the channel to be weakly symmetric.
Definition 3 (weakly symmetric channel).
A discrete memoryless with input alphabet and output alphabet is said to be weakly symmetric if and only if there is a partition of the output alphabet such that all the submatrices are symmetric. A matrix is said to be symmetric if all if its rows are permutations of each other, and all its columns are permutations of each other.
Such a channel is called symmetric in [Gal68, p.94]. We avoid using the same terminology as Gallager since “symmetric channel” is generally used now to denote a channel for which any row is a permutation of each other row and the same property also holds for the columns.
It is shown that for such channels [Gal68, Th. 4.5.2] a uniform distribution on the inputs maximizes the mutual information between the output and the input of the channel and gives therefore its capacity. In such a case, linear codes attain the capacity of such a channel.
This notion captures the notion of symmetry of a channel in a very broad sense. In particular the erasure channel is weakly symmetric (for many definitions of “symmetric channels” an erasure channel is not symmetric).
We denote for such a channel and for a given output by the associated APP vector, that is where we denote by the input symbol to the channel.
To compute this APP vector we will make throughout the paper the following assumption
The input of the communication channel is assumed to be uniformly distributed over .
We give now the asymptotic behavior of the Koetter-Vardy decoder for a weakly symmetric channel, but before doing this we will need a few lemmas.
Assume that is the input symbol that was sent and that the communication is weakly symmetric, then by viewing as a function of the random variable we have for any :
To prove this result, let us introduce some notation. Let us denote by
the output alphabet and is a partition of such that all the submatrices are symmetric for .
and where is arbitrary in (these quantities do not depend on the element chosen in );
and where is arbitrary in .
We observe now that from the assumption that was uniformly distributed
We observe now that
where the second equality is due to (10).
On the other hand
where the second equality is due to (10).
By summing all the elements (or the square of the elements) of the symmetric matrix either by columns or by rows and since all these row sums or all these column sums are equal, we obtain that
As we will now show, this quantity turns out to be the limit of the rate for which the Koetter-Vardy decoder succeeds in decoding when the alphabet gets large. For this reason, we will denote this quantity by the Koetter-Vardy capacity of the channel.
Definition 4 (Koetter-Vardy capacity).
Consider a weakly symmetric channel and denote by the associated probability vector. The Koetter-Vardy capacity of this channel, which we denote by , is defined by
To prove that this quantity captures the rate at which the Koetter-Vardy is successful (at least for large lengths and therefore large field size) let us first prove concentration results around the expectation for the numerator and denominator appearing in the left-hand term of (6).
Let and . We have
Let us first prove (13). We can write the left-hand term as a sum of i.i.d. random variables
where . Note that (i) , (ii) . By using Hoeffding’s inequality we obtain that for any we have
This result can be used to derive a rather tight upper-bound on the probability of error of the Koetter-Vardy decoder.
Consider a weakly symmetric -ary input channel of Koetter-Vardy capacity . Consider a Reed-Solomon code over of length , dimension such that its rate satisfies . Let
The probability that the Koetter-Vardy decoder with list size bounded by does not output in its list the right codeword is upper-bounded by for some constant .
Without loss of generality we can assume that the all-zero codeword was sent. From Theorem 2, we know that the Koetter-Vardy decoder succeeds if and only if the following condition is met
Notice that the right-hand side satisfies
Let be a positive constant that we are going to choose afterward. Define the events and by
Note that by Lemma 6 the events and have both probability where .
Thus, the probability that event and event both occur is
In the case and both hold, we have
A straightforward computation shows that for any we have
Therefore for we have in the aforementioned case
Let us choose now such that
Note that . This choice implies that
where we used in the last inequality the bound given in (16).
In other words, the Koetter Vardy decoder outputs the codeword in its list. The probability that this does not happen is at most . ∎
An immediate corollary of this theorem is the following result that gives a (tight) lower bound on the error-correction capacity of the Koetter-Vardy decoding algorithm over a discrete memoryless channel.
Let be an infinite family of Reed-Solomon codes of rate . Denote by the alphabet size of that is assumed to be a non decreasing sequence that goes to infinity with . Consider an infinite family of -ary weakly symmetric channels with associated probability error vectors such that has a limit as tends to infinity. Denote by the asymptotic Koetter-Vardy capacity of these channels, i.e.
This infinite family of codes can be decoded correctly by the Koetter-Vardy decoding algorithm with probability as tends to infinity as soon as there exists such that
Let us observe that for the we have
By letting going to infinity, we recover in this way the performance of the Guruswami-Sudan algorithm which works as soon as .
Link with the results presented in [KV03a] and [KV03b]. In [KV03a, Sec. V.B eq. (32)] an arbitrarily small upper bound on the error probability is given, it is namely explained that as soon as the rate and the length of the Reed-Solomon code satisfy (where the expectation is taken with respect to the a posteriori probability distribution of the codeword). Here is some function of the multiplicity matrix which itself depends on the received word. This is not a bound of the same form as the one given in Theorem 7 whose upper-bound on the error probability only depends on some well defined quantities which govern the complexity of the algorithm (such as the size of the field over which the Reed-Solomon code is defined and a bound on the list-size) and the Koetter-Vardy capacity of the channel.
However, many more details are given in the preprint version [KV03b] of [KV03a] in Section 9. There is for instance implicitly in the proof of Theorem 27 in [KV03b, Sec. 9] an upper-bound on the error probability of decoding a Reed-Solomon code with the Koetter-Vardy decoder which goes to zero polynomially fast in the length as long as the rate is less than where is the transition probability matrix of the channel and is the matrix which is zero except on the diagonal where the diagonal elements give the probability distribution of the output of the channel when the input is uniformly distributed. It is readily verified that in the case of a weakly symmetric channel is nothing but the Koetter-Vardy capacity of the channel defined here. can be viewed as a more general definition of the “capacity” of a channel adapted to the Koetter-Vardy decoding algorithm. However it should be said that “error-probability” in [KV03a, KV03b] should be understood here as “average error probability of error” where the average is taken over the set of codewords of the code. It should be said that this average may vary wildly among the codewords in the case of a non-symmetric channel. In order to avoid this, we have chosen a different route here and have assumed some weak form of symmetry for the channel which ensures that the probability of error does not depend on the codeword which is sent. The authors of [KV03b] use a second moment method to bound the error probability, this can only give polynomial upper-bounds on the error probability. This is why we have also used a slightly different route in Theorem 7 to obtain stronger (i.e. exponentially small) upper-bounds on the error probability.
4 Algebraic-soft decision decoding of AG codes.
The problem with Reed-Solomon codes is that their length is limited by the alphabet size. To overcome this limitation it is possible to proceed as in [KV03b] and use instead Algebraic-Geometric codes (AG codes in short) which can also be decoded by an extension of the Koetter-Vardy algorithm and which have more or less a similar error correction capacity as Reed-Solomon codes under this decoding strategy. The extension of this decoding algorithm to AG codes is sketched in Section D. Let us first recall how these codes are defined.
An AG code is constructed from a triple where:
denotes an algebraic curve over a finite field (we refer to [Sti93] for more information about algebraic geometry codes);
denotes a set of distinct points of with coordinates in ;
is a divisor of the curve, here denotes another point in with coordinates in which is not in and is a nonnegative integer.
We define as the vector space of rational functions on that may contain only a pole at and the multiplicity of this pole is at most . Then, the algebraic geometry code associated to the above triple denoted by is the image of under the evaluation map