Approaching Miscorrectionfree Performance
of Product and
Generalized Product Codes
Abstract
Product codes (PCs) protect a twodimensional array of bits using short component codes. Assuming transmission over the binary symmetric channel, the decoding is commonly performed by iteratively applying boundeddistance decoding to the component codes. For this coding scheme, undetected errors in the component decoding—also known as miscorrections—significantly degrade the performance. In this paper, we propose a novel iterative decoding algorithm for PCs which can detect and avoid most miscorrections. The algorithm can also be used to decode many recently proposed classes of generalized PCs such as staircase, braided, and halfproduct codes. Depending on the component code parameters, our algorithm significantly outperforms the conventional iterative decoding method. As an example, for doubleerrorcorrecting Bose–Chaudhuri–Hocquenghem component codes, the net coding gain can be increased by up to dB. Moreover, the error floor can be lowered by orders of magnitude, up to the point where the decoder performs virtually identical to a genieaided decoder that avoids all miscorrections. We also discuss postprocessing techniques that can be used to reduce the error floor even further.
I Introduction
A product code (PC) is the set of all arrays where each row and column in the array is a codeword in some linear component code of length [1]. Recently, a wide variety of related code constructions have been proposed, e.g., braided codes [2], halfproduct codes [3, 4], continuouslyinterleaved codes [5], halfbraided codes [4, 6], and staircase codes [7]. All of these code classes have Tanner graph representations that consist exclusively of degree2 variable nodes, i.e., each bit is protected by two component codes. We use the term generalized product codes (GPCs) to refer to such codes.
The component codes of a GPC typically correspond to Reed–Solomon or Bose–Chaudhuri–Hocquenghem (BCH) codes, which can be efficiently decoded via algebraic boundeddistance decoding (BDD). The overall GPC is then decoded by iteratively applying BDD to the component codes. This iterative coding scheme dates back to 1968 [8] and has been shown to offer excellent performance in practice. In particular for the binary symmetric channel (BSC) at high code rates, iterative decoding of GPCs with binary BCH component codes can achieve performance close to the channel capacity [7, 9, 10]. Moreover, the decoder data flow can be orders of magnitude lower than for comparable lowdensity paritycheck (LDPC) codes under messagepassing decoding [7]. This facilitates decoder throughputs of tens or even hundreds of Gigabits per second. Indeed, GPCs are popular choices for highbitrate applications with limited soft information such as regional/metro optical transport networks [7, 11, 5, 9, 12, 13, 3, 4, 6, 14, 15, 16]. Besides data transmission, GPCs are also used in storage applications [17, 18, 19, 20, 21].
For GPCs over the BSC, undetected errors in the component decoding—also known as miscorrections—significantly degrade the performance of iterative decoding. In particular, let , where denote a component codeword and random error vector, respectively. For a errorcorrecting component code , BDD yields the correct codeword if and only if , where and denote Hamming distance and weight, respectively. On the other hand, if , the decoding either fails or there exists another codeword such that . In the latter case, we say that a miscorrection occurs, in the sense that BDD is technically successful but the decoded codeword is not the correct one. Miscorrections are highly undesirable because they introduce additional errors (on top of channel errors) into the iterative decoding process. Moreover, from a theoretical perspective, miscorrections are notoriously difficult to analyze in an iterative decoding scheme [22, 23, 4, 7, 14, 24, 25, 26]. In fact, despite the widespread use in practice and to the best of our knowledge, no rigorous analytical results exist characterizing the finitelength performance of GPCs under iterative BDD over the BSC.
For specific code proposals in practical systems, the miscorrection problem is typically addressed by appropriately modifying the component code that is used to construct the GPC, see, e.g., [4, 7, 9, 16]. In particular, for binary errorcorrecting BCH codes, it is known that miscorrections occur approximately with probability [27, 4]. In order to reduce this probability, one may employ a subcode of the original code [4, 7], extend the code [16], and/or apply code shortening [7, 16]. On the other hand, such modifications invariably lead to a code rate reduction. Moreover, even with a modified component code, miscorrections can still have a significant effect on the performance.
The main contribution in this paper is a novel iterative decoding algorithm for GPCs which can detect and avoid most miscorrections. The algorithm relies on socalled anchor codewords to resolve inconsistencies across component codewords. This leads to significant performance improvements, in particular when is small (which is typically the case in practice). As an example, for , the algorithm can improve the net coding gain by roughly dB. Moreover, the error floor can be lowered by almost two orders of magnitude, up to the point where the performance is virtually identical to that of a miscorrectionfree genieaided decoder.
Errorfloor improvements are particularly important for applications with stringent reliability constraints such as optical transport networks. We therefore also discuss the application of postprocessing (PP) techniques [28, 9, 21, 29, 30, 26], which can be combined with the proposed algorithm to reduce the error floor even further.
Decoder modifications that target miscorrections have been proposed before in the literature. Usually these modifications are minor and they are tailored to a specific GPC. As an example, staircase codes [7] can be seen as a convolutionallike (or spatiallycoupled) version of PCs. The associated code array consists of an infinite number of square blocks that are arranged to look like a staircase [7, Fig. 4]. Decoding is facilitated by using a sliding window which comprises only a finite number of blocks. In order to reduce miscorrections, one may reject certain bit flips from component codewords that are associated with the newest (most unreliable) block from the channel [31, p. 59]. However, simulations suggest that the performance gains using this approach are limited. On the other hand, the proposed anchorbased decoding can closely approach miscorrectionfree performance when applied to staircase codes [32].
In [4], a decoder modification for PCs is suggested based on the observation that the miscorrection probability is reduced by a factor of if only errors are corrected for a errorcorrecting component code. For large , this is significant and the author thus proposes to only correct errors in the first iteration of iterative BDD. We will see later that this indeed gives some notable performance improvements. This trick can also be easily combined with the proposed algorithm.
The work in this paper is inspired by another comment made in [4] where it is mentioned that for PCs it may be desirable to “take special actions in case of conflicts” caused by miscorrections. In particular, it is suggested to use the number of conflicts for a particular component code as an indicator for the reliability of the component code. Besides this suggestion, no further details or results are provided. Our algorithm builds upon this idea and we develop a systematic approach that exploits component code conflicts and can be applied to an arbitrary GPC.
The remainder of the paper is structured as follows. In Sec. II, we review PCs and the conventional iterative decoding scheme. We also give a brief overview of theoretical methods that have been proposed to analytically predict the performance. The proposed anchorbased decoding algorithm is described in Sec. III. In Sec. IV, simulation results are presented and discussed for various component code parameters. PP is discussed in Sec. V. We discuss the complexity and some implementation issues for the proposed algorithm in Sec. VI. Finally, the paper is concluded in Sec. VII.
Ii Product Codes and Iterative Decoding
This paper focuses on PCs, which we believe many readers are familiar with. Other classes of GPCs are discussed separately in Sec. IIIF.
Iia Product Codes
Let be the paritycheck matrix of a binary linear code , where , , and are the code length, dimension, and minimum distance, respectively. A PC based on is defined as
(1) 
It can be shown that is a linear code. The codewords can be represented as twodimensional arrays. The two conditions in (1) enforce that the rows and columns in the array are valid codewords in .
We use a pair to identify a particular component codeword. The first parameter refers to the codeword type which can be either a row or a column . The second parameter enumerates the codewords of a given type.
Example 1.
The code array for a PC where the component code has length is shown in Fig. 1. The component codewords and are highlighted in blue.
For PCs, the coded bits can be identified by their two coordinates within the PC array. However, this way of specifying bits does not generalize well to other classes of GPCs because the associated code array may have a different shape (or there may not exist an array representation at all). In order to keep the notation general, we therefore use the convention that a coded bit is specified by two component codewords and , i.e., four parameters in total.
Example 2.
The two highlighted component codewords in Fig. 1 intersect at the bit corresponding to the tuple or .
IiB BCH Component Codes
We use binary errorcorrecting BCH codes as component codes, as well as their singly and doublyextended versions. Recall that a singlyextended BCH code is obtained through an additional parity bit, formed by adding (modulo 2) all coded bits of the BCH code, where is the Galois field extension degree. On the other hand, a doublyextended BCH code has two additional parity bits, denoted by and , such that
(2)  
(3) 
i.e., the parity bits perform checks separately on odd and even bit positions. The overall component code has length , where indicates either no (), single (), or double () extension. In all three cases, the guaranteed code dimension is . For , the guaranteed minimum distance is . This is increased to for . We use a triple to denote all BCH code parameters.
IiC BoundedDistance Decoding and Miscorrections
Consider the transmission of a component codeword over the BSC with crossover probability . The error vector introduced by the channel is denoted by , i.e., the components of are i.i.d. Bernoulli() random variables. Applying BDD to the received vector results in
(4) 
In practice, BDD is implemented by first computing the syndrome . Each of the possible syndromes is then associated with either an estimated error vector , where , or a decoding failure. In the first case, the decoded output is computed as .
The second case in (4) corresponds to an undetected error or miscorrection.
Example 3.
Consider the component codeword in Fig. 1 and assume that the allzero codeword is transmitted. The black crosses represent bit positions which are received in error, i.e., for and elsewhere. For a component code with and , we have , i.e., there exists at least one codeword with Hamming weight 5. Assume we have with for and elsewhere. Applying BDD to then introduces two additional errors at bit positions 6 and 14. This is shown by the red crosses in Fig. 1.
Code extension reduces the probability of miscorrecting, at the expense of a slightly increased code length (and hence a small rate loss). To see this, let and assume that . Consider now the decoding of a random syndrome where the components in are i.i.d. Bernoulli(). In that case, the probability of miscorrecting is simply the ratio of the number of decodable syndromes and the total number of syndromes, i.e.,
(5) 
For and , the total number of syndromes is increased to and the miscorrection probability is thus reduced by a factor of 1/2 and 1/4, respectively. We note that this reasoning can be made precise, see, e.g., [27, 4].
Example 4.
Consider the same scenario as in Example 3. If we use the singlyextended component code, the minimum distance is increased to . Assuming that there is no additional error in the last bit position, i.e., , the miscorrection illustrated in Example 3 can be detected because the paritycheck equation involving the additional parity bit is not satisfied.
Remark 1.
As an alternative to extending the code, one may employ a subcode of the original BCH code. For example, the singlyextended BCH code behaves similarly to the evenweight subcode of the BCH code, which is obtained by multiplying its generator polynomial by . The subcode where odd and even coded bits separately sum to zero is obtained by multiplying the generator polynomial by . Subcodes have a reduced code dimension and hence lead to a similar rate loss as the code extension.
IiD Iterative BoundedDistance Decoding
We now consider the transmission of a codeword over the BSC with crossover probability . The conventional iterative decoding procedure consists of applying BDD first to all row component codewords and then to all column component codewords. This is repeated times or until a valid codeword in is found. Pseudocode for the iterative BDD is given in Algorithm 1. A stoppingcriterion is omitted for readability purposes.
In order to analyze the bit error rate (BER) of PCs under iterative BDD, the prevailing approach in the literature is to assume that no miscorrections occur in the BDD of the component codes, see, e.g., [22, 23, 4, 14, 24]. To that end, we define
(6) 
which can be seen as an idealized version of BDD where a genie prevents miscorrections. Conceptually, this is similar to assuming transmission over the binary erasure channel (BEC) instead of the BSC [22].
Using (6) instead of (4), a decoding failure for a PC is related to the existence of a socalled core in an Erdős–Rényi random graph [4, 23]. This connection can be used to rigorously analyze the asymptotic performance as using density evolution (DE) [14] [4] [23]. Moreover, the error floor can be estimated by enumerating stopping sets, also known as stall patterns. A stopping set is a subset of bit positions such that every component codeword with at least one bit in the set must contain at least bits in the set. For PCs, a minimalsize stopping set involves row codewords and column codewords and has size . For example, the black crosses shown in Fig. 1 form such a stopping set when . If we consider only stopping sets of minimal size, the BER can be approximated as
(7) 
for sufficiently small , where is the total number of possible minimalsize stopping sets, also referred to as the stopping set’s multiplicity. Unfortunately, if miscorrections are taken into account, DE and the error floor analysis are nonrigorous and become inaccurate.
Example 5.
Consider a BCH code with parameters . The resulting PC has length and code rate . For decoding iterations, the outcome of DE (see Appendix A for details) and the error floor analysis via (7) are shown in Fig. 2 by the dashed black lines. The analysis can be verified by performing idealized iterative BDD using (6). The results are shown by the blue line (triangles). However, the actual BER with true BDD (4) deviates significantly from the idealized decoding, as shown by the red line (squares). The BER can be moderately improved by treating the component codes as singleerrorcorrecting in the first iteration, as suggested in [4]. This is shown by the red dotted line.
There exist several approaches to quantify the performance loss due to miscorrections. In terms of the error floor, the authors in [7] derive an expression similar to (7) for staircase codes. To account for miscorrections, this expression is modified by introducing a heuristic parameter, whose value has to be estimated using Monte–Carlo simulations. In terms of asymptotic performance, the authors in [25] recently proposed a novel approach to analyze a GPC ensemble that is structurally related to staircase codes. The presented method is shown to give more accurate asymptotic predictions compared to the case where miscorrections are ignored.
Rather than analyzing the effect of miscorrections, the approach taken in this paper is to try to avoid them by modifying the decoding. In the next section, we give a detailed description of the proposed decoding algorithm. Its BER performance for the code parameters considered in Example 5 is shown in Fig. 2 by the green line (circles). The results are discussed in more detail in Sec. IV below.
Remark 2.
Iterative BDD can be interpreted as a messagepassing algorithm with binary “harddecision” messages. The corresponding messagepassing rule is intrinsic, in the sense that the outgoing message along some edge depends on the incoming message along the same edge. In [10], the authors propose an extrinsic messagepassing algorithm based on BDD. The BER for this algorithm when applied to the PC in Example 5 is shown in Fig. 2 by the brown line (diamonds). Similar to the proposed algorithm, extrinsic messagepassing provides significant performance improvements over iterative BDD. However, it is known that the decoder dataflow and storage requirements can be dramatically increased for messagepassing decoding compared to iterative BDD [7]. One reason for this is that iterative BDD can leverage a syndrome compression effect by operating entirely in the syndrome domain. We show in Sec. VI that this effect also applies to the proposed algorithm. For extrinsic messagepassing, it is an open question if an efficient syndrome domain implementation is possible. Due to this, we do not consider the extrinsic messagepassing further in this paper.
Iii Anchorbased Decoding
In the previous section, we have seen that there exists a significant performance gap between iterative BDD and idealized iterative BDD where a genie prevents miscorrections. Our goal is to close this gap. In order to do so, the key observation we exploit is that miscorrections lead to inconsistencies (or conflicts) across component codewords. In particular, two component codes that protect the same bit may disagree on its value. In this section, we show how these inconsistencies can be used to (a) reliably prevent miscorrections and (b) identify miscorrected codewords in order to revert their decoding decisions.
Iiia Preliminaries
The proposed decoding algorithm relies on socalled anchor codewords which have presumably been decoded without miscorrections. Roughly speaking, we want to ensure that bit flips do not lead to inconsistencies with anchor codewords. Consequently, decoding decisions from codewords that are in conflict with anchors are not applied. On the other hand, some anchor codewords may actually be miscorrected. We therefore allow for the decoding decisions of anchors to be overturned if too many other component codewords are in conflict with a particular anchor. In order to make this more precise, we start by introducing some additional concepts and notation in this subsection.
First, consider the BDD of a single component codeword. We explicitly regard this component decoding as a twostep process. In the first step, the actual decoding is performed and the outcome is either an estimated error vector or a decoding failure. In the second step, errorcorrection is performed by flipping the bits corresponding to the error locations. These two steps are separated in order to perform consistency checks (described below). These checks are used to determine if the errorcorrection step should be applied.
It is more convenient to specify the estimated error vector in terms of a set of error locations. For component codeword , this set is denoted by , where . The set comprises those component codewords that are affected by the bit flips implied by .
Example 6.
Remark 3.
It may seem more natural to define in terms of the bit positions of the BCH code, e.g., in the previous example. However, defining in terms of the affected component codewords leads to a more succinct description of the proposed algorithm. Moreover, this definition also generalizes more easily to other classes of GPCs, see Sec. IIIF below.
Furthermore, we use a set for each component codeword that comprises other component codewords that are in conflict with codeword due to miscorrections. Lastly, each component codeword has an associated status to signify its current state. The status values range from to with the following meaning:

0: anchor codeword

1: eligible for BDD

2: BDD failed in last iteration

3: frozen codeword
The precise use of the status and the transition rules between different status values are described in the following.
IiiB Main Algorithm Routine
The algorithm is initialized by setting the status of all component codewords to 1. We then iterate times over the component codewords in the same fashion as in Algorithm 1, but replacing line 4 with lines 1–19 in Algorithm 2. Algorithm 2 represents the main routine of the proposed anchorbased decoding. It can be divided into 4 steps which are described in the following.
Step 1 (Lines 1–3)
If the component codeword is eligible for BDD, i..e, its status is 1, we proceed to decode the component codeword. If the decoding is successful, we proceed to the next step, otherwise, the codeword status is set to 2 and we skip to the next codeword.
Step 2 (Lines 4–11)
For each found error location , a consistency check is performed. That is, one checks if the implied component codeword corresponds to an anchor. If so, is the number of conflicts that this anchor is already involved in. This number is then compared against a threshold . If , the anchor is deemed unreliable and it is marked for backtracking by adding it to the backtracking set . On the other hand, if , the codeword is frozen by changing its status to 3. Moreover, the conflict between the (now frozen) codeword and the anchor is stored by modifying the respective sets and . Frozen codewords are always skipped (in the loop of Algorithm 1) for the rest of the decoding unless either the conflicting anchor is backtracked or any bits in the frozen codeword change.
Step 3 (Lines 12–15)
If the component codeword still has status 1, the bit flips implied by are consistent with all reliable anchors, i.e., anchors that are involved in or fewer other conflicts. If that is the case, the algorithm proceeds by applying the errorcorrection step for codeword , i.e., the bits corresponding to all error locations are flipped. The errorcorrection step is implemented in Algorithm 4 and described in detail in Section IIIE. Afterwards, the codeword becomes an anchor by changing its status to 0.
Step 4 (Lines 16–17)
The last step consists of backtracking all anchor codewords in the set (if there are any). Roughly speaking, backtracking involves the reversal of all previously applied bit flips of the corresponding anchor. Moreover, the backtracked codeword loses its anchor status. The backtracking routine is implemented in Algorithm 3 and described in more detail in Sec. IIID below.
IiiC Examples
We now illustrate the above steps with the help of two examples. For both examples, a component code with errorcorrecting capability is assumed. Moreover, the conflict threshold is set to .
Example 7.
Consider the scenario depicted in Fig. 3(a). Assume that we are at , corresponding to a row component codeword with status 1 and four attached errors shown by the black crosses. The codeword is assumed to be miscorrected with shown by the red crosses. Codeword is assumed to have status 2 (i.e., BDD failed in the previous iteration with three attached errors) and therefore the first consistency check is passed. However, assuming that the codeword is an anchor without any other conflicts, i.e., , the codeword is frozen during step 2. Hence, no bit flips are applied and the miscorrection is prevented. The conflict is stored by updating the two conflict sets as and , respectively.
Example 8.
Consider the scenario depicted in Fig. 3(b), where we assume that the codeword is a miscorrected anchor without conflicts (i.e., ) and error locations . Assume that we are at . The codeword has status 1 and two attached error. Thus, BDD is successful with . During step 2, the codeword is, however, frozen because there is a conflict with anchor . After freezing the codeword, we have and . We skip to the next codeword , which has status 1. Again, BDD is successful with . The implied bit flip is inconsistent with the anchor . However, since this anchor is already in conflict with codeword (and, hence, ), the anchor is marked for backtracking and the errorcorrection step for bit will be applied.
IiiD Backtracking
In Example 8, we have encountered a scenario that leads to the backtracking of a miscorrected anchor. The actual backtracking routine is implemented in Algorithm 3. First, all conflicts caused by the anchor are removed by modifying the respective conflict sets. Note that all codewords for anchor necessarily have status 3, i.e., they are frozen. After removing conflicts, such codewords may be conflictfree, in which case their status is changed to . After this, all previously applied bit flips are reversed. In order to perform this operation, it is necessary to store the set for each anchor. Finally, the codeword loses its anchor status. In principle, the new codeword status can be chosen to be either or . However, backtracked anchors are likely to have miscorrected. We therefore prefer to freeze the codeword by setting its status to after the backtracking.
Remark 4.
Since we do not know if an anchor is miscorrected or not, it is also possible that we mistakenly backtrack “good” anchors. Fortunately, this is unlikely to happen for long component codes because the additional errors due to miscorrections are approximately randomly distributed within the codeword [4]. This implies that additional errors of two (or more) miscorrected codewords rarely overlap.
IiiE Errorcorrection Step
The errorcorrection step is implemented in Algorithm 4. The input is a parameter tuple where is the codeword that initiated the bit flip and is the corresponding codeword affected by it. Note that Algorithm 4 can be reached from both the main routine (Algorithm 2, lines 13–14) and as part of the backtracking process (Algorithm 3, lines 6–7). If the algorithm is reached via backtracking, it is possible that the affected codeword is now an anchor. In this case, we use the convention to trust the anchor’s decision about the bit and not apply any changes. In all other cases, apart from actually flipping the bit (line 2), errorcorrection triggers a status change (lines 3–9). If the bit flip affects a frozen codeword, the codeword is unfrozen and we remove the conflicts that the codeword is involved in.
IiiF Generalized Product Codes
In principle, anchorbased decoding can be applied to an arbitrary GPC. Indeed, Algorithms 2–4 are independent of the underlying GPC. The global code structure manifests itself only through the set of error locations . This set is defined in terms of the affected component codewords, which implicitly uses the code structure.
Compared to PCs, the main difference for other GPCs is that Algorithm 1 has to be replaced by a version that is appropriate for the specific GPC. For anchorbased decoding, Algorithm 1 simply specifies the order in which the component codewords are traversed during the iterative decoding.
Example 9.
Note that staircase codes have more than two types of component codewords. In particular, the codeword types indicate the position of the component codewords in the staircase code array. For GPCs that do not admit a description in terms of a finite number of codeword types (e.g., tightlybraided block codes), one may simply use the convention that each component codeword forms its own type.
Anchorbased decoding can also be applied to GPCs that are based on component codes with different lengths and/or errorcorrecting capabilities. For example, PCs can be defined such that different component codes are used to protect the rows and columns of the code array. More generally, the component codes may even vary across rows and columns, leading to irregular PCs [33, 34]. While Algorithms 2–4 are agnostic to such changes, it may be beneficial to adopt different conflict thresholds for different component codes in these cases.
Iv Simulation Results
In this section, we present and discuss simulation results assuming different BCH component codes. For the conflict threshold, we tested different values for a wide variety of component codes and BSC crossover probabilities. In all cases, was found to give the best performance. Hence, is assumed in the following.
Iva BCH Codes with
Doubleerrorcorrecting BCH codes are of particular interest because they can be decoded very efficiently in hardware [35, 16]. On the other hand, the performance of the resulting PCs is significantly affected by miscorrections. This was shown in Example 5 in Sec. IID.
Recall that Example 5 uses a BCH component code with parameters and decoding iterations. For these parameters, the BER of the anchorbased decoding is shown in Fig. 2 by the green line (circles). The algorithm closely approaches the performance of the idealized iterative BDD in the waterfall regime. Moreover, virtually miscorrectionfree performance is achieved in the errorfloor regime.
In order to quantify the performance gain with respect to iterative BDD, we use the net coding gain (NCG). To that end, assume that a coding scheme with code rate achieves a BER of on a BSC with crossover probability . The NCG (in dB) is then defined as
(8) 
The NCG assumes binary modulation over an additive Gaussian noise channel and measures the difference in required between uncoded transmission and coded transmission using the coding scheme under consideration. As an example, it can be seen from Fig. 2 that the iterative BDD and the anchorbased decoding achieve a BER of at approximately and , respectively. The code rate in both cases is . Hence, the respective NCGs are given by dB and dB. The proposed algorithm thus achieves a NCG improvement of approximately dB, as indicated by the arrow in Fig. 2.
As suggest in [4], correcting only one error in the first iteration of iterative BDD gives some moderate performance improvements. This trick can also be used in combination with the anchorbased decoding. In that case, a decoding failure (in line 2 of Algorithm 2) also occurs when BDD is successful but . Since the status of such component codewords is set to 2, it is important to reset the status to 1 after the first iteration in order to continue the decoding with the full errorcorrection capability. For the PC in Example 5 with iterations, we found that correcting only one error in the first iteration does not lead to noticeable performance improvements for the anchorbased decoding. Some small improvements can be obtained, however, by increasing the total number of iterations to and correcting only one error in the first 5 iterations. This is shown by the green dotted line in Fig. 2. It is important to stress that without the gradual increase of , the BER for and is virtually the same. Thus, the improvement shown in Fig. 2 is indeed due to the artificial restriction of the errorcorrecting capability.
Next, we provide a direct comparison with the results presented in [16]. In particular, the authors propose a hardware architecture for a PC that is based on a BCH component code with parameters . The BCH code is further shortened by 61 bits, leading to an effective length of and dimension . The shortening gives a desired code rate of . The number of decoding iterations is set to . For these parameters, BER results are shown in Fig. 4 for iterative BDD, anchorbased decoding, and idealized iterative BDD (labeled “w/o PP”). As before, the outcome of DE and the error floor prediction via (7) are shown by the dashed black lines as a reference. Compared to the results shown in Fig. 2, the anchorbased decoding approaches the performance of the idealized iterative BDD even closer and virtually miscorrectionfree performance is achieved for BERs below . This can be attributed to the quite extensive code shortening, which reduces the probability of miscorrecting compared to an unshortened component code.
IvB BCH Codes with
For BCH component codes with errorcorrecting capability larger than 2, the error floor is generally out of reach for our software simulations. Hence, we focus on the performance improvements that can be obtained in the waterfall regime.
In Fig. 5, we show the achieved BER for two different PCs. The first PC is based on a BCH component code with parameters . For these parameters, miscorrections significantly degrade the performance. Consequently, there is a large performance gap between iterative BDD and idealized iterative BDD. The anchorbased decoding partially closes this gap and achieves a NCG improvement of around dB over iterative BDD at . The second PC is based on a BCH component code with parameters . The miscorrection probability is reduced approximately by a factor of 16 compared to the first PC. Hence, there is only a small gap between iterative BDD and idealized iterative BDD. The anchorbased decoding manages to close this gap almost completely. The NCG improvement at in this case is, however, limited to around dB.
V PostProcessing
If the anchorbased decoding terminates unsuccessfully, one may use some form of PP in order to continue the decoding. In this section, we discuss two PP techniques, which we refer to as bitflipanditerate PP and algebraicerasure PP. Both techniques have been studied before in the literature as a means to lower the errorfloor for various GPCs assuming the conventional iterative BDD [28, 9, 21, 29, 30, 26].
Va Methods
Let denote the set of failed component codewords of type after an unsuccessful decoding attempt. That is, and are, respectively, row and column codewords that still have nonzero syndrome after decoding iterations. The intersection of these codewords defines a set of bit positions according to
(9) 
For bitflipanditerate PP, the bits in the intersection (9) are first flipped, after which the iterative decoding is resumed for one or more iterations. The intuition behind this approach is that the equivalent channel for the bits in the intersection (9) after the iterative decoding is essentially a BSC with very high crossover probability . Thus, the bit flipping converts this channel into a BSC where and iterative decoding may then resolve the remaining errors. Bitflipanditerate PP has been applied to PCs [28, 29], halfproduct codes [30], and staircase codes [26].
For algebraicerasure PP, the bits in the intersection (9) are instead treated as erasures. An algebraic erasure decoder is then used to recover the bits. Assuming that there are no miscorrected codewords, algebraicerasure PP provably succeeds as long as either or holds. This type of PP has been applied to braided codes in [9] and to halfproduct codes in [21].
VB PostProcessing for AnchorBased Decoding
In principle, the above PP techniques can be applied after the anchorbased decoding without any changes. However, it is possible to improve the effectiveness of algebraicerasure PP by exploiting some additional information that is available for the anchorbased decoding. In particular, recall that it is necessary to keep track of the error locations of anchors in case they are backtracked. If the anchorbased decoding fails, these error locations can be used for the purpose of PP as follows. Assume that we have determined the sets and . Then, one can check for anchors that satisfy the condition
(10) 
where , i.e., anchors that have corrected errors, where the error locations overlap entirely with the set of failed component codewords. For the algebraicerasure PP, we found that it is beneficial to include such anchors into the respective sets and (even though they have zero syndrome). This is because such component codewords are likely to be miscorrected.
Note that additional component codewords should only be included into the sets and as long as or remain satisfied. Therefore, if there are more component codewords satisfying (10) than allowed by these constraints, we perform the inclusion on a random basis. A better, but also more complex, approach would be to perform the algebraic erasure decoding multiple times with all possible combinations of component codewords in the sets , and those satisfying (10).
Remark 5.
For bitflipanditerate PP, we found that the same strategy may, in fact, degrade the performance, in particular if the number of decoding iterations is relatively low. For small , the decoding often terminates unsuccessfully with only a few remaining errors left. These errors are then easily corrected using the conventional bitflipanditerate PP. In that case, it is counterproductive to include additional component codewords into the sets and .
VC Example for BCH Codes with
Consider again the PC based on a shortened BCH code with parameters studied in [16] (see Fig. 4). The authors in [16] also consider the application of bitflipanditerate PP in order to reduce the error floor. The simulation data was provided to us by the authors and the results are reproduced^{1}^{1}1The same results are shown in [16, Fig. 3]. for convenience in Fig. 4 by the red line (squares) labeled “w/ PP”. For the anchorbased decoder, we propose to use instead the algebraicerasure PP as described in the previous subsection. In order to estimate its performance, we first consider algebraicerasure PP for the idealized iterative BDD without miscorrections. In this case, the dominant stopping set is of size and involves 6 row and 6 column codewords. An example of this stopping set is shown by the black crosses in Fig. 3(c), where the involved rows and columns are indicated by the dotted lines. It is pointed out in [26] that the multiplicity of such a stopping set can be obtained by using existing counting formulas for the number of binary matrices with given row and weight weight [36]. In particular, there exist 297,200 binary matrices of size with uniform row and column weight 3 [36, Table 1]. This gives a multiplicity of for this stopping set. The error floor can then be estimated using (7) with . This is shown in Fig. 4 by the dashed black line labeled “error floor w/ PP” and can be verified using the idealized iterative BDD including PP. The performance of anchorbased decoding including algebraicerasure PP is also shown in Fig. 4 and virtually overlaps with the performance of idealized iterative BDD for BERs below . Overall, the improvements translate into an additional NCG of around dB at a BER of over iterative BDD with bitflipanditerate PP.
Without the modification described in Sec. VB, the performance of algebraicerasure PP assuming the anchorbased decoding would be slightly decreased. Indeed, we found that a dominant error event of the anchorbased decoder for and is such that 3 row (or column) decoders miscorrect towards the same estimated error pattern of weight 6. This scenario is illustrated in Fig. 3(d). We did not encounter similar error events for the conventional iterative BDD. This indicates that the anchorbased decoder introduces a slight bias towards an estimated error pattern, once it is “anchored”. For the error event shown in Fig. 3(d), 6 column codewords are not decodable, whereas all row codewords are decoded successfully with a zero syndrome. This implies that the set is empty and therefore the bit intersection (9) is empty as well. Hence, conventional PP (both bitflipanditerate and algebraicerasure) would fail. On the other hand, with high probability condition (10) holds for all 3 miscorrected row codewords. Condition (10) may also hold for correctly decoded anchors. However, there is no harm in including up to two additional correctly decoded row codewords into the set .
Vi Implementation Complexity
One of the main advantages of iterative BDD compared to messagepassing decoding is the significantly reduced decoder data flow [7]. In this section, we briefly review the product decoder architecture in [7] and discuss the potential increase in implementation complexity for the proposed anchorbased decoding. However, the design of a full hardware implementation is beyond the scope of this paper.
The product decoder architecture discussed in [7] consists of three main parts or units (cf. [7, Fig. 3]): a data storage unit for the product code array, a syndrome storage unit, and a BCH component decoder unit. Based on this architecture, the authors argue that the internal data flow (in bits/s) between these units can be used as a surrogate for the implementation complexity of iterative BDD of PCs. The main contributors to the overall decoder data flow are as follows:

Initially, the syndromes for all component codewords have to be computed and stored in the syndrome storage based on the received data bits.

During iterative decoding, syndromes are loaded from the syndrome storage and used by BCH component decoder unit for the component decoding.

After a successful component decoding, the syndromes in the syndrome storage are updated based on the found error locations. Moreover, the corresponding bits in the data storage unit are flipped.
A key aspect of this architecture is that information between component codewords is exchanged entirely through their syndromes. This is very efficient at high code rates: in this case, syndromes can be seen as a compressed representation of the component codeword. Moreover, each successful component decoding affects at most syndromes.
In principle, anchorbased decoding can also operate entirely in the syndrome domain, thereby leveraging the same syndrome compression effect as iterative BDD. In particular, the initial syndrome computation phase can be performed in the same fashion as for iterative BDD. The syndrome loading occurs before executing line 2 of Algorithm 2 and syndrome updates are triggered due to line 2 of Algorithm 4.
While the syndrome domain implementation can be kept intact, in the following we comment on some of the implementation differences between iterative BDD and anchorbased decoding.
Vi1 Component code status
Anchorbased decoding uses a status value for each component code. Status changes occur after BDD (lines 15 and 19 in Algorithm 2), after backtracking (line 8 in Algorithm 3), and after applying bit flips (lines 3–6 in Algorithm 4). This leads to an apparent complexity increase compared to a straightforward implementation of iterative BDD. On the other hand, even for iterative BDD, it is common to introduce some form of status information for each component codeword. For example, a status flag is often used to indicate if the syndrome for a particular component code changed since the last decoding attempt [7]. This is done in order to avoid decoding the same syndrome multiple times. The product decoder architecture in [16] also features a status flag to indicate the failure of a particular component code in the last decoding iteration. Therefore, the slightly more involved status handling for the anchorbased decoding should not lead a drastic complexity increase compared to practical implementations of iterative BDD.
Vi2 Error locations
Additional storage is needed to keep track of the error locations for each anchor codeword in case of backtracking (lines 6–7 in Algorithm 3). Since each individual error location can be specified using +1 bits, the total extra storage for all error locations required is +1) bits.
Vi3 Conflict sets
Additional storage is also needed to store the conflicts between component codewords. For a errorcorrecting component code, there can be at most conflicts for each frozen component codewords. Moreover, for a conflict threshold of , it is sufficient to keep track of a single conflict per anchor codeword. Taking the larger of these two values, the conflict set size therefore has to be . The extra storage required is thus the same as for the error locations, i.e., bits. One possibility to reduce the required storage is to only keep track of a single conflict for each frozen component codeword and ignore other conflicts. This would also lead to a very simple implementation of the loops in Algorithm 3 (line 1) and Algorithm 4 (line 7). On the other hand, this may also lead to a small performance loss.
Vii Conclusion
We have shown that the performance of product codes can be significantly improved by adopting a novel iterative decoding algorithm. The proposed algorithm uses anchor codewords to reliably detect and prevent miscorrections. Depending on the component code parameters and the BSC crossover probability, anchorbased decoding can closely approach the performance of idealized iterative BDD where a genie avoids all miscorrections. Moreover, the algorithm can be applied to a large variety of related code classes that are used in practice, e.g., staircase or braided codes.
References
 [1] P. Elias, “Errorfree coding,” IRE Trans. Inf. Theory, vol. 4, no. 4, pp. 29–37, Apr. 1954.
 [2] A. J. Feltström, D. Truhachev, M. Lentmaier, and K. S. Zigangirov, “Braided block codes,” IEEE Trans. Inf. Theory, vol. 55, no. 6, pp. 2640–2658, Jul. 2009.
 [3] J. Justesen, K. J. Larsen, and L. A. Pedersen, “Error correcting coding for OTN,” IEEE Commun. Mag., vol. 59, no. 9, pp. 70–75, Sep. 2010.
 [4] J. Justesen, “Performance of product codes and related structures with iterated decoding,” IEEE Trans. Commun., vol. 59, no. 2, pp. 407–415, Feb. 2011.
 [5] M. Scholten, T. Coe, and J. Dillard, “Continuouslyinterleaved BCH (CIBCH) FEC delivers best in class NECG for 40G and 100G metro applications,” in Proc. Optical Fiber Communication Conf. (OFC), San Diego, CA, 2010.
 [6] H. D. Pfister, S. K. Emmadi, and K. Narayanan, “Symmetric product codes,” in Proc. Information Theory and Applications Workshop (ITA), San Diego, CA, 2015.
 [7] B. P. Smith, A. Farhood, A. Hunt, F. R. Kschischang, and J. Lodge, “Staircase codes: FEC for 100 Gb/s OTN,” J. Lightw. Technol., vol. 30, no. 1, pp. 110–117, Jan. 2012.
 [8] N. Abramson, “Cascade decoding of cyclic product codes,” IEEE Trans. Commun. Tech., vol. 16, no. 3, pp. 398–402, Jun. 1968.
 [9] Y.Y. Jian, H. D. Pfister, K. R. Narayanan, R. Rao, and R. Mazahreh, “Iterative harddecision decoding of braided BCH codes for highspeed optical communication,” in Proc. IEEE Glob. Communication Conf. (GLOBECOM), Atlanta, GA, 2014.
 [10] Y.Y. Jian, H. D. Pfister, and K. R. Narayanan, “Approaching capacity at highrates with iterative harddecision decoding,” IEEE Trans. Inf. Theory, vol. 63, no. 9, pp. 5752–5773, Sep. 2017.
 [11] A. Farhoodfar, F. R. Kschischang, A. Hunt, B. P. Smith, and J. Lodge, “Staircase forward error correction coding,” US Patent 8,751,910 B2, 2011.
 [12] L. M. Zhang and F. R. Kschischang, “Staircase codes with 6% to 33% overhead,” J. Lightw. Technol., vol. 32, no. 10, pp. 1999–2002, May 2014.
 [13] C. Häger, A. Graell i Amat, H. D. Pfister, A. Alvarado, F. Brännström, and E. Agrell, “On parameter optimization for staircase codes,” in Proc. Optical Fiber Communication Conf. (OFC), Los Angeles, CA, 2015.
 [14] C. Häger, H. D. Pfister, A. Graell i Amat, and F. Brännström, “Density evolution for deterministic generalized product codes on the binary erasure channel at high rates,” IEEE Trans. Inf. Theory, vol. 63, no. 7, pp. 4357–4378, Jul. 2017.
 [15] ——, “Density evolution and error floor analysis of staircase and braided codes,” in Proc. Optical Fiber Communication Conf. (OFC), Anaheim, CA, 2016.
 [16] C. Condo, P. Giard, F. LeducPrimeau, G. Sarkis, and W. J. Gross, “A 9.96 dB NCG FEC scheme and 164 bits/cycle lowcomplexity product decoder architecture,” IEEE Trans. Circuits and Systems I: Fundamental Theory and Applications (accepted for publication), 2017. [Online]. Available: https://arxiv.org/pdf/1610.06050v2.pdf
 [17] H. C. Chang, C. B. Shung, and C. Y. Lee, “A reedsolomon productcode (RSPC) decoder chip for DVD applications,” IEEE J. SolidState Circuits, vol. 36, no. 2, pp. 229–238, Feb. 2001.
 [18] J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J. C. Hoe, “Multibit error tolerant caches using twodimensional error coding,” in Proc. ACM/IEEE Int. Symp. Microarchitecture (MICRO), 2007.
 [19] V. Tam Van and S. Mita, “A novel error correcting system based on product codes for future magnetic recording channels.” IEEE Trans. Magnetics, vol. 56, no. 10, p. 2010, Oct. 2011.
 [20] C. Yang, Y. Emre, and C. Chakrabarti, “Product code schemes for error correction in MLC NAND flash memories,” IEEE Trans. VLSI Systems, vol. 20, no. 12, pp. 2302–2314, Dec. 2012.
 [21] S. Emmadi, K. R. Narayanan, and H. D. Pfister, “Halfproduct codes for flash memory,” in Proc. NonVolatile Memories Workshop, San Diego, CA, 2015.
 [22] M. Schwartz, P. Siegel, and A. Vardy, “On the asymptotic performance of iterative decoders for product codes,” in Proc. IEEE Int. Symp. Information Theory (ISIT), Adelaide, SA, 2005.
 [23] J. Justesen and T. Høholdt, “Analysis of iterated hard decision decoding of product codes with ReedSolomon component codes,” in Proc. IEEE Information Theory Workshop (ITW), Tahoe City, CA, 2007.
 [24] L. M. Zhang, D. Truhachev, and F. R. Kschischang, “Spatiallycoupled splitcomponent codes with boundeddistance component decoding,” in Proc. IEEE Int. Symp. Information Theory (ISIT), Hong Kong, 2015.
 [25] D. Truhachev, A. Karami, L. Zhang, and F. Kschischang, “Decoding analysis accounting for miscorrections for spatiallycoupled splitcomponent codes,” in Proc. IEEE Int. Symp. Information Theory (ISIT), Barcelona, Spain, 2016.
 [26] L. Holzbaur, H. Bartz, and A. WachterZeh, “Improved decoding and error floor analysis of staircase codes,” in Proc. Int. Workshop on Coding and Cryptography (WCC), Saint Petersburg, Russia, 2017.
 [27] R. J. McEliece and L. Swanson, “On the decoder error probability for ReedSolomon codes,” IEEE Trans. Inf. Theory, vol. 32, no. 5, pp. 701–703, Sep. 1986.
 [28] S. Sridharan, M. Jarchi, and T. Coe, “Product code based forward error correction system,” US Patent 6,810,499 B2, 2003.
 [29] C. Condo, G. Sarkis, P. Giard, W. J. Gross, and S. Member, “Stall pattern avoidance in polynomial product codes,” in Proc. IEEE Global Conf. Signal and Information Processing (GlobalSIP), Washington, DC, 2016.
 [30] T. Mittelholzer, T. Parnell, N. Papandreou, and H. Pozidis, “Improving the errorfloor performance of binary halfproduct codes,” in Proc. Int. Symp. Information Theory and its Applications (ISITA), Montenery, CA, 2016.
 [31] B. P. Smith, “Errorcorrecting codes for fibreoptic communication systems,” Ph.D. dissertation, University of Toronto, 2011.
 [32] C. Häger and H. D. Pfister, “Miscorrectionfree decoding of staircase codes,” in Proc. European Conf. Optical Communication (ECOC), Gothenburg, Sweden, 2017.
 [33] S. Hirasawa, M. Kasahara, Y. Sugiyama, and T. Namekawa, “Modified product codes,” IEEE Trans. Inf. Theory, vol. 30, no. 2, pp. 299–306, Mar. 1984.
 [34] M. Alipour, O. Etesami, G. Maatouk, and A. Shokrollahi, “Irregular product codes,” in Proc. IEEE Information Theory Workshop (ITW), Lausanne, Switzerland, 2012.
 [35] D. Gorenstein, W. W. Peterson, and N. Zierler, “Twoerror correcting BoseChaudhuri codes are quasiperfect,” Inf. Control, vol. 3, no. 3, pp. 291–294, 1960.
 [36] B.Y. Wang and F. Zhang, “On the precise number of (0,1)matrices in U(R,S),” Discrete Mathematics, vol. 187, pp. 211–220, 1998.
Appendix A Density Evolution for Product Codes
In this appendix, we briefly describe how to reproduce the DE results that are shown in Figs. 2, 4, and 5.
Consider a BCH component code with parameters . The goal is to predict the waterfall BER of the PC under iterative BDD as a function of the BSC crossover probability . Recall that the component code length is and the number of iterations is , where each iteration consists of two halfiterations with row and column codewords being decoded separately. We define , , and . Further, let for odd and for even, where . Finally, let be the tail probability of a Poisson random variable with mean . With these definition, we recursively compute
(11) 
for and , using , , as initial values. Collecting all final values in a vector as , the BER is then approximated as
(12) 
where is the number of 1s in .
The above procedure can be applied to compute the asymptotic performance for a wide variety of GPCs by simply adjusting the code parameters and , and the decoding schedule . For example, for staircase codes, the matrix is an matrix with entries for and zeros elsewhere. For more details on this topic, we refer the interested reader to [14].