# Memory Efficient Quasi-Cyclic Spatially Coupled LDPC Codes

## Abstract

In this paper we propose the construction of Spatially Coupled Low-Density Parity-Check (SC-LDPC) codes using a periodic time-variant Quasi-Cyclic (QC) algorithm. The QC based approach is optimized to obtain memory efficiency in storing the parity-check matrix in the decoders. A hardware model of the parity-check storage units has been designed for Xilinx FPGA to compare the logic and memory requirements for various approaches. It is shown that the proposed QC SC-LDPC code (with optimization) can be stored with reasonable logic resources and without the need of block memory in the FPGA. In addition, a significant improvement in the processing speed is also achieved.

## I Introduction

Spatially coupled Low-Density Parity-Check (SC-LDPC) codes, which can be thought of as a class of terminated LDPC convolutional codes [1], have recently drawn significant interest in channel coding due to their excellent sum-product decoding thresholds [2, 3, 4, 5]. Their excellent threshold performance is achieved at large code lengths (generally over 100K) which represents a significant challenge for implementation. The practical implementation of such large LDPC codes is a well known problem, particularly the storage of the parity-check matrix in the hardware [6] which can be achieved efficiently only for very structured parity-check matrices. The structure of the matrix also significantly affects the implementation complexity of the encoder and decoder. Quasi-Cyclic (QC) based LDPC matrices [7, 8, 9] have proven advantages over unstructured (random) matrices in design complexity and encoding process [10, 11]. They also enable collision-free parallel processing in the decoder [12].

Convolutional LDPC codes can be of two types: time-variant and time-invariant [1]. Recently, there have been studies on deriving time-variant and time-invariant LDPC convolutional codes by unwrapping QC LDPC block codes [13]. It has also been noted that the time-invariant codes are less complex for implementation but the decoding performance is poor compared to time-variant codes [14].

In this paper, we present a special case of periodic time-variant SC-LDPC codes using a QC construction technique. The inherent advantages of QC based codes and the diagonal structure of the SC-LDPC matrix are exploited to reduce the complexity of the decoder by reusing the circulants in the matrix. When using sum-product decoding, a critical factor in the decoder performance is the girth of the code in the Tanner graph. Consequently, in this paper we will study the impact of the period of time-varying SC-LDPC codes on the girth of their Tanner graphs. A comprehensive analysis is carried out to evaluate the performance in terms of bit error rate (BER) and hardware implementation complexity of these memory-optimized QC SC-LDPC codes. The FPGA resource requirements and speed of operation are compared by implementing a hardware model of SC-LDPC codes using QC [15] and progressive edge growth (PEG) [16] techniques. Memory efficiency and speed improvements achievable by using the proposed optimized QC SC-LDPC codes are also presented.

Here we briefly introduce SC-LDPC codes, more details can be found in [17, 18, 5, 19]. A sample Tanner graph structure of SC-LDPC codes (as defined in [5]) is shown in Fig. 1. The SC-LDPC code starts with the protograph of a standard -regular LDPC code, where and denotes the average bit degree and check degree respectively. Fig. 1(a) shows a regular LDPC protograph for and . There are bit nodes, shown as circles and check nodes, shown as a square, in the base protograph, where gcd stands for greatest common denominator. (When a spatially coupled code cannot be constructed using this method.)

For an SC-LDPC ensemble, a coupled chain of of these protographs (see Fig. 1(b)) is formed by repeating the standard protograph times and connecting it once to each of the protographs to its right. There are extra check nodes added when forming the coupled chain of protographs and this reduces the rate of the resulting spatially coupled chain when compared to the original LDPC protograph code. The rate loss of course diminishes as is increased.

A particular code of length is formed from the () protograph chain by creating copies of every node and edge in the coupled chain ^{1}

(1) |

(2) |

where, and are the number of bit and check nodes in the original protograph respectively.

## Ii Memory-Efficient Quasi-Cyclic SC-LDPC codes

The quasi-cyclic (QC) construction of LDPC codes has various advantages including simple encoding [10], parallel decoding and memory efficiency in storing the matrix elements in the decoder [12]. Hence, for the same advantages, it is desirable to have SC-LDPC codes also in QC form. A dual-core programmable QC LDPC convolutional decoder has been designed in [20]. However, the design uses a time-invariant code whose performance is poor compared to time-variant convolutional LDPC codes [13]. The construction of non-periodic time-varying QC SC-LDPC code is presented in [15, 21]. [15] discusses the performance improvements achieved by spatial coupling; particularly for quantum LDPC codes. Whereas [21] presents techniques to improve the upper bound on the minimum Hamming distance of members of the QC sub-ensembles. Recently, a time-varying periodic covolutional LDPC decoder has been designed from a QC LDPC block code that achieves a throughput of 2 [14].

In contrast to the above variations of convolutional LDPC codes, this paper proposes an innovative technique to construct time-varying QC SC-LDPC code with periodicity, , and also without the need of a QC LDPC block code. It is shown that by using periodic codes (), the BER performance is as good as that of non-periodic time-varying codes. As an added advantage to the reduced complexity, we also show that there is a significant reduction in the hardware requirements of the decoder, especially the Block RAMs (BRAM) in an FPGA.

While designing an LDPC decoder, it is essential to store the structure of the LDPC matrix in the hardware to carry out the decoding process. The structure is normally stored in the form of BRAMs because of the enormous data required, particularly for large LDPC codes [22]. Compared to the codes that are constructed using PEG algorithms [16], the QC based technique requires significantly less memory to store the matrix structure. This is because the latter requires memory for storing only the circulant, instead of the entire matrix. However, a special circulant-processing block is needed for cyclic-shifting of the identity matrix for a given circulant to realize the actual and complete LDPC matrix in the decoder hardware [23]. Therefore, decoders using QC based LDPC matrices require less memory but use additional logic elements for special operations.

The QC SC-LDPC matrix consists of a chain of circulants making up the protographs. For a -regular code, (i.e., a rate code) each protograph in the chain corresponds to a set of individual circulants.

The staircase structure of the SC-LDPC codes offers a unique opportunity to optimize the QC based technique to further reduce the hardware resources required for the decoder. We investigate whether a fixed set of circulants in the matrix (i.e., fewer than independent circulant sets) can be reused without affecting the decoding performance. To this end, we consider SC-LDPC codes constructed with one, two and three repeating sets of circulants. In other words, we can say time-varying QC SC-LDPC codes with periodicity, =1, 2 and 3. Fig. 2(a), Fig. 2(b) and Fig. 2(c) show one, two and three repeating sets respectively for a rate QC SC-LDPC code. Each unique circulant is represented by a unique letter of the alphabet, and similarly repeated circulants are represented by the same letter. Note that in the given example, the QC SC-LDPC codes with =1, 2 and 3 require a limited number of 6, 12 and 18 circulants only. Thus the QC SC-LDPC code is completely specified by a very small number of shift values (6, 12 or 18), significantly reducing the memory requirements in storing the SC-LDPC matrix in the decoder.

### Ii-a The effect of circulant reuse on code girth

Tanner [7] and Fossorier [9] have shown that QC LDPC codes defined by an array of circulant matrices have a maximum possible girth of 12. Protograph LDPC codes, such as SC-LDPC codes, allow zero matrices in place of some circulant matrices and so are not limited to this bound. However, for these more general LDPC codes Kim et al. in [24] have shown that the girth of the QC code can be bounded by the girth of its base protograph. Kim et al. defined inevitable cycles in a QC-LDPC code as those cycles that always exist in the code regardless of the choice of circulant permutations. They then found all the subgraph patterns of protographs which lead to inevitable cycles of size up to girth 20.

It is easy to see that the submatrix

exists in the base matrix of any SC-LDPC code with and any code with that has . Thus, applying the results of [24], these SC-LDPC codes have inevitable cycles of length 12 and hence an upper bound of 12 on their girth.

Kim et al. also showed that a protograph with girth cannot contain inevitable cycles of length smaller than . Thus if the circulants are large enough and chosen appropriately, a QC SC-LDPC code with girth can be found. However, choosing the shift value of the circulants in QC-LDPC codes to avoid all non-inevitable cycles, and thus obtain a girth equal to the minimum inevitable cycle length, is not trivial and few algebraic constructions for QC girth 12 codes have been found. One notable exception is an algebraic construction for a class of (3,5)-regular QC-LDPC codes of Tanner [7] which obtain girth 12 for circulant matrices for certain choices of prime [25].

For a QC SC-LDPC code with the reuse of columns of circulants as in Fig. 2, the choice of circulants is restricted by the reuse factor and so we define reuse- inevitable cycles as those cycles that always exist in the code when circulant permutations are reused, regardless of the choice of those circulant permutations.

Adapting the notation of [9], we define the parity-check matrix for a general SC-LDPC code as shown in Fig. 3,

where is a circulant matrix with a shift value . I.e. represents the circulant permutation matrix with a one at column for row , , and zero elsewhere. For notational clarity, we assume that the SC-LDPC code has , however the girth results hold for all .

A cycle of length in is described by a sequence of positions such that: 1) each consecutive position is obtained by changing, alternatively, the row or column index of the previous position; and 2) all positions are distinct, except the first and last ones. Thus two consecutive positions in any cycle belong to distinct circulant permutation matrices which are either in the same row, or in the same column. A length cycle exists if and only if [9]:

(3) |

where , , and

Given this notation we can now define the inevitable cycles in the QC SC-LDPC codes with circulant reuse.

###### Lemma 1

A QC SC-LDPC code with reuse , and has a reuse-1 inevitable cycle of length and thus a maximum girth of .

###### Lemma 2

A QC SC-LDPC code with reuse , and has a reuse-2 inevitable cycle of length and thus a maximum girth of .

###### Lemma 3

A QC SC-LDPC code with reuse , and has a reuse-3 inevitable cycle of length and thus a maximum girth of .

Although the proof only requires that we find one 6-cycle (respectively 8-cycle or 10-cycle) which exists for any choices of circulants, in fact every column of (and hence every codeword bit) is involved in 6-cycles (respectively 8-cycles or 10-cycles) in QC SC-LDPC codes with reuse-1 (respectively reuse-2 or 3) regardless of which circulants are chosen.

Consequently, the reuse-1 codes not only have a poor girth but also have a very large number of cycles of the minimum length. Similarly for the reuse-2 codes, while a girth of 8 is not necessarily problematic for LDPC codes if there are only a few such cycles, the very large number of 8-cycles in the QC SC-LDPC is certainly detrimental for the performance of the sum-product decoder when longer codes are considered.

## Iii Performance of SC-LDPC codes

The QC SC-LDPC codes were simulated to evaluate the BER performance on a binary input additive white Gaussian noise (BI-AWGN) channel. We used multi-edge density evolution to compute the threshold of SC-LDPC codes, shown in Fig. 4, and noted that is necessary to achieve thresholds better than that of standard LDPC codes and that codes with have an improved threshold over codes with as is increased. Also, as SC-LDPC codes are known to have good performance for very long codes, we compared decoding performance with code length 25K, 100K and 250K as shown in Fig. 5. Given these results, a fairly long SC-LDPC code of 103,200-bits () with , , and is considered for our following simulation results. However, we note that even longer codes would perform noticeably better. Simulations were carried out using software models on a BI-AWGN channel. The sum-product algorithm was used for decoding with a maximum of 1000 iterations, and the simulation was run until at least 50 word errors were accumulated.

Fig. 6 shows a comparison of QC SC-LDPC codes with different levels of circulant reuse. In the construction process for all codes the circulants are chosen to avoid girth 4 in the resulting matrix. As would be expected due to girth limitations, the quasi-cyclic matrix with reuse of one circulant column (i.e., time invariant) and two circulant columns (periodic time-varying with period 2) have poor performances. However, by reusing three circulant columns (periodic time-varying with period 3), the BER performance we obtained is as good as standard (non-periodic time-varying) QC SC-LDPC codes down to a bit error rate of .

Lastly, in Fig. 7 we compare the BER performance of the QC SC-LDPC codes with non-quasi-cyclic SC-LDPC codes and with standard LDPC block codes. The non-quasi-cyclic SC-LDPC codes are constructed using a PEG algorithm modified for spatial coupling. The standard LDPC codes are constructed according to PEG [26] and QC [9] techniques. From Fig. 7, it is clear that the waterfall regions of SC-LDPC codes are better than standard LDPC codes (as predicted by density evolution, see Fig. 4). Also, the performance of the QC based SC-LDPC codes is very similar to that of PEG based codes.

## Iv Estimation of hardware requirements for using SC-LDPC codes

The new QC SC-LDPC codes are compared with random SC-LDPC codes by designing a decoder hardware model. The model consists of units that are essential for storing the contents of the LDPC matrix and generating the appropriate address locations in a decoder. The block diagram of the hardware model is shown in Fig. 8. The hardware model consists of a clock and a synchronous reset as inputs. It also consist of an address counter and output controller for sequencing the input and output data respectively. The bit and check node address generators are responsible for storing the LDPC matrix information and generating appropriate addresses for the decoder. Note that the QC SC-LDPC codes require an additional Cyclic Shift unit to generate appropriate addresses based on the circulants for decoding, as shown in Fig. 8. Whereas PEG based SC-LDPC codes do not require any such unit, since the complete set of matrix elements are stored in hardware memory.

The hardware models for both PEG and QC SC-LDPC codes have been designed and synthesized using Verilog HDL for FPGA implementation. The designs are placed and routed for Xilinx Kintex-7 (XC7K355T) FPGA with LDPC codes of length 100K and 25K. The estimates of FPGA hardware requirements and maximum clock frequency achievable for the designs are shown in Table I. As expected, a large number of BRAMs are utilized by the PEG based codes compared to QC SC-LDPC codes, to store the matrix elements. With slightly increased logic units - registers and look-up tables (LUT), the standard QC codes offer a significant saving (up to 43 times in the case of 100K code lengths) of BRAMs compared to PEG SC-LDPC codes. As stated earlier, the increased logic requirements are due to the Cyclic Shift unit in the QC based SC-LDPC hardware models. Further, it is also noted that QC SC-LDPC codes with reuse-3 do not require BRAMs. This is due to few set of reusable circulants in the code, that can be easily stored in the LUTs. Elimination of BRAMs (which normally have large access delays) also results in a significant improvement in the speed of operation (up-to 40% increase in the maximum operating clock frequency) compared to PEG or standard QC SC-LDPC codes (for 100K codes).

LDPC codes | Std-PEG | Std-QC | QC (Reuse-3) | ||
---|---|---|---|---|---|

Code length | 100K | 25K | 100K | 25K | 100K / 25K |

Registers | 286 | 268 | 320 | 302 | 322 |

LUTs | 185 | 170 | 890 | 819 | 609 |

Slices | 143 | 139 | 357 | 356 | 214 |

RAM | 528 | 116 | 12 | 12 | - |

Clock (MHz) | 188 | 215 | 180 | 182 | 264 |

## V Conclusion

This paper has presented the construction of periodic time-varying SC-LDPC codes using the quasi-cyclic technique. It also demonstrates the threshold advantages achievable by SC-LDPC codes over standard LDPC codes. Memory optimized QC SC-LDPC codes are introduced to significantly reduce the complexity of the decoder for storing the matrix elements. It is shown that by reusing the circulant columns in the SC-LDPC matrix (with reuse-3), it is possible to obtain a memory efficient decoder without noticeably affecting the decoding performance. The advantages of using the optimized QC SC-LDPC codes have been demonstrated by designing a hardware model, which shows substantial reduction in memory requirements and a significant improvement in the operating clock frequency of the decoder.

### Footnotes

### References

- Jimenez Felstrom A, Zigangirov KS. Time-varying periodic convolutional codes with low-density parity-check matrix. IEEE Trans. Inform. Theory 1999; 45(6):2181–2191.
- Tanner RM, Sridhara D, Sridharan A, Fuja TE, Costello DJ. LDPC block and convolutional codes based on circulant matrices. IEEE Trans. Inform. Theory 2004; 50(12):2966–2984.
- Lentmaier M, Sridharan A, Zigangirov KS, Costello DJ. Terminated LDPC convolutional codes with thresholds close to capacity. Proc. IEEE Int. Symp. Inf. Theory, 2005; 1372 –1376, doi:10.1109/ISIT.2005.1523567.
- Lentmaier M, Sridharan A, Costello DJ, Zigangirov KS. Iterative decoding threshold analysis for LDPC convolutional codes. IEEE Trans. Inform. Theory Oct 2010; 56(10):5274 –5289, doi:10.1109/TIT.2010.2059490.
- Kudekar S, Richardson TJ, Urbanke RL. Threshold saturation via spatial coupling: Why convolutional LDPC ensembles perform so well over the BEC. IEEE Trans. Inform. Theory 2011; 57(2):803–834.
- Chandrasetty VA, Aziz SM. A highly flexible LDPC decoder using hierarchical quasi-cyclic matrix with layered permutation. J. Networks, Academy Publisher 2012; 7(3):441–449.
- Tanner RM, Sridhara D, Fuja TE. A class of group-structured LDPC codes. Proc. Int. Conf. Inf. Syst. Technol. Its Appli., 2001.
- Johnson SJ, Weller SR. A family of irregular LDPC codes with low encoding complexity. IEEE Commun. Lett. February 2003; 7(2):79–81.
- Fossorier MPC. Quasi-cyclic low-density parity-check codes from circulant permutation matrices. IEEE Trans. Inform. Theory 2004; 50(8):1788–1793.
- Mahdi A, Kanistras N, Paliouras V. An encoding scheme and encoder architecture for rate-compatible QC-LDPC codes. IEEE Workshop on Signal Processing Systems, 2011; 328–333.
- Xinmiao Z, Fang C. Efficient partial-parallel decoder architecture for quasi-cyclic nonbinary LDPC codes. IEEE Trans. Circuits and Systems 2011; 58(2):402–414.
- Yongmei D, Ning C, Zhiyuan Y. Memory efficient decoder architectures for quasi-cyclic LDPC codes. IEEE Trans. Circuits and Systems 2008; 55(9):2898–2911.
- Pusane AE, Smarandache R, Vontobel PO, Costello DJ. Deriving good LDPC convolutional codes from LDPC block codes. IEEE Trans. Inform. Theory 2011; 57(2):835–857.
- Chiu-Wing S, Xu C, Lau FCM, Yue Z, Tam WM. A 2.0 gb/s throughput decoder for QC-LDPC convolutional codes. IEEE Trans. Circuits and Systems 2013; 60(7):1857–1869.
- Hagiwara M, Kasai K, Imai H, Sakaniwa K. Spatially coupled quasi-cyclic quantum LDPC codes. IEEE International Symposium on Information Theory, 2011; 638–642.
- Zhengang C, Bates S. Construction of low-density parity-check convolutional codes through progressive edge-growth. IEEE Commun. Lett. 2005; 9(12):1058–1060.
- Sridharan A, Truhachev D, Lentmaier M, Costello DJ, Zigangirov K. Distance bounds for an ensemble of LDPC convolutional codes. IEEE Trans. Inform. Theory December 2007; 53(12):4537–4555.
- Lentmaier M, Fettweis GP, Zigangirov KS, Costello DJ. Approaching capacity with asymptotically regular LDPC codes. Proc. Inf. Theory and App. Workshop, 2009; 173 –177, doi:10.1109/ITA.2009.5044941.
- Kudekar S, Kasai K. Threshold saturation on channels with memory via spatial coupling. IEEE International Symposium on Information Theory Proceedings, 2011; 2562–2566.
- Tavares MBS, Matus E, Kunze S, Fettweis GP. A dual-core programmable decoder for LDPC convolutional codes. IEEE International Symposium on Circuits and Systems, Seattle, WA, 2008; 532–535.
- Mitchell DGM, Smarandache R, Lentmaier M, Costello DJ. Quasi-cyclic asymptotically regular LDPC codes. IEEE Information Theory Workshop, Dublin, 2010; 1–5.
- Xiaoheng C, Shu L, Akella V. QSN:A simple circular-shift network for reconfigurable quasi-cyclic LDPC decoders. IEEE Trans. Circuits and Systems–II: Express Briefs 2010; 57(10):782–786.
- Chandrasetty VA, Aziz SM. FPGA implementation of a LDPC decoder using a reduced complexity message passing algorithm. J. Networks, Academy Publisher 2011; 6(1):36–45.
- Kim S, No JS, Chung H, Shin DJ. Quasi-cyclic low-density parity-check codes with girth larger than 12. IEEE Trans. Inform. Theory 2007; 53(8):2885–2891.
- Kim S, No JS, Chung H, Shin DJ. On the girth of Tanner (3, 5) quasi-cyclic LDPC codes. IEEE Trans. Inform. Theory 2006; 52(4):1739–1744, doi:10.1109/TIT.2006.871060.
- Xiao-Yu H, Eleftheriou E, Arnold DM. Progressive edge-growth Tanner graphs. IEEE Global Telecommunications Conference, vol. 2, 2001; 995–1001.