Performance Analysis of Block Markov Superposition Transmission of Short Codes

# Performance Analysis of Block Markov Superposition Transmission of Short Codes

Kechao Huang,  and Xiao Ma,  This work was partially supported by the Program (No. CB) and the China NSF (No. 91438101 and No. 61172082).The authors are with the Department of Electronics and Communication Engineering, Sun Yat-sen University, Guangzhou, GD 510006, China (e-mail: hkech@mail2.sysu.edu.cn; maxiao@mail.sysu.edu.cn).
###### Abstract

In this paper, we consider the asymptotic and finite-length performance of block Markov superposition transmission (BMST) of short codes, which can be viewed as a new class of spatially coupled (SC) codes with the generator matrices of short codes (referred to as basic codes) coupled. A modified extrinsic information transfer (EXIT) chart analysis that takes into account the relation between mutual information (MI) and bit-error-rate (BER) is presented to study the convergence behavior of BMST codes. Using the modified EXIT chart analysis, we investigate the impact of various parameters on BMST code performance, thereby providing theoretical guidance for designing and implementing practical BMST codes suitable for sliding window decoding. Then, we present a performance comparison of BMST codes and SC low-density parity-check (SC-LDPC) codes on the basis of equal decoding latency. Also presented is a comparison of computational complexity. Simulation results show that, under the equal decoding latency constraint, BMST codes using the repetition code as the basic code can outperform -regular SC-LDPC codes in the waterfall region but have a higher computational complexity.

Block Markov superposition transmission (BMST), capacity-approaching codes, extrinsic information transfer (EXIT) chart analysis, sliding window decoding, spatial coupling.

## I Introduction

Low-density parity-check (LDPC) block codes (LDPC-BCs) [1], combined with iterative belief propagation (BP) decoding, are a class of capacity-approaching codes with decoding complexity that increases only linearly with block length [2]. A practical approach to improving the performance of LDPC-BCs is coupling together a series of disjoint graphs that specify the parity-check matrix of an LDPC-BC into a single coupled chain, thereby producing a spatially coupled LDPC (SC-LDPC) code. It has been shown in [3, 4, 5, 6] that SC-LDPC code ensembles exhibit a phenomenon called “threshold saturation”, which allows them to achieve the maximum a posteriori (MAP) thresholds of their underlying LDPC-BC ensembles on memoryless binary-input symmetric-output channels under BP decoding, and thus to achieve capacity by increasing the density of the parity-check matrix. Due to their excellent performance, SC-LDPC codes have recently received a great deal of attention in the literature  (see, e.g., [7, 8, 9, 10, 11, 12, 13, 14, 15] and the references therein).

The concept of spatial coupling is not limited to LDPC codes. Block Markov superposition transmission (BMST) of short codes [16, 17], for example, is equivalent to spatial coupling of the subgraphs that specify the generator matrices of the short codes. From this perspective, BMST codes are similar to braided block/convolutional codes [18, 19, 20], staircase codes [21], and SC turbo codes [22]. An encoder of a BMST code with encoding memory is shown in Fig. 1, where a BMST code can also be viewed as a serially concatenated code with a structure similar to repeat-accumulate-like codes [23, 24, 25]. The outer code is a short code, referred to as the basic code (not limited to repetition codes), that introduces redundancy, while the inner code is a rate-one block-oriented feedforward convolutional code (instead of a bit-oriented accumulator) that introduces memory between transmissions. Hence, BMST codes typically have very simple encoding algorithms. To decode BMST codes, a sliding window decoding algorithm with a tunable decoding delay can be used, as with SC-LDPC codes [26]. The construction of BMST codes is flexible [27, 28], in the sense that it applies to all code rates of interest in the interval (0,1). Further, BMST codes have near-capacity performance (observed by simulation) in the waterfall region of the bit-error-rate (BER) cruve and an error floor (predicted by analysis) that can be controlled by the encoding memory.

On an additive white Gaussian noise channel (AWGNC), the well-known extrinsic information transfer (EXIT) chart analysis [29] can be used to obtain the iterative BP decoding threshold of LDPC-BC ensembles. In [30], a novel EXIT chart analysis was used to evaluate the performance of protograph-based LDPC-BC ensembles, and a similar analysis was used to find the thresholds of -ary SC-LDPC codes with sliding window decoding in [31]. Unlike LDPC codes, the asymptotic BER of BMST codes with window decoding cannot be better than a corresponding genie-aided lower bound [16]. Thus, conventional EXIT chart analysis cannot be applied directly to BMST codes. In this paper, we propose a modified EXIT chart analysis, that takes into account the relation between mutual information (MI) and BER, to study the convergence behavior of BMST codes and to predict the performance in the waterfall region of the BER curve. Simulation results confirm that the modified EXIT chart analysis of BMST codes is supported by their finite-length performance behavior. We also investigate the relationship between the basic code structure, the decoding delay, and the decoding performance of BMST codes when the decoding latency is fixed. Finally, we present a computational complexity comparison of BMST codes and SC-LDPC codes on the basis of equal decoding latency.

The rest of the paper is structured as follows. In Section II, we give a brief review of BMST codes. In Section III, we discuss the relation between BMST codes and protograph-based SC-LDPC codes. In Section IV, we propose a modified EXIT chart analysis of BMST codes. In Section V, we investigate the impact of various parameters on BMST code performance. Then, in Section VI, we present a performance comparison of BMST codes and SC-LDPC codes on the basis of equal decoding latency. A computational complexity comparison of BMST codes and SC-LDPC codes is also given in Section VI. Finally, some concluding remarks are given in Section VII.

## Ii Review of BMST Codes

### Ii-a Encoding of BMST Codes

Consider a BMST code using a rate binary basic code of length and dimension . Let , , , be blocks of data to be transmitted, where . Here, is called the coupling length. The encoding algorithm of a BMST code with encoding memory (coupling width) is described as follows (see Fig. 1), where are interleavers of size .

###### Algorithm 1

Encoding of BMST Codes

• Initialization: For , set .

• Loop: For , , , ,

1. Encode into using the encoding algorithm of the basic code ;

2. For , interleave using the -th interleaver into ;

3. Compute , which is taken as the -th block of transmission.

• Termination: For , , , , set and compute following Loop.

Remark: To force the encoder of BMST codes to the zero state at the end of the encoding process, a tail consisting of blocks of the -dimensional all-zero vector is added. This is different from SC-LDPC code encoders, where the tail is usually non-zero and depends on the encoded information bits (see Section IV of [32]). As a result, the termination procedure for BMST codes is much simpler than for SC-LDPC codes.

The rate of the BMST code is

 RBMST=Lk(L+m)n=LL+mR, (1)

which is slightly less than the rate of the basic code. However, similar to SC-LDPC codes, this rate loss becomes vanishingly small as .

Though any code (linear or nonlinear) with a fast encoding algorithm and an efficient soft-in soft-out (SISO) decoding algorithm can be taken as the basic code, we focus in this paper on the use of the -fold Cartesian product of a repetition (R) code (denoted by R ) or a single parity-check (SPC) code (denoted by SPC ) as the basic code, resulting in a BMST-R code (denoted by BMST-R ) or a BMST-SPC code (denoted by BMST-SPC ), respectively.111Using codes constructed by time-sharing between the R code and the SPC code as the basic code, one can construct BMST-RSPC codes for a wide range of code rates. For more details, see [28]. Note that the overall code length of the basic code in this case is and the overall dimension is or .

### Ii-B Sliding Window Decoding of BMST Codes

BMST codes can be represented by a Forney-style factor graph, also known as a normal graph [33], where edges represent variables and vertices (nodes) represent constraints. All edges connected to a node must satisfy the specific constraint of the node. A full-edge connects to two nodes, while a half-edge connects to only one node. A half-edge is also connected to a special symbol, called a “dongle”, that denotes coupling to other parts of the transmission system (say, the channel or the information source) [33]. There are four types of nodes in the normal graph of BMST codes.

• Node +: All edges (variables) connected to node + must sum to the all-zero vector. The message updating rule at node + is similar to that of a check node in the factor graph of a binary LDPC code. The only difference is that the messages on the half-edges are obtained from the channel observations.

• Node : The node represents the -th interleaver, which interleaves or de-interleaves the input messages.

• Node =: All edges (variables) connected to node = must take the same (binary) values. The message updating rule at node = is the same as that of a variable node in the factor graph of a binary LDPC code.

• Node G: All edges (variables) connected to node G must satisfy the constraint specified by the basic code . The message updating rule at node G can be derived accordingly, where the messages on the half-edges are associated with the information source.

The normal graph of a BMST code can be divided into layers, where each layer typically consists of a node of type G, a node of type =, nodes of type , and a node of type + (see Fig. 2). The result is a high-level normal graph, where each edge represents a sequence of random variables. Looking into the details, we can see that, at each layer, there are nodes of degree , nodes of degree  (including half edges), and nodes corresponding to the short code (R or SPC in this paper).

Similar to SC-LDPC codes, an iterative sliding window decoding algorithm with decoding delay working over a subgraph consisting of consecutive layers can be implemented for BMST codes. An example of a window decoder with decoding delay operating on the normal graph of a BMST code with is shown in Fig. 2. For each window position, the forward-backward decoding algorithm is implemented for updating the messages layer-by-layer within the decoding window.222For more details on the decoding algorithm of BMST codes, we refer the reader to Section III of [16]. Decoding proceeds until a fixed number of iterations has been performed or some given stopping criterion is satisfied, in which case the window shifts to the right by one layer and the symbols corresponding to the layer shifted out of the window are decoded. The first layer in any window is called the target layer.

### Ii-C Genie-Aided Lower Bound on BER

Let represent the performance of a BMST code with encoding memory (coupling width) and coupling length , where is the BER and represents the received bit signal-to-noise ratio (SNR) on an AWGNC in dB, and let represent the performance of the basic code. By assuming a genie-aided decoder, we can obtain a lower bound on the performance of BMST codes given by (see [16])

 fBMST(γb)≥fBasic(γb+10log10(m+1)−10log10(1+m/L)), (2)

where the term depends on the encoding memory and the term is due to the rate loss. In other words, a maximum coding gain over the basic code of dB in the low BER (high SNR) region is achieved for large . Intuitively, this bound can be understood by assuming that a codeword in the basic code is transmitted times without interference from other layers.

### Ii-D Design of Capacity Approaching BMST Codes

Aided by the genie-aided lower bound, we can construct good codes at a target BER with any given code rate of interest by determining as follows the required encoding memory .

1. Take a code with the given rate as the basic code. To approach channel capacity, we set the code length ;

2. From the performance curve of the basic code, find the required to achieve the target BER;

3. Find the Shannon limit for the code rate, denoted by ;

4. Determine the encoding memory by

 m=⌈10γtarget−γlim10−1⌉, (3)

where represents the smallest integer greater than or equal to .

The above procedure requires no optimization and hence can be easily implemented given that the performance curve is available, as is the usual case for short codes.333The basic code considered in this paper is a Cartesian product of a short code, where each codeword is indeed a cascade of separate and independent codewords from the short code. Thus, the performance of the basic code can easily be obtained, which is the same as that of the involved short code. Its effectiveness has been confirmed by construction examples in [16, 17, 27, 28]. The encoding memories for some BMST codes required to approach the corresponding Shannon limits at given target BERs are shown in Table I. As expected, the lower the target BER is, the larger the required encoding memory is.

## Iii BMST Codes as a Class of SC Codes

In this section, we show that BMST codes can be viewed as a class of SC codes, using an algebraic description as well as a graphical representation, and we compare the structure of BMST codes to SC-LDPC codes.

### Iii-a Matrix Representation

To describe an SC-LDPC code ensemble with coupling width (syndrome former memory) and coupling length , we start with an matrix

 \boldmathB\unboldmath=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣% \boldmathB\unboldmath0\boldmathB\unboldmath1\boldmathB\unboldmath0⋮\boldmathB\unboldmath1⋱\boldmathB\unboldmathm⋮⋱\boldmathB\unboldmath% 0\boldmathB\unboldmathm⋱\boldmathB\unboldmath1⋱⋮\boldmathB\unboldmathm⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦, (4)

where all of the component submatrices have non-negative integer entries and size . To construct an SC-LDPC code with good performance, we can replace each non-zero entry in with a sum of nonoverlapping randomly selected permutation matrices and each zero entry in with the all-zero matrix, where is typically a small integer and is typically a large integer. The resulting SC-LDPC parity-check matrix of size is given by

 \boldmathH\unboldmathSC= ⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣\boldmathH\unboldmath0(0)\boldmathH\unboldmath1(1)\boldmathH\unboldmath0(1)⋮\boldmathH\unboldmath1(2)⋱\boldmathH\unboldmathm(m)⋮⋱\boldmathH% \unboldmath0(L−1)\boldmathH\unboldmathm(m+1)⋱\boldmathH\unboldmath1(L)⋱⋮\boldmathH\unboldmathm(L+m−1)⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦, (5)

where the blank spaces in correspond to zeros and the submatrices have size , for and .

In contrast to SC-LDPC codes, it is convenient to describe BMST codes using generator matrices. Let be the generator matrix of a short code with dimension and length . To describe a BMST code ensemble with coupling width (encoding memory) and coupling length , we start with the matrix

 \boldmathA\unboldmath=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣11⋯111⋯1⋱⋱⋱⋱11⋯111⋯1⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦, (6)

which has constant weight in each row. This matrix plays a similar role for constructing BMST codes as the matrix does for constructing SC-LDPC codes. To construct a BMST code with good performance, each nonzero entry  ( and ) in is replaced with a matrix , where

 \boldmathG\unboldmath=diag{\boldmathG% \unboldmath0,⋯,\boldmathG\unboldmath0B} (7)

is the generator matrix of the -fold Cartesian product of the short code, the () are randomly selected permutation matrices, and the Cartesian product order is typically large. The resulting BMST code has length and dimension , and the generator matrix is given by

 \boldmathG\unboldmathBMST= ⎡⎢ ⎢ ⎢ ⎢ ⎢⎣\boldmathG\unboldmath\boldmathΠ\unboldmath0\boldmathG\unboldmath\boldmathΠ% \unboldmath1⋯\boldmathG\unboldmath\boldmathΠ% \unboldmathm\boldmathG\unboldmath\boldmathΠ\unboldmath0% \boldmathG\unboldmath\boldmathΠ\unboldmath1⋯% \boldmathG\unboldmath\boldmathΠ\unboldmathm⋱⋱⋱⋱\boldmathG\unboldmath\boldmathΠ\unboldmath0%\boldmath$G$\unboldmath\boldmathΠ\unboldmath1⋯% \boldmathG\unboldmath\boldmathΠ\unboldmathm⎤⎥ ⎥ ⎥ ⎥ ⎥⎦. (8)

### Iii-B Graphical Representation

SC-LDPC code ensembles are often described in terms of a protograph, where an edge-spreading operation is applied to couple a sequence of disjoint block code protographs into a single chain [6]. Usually, no extra edges are introduced during the coupling process. In this paper, we describe the coupling process from a new perspective, where extra edges are allowed to be added. We believe that this new treatment is more general. For example, SC turbo codes [22] are obtained by adding edges to connect each turbo code graph to one or more nearby graphs in the chain. Based on this perspective, we can redescribe SC-LDPC codes as follows.

We start with a protograph for the submatrix , which has variable nodes and check nodes, where the -th check node is connected to the -th variable node by edges. A short-hand protograph corresponding to is shown in Fig. 3(a), where the node =⃝ represents variable nodes, the node +⃝ represents check nodes, and the edge $⃝{\rmB}_{0}$ represents a collection of edges. To distinguish, the edge $⃝{\rmB}_{0}$ is referred to as a super-edge of type , while the conventional edge in the full protograph is referred to as a simple edge. The short-hand protograph is then replicated times, as shown in Fig. 3(b), meaning that the sequence of transmitted codewords satisfy independently the constraint . The disjoint graphs are then coupled by adding a super-edge of type to bridge the variable node =⃝ at time and the check node +⃝ at time , for and , resulting in a single coupled chain corresponding to an SC-LDPC code ensemble with coupling length and coupling memory . An example of an SC-LDPC code ensemble with coupling memory is shown in Fig. 3(c). When lifting, each simple edge (not super-edge) is replaced by a bundle of edges (permutation within the bundle is assumed), resulting in an SC-LDPC code with length .

Similarly, BMST codes start with a protograph for the generator matrix , which has = nodes and + nodes, where the -th = node is connected to the -th + node if and only if . A short-hand protograph corresponding to is shown in Fig. 4(a), where represents a super-edge of type . The protograph is then replicated times, as shown in Fig. 4(b), which can be considered as transmitting a sequence of codewords from the basic code corresponding to the generator matrix independently at time instants , , , . The disjoint graphs are coupled by adding a super-edge of type to bridge the node at time and the node at time , for , resulting in a single coupled chain corresponding to a BMST code ensemble with coupling length and coupling memory . An example of a BMST code ensemble with coupling memory is shown in Fig. 4(c), whose equivalent form is shown in Fig. 4(d). When lifting, the super-edge of type bridging the node at time and the node at time , for and , is replaced by a super-edge of type , resulting in a BMST code with length .

### Iii-C Similarities and Differences

From the previous two subsections, we see that both SC-LDPC codes and BMST codes can be derived from a small matrix by replacing the entries with properly-defined submatrices. We also see that the generator matrix of BMST codes is similar in form to the parity-check matrix of SC-LDPC codes. SC-LDPC codes introduce memory by spatially coupling the basic parity-check matrices , while BMST codes introduce memory by spatially coupling the basic generator matrices . Further, we see from Fig. 3 and Fig. 4 that during the construction of both SC-LDPC codes and BMST codes, the memory is introduced by coupling the disjoint graphs together in a single chain, which is the fundamental idea of spatial coupling. Thus, BMST codes can be viewed as a class of SC codes.

## Iv EXIT Chart Analysis of BMST Codes

Given the basic code with generator matrix , we can construct a sequence of BMST codes by choosing the Cartesian product order . Now assume that the interleavers are chosen uniformly and at random for each transmission. Then we have a sequence of code ensembles. The aim of EXIT chart analysis is to predicte the performance behavior of the BMST codes as . In this section, we first discuss the issue that prevents the use of conventional EXIT chart analysis for BMST codes, and then we provide a modified EXIT chart analysis to study the convergence behavior of BMST codes with window decoding.

We consider binary phase-shift keying (BPSK) modulation over the binary-input AWGNC. To describe density evolution, it is convenient to assume that the all-zero codeword is transmitted and to represent the messages as log-likelihood ratios (LLRs). The threshold of protograph-based LDPC codes can be obtained based on a protograph-based EXIT chart analysis [30, 31] by determining the minimum value of the SNR such that the MI between the a posteriori message at a variable node and an associated codeword bit (referred to as the a posteriori MI for short) goes to 1 as the number of iterations increases, i.e., the BER at the variable nodes tends to zero as the number of iterations tends to infinity. At a first glance, a similar iterative sliding window decoding EXIT chart analysis algorithm can be implemented over the normal graph (see Fig. 4(d)) of the BMST code ensemble to study the convergence behavior of BMST codes. However, as shown in (2), the high SNR performance of BMST codes with window decoding cannot be better than the corresponding genie-aided lower bound, which means that the a posteriori MI of BMST codes cannot reach 1 as the number of iterations tends to infinity. Thus, the conventional EXIT chart analysis cannot be applied directly to BMST codes. Fortunately, this can be amended by taking into account the relation between MI and BER [29]. Specifically, we need the convergence check at node , as described below in Algorithm 2. For convenience, the MI between the a priori input and the corresponding codeword bit is referred to as the a priori MI, the MI between the extrinsic output and the corresponding codeword bit is referred to as the extrinsic MI, and the MI between the channel observation and the corresponding codeword bit is referred to as the channel MI.

###### Algorithm 2

Convergence Check at Node

• Let denote the a priori MI and denote the extrinsic MI. Then the a posteriori MI is given by

 IAP=J(√[J−1(IA)]2+[J−1(IE)]2), (9)

where the and functions are given in [34], is the a priori MI, and is the extrinsic MI. As shown in Section III-C of [29], supposing that the a posteriori MI is Gaussian, an estimate of the BER is then given by

 pest=Q(J−1(1−IAP)/2), (10)

where

 Q(x)=1√2π∫∞xexp(−t22)dt. (11)
• If the estimated BER is less than some preselected target BER, a local decoding success is declared; otherwise, a local decoding failure is declared.

For a fixed SNR , the channel bit LLR corresponding to the binary-input AWGNC is Gaussian with variance [29]

 σ2ch=8RBMSTEbN0, (12)

where is the rate of the BMST codes. The channel MI is then given by

 Ich=J(σch)=J(√8RBMSTEbN0). (13)

The modified EXIT chart analysis algorithm of BMST codes, similar to the protograph-based EXIT chart analysis algorithm of SC-LDPC codes [31], can now be described as follows.

###### Algorithm 3

EXIT Chart Analysis of BMST Codes with Window Decoding

• Initialization: All messages over those half-edges (connected to the channel) at nodes + are initialized as according to (13), all messages over those half-edges (connected to the information source) at nodes are initialized as 0, and all messages over the remaining (inter-connected) full-edges are initialized as 0. Set a maximum number of iterations .

• Sliding window decoding: For each window position, the decoding layers perform MI message processing/passing layer-by-layer according to the schedule

 \framebox+→\framebox=→\framebox$G0$→\framebox=→\framebox+.

After a fixed number of iterations , perform a convergence check at node using Algorithm 2. If a local decoding failure is declared, then window decoding terminates; otherwise, a local decoding success is declared, the window position is shifted, and decoding continues. A complete decoding success for a specific channel parameter and target BER is declared if and only if all target layers declare decoding successes.

Now we can denote the iterative decoding threshold of BMST code ensembles for a preselected target BER as the minimum value of the channel parameter which allows the decoder of Algorithm 3 to output a decoding success, in the limit of large code lengths (i.e., ).

## V Impact of Parameters on BMST Codes

In this section we study the impact of various parameters (coupling width , Cartesian product order , and decoding delay ) on BMST codes. Three regimes are considered: (1) fixed and , increasing , (2) fixed and , increasing , and (3) fixed , increasing  (and hence ).

All simulations are performed assuming BPSK modulation and an AWGNC. In the computation of the asymptotic window decoding thresholds of BMST codes, we set a maximum number of iterations . We will refer to the iterative decoding threshold simply as when it does not lead to ambiguity. In the simulations of finite-length performance, random interleavers (randomly generated but fixed) of size are used for encoding. The iterative sliding window decoding algorithm [16, Algorithm 3] for BMST codes is performed using a layer-by-layer updating schedule with a maximum iteration number of 18, and the entropy stopping criterion [35, 16] with a preselected threshold of is employed.

### V-a Fixed m and B, Increasing d

###### Example 1 (Asymptotic Performance)

Consider a BMST-R code ensemble with and . We calculate its window decoding thresholds with different preselected target BERs and different decoding delays. The calculated thresholds in terms of the SNR versus the preselected target BERs together with the lower bound are shown in Fig. 5(a), where we observe that

1. In the waterfall region (above a critical BER), the thresholds remain almost constant. However, once the critical BER is reached, the thresholds increase as the target BER decreases.

2. For a small decoding delay (say ), the thresholds do not achieve the lower bound even in the high SNR region.

3. For a larger decoding delay (roughly ), the thresholds correspond to the lower bound in the high SNR region, suggesting that the window decoding algorithm with decoding delay is near optimal for BMST codes.

4. The error floor region threshold improves as the decoding delay increases, but it does not improve much further beyond a certain decoding delay (roughly ).

Similar behavior has also been observed for BMST-SPC code ensembles, as shown in Fig. 5(b), where the thresholds of a rate BMST-SPC code ensemble constructed with and and decoded with different decoding delays are depicted.

The window decoding thresholds, corresponding to a preselected target BER444We choose a BER of for comparison because it represents a target BER commonly used in many practical applications. of , for the -regular SC-LDPC code ensemble with and the BMST-R code ensemble with as a function of decoding delay is shown in Fig. 6. We see that, similar to the SC-LDPC code ensemble, the threshold of the BMST code ensemble improves as the decoding delay increases and it becomes better than that of the SC-LDPC code ensemble beyond a certain decoding delay (roughly ).

###### Example 2 (Finite-Length Performance)

Consider rate BMST-R codes with and . The BER performance of BMST-R codes decoded with different decoding delays is shown in Fig. 7(a), where we observe that

1. The BER performance of BMST-R codes decoded with different delays matches well with the corresponding window decoding thresholds in the high SNR region.

2. The BER performance in the waterfall region improves as the decoding delay increases, but it does not improve much further beyond a certain decoding delay (roughly ).

3. The error floor improves as the decoding delay increases, and it matches well with the lower bound for BMST-R codes with when increases up to a certain point (roughly ).

These results are consistent with the asymptotic threshold performance analysis shown in Fig. 5(a).

Similar behavior has also been observed for BMST-SPC code ensembles, as shown in Fig. 7(b), where the simulated decoding performance of a rate BMST-SPC code constructed with , , and , and decoded with different decoding delays is depicted.

### V-B Fixed m and d, Increasing B

###### Example 3 (Finite-Length Performance)

Consider rate BMST-R codes with and . The BER performance of BMST-R codes constructed with different Cartesian product orders is shown in Fig. 8, where we observe that

1. Similar to SC-LDPC codes, where increasing the lifting factor improves waterfall region performance, increasing the Cartesian product order of BMST codes also improves waterfall region performance. As expected, this improvement saturates for sufficiently large . For example, the improvement at a BER of from to , both decoded with , is about 0.17 dB, while the improvement decreases to about 0.06 dB from to .

2. The BER performance of BMST-R codes matches well with the corresponding window decoding thresholds in the error floor region.

3. The error floors, which are solely determined by the encoding memory  (see Section II-C), cannot be lowered by increasing .

Remark: We found from simulations that, in the error floor region, the gap between finite-length performance and window decoding threshold is less than 0.02 dB. For example, the values of needed to achieve a BER of for a BMST-R code with , very extremely large Cartesian product order (say, ), and decoding delay is 1.087 dB, while the calculated window decoding threshold for a preselected target BER of of the BMST-R code ensemble with and is . This result again demonstrates that the finite-length performance is consistent with the asymptotic performance analysis.

### V-C Fixed B, Increasing m (and hence d)

###### Example 4 (Asymptotic Performance)

Consider a family of BMST-R code ensembles with different encoding memories . The calculated window decoding thresholds in terms of the SNR versus the preselected target BERs together with the lower bounds are shown in Fig. 9(a), where we observe that

1. For a high target BER (roughly above ), the threshold with a sufficiently large decoding delay degrades slightly as the encoding memory increases, due to errors propagating to successive decoding windows.

2. The error floor can be lowered by increasing the encoding memory  (and hence the decoding delay ).

Similar behavior has also been observed for BMST-SPC code ensembles, as shown in Fig. 9(b), where the thresholds of a family of rate BMST-SPC code ensembles are depicted.

###### Example 5 (Finite-Length Performance)

Consider rate BMST-R codes constructed with encoding memories , , , and , and Cartesian product orders and . The simulated BER performance with sufficiently large decoding delay is shown in Fig. 10, where we observe that

1. The BER performance in the waterfall region degrades slightly as the encoding memory increases, due to errors propagating to successive decoding windows.

2. The error floor of the BER curves is lowered by increasing the encoding memory  (and hence the decoding delay ).

These results are consistent with the asymptotic performance analysis shown in Fig. 9(a).

## Vi Performance and Complexity Comparison of SC-LDPC Codes and BMST Codes

In addition to decoding performance, the latency introduced by employing channel coding is a crucial factor in the design of a practical communication system. For example, minimizing latency is very important in applications such as personal wireless communication and real-time audio and video. In this section, we first compare the performance of BMST codes and SC-LDPC codes when the two decoding latencies are equal. Then a computational complexity comparison is presented.

We restrict consideration to -regular SC-LDPC codes with coupling width , where two component submatrices and are used, due to their superior thresholds and finite-length performance with window decoding when the decoding delay is relatively small (see, e.g., [31, 15]). For the BMST codes, we consider BMST-R [2,1] codes with encoding memory , due to their near-capacity performance in the waterfall region and relatively low error floor (see Section V). In the simulations, the iterative sliding window decoding algorithm for SC-LDPC codes uses the uniform parallel (flooding) updating schedule with a maximum iteration number of 100, while for the BMST codes, window decoding is performed using the layer-by-layer updating schedule with a maximum iteration number of 18. The entropy stopping criterion [35, 16] is employed for both window decoding algorithms with a preselected threshold of .

The decoding latency of the sliding window decoder, in terms of bits, is given by [15]

 TSC=2M(dSC+1) (14)

for the -regular SC-LDPC codes, and

 TBMST=2B(dBMST+1) (15)

for the BMST-R codes, where and are the decoding delays of the SC-LDPC codes and BMST codes, respectively. When the parameters , , , and satisfy , the decoding latency of BMST-R codes is the same as that of -regular SC-LDPC codes. In our simulations, we consider decoding delay  (i.e., window size ), which is a good choice for the SC-LDPC codes to achieve optimum performance when the decoding latency is fixed [15].

### Vi-a Performance Comparison

In Fig. 11, BMST-R codes are compared to -regular SC-LDPC codes, where the values of the Cartesian product order and decoding delay for the BMST-R codes are chosen such that the two decoding latencies and are the same. We see that the BMST-R codes outperform the SC-LDPC code in the waterfall region but have a higher error floor. From Fig. 11, we also see that, in the waterfall region, the BMST-R code constructed with a larger Cartesian product order and decoded with a smaller decoding delay outperforms the BMST-R code constructed with a smaller and decoded with a larger decoding delay but has a higher error floor (both have the same decoding latency). In other words, selecting a smaller , which is typically detrimental to decoder performance, is compensated for by allowing a larger , which improves code performance. For example, at a BER of , the BMST-R code with and decoded with decoding delay gains dB compared to the equal latency SC-LDPC code with , while the gain increases to dB by using the BMST-R code with and .

The required to achieve a BER of for equal latency -regular LDPC-BCs, -regular SC-LDPC codes, and BMST-R codes as a function of decoding latency is shown in Fig. 12, where we observe that both the BMST-R codes and the SC-LDPC codes perform significantly better than the LDPC-BCs. Also, the performance of the BMST-R codes (with fixed Cartesian product order ) improves as the decoding delay  (and hence the latency) increases, but it does not improve much further beyond a certain decoding delay (roughly ). (Note again that increasing the decoding delay improves decoder performance and increasing the Cartesian product order improves code performance.) However, under an equal decoding latency assumption, increasing the decoding delay or the Cartesian product order does not always lower the required to achieve a BER of . For example, when the decoding latency is bits, the performance of the BMST-R code with and decoded with is better than that of the BMST-R code with and decoded with . However, if we increase the latency to 19800 bits, the code with the Cartesian product order and decoded with a larger still outperforms the code with and decoded with a smaller . This raises the interesting question of how to choose and in order to achieve the best performance when the decoding latency of the sliding window decoder for BMST-R codes is fixed.

We also see from Fig. 12 that, for a fixed decoding latency roughly less than 15000 bits, to achieve a BER of , is a good choice for optimum performance. This is due to the fact that the interleavers, which break short cycles in the normal graph of BMST codes, especially when the interleavers of size are generated randomly, play a crucial role in iterative decoding [16]. That is, the larger the Cartesian product order is, the better the performance of BMST codes becomes. However, the value of required to achieve a BER of for BMST-R codes decoded with a fixed decoding delay is bounded below by its corresponding window decoding threshold (see Section V-B).

Fig. 13 shows the values required for BMST-R [2,1] codes to achieve a BER of with different decoding delays and larger decoding latencies of 19800, 23760, and 27720 bits. Here we see that the required values of for the BMST-R codes with are the same and approach the corresponding window decoding threshold (as remarked in Section V-B). In this case, however, we also observe that the required values of continue to decreases until roughly , and then they increase gradually as the decoding delay increases further. This increase results from the fact that the improved decoder performance obtained by increasing is not compensating for the decrease in code performance as a result of the smaller Cartesian product order . Thus, for larger decoding latencies (up to 35000 bits), is a good choice for optimum performance.

### Vi-B Complexity Comparison

As shown in [16], we can measure the computational complexity of BMST codes by the total number of operations. Consider a BMST-R code or a BMST-SPC code with Cartesian product order and decoding delay . Let denote the number of operations at a generic node . Each decoding layer has parallel nodes , parallel nodes , and a node of type G. The computational complexity for each node , each node , and each node G is , , and , respectively. Thus, the total number of operations for each decoding layer update is given by

 NB⋅Opt(\framebox=)+NB⋅Opt(\framebox+)+Opt(\frameboxG)=NB(m+2)+NB(m+1)+NB=NB(2m+4). (16)

Let denote the average number of iterations required to decode a target layer for BMST codes. Since each iteration requires both a forward recursion ( layer-updates) and a backward recursion ( layer-updates), the total (average) computational complexity per window is given by

 O(NB(2m+4)×2dBMST)IBMST=O(NB(4m+8)dBMST)IBMST. (17)

Note that the number of decoded (target) bits for the window decoder at each time instant is , and thus the computational complexity per decoded bit for a BMST code is

 O(NB(4m+8)dBMST)IBMST/(NB)=O((4m+8)dBMSTIBMST). (18)

Now consider a -regular SC-LDPC code with lifting factor and decoding delay  (the corresponding decoding window size ). Let denote the average number of iterations required to decode a target layer for SC-LDPC codes. Note that the numbers of operations at a variable node and a check node of -regular SC-LDPC codes are 3 and 6, respectively. The average computational complexity (also measured by the total number of operations) per window is then given by

 O(3TSC+6TSC/2)ISC=O(6TSCISC), (19)

where is the decoding latency. Note that the number of decoded (target) bits for the window decoder at each time instant is , and thus the computational complexity per decoded bit for a -regular SC-LDPC code is

 O(6TSC)ISCTSC/(dSC+1)=O(6(dSC+1)ISC). (20)

Table II shows the average computational complexity per decoded bit of the -regular SC-LDPC code and the BMST-R codes used in Fig. 11 that achieve a BER of with a decoding latency of 30000 bits. The simulation parameters , , , , , and are also included. We observe that, though the average number of iterations for the BMST code is significantly less than for SC-LDPC code, the computational complexity per decoded bit for the BMST codes is higher than for the SC-LDPC code. However, the BMST codes outperform the SC-LDPC code in the waterfall region (see Fig. 11 in Section VI-B). This means that BMST-R codes, compared to -regular SC-LDPC codes, obtain performance gains at a cost of higher computational complexity.

## Vii Conclusions

In this paper, we described BMST codes using both an algebraic description and a graphical representation for the purpose of showing that BMST codes can be viewed as a class of SC codes. Then, based on a modified EXIT chart analysis and finite-length computer simulations, we investigated the impact of several parameters (coupling width, Cartesian product order, and decoding delay) on the performance of BMST codes. We then examined the relationship between the Cartesian product order, the decoding delay, and the decoding performance of BMST codes for fixed decoding latency in comparison to SC-LDPC codes, and a comparison of computational complexity was also presented. It was observed that, under the equal decoding latency constraint, BMST codes using the repetition code (BMST-R code) as the basic code can outperform -regular SC-LDPC codes in the waterfall region but have a higher error floor and a larger decoding complexity. An interesting future research topic to complement the work reported here is to embed a partial superposition strategy into the code design to further improve the performance of the original BMST codes for a given decoding latency.

## Acknowledgment

The authors would like to thank Prof. Daniel J. Costello, Jr. for his helpful comments, polishing this paper, and invaluable contributions as a co-author of the conference version of this paper [36]. They would also like to thank Mr. Chulong Liang from Sun Yat-sen University for helpful discussions.

## References

• [1] R. G. Gallager, Low-Density Parity-Check Codes.   Cambridge, MA: MIT Press, 1963.
• [2] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacity-approaching irregular low-density parity-check codes,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 619–637, Feb. 2001.
• [3] M. Lentmaier, A. Sridharan, D. J. Costello, Jr., and K. S. Zigangirov, “Iterative decoding threshold analysis for LDPC convolutional codes,” IEEE Trans. Inf. Theory, vol. 56, no. 10, pp. 5274–5289, Oct. 2010.
• [4] S. Kudekar, T. J. Richardson, and R. L. Urbanke, “Threshold saturation via spatial coupling: Why convolutional LDPC ensembles perform so well over the BEC,” IEEE Trans. Inf. Theory, vol. 57, no. 4, pp. 803–834, Feb. 2011.
• [5] ——, “Spatially coupled ensembles universally achieve capacity under belief propagation,” IEEE Trans. Inf. Theory, vol. 59, no. 12, pp. 7761–7813, Dec. 2013.
• [6] D. G. M. Mitchell, M. Lentmaier, and D. J. Costello, Jr., “Spatially coupled LDPC codes constructed from protographs,” 2014, submitted to IEEE Trans. Inf. Theory. [Online]. Available: http://arxiv.org/abs/1407.5366
• [7] A. E. Pusane, R. Smarandache, P. O. Vontobel, and D. J. Costello, Jr., “Deriving good LDPC convolutional codes from LDPC block codes,” IEEE Trans. Inf. Theory, vol. 57, no. 2, pp. 835–857, Feb. 2011.
• [8] M. Lentmaier, M. M. Prenda, and G. P. Fettweis, “Efficient message passing scheduling for terminated LDPC convolutional codes,” in Proc. IEEE Int. Symp. on Inf. Theory, St. Petersburg, Russia, Aug. 2011, pp. 1826–1830.
• [9] N. ul Hassan, A. E. Pusane, M. Lentmaier, G. P. Fettweis, and D. J. Costello, Jr., “Reduced complexity window decoding schedules for coupled LDPC codes,” in Proc. IEEE Inf. Theory Workshop, Lausanne, Switzerland, Sept. 2012, pp. 20–24.
• [10] I. Andriyanova and A. Graell i Amat, “Threshold saturation for nonbinary SC-LDPC codes on the binary erasure channel,” 2013, submitted to IEEE Trans. Inf. Theory. [Online]. Available: http://arxiv.org/abs/1311.2003
• [11] D. G. M. Mitchell, A. E. Pusane, and D. J. Costello, Jr., “Minimum distance and trapping set analysis of protograph-based LDPC convolutional codes,” IEEE Trans. Inf. Theory, vol. 59, no. 1, pp. 254–281, Jan. 2013.
• [12] D. G. M. Mitchell, M. Lentmaier, A. E. Pusane, and D. J. Costello, Jr., “Randomly punctured spatially coupled LDPC codes,” in Proc. Int. Symp. Turbo Codes Iterative Inf. Process., Aug. 2014, pp. 1–6.
• [13] D. J. Costello, Jr., L. Dolecek, T. E. Fuja, J. Kliewer, D. G. M. Mitchell, and R. Smarandache, “Spatially coupled sparse codes on graphs: Theory and practice,” IEEE Communications Magazine, vol. 52, no. 7, pp. 168–176, July 2014.
• [14] P. M. Olmos and R. L. Urbanke, “A scaling law to predict the finite-length performance of spatially coupled LDPC codes,” 2014, submitted to IEEE Trans. Inf. Theory. [Online]. Available: http://arxiv.org/abs/1404.5719
• [15] K. Huang, D. G. M. Mitchell, L. Wei, X. Ma, and D. J. Costello, Jr., “Performance comparison of LDPC block and spatially coupled codes over GF,” IEEE Trans. Commun., vol. 63, no. 3, pp. 592–604, Mar. 2015.
• [16] X. Ma, C. Liang, K. Huang, and Q. Zhuang, “Block Markov superposition transmission: Construction of big convolutional codes from short codes,” IEEE Trans. Inf. Theory, 2015, to appear.
• [17] C. Liang, X. Ma, Q. Zhuang, and B. Bai, “Spatial coupling of generator matrices: A general approach to design good codes at a target BER,” IEEE Trans. Commun., vol. 62, no. 12, pp. 4211–4219, Dec. 2014.
• [18] A. J. Feltström, D. Truhachev, M. Lentmaier, and K. S. Zigangirov, “Braided block codes,” IEEE Trans. Inf. Theory, vol. 55, no. 6, pp. 2640–2658, June 2009.
• [19] W. Zhang, M. Lentmaier, K. S. Zigangirov, and D. J. Costello, Jr., “Braided convolutional codes: A new class of turbo-like codes,” IEEE Trans. Inf. Theory, vol. 56, pp. 316–331, Jan. 2010.
• [20] S. Moloudi and M. Lentmaier, “Density evolution analysis of braided convolutional codes on the erasure channel,” in Proc. IEEE Int. Symp. on Inf. Theory, Honolulu, HI, June 2014, pp. 2609–2613.
• [21] B. P. Smith, A. Farhood, A. Hunt, F. R. Kschischang, and J. Lodge, “Staircase codes: FEC for 100 Gb/s OTN,” J. Lightwave Technol., vol. 30, no. 1, pp. 110–117, Jan. 2012.
• [22] S. Moloudi, M. Lentmaier, and A. Graell i Amat, “Spatially coupled turbo codes,” in Proc. Int. Symp. Turbo Codes Iterative Inf. Process., Bremen, Germany, Aug. 2014, pp. 82–86.
• [23] D. Divsalar, H. Jin, and R. J. McEliece, “Coding theorems for ‘turbo-like’ codes,” in Proc. Allerton Conf., Urbana, IL, Sept. 1998, pp. 201–210.
• [24] H. D. Pfister and P. H. Siegel, “The serial concatenation of rate-1 codes through uniform random interleavers,” IEEE Trans. Inf. Theory, vol. 49, no. 6, pp. 1425–1438, June 2003.
• [25] A. Abbasfar, D. Divsalar, and K. Yao, “Accumulate-repeat-accumulate codes,” IEEE Trans. Commun., vol. 55, no. 4, pp. 692–702, Apr. 2007.
• [26] A. R. Iyengar, M. Papaleo, P. H. Siegel, J. K. Wolf, A. Vanelli-Coralli, and G. E. Corazza, “Windowed decoding of protograph-based LDPC convolutional codes over erasure channels,” IEEE Trans. Inf. Theory, vol. 58, no. 4, pp. 2303–2320, Apr. 2012.
• [27] C. Liang, J. Hu, X. Ma, and B. Bai, “A new class of multiple-rate codes based on block Markov superposition transmission,” 2014, submitted to IEEE Trans. Signal Process. [Online]. Available: http://arxiv.org/abs/1406.2785
• [28] J. Hu, X. Ma, and C. Liang, “Block Markov superposition transmission of repetition and single-parity-check codes,” IEEE Commun. Lett., vol. 19, no. 2, pp. 131–134, Feb. 2015.
• [29] S. ten Brink, “Convergence behavior of iteratively decoded parallel concatenated codes,” IEEE Trans. Commun., vol. 49, no. 10, pp. 1727–1737, Oct. 2001.
• [30] G. Liva and M. Chiani, “Protograph LDPC codes design based on EXIT analysis,” in Proc. IEEE Global Commun. Conf., Washington, DC, Nov. 2007, pp. 3250–3254.
• [31] L. Wei, D. G. M. Mitchell, T. E. Fuja, and D. J. Costello, Jr., “Design of spatially coupled LDPC codes over GF() for windowed decoding,” 2014, submitted to IEEE Trans. Inf. Theory. [Online]. Available: http://arxiv.org/abs/1411.4373
• [32] A. E. Pusane, A. J. Felström, A. Sridharan, M. Lentmaier, K. S. Zigangirov, and D. J. Costello, Jr., “Implementation aspects of LDPC convolutional codes,” IEEE Trans. Commun., vol. 56, no. 7, pp. 1060–1069, July 2008.
• [33] G. D. Forney, Jr., “Codes on graphs: Normal realizations,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 520–548, Feb. 2001.
• [34] S. ten Brink, G. Kramer, and A. Ashikhmin, “Design of low-density parity-check codes for modulation and detection,” IEEE Trans. Commun., vol. 52, no. 4, pp. 670–678, Apr. 2004.
• [35] X. Ma and L. Ping, “Coded modulation using superimposed binary codes,” IEEE Trans. Inf. Theory, vol. 50, pp. 3331–3343, Dec. 2004.
• [36] K. Huang, X. Ma, and D. J. Costello, Jr., “EXIT chart analysis of block Markov superposition transmission of short codes,” 2015, accepted by Proc. IEEE Int. Symp. on Inf. Theory. [Online]. Available: http://arxiv.org/abs/1502.00079
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters