Performance Analysis of Block Markov Superposition Transmission of Short Codes
Abstract
In this paper, we consider the asymptotic and finitelength performance of block Markov superposition transmission (BMST) of short codes, which can be viewed as a new class of spatially coupled (SC) codes with the generator matrices of short codes (referred to as basic codes) coupled. A modified extrinsic information transfer (EXIT) chart analysis that takes into account the relation between mutual information (MI) and biterrorrate (BER) is presented to study the convergence behavior of BMST codes. Using the modified EXIT chart analysis, we investigate the impact of various parameters on BMST code performance, thereby providing theoretical guidance for designing and implementing practical BMST codes suitable for sliding window decoding. Then, we present a performance comparison of BMST codes and SC lowdensity paritycheck (SCLDPC) codes on the basis of equal decoding latency. Also presented is a comparison of computational complexity. Simulation results show that, under the equal decoding latency constraint, BMST codes using the repetition code as the basic code can outperform regular SCLDPC codes in the waterfall region but have a higher computational complexity.
I Introduction
Lowdensity paritycheck (LDPC) block codes (LDPCBCs) [1], combined with iterative belief propagation (BP) decoding, are a class of capacityapproaching codes with decoding complexity that increases only linearly with block length [2]. A practical approach to improving the performance of LDPCBCs is coupling together a series of disjoint graphs that specify the paritycheck matrix of an LDPCBC into a single coupled chain, thereby producing a spatially coupled LDPC (SCLDPC) code. It has been shown in [3, 4, 5, 6] that SCLDPC code ensembles exhibit a phenomenon called “threshold saturation”, which allows them to achieve the maximum a posteriori (MAP) thresholds of their underlying LDPCBC ensembles on memoryless binaryinput symmetricoutput channels under BP decoding, and thus to achieve capacity by increasing the density of the paritycheck matrix. Due to their excellent performance, SCLDPC codes have recently received a great deal of attention in the literature (see, e.g., [7, 8, 9, 10, 11, 12, 13, 14, 15] and the references therein).
The concept of spatial coupling is not limited to LDPC codes. Block Markov superposition transmission (BMST) of short codes [16, 17], for example, is equivalent to spatial coupling of the subgraphs that specify the generator matrices of the short codes. From this perspective, BMST codes are similar to braided block/convolutional codes [18, 19, 20], staircase codes [21], and SC turbo codes [22]. An encoder of a BMST code with encoding memory is shown in Fig. 1, where a BMST code can also be viewed as a serially concatenated code with a structure similar to repeataccumulatelike codes [23, 24, 25]. The outer code is a short code, referred to as the basic code (not limited to repetition codes), that introduces redundancy, while the inner code is a rateone blockoriented feedforward convolutional code (instead of a bitoriented accumulator) that introduces memory between transmissions. Hence, BMST codes typically have very simple encoding algorithms. To decode BMST codes, a sliding window decoding algorithm with a tunable decoding delay can be used, as with SCLDPC codes [26]. The construction of BMST codes is flexible [27, 28], in the sense that it applies to all code rates of interest in the interval (0,1). Further, BMST codes have nearcapacity performance (observed by simulation) in the waterfall region of the biterrorrate (BER) cruve and an error floor (predicted by analysis) that can be controlled by the encoding memory.
On an additive white Gaussian noise channel (AWGNC), the wellknown extrinsic information transfer (EXIT) chart analysis [29] can be used to obtain the iterative BP decoding threshold of LDPCBC ensembles. In [30], a novel EXIT chart analysis was used to evaluate the performance of protographbased LDPCBC ensembles, and a similar analysis was used to find the thresholds of ary SCLDPC codes with sliding window decoding in [31]. Unlike LDPC codes, the asymptotic BER of BMST codes with window decoding cannot be better than a corresponding genieaided lower bound [16]. Thus, conventional EXIT chart analysis cannot be applied directly to BMST codes. In this paper, we propose a modified EXIT chart analysis, that takes into account the relation between mutual information (MI) and BER, to study the convergence behavior of BMST codes and to predict the performance in the waterfall region of the BER curve. Simulation results confirm that the modified EXIT chart analysis of BMST codes is supported by their finitelength performance behavior. We also investigate the relationship between the basic code structure, the decoding delay, and the decoding performance of BMST codes when the decoding latency is fixed. Finally, we present a computational complexity comparison of BMST codes and SCLDPC codes on the basis of equal decoding latency.
The rest of the paper is structured as follows. In Section II, we give a brief review of BMST codes. In Section III, we discuss the relation between BMST codes and protographbased SCLDPC codes. In Section IV, we propose a modified EXIT chart analysis of BMST codes. In Section V, we investigate the impact of various parameters on BMST code performance. Then, in Section VI, we present a performance comparison of BMST codes and SCLDPC codes on the basis of equal decoding latency. A computational complexity comparison of BMST codes and SCLDPC codes is also given in Section VI. Finally, some concluding remarks are given in Section VII.
Ii Review of BMST Codes
Iia Encoding of BMST Codes
Consider a BMST code using a rate binary basic code of length and dimension . Let , , , be blocks of data to be transmitted, where . Here, is called the coupling length. The encoding algorithm of a BMST code with encoding memory (coupling width) is described as follows (see Fig. 1), where are interleavers of size .
Algorithm 1
Encoding of BMST Codes

Initialization: For , set .

Loop: For , , , ,

Encode into using the encoding algorithm of the basic code ;

For , interleave using the th interleaver into ;

Compute , which is taken as the th block of transmission.


Termination: For , , , , set and compute following Loop.
Remark: To force the encoder of BMST codes to the zero state at the end of the encoding process, a tail consisting of blocks of the dimensional allzero vector is added. This is different from SCLDPC code encoders, where the tail is usually nonzero and depends on the encoded information bits (see Section IV of [32]). As a result, the termination procedure for BMST codes is much simpler than for SCLDPC codes.
The rate of the BMST code is
(1) 
which is slightly less than the rate of the basic code. However, similar to SCLDPC codes, this rate loss becomes vanishingly small as .
Though any code (linear or nonlinear) with a fast encoding algorithm and an efficient softin softout (SISO) decoding algorithm can be taken as the basic code, we focus in this paper on the use of the fold Cartesian product of a repetition (R) code (denoted by R ) or a single paritycheck (SPC) code (denoted by SPC ) as the basic code, resulting in a BMSTR code (denoted by BMSTR ) or a BMSTSPC code (denoted by BMSTSPC ), respectively.^{1}^{1}1Using codes constructed by timesharing between the R code and the SPC code as the basic code, one can construct BMSTRSPC codes for a wide range of code rates. For more details, see [28]. Note that the overall code length of the basic code in this case is and the overall dimension is or .
IiB Sliding Window Decoding of BMST Codes
BMST codes can be represented by a Forneystyle factor graph, also known as a normal graph [33], where edges represent variables and vertices (nodes) represent constraints. All edges connected to a node must satisfy the specific constraint of the node. A fulledge connects to two nodes, while a halfedge connects to only one node. A halfedge is also connected to a special symbol, called a “dongle”, that denotes coupling to other parts of the transmission system (say, the channel or the information source) [33]. There are four types of nodes in the normal graph of BMST codes.

Node +: All edges (variables) connected to node + must sum to the allzero vector. The message updating rule at node + is similar to that of a check node in the factor graph of a binary LDPC code. The only difference is that the messages on the halfedges are obtained from the channel observations.

Node : The node represents the th interleaver, which interleaves or deinterleaves the input messages.

Node =: All edges (variables) connected to node = must take the same (binary) values. The message updating rule at node = is the same as that of a variable node in the factor graph of a binary LDPC code.

Node G: All edges (variables) connected to node G must satisfy the constraint specified by the basic code . The message updating rule at node G can be derived accordingly, where the messages on the halfedges are associated with the information source.
The normal graph of a BMST code can be divided into layers, where each layer typically consists of a node of type G, a node of type =, nodes of type , and a node of type + (see Fig. 2). The result is a highlevel normal graph, where each edge represents a sequence of random variables. Looking into the details, we can see that, at each layer, there are nodes of degree , nodes of degree (including half edges), and nodes corresponding to the short code (R or SPC in this paper).
Similar to SCLDPC codes, an iterative sliding window decoding algorithm with decoding delay working over a subgraph consisting of consecutive layers can be implemented for BMST codes. An example of a window decoder with decoding delay operating on the normal graph of a BMST code with is shown in Fig. 2. For each window position, the forwardbackward decoding algorithm is implemented for updating the messages layerbylayer within the decoding window.^{2}^{2}2For more details on the decoding algorithm of BMST codes, we refer the reader to Section III of [16]. Decoding proceeds until a fixed number of iterations has been performed or some given stopping criterion is satisfied, in which case the window shifts to the right by one layer and the symbols corresponding to the layer shifted out of the window are decoded. The first layer in any window is called the target layer.
IiC GenieAided Lower Bound on BER
Let represent the performance of a BMST code with encoding memory (coupling width) and coupling length , where is the BER and represents the received bit signaltonoise ratio (SNR) on an AWGNC in dB, and let represent the performance of the basic code. By assuming a genieaided decoder, we can obtain a lower bound on the performance of BMST codes given by (see [16])
(2) 
where the term depends on the encoding memory and the term is due to the rate loss. In other words, a maximum coding gain over the basic code of dB in the low BER (high SNR) region is achieved for large . Intuitively, this bound can be understood by assuming that a codeword in the basic code is transmitted times without interference from other layers.
IiD Design of Capacity Approaching BMST Codes
Aided by the genieaided lower bound, we can construct good codes at a target BER with any given code rate of interest by determining as follows the required encoding memory .

Take a code with the given rate as the basic code. To approach channel capacity, we set the code length ;

From the performance curve of the basic code, find the required to achieve the target BER;

Find the Shannon limit for the code rate, denoted by ;

Determine the encoding memory by
(3) where represents the smallest integer greater than or equal to .
The above procedure requires no optimization and hence can be easily implemented given that the performance curve is available, as is the usual case for short codes.^{3}^{3}3The basic code considered in this paper is a Cartesian product of a short code, where each codeword is indeed a cascade of separate and independent codewords from the short code. Thus, the performance of the basic code can easily be obtained, which is the same as that of the involved short code. Its effectiveness has been confirmed by construction examples in [16, 17, 27, 28]. The encoding memories for some BMST codes required to approach the corresponding Shannon limits at given target BERs are shown in Table I. As expected, the lower the target BER is, the larger the required encoding memory is.
Encoding memory  Target BER  

BMSTR  4  6  8  10 
BMSTR  5  8  10  13 
BMSTR  6  9  11  14 
BMSTSPC  2  3  4  5 
Iii BMST Codes as a Class of SC Codes
In this section, we show that BMST codes can be viewed as a class of SC codes, using an algebraic description as well as a graphical representation, and we compare the structure of BMST codes to SCLDPC codes.
Iiia Matrix Representation
To describe an SCLDPC code ensemble with coupling width (syndrome former memory) and coupling length , we start with an matrix
(4) 
where all of the component submatrices have nonnegative integer entries and size . To construct an SCLDPC code with good performance, we can replace each nonzero entry in with a sum of nonoverlapping randomly selected permutation matrices and each zero entry in with the allzero matrix, where is typically a small integer and is typically a large integer. The resulting SCLDPC paritycheck matrix of size is given by
(5) 
where the blank spaces in correspond to zeros and the submatrices have size , for and .
In contrast to SCLDPC codes, it is convenient to describe BMST codes using generator matrices. Let be the generator matrix of a short code with dimension and length . To describe a BMST code ensemble with coupling width (encoding memory) and coupling length , we start with the matrix
(6) 
which has constant weight in each row. This matrix plays a similar role for constructing BMST codes as the matrix does for constructing SCLDPC codes. To construct a BMST code with good performance, each nonzero entry ( and ) in is replaced with a matrix , where
(7) 
is the generator matrix of the fold Cartesian product of the short code, the () are randomly selected permutation matrices, and the Cartesian product order is typically large. The resulting BMST code has length and dimension , and the generator matrix is given by
(8) 
IiiB Graphical Representation
SCLDPC code ensembles are often described in terms of a protograph, where an edgespreading operation is applied to couple a sequence of disjoint block code protographs into a single chain [6]. Usually, no extra edges are introduced during the coupling process. In this paper, we describe the coupling process from a new perspective, where extra edges are allowed to be added. We believe that this new treatment is more general. For example, SC turbo codes [22] are obtained by adding edges to connect each turbo code graph to one or more nearby graphs in the chain. Based on this perspective, we can redescribe SCLDPC codes as follows.
We start with a protograph for the submatrix , which has variable nodes and check nodes, where the th check node is connected to the th variable node by edges. A shorthand protograph corresponding to is shown in Fig. 3(a), where the node =⃝ represents variable nodes, the node +⃝ represents check nodes, and the edge $⃝{\rmB}_{0}$ represents a collection of edges. To distinguish, the edge $⃝{\rmB}_{0}$ is referred to as a superedge of type , while the conventional edge in the full protograph is referred to as a simple edge. The shorthand protograph is then replicated times, as shown in Fig. 3(b), meaning that the sequence of transmitted codewords satisfy independently the constraint . The disjoint graphs are then coupled by adding a superedge of type to bridge the variable node =⃝ at time and the check node +⃝ at time , for and , resulting in a single coupled chain corresponding to an SCLDPC code ensemble with coupling length and coupling memory . An example of an SCLDPC code ensemble with coupling memory is shown in Fig. 3(c). When lifting, each simple edge (not superedge) is replaced by a bundle of edges (permutation within the bundle is assumed), resulting in an SCLDPC code with length .
Similarly, BMST codes start with a protograph for the generator matrix , which has = nodes and + nodes, where the th = node is connected to the th + node if and only if . A shorthand protograph corresponding to is shown in Fig. 4(a), where represents a superedge of type . The protograph is then replicated times, as shown in Fig. 4(b), which can be considered as transmitting a sequence of codewords from the basic code corresponding to the generator matrix independently at time instants , , , . The disjoint graphs are coupled by adding a superedge of type to bridge the node at time and the node at time , for , resulting in a single coupled chain corresponding to a BMST code ensemble with coupling length and coupling memory . An example of a BMST code ensemble with coupling memory is shown in Fig. 4(c), whose equivalent form is shown in Fig. 4(d). When lifting, the superedge of type bridging the node at time and the node at time , for and , is replaced by a superedge of type , resulting in a BMST code with length .
IiiC Similarities and Differences
From the previous two subsections, we see that both SCLDPC codes and BMST codes can be derived from a small matrix by replacing the entries with properlydefined submatrices. We also see that the generator matrix of BMST codes is similar in form to the paritycheck matrix of SCLDPC codes. SCLDPC codes introduce memory by spatially coupling the basic paritycheck matrices , while BMST codes introduce memory by spatially coupling the basic generator matrices . Further, we see from Fig. 3 and Fig. 4 that during the construction of both SCLDPC codes and BMST codes, the memory is introduced by coupling the disjoint graphs together in a single chain, which is the fundamental idea of spatial coupling. Thus, BMST codes can be viewed as a class of SC codes.
Iv EXIT Chart Analysis of BMST Codes
Given the basic code with generator matrix , we can construct a sequence of BMST codes by choosing the Cartesian product order . Now assume that the interleavers are chosen uniformly and at random for each transmission. Then we have a sequence of code ensembles. The aim of EXIT chart analysis is to predicte the performance behavior of the BMST codes as . In this section, we first discuss the issue that prevents the use of conventional EXIT chart analysis for BMST codes, and then we provide a modified EXIT chart analysis to study the convergence behavior of BMST codes with window decoding.
We consider binary phaseshift keying (BPSK) modulation over the binaryinput AWGNC. To describe density evolution, it is convenient to assume that the allzero codeword is transmitted and to represent the messages as loglikelihood ratios (LLRs). The threshold of protographbased LDPC codes can be obtained based on a protographbased EXIT chart analysis [30, 31] by determining the minimum value of the SNR such that the MI between the a posteriori message at a variable node and an associated codeword bit (referred to as the a posteriori MI for short) goes to 1 as the number of iterations increases, i.e., the BER at the variable nodes tends to zero as the number of iterations tends to infinity. At a first glance, a similar iterative sliding window decoding EXIT chart analysis algorithm can be implemented over the normal graph (see Fig. 4(d)) of the BMST code ensemble to study the convergence behavior of BMST codes. However, as shown in (2), the high SNR performance of BMST codes with window decoding cannot be better than the corresponding genieaided lower bound, which means that the a posteriori MI of BMST codes cannot reach 1 as the number of iterations tends to infinity. Thus, the conventional EXIT chart analysis cannot be applied directly to BMST codes. Fortunately, this can be amended by taking into account the relation between MI and BER [29]. Specifically, we need the convergence check at node , as described below in Algorithm 2. For convenience, the MI between the a priori input and the corresponding codeword bit is referred to as the a priori MI, the MI between the extrinsic output and the corresponding codeword bit is referred to as the extrinsic MI, and the MI between the channel observation and the corresponding codeword bit is referred to as the channel MI.
Algorithm 2
Convergence Check at Node

Let denote the a priori MI and denote the extrinsic MI. Then the a posteriori MI is given by
(9) where the and functions are given in [34], is the a priori MI, and is the extrinsic MI. As shown in Section IIIC of [29], supposing that the a posteriori MI is Gaussian, an estimate of the BER is then given by
(10) where
(11) 
If the estimated BER is less than some preselected target BER, a local decoding success is declared; otherwise, a local decoding failure is declared.
For a fixed SNR , the channel bit LLR corresponding to the binaryinput AWGNC is Gaussian with variance [29]
(12) 
where is the rate of the BMST codes. The channel MI is then given by
(13) 
The modified EXIT chart analysis algorithm of BMST codes, similar to the protographbased EXIT chart analysis algorithm of SCLDPC codes [31], can now be described as follows.
Algorithm 3
EXIT Chart Analysis of BMST Codes with Window Decoding

Initialization: All messages over those halfedges (connected to the channel) at nodes + are initialized as according to (13), all messages over those halfedges (connected to the information source) at nodes are initialized as 0, and all messages over the remaining (interconnected) fulledges are initialized as 0. Set a maximum number of iterations .

Sliding window decoding: For each window position, the decoding layers perform MI message processing/passing layerbylayer according to the schedule
After a fixed number of iterations , perform a convergence check at node using Algorithm 2. If a local decoding failure is declared, then window decoding terminates; otherwise, a local decoding success is declared, the window position is shifted, and decoding continues. A complete decoding success for a specific channel parameter and target BER is declared if and only if all target layers declare decoding successes.
Now we can denote the iterative decoding threshold of BMST code ensembles for a preselected target BER as the minimum value of the channel parameter which allows the decoder of Algorithm 3 to output a decoding success, in the limit of large code lengths (i.e., ).
V Impact of Parameters on BMST Codes
In this section we study the impact of various parameters (coupling width , Cartesian product order , and decoding delay ) on BMST codes. Three regimes are considered: (1) fixed and , increasing , (2) fixed and , increasing , and (3) fixed , increasing (and hence ).
All simulations are performed assuming BPSK modulation and an AWGNC. In the computation of the asymptotic window decoding thresholds of BMST codes, we set a maximum number of iterations . We will refer to the iterative decoding threshold simply as when it does not lead to ambiguity. In the simulations of finitelength performance, random interleavers (randomly generated but fixed) of size are used for encoding. The iterative sliding window decoding algorithm [16, Algorithm 3] for BMST codes is performed using a layerbylayer updating schedule with a maximum iteration number of 18, and the entropy stopping criterion [35, 16] with a preselected threshold of is employed.
Va Fixed and , Increasing
Example 1 (Asymptotic Performance)
Consider a BMSTR code ensemble with and . We calculate its window decoding thresholds with different preselected target BERs and different decoding delays. The calculated thresholds in terms of the SNR versus the preselected target BERs together with the lower bound are shown in Fig. 5(a), where we observe that

In the waterfall region (above a critical BER), the thresholds remain almost constant. However, once the critical BER is reached, the thresholds increase as the target BER decreases.

For a small decoding delay (say ), the thresholds do not achieve the lower bound even in the high SNR region.

For a larger decoding delay (roughly ), the thresholds correspond to the lower bound in the high SNR region, suggesting that the window decoding algorithm with decoding delay is near optimal for BMST codes.

The error floor region threshold improves as the decoding delay increases, but it does not improve much further beyond a certain decoding delay (roughly ).
Similar behavior has also been observed for BMSTSPC code ensembles, as shown in Fig. 5(b), where the thresholds of a rate BMSTSPC code ensemble constructed with and and decoded with different decoding delays are depicted.
The window decoding thresholds, corresponding to a preselected target BER^{4}^{4}4We choose a BER of for comparison because it represents a target BER commonly used in many practical applications. of , for the regular SCLDPC code ensemble with and the BMSTR code ensemble with as a function of decoding delay is shown in Fig. 6. We see that, similar to the SCLDPC code ensemble, the threshold of the BMST code ensemble improves as the decoding delay increases and it becomes better than that of the SCLDPC code ensemble beyond a certain decoding delay (roughly ).
Example 2 (FiniteLength Performance)
Consider rate BMSTR codes with and . The BER performance of BMSTR codes decoded with different decoding delays is shown in Fig. 7(a), where we observe that

The BER performance of BMSTR codes decoded with different delays matches well with the corresponding window decoding thresholds in the high SNR region.

The BER performance in the waterfall region improves as the decoding delay increases, but it does not improve much further beyond a certain decoding delay (roughly ).

The error floor improves as the decoding delay increases, and it matches well with the lower bound for BMSTR codes with when increases up to a certain point (roughly ).
These results are consistent with the asymptotic threshold performance analysis shown in Fig. 5(a).
Similar behavior has also been observed for BMSTSPC code ensembles, as shown in Fig. 7(b), where the simulated decoding performance of a rate BMSTSPC code constructed with , , and , and decoded with different decoding delays is depicted.
VB Fixed and , Increasing
Example 3 (FiniteLength Performance)
Consider rate BMSTR codes with and . The BER performance of BMSTR codes constructed with different Cartesian product orders is shown in Fig. 8, where we observe that

Similar to SCLDPC codes, where increasing the lifting factor improves waterfall region performance, increasing the Cartesian product order of BMST codes also improves waterfall region performance. As expected, this improvement saturates for sufficiently large . For example, the improvement at a BER of from to , both decoded with , is about 0.17 dB, while the improvement decreases to about 0.06 dB from to .

The BER performance of BMSTR codes matches well with the corresponding window decoding thresholds in the error floor region.

The error floors, which are solely determined by the encoding memory (see Section IIC), cannot be lowered by increasing .
Remark: We found from simulations that, in the error floor region, the gap between finitelength performance and window decoding threshold is less than 0.02 dB. For example, the values of needed to achieve a BER of for a BMSTR code with , very extremely large Cartesian product order (say, ), and decoding delay is 1.087 dB, while the calculated window decoding threshold for a preselected target BER of of the BMSTR code ensemble with and is . This result again demonstrates that the finitelength performance is consistent with the asymptotic performance analysis.
VC Fixed , Increasing (and hence )
Example 4 (Asymptotic Performance)
Consider a family of BMSTR code ensembles with different encoding memories . The calculated window decoding thresholds in terms of the SNR versus the preselected target BERs together with the lower bounds are shown in Fig. 9(a), where we observe that

For a high target BER (roughly above ), the threshold with a sufficiently large decoding delay degrades slightly as the encoding memory increases, due to errors propagating to successive decoding windows.

The error floor can be lowered by increasing the encoding memory (and hence the decoding delay ).
Similar behavior has also been observed for BMSTSPC code ensembles, as shown in Fig. 9(b), where the thresholds of a family of rate BMSTSPC code ensembles are depicted.
Example 5 (FiniteLength Performance)
Consider rate BMSTR codes constructed with encoding memories , , , and , and Cartesian product orders and . The simulated BER performance with sufficiently large decoding delay is shown in Fig. 10, where we observe that

The BER performance in the waterfall region degrades slightly as the encoding memory increases, due to errors propagating to successive decoding windows.

The error floor of the BER curves is lowered by increasing the encoding memory (and hence the decoding delay ).
These results are consistent with the asymptotic performance analysis shown in Fig. 9(a).
Vi Performance and Complexity Comparison of SCLDPC Codes and BMST Codes
In addition to decoding performance, the latency introduced by employing channel coding is a crucial factor in the design of a practical communication system. For example, minimizing latency is very important in applications such as personal wireless communication and realtime audio and video. In this section, we first compare the performance of BMST codes and SCLDPC codes when the two decoding latencies are equal. Then a computational complexity comparison is presented.
We restrict consideration to regular SCLDPC codes with coupling width , where two component submatrices and are used, due to their superior thresholds and finitelength performance with window decoding when the decoding delay is relatively small (see, e.g., [31, 15]). For the BMST codes, we consider BMSTR [2,1] codes with encoding memory , due to their nearcapacity performance in the waterfall region and relatively low error floor (see Section V). In the simulations, the iterative sliding window decoding algorithm for SCLDPC codes uses the uniform parallel (flooding) updating schedule with a maximum iteration number of 100, while for the BMST codes, window decoding is performed using the layerbylayer updating schedule with a maximum iteration number of 18. The entropy stopping criterion [35, 16] is employed for both window decoding algorithms with a preselected threshold of .
The decoding latency of the sliding window decoder, in terms of bits, is given by [15]
(14) 
for the regular SCLDPC codes, and
(15) 
for the BMSTR codes, where and are the decoding delays of the SCLDPC codes and BMST codes, respectively. When the parameters , , , and satisfy , the decoding latency of BMSTR codes is the same as that of regular SCLDPC codes. In our simulations, we consider decoding delay (i.e., window size ), which is a good choice for the SCLDPC codes to achieve optimum performance when the decoding latency is fixed [15].
Via Performance Comparison
In Fig. 11, BMSTR codes are compared to regular SCLDPC codes, where the values of the Cartesian product order and decoding delay for the BMSTR codes are chosen such that the two decoding latencies and are the same. We see that the BMSTR codes outperform the SCLDPC code in the waterfall region but have a higher error floor. From Fig. 11, we also see that, in the waterfall region, the BMSTR code constructed with a larger Cartesian product order and decoded with a smaller decoding delay outperforms the BMSTR code constructed with a smaller and decoded with a larger decoding delay but has a higher error floor (both have the same decoding latency). In other words, selecting a smaller , which is typically detrimental to decoder performance, is compensated for by allowing a larger , which improves code performance. For example, at a BER of , the BMSTR code with and decoded with decoding delay gains dB compared to the equal latency SCLDPC code with , while the gain increases to dB by using the BMSTR code with and .
The required to achieve a BER of for equal latency regular LDPCBCs, regular SCLDPC codes, and BMSTR codes as a function of decoding latency is shown in Fig. 12, where we observe that both the BMSTR codes and the SCLDPC codes perform significantly better than the LDPCBCs. Also, the performance of the BMSTR codes (with fixed Cartesian product order ) improves as the decoding delay (and hence the latency) increases, but it does not improve much further beyond a certain decoding delay (roughly ). (Note again that increasing the decoding delay improves decoder performance and increasing the Cartesian product order improves code performance.) However, under an equal decoding latency assumption, increasing the decoding delay or the Cartesian product order does not always lower the required to achieve a BER of . For example, when the decoding latency is bits, the performance of the BMSTR code with and decoded with is better than that of the BMSTR code with and decoded with . However, if we increase the latency to 19800 bits, the code with the Cartesian product order and decoded with a larger still outperforms the code with and decoded with a smaller . This raises the interesting question of how to choose and in order to achieve the best performance when the decoding latency of the sliding window decoder for BMSTR codes is fixed.
We also see from Fig. 12 that, for a fixed decoding latency roughly less than 15000 bits, to achieve a BER of , is a good choice for optimum performance. This is due to the fact that the interleavers, which break short cycles in the normal graph of BMST codes, especially when the interleavers of size are generated randomly, play a crucial role in iterative decoding [16]. That is, the larger the Cartesian product order is, the better the performance of BMST codes becomes. However, the value of required to achieve a BER of for BMSTR codes decoded with a fixed decoding delay is bounded below by its corresponding window decoding threshold (see Section VB).
Fig. 13 shows the values required for BMSTR [2,1] codes to achieve a BER of with different decoding delays and larger decoding latencies of 19800, 23760, and 27720 bits. Here we see that the required values of for the BMSTR codes with are the same and approach the corresponding window decoding threshold (as remarked in Section VB). In this case, however, we also observe that the required values of continue to decreases until roughly , and then they increase gradually as the decoding delay increases further. This increase results from the fact that the improved decoder performance obtained by increasing is not compensating for the decrease in code performance as a result of the smaller Cartesian product order . Thus, for larger decoding latencies (up to 35000 bits), is a good choice for optimum performance.
ViB Complexity Comparison
As shown in [16], we can measure the computational complexity of BMST codes by the total number of operations. Consider a BMSTR code or a BMSTSPC code with Cartesian product order and decoding delay . Let denote the number of operations at a generic node . Each decoding layer has parallel nodes , parallel nodes , and a node of type G. The computational complexity for each node , each node , and each node G is , , and , respectively. Thus, the total number of operations for each decoding layer update is given by
(16) 
Let denote the average number of iterations required to decode a target layer for BMST codes. Since each iteration requires both a forward recursion ( layerupdates) and a backward recursion ( layerupdates), the total (average) computational complexity per window is given by
(17) 
Note that the number of decoded (target) bits for the window decoder at each time instant is , and thus the computational complexity per decoded bit for a BMST code is
(18) 
Now consider a regular SCLDPC code with lifting factor and decoding delay (the corresponding decoding window size ). Let denote the average number of iterations required to decode a target layer for SCLDPC codes. Note that the numbers of operations at a variable node and a check node of regular SCLDPC codes are 3 and 6, respectively. The average computational complexity (also measured by the total number of operations) per window is then given by
(19) 
where is the decoding latency. Note that the number of decoded (target) bits for the window decoder at each time instant is , and thus the computational complexity per decoded bit for a regular SCLDPC code is
(20) 
Codes  Complexity  

SCLDPC  2500  1  5  9.65  347.4 
BMST  1000  8  14  2.03  1136.8 
BMST  1500  8  9  3.20  1152.0 
Table II shows the average computational complexity per decoded bit of the regular SCLDPC code and the BMSTR codes used in Fig. 11 that achieve a BER of with a decoding latency of 30000 bits. The simulation parameters , , , , , and are also included. We observe that, though the average number of iterations for the BMST code is significantly less than for SCLDPC code, the computational complexity per decoded bit for the BMST codes is higher than for the SCLDPC code. However, the BMST codes outperform the SCLDPC code in the waterfall region (see Fig. 11 in Section VIB). This means that BMSTR codes, compared to regular SCLDPC codes, obtain performance gains at a cost of higher computational complexity.
Vii Conclusions
In this paper, we described BMST codes using both an algebraic description and a graphical representation for the purpose of showing that BMST codes can be viewed as a class of SC codes. Then, based on a modified EXIT chart analysis and finitelength computer simulations, we investigated the impact of several parameters (coupling width, Cartesian product order, and decoding delay) on the performance of BMST codes. We then examined the relationship between the Cartesian product order, the decoding delay, and the decoding performance of BMST codes for fixed decoding latency in comparison to SCLDPC codes, and a comparison of computational complexity was also presented. It was observed that, under the equal decoding latency constraint, BMST codes using the repetition code (BMSTR code) as the basic code can outperform regular SCLDPC codes in the waterfall region but have a higher error floor and a larger decoding complexity. An interesting future research topic to complement the work reported here is to embed a partial superposition strategy into the code design to further improve the performance of the original BMST codes for a given decoding latency.
Acknowledgment
The authors would like to thank Prof. Daniel J. Costello, Jr. for his helpful comments, polishing this paper, and invaluable contributions as a coauthor of the conference version of this paper [36]. They would also like to thank Mr. Chulong Liang from Sun Yatsen University for helpful discussions.
References
 [1] R. G. Gallager, LowDensity ParityCheck Codes. Cambridge, MA: MIT Press, 1963.
 [2] T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacityapproaching irregular lowdensity paritycheck codes,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 619–637, Feb. 2001.
 [3] M. Lentmaier, A. Sridharan, D. J. Costello, Jr., and K. S. Zigangirov, “Iterative decoding threshold analysis for LDPC convolutional codes,” IEEE Trans. Inf. Theory, vol. 56, no. 10, pp. 5274–5289, Oct. 2010.
 [4] S. Kudekar, T. J. Richardson, and R. L. Urbanke, “Threshold saturation via spatial coupling: Why convolutional LDPC ensembles perform so well over the BEC,” IEEE Trans. Inf. Theory, vol. 57, no. 4, pp. 803–834, Feb. 2011.
 [5] ——, “Spatially coupled ensembles universally achieve capacity under belief propagation,” IEEE Trans. Inf. Theory, vol. 59, no. 12, pp. 7761–7813, Dec. 2013.
 [6] D. G. M. Mitchell, M. Lentmaier, and D. J. Costello, Jr., “Spatially coupled LDPC codes constructed from protographs,” 2014, submitted to IEEE Trans. Inf. Theory. [Online]. Available: http://arxiv.org/abs/1407.5366
 [7] A. E. Pusane, R. Smarandache, P. O. Vontobel, and D. J. Costello, Jr., “Deriving good LDPC convolutional codes from LDPC block codes,” IEEE Trans. Inf. Theory, vol. 57, no. 2, pp. 835–857, Feb. 2011.
 [8] M. Lentmaier, M. M. Prenda, and G. P. Fettweis, “Efficient message passing scheduling for terminated LDPC convolutional codes,” in Proc. IEEE Int. Symp. on Inf. Theory, St. Petersburg, Russia, Aug. 2011, pp. 1826–1830.
 [9] N. ul Hassan, A. E. Pusane, M. Lentmaier, G. P. Fettweis, and D. J. Costello, Jr., “Reduced complexity window decoding schedules for coupled LDPC codes,” in Proc. IEEE Inf. Theory Workshop, Lausanne, Switzerland, Sept. 2012, pp. 20–24.
 [10] I. Andriyanova and A. Graell i Amat, “Threshold saturation for nonbinary SCLDPC codes on the binary erasure channel,” 2013, submitted to IEEE Trans. Inf. Theory. [Online]. Available: http://arxiv.org/abs/1311.2003
 [11] D. G. M. Mitchell, A. E. Pusane, and D. J. Costello, Jr., “Minimum distance and trapping set analysis of protographbased LDPC convolutional codes,” IEEE Trans. Inf. Theory, vol. 59, no. 1, pp. 254–281, Jan. 2013.
 [12] D. G. M. Mitchell, M. Lentmaier, A. E. Pusane, and D. J. Costello, Jr., “Randomly punctured spatially coupled LDPC codes,” in Proc. Int. Symp. Turbo Codes Iterative Inf. Process., Aug. 2014, pp. 1–6.
 [13] D. J. Costello, Jr., L. Dolecek, T. E. Fuja, J. Kliewer, D. G. M. Mitchell, and R. Smarandache, “Spatially coupled sparse codes on graphs: Theory and practice,” IEEE Communications Magazine, vol. 52, no. 7, pp. 168–176, July 2014.
 [14] P. M. Olmos and R. L. Urbanke, “A scaling law to predict the finitelength performance of spatially coupled LDPC codes,” 2014, submitted to IEEE Trans. Inf. Theory. [Online]. Available: http://arxiv.org/abs/1404.5719
 [15] K. Huang, D. G. M. Mitchell, L. Wei, X. Ma, and D. J. Costello, Jr., “Performance comparison of LDPC block and spatially coupled codes over GF,” IEEE Trans. Commun., vol. 63, no. 3, pp. 592–604, Mar. 2015.
 [16] X. Ma, C. Liang, K. Huang, and Q. Zhuang, “Block Markov superposition transmission: Construction of big convolutional codes from short codes,” IEEE Trans. Inf. Theory, 2015, to appear.
 [17] C. Liang, X. Ma, Q. Zhuang, and B. Bai, “Spatial coupling of generator matrices: A general approach to design good codes at a target BER,” IEEE Trans. Commun., vol. 62, no. 12, pp. 4211–4219, Dec. 2014.
 [18] A. J. Feltström, D. Truhachev, M. Lentmaier, and K. S. Zigangirov, “Braided block codes,” IEEE Trans. Inf. Theory, vol. 55, no. 6, pp. 2640–2658, June 2009.
 [19] W. Zhang, M. Lentmaier, K. S. Zigangirov, and D. J. Costello, Jr., “Braided convolutional codes: A new class of turbolike codes,” IEEE Trans. Inf. Theory, vol. 56, pp. 316–331, Jan. 2010.
 [20] S. Moloudi and M. Lentmaier, “Density evolution analysis of braided convolutional codes on the erasure channel,” in Proc. IEEE Int. Symp. on Inf. Theory, Honolulu, HI, June 2014, pp. 2609–2613.
 [21] B. P. Smith, A. Farhood, A. Hunt, F. R. Kschischang, and J. Lodge, “Staircase codes: FEC for 100 Gb/s OTN,” J. Lightwave Technol., vol. 30, no. 1, pp. 110–117, Jan. 2012.
 [22] S. Moloudi, M. Lentmaier, and A. Graell i Amat, “Spatially coupled turbo codes,” in Proc. Int. Symp. Turbo Codes Iterative Inf. Process., Bremen, Germany, Aug. 2014, pp. 82–86.
 [23] D. Divsalar, H. Jin, and R. J. McEliece, “Coding theorems for ‘turbolike’ codes,” in Proc. Allerton Conf., Urbana, IL, Sept. 1998, pp. 201–210.
 [24] H. D. Pfister and P. H. Siegel, “The serial concatenation of rate1 codes through uniform random interleavers,” IEEE Trans. Inf. Theory, vol. 49, no. 6, pp. 1425–1438, June 2003.
 [25] A. Abbasfar, D. Divsalar, and K. Yao, “Accumulaterepeataccumulate codes,” IEEE Trans. Commun., vol. 55, no. 4, pp. 692–702, Apr. 2007.
 [26] A. R. Iyengar, M. Papaleo, P. H. Siegel, J. K. Wolf, A. VanelliCoralli, and G. E. Corazza, “Windowed decoding of protographbased LDPC convolutional codes over erasure channels,” IEEE Trans. Inf. Theory, vol. 58, no. 4, pp. 2303–2320, Apr. 2012.
 [27] C. Liang, J. Hu, X. Ma, and B. Bai, “A new class of multiplerate codes based on block Markov superposition transmission,” 2014, submitted to IEEE Trans. Signal Process. [Online]. Available: http://arxiv.org/abs/1406.2785
 [28] J. Hu, X. Ma, and C. Liang, “Block Markov superposition transmission of repetition and singleparitycheck codes,” IEEE Commun. Lett., vol. 19, no. 2, pp. 131–134, Feb. 2015.
 [29] S. ten Brink, “Convergence behavior of iteratively decoded parallel concatenated codes,” IEEE Trans. Commun., vol. 49, no. 10, pp. 1727–1737, Oct. 2001.
 [30] G. Liva and M. Chiani, “Protograph LDPC codes design based on EXIT analysis,” in Proc. IEEE Global Commun. Conf., Washington, DC, Nov. 2007, pp. 3250–3254.
 [31] L. Wei, D. G. M. Mitchell, T. E. Fuja, and D. J. Costello, Jr., “Design of spatially coupled LDPC codes over GF() for windowed decoding,” 2014, submitted to IEEE Trans. Inf. Theory. [Online]. Available: http://arxiv.org/abs/1411.4373
 [32] A. E. Pusane, A. J. Felström, A. Sridharan, M. Lentmaier, K. S. Zigangirov, and D. J. Costello, Jr., “Implementation aspects of LDPC convolutional codes,” IEEE Trans. Commun., vol. 56, no. 7, pp. 1060–1069, July 2008.
 [33] G. D. Forney, Jr., “Codes on graphs: Normal realizations,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 520–548, Feb. 2001.
 [34] S. ten Brink, G. Kramer, and A. Ashikhmin, “Design of lowdensity paritycheck codes for modulation and detection,” IEEE Trans. Commun., vol. 52, no. 4, pp. 670–678, Apr. 2004.
 [35] X. Ma and L. Ping, “Coded modulation using superimposed binary codes,” IEEE Trans. Inf. Theory, vol. 50, pp. 3331–3343, Dec. 2004.
 [36] K. Huang, X. Ma, and D. J. Costello, Jr., “EXIT chart analysis of block Markov superposition transmission of short codes,” 2015, accepted by Proc. IEEE Int. Symp. on Inf. Theory. [Online]. Available: http://arxiv.org/abs/1502.00079