Efficient ErrorCorrecting Codes in the Short Blocklength Regime
Abstract
The design of block codes for short information blocks (e.g., a thousand or less information bits) is an open research problem that is gaining relevance thanks to emerging applications in wireless communication networks. In this paper, we review some of the most promising code constructions targeting the short block regime, and we compare them with both finitelength performance bounds and classical errorcorrection coding schemes. The work addresses the use of both binary and highorder modulations over the additive white Gaussian noise channel. We will illustrate how to effectively approach the theoretical bounds with various performance versus decoding complexity tradeoffs.
keywords:
Short packets, errorcorrecting codes, finitelength performance bounds, coded modulation. ARA
 accumulaterepeataccumulate
 AR3A
 accumulaterepeat3accumulate
 ARJA
 accumulaterepeatjaggedaccumulate
 AWGN
 additive white Gaussian noise
 BDMC
 binaryinput discrete memoryless channels
 BMC
 binaryinput memoryless channels
 BCJR
 BahlCockeJelinekRaviv
 BEAST
 a bidirectional efficient algorithm for searching trees
 BP
 belief propagation
 BPSK
 binary phase shift keying
 BCH
 BoseChaudhuriHocquenghem
 biAWGN
 binaryinput additive white Gaussian noise
 CC
 convolutional code
 CRC
 cyclic redundancy check
 CCSDS
 Consultative Committee for Space Data Systems
 CER
 codeword error rate
 DE
 density evolution
 EXIT
 extrinsic information transfer
 eMBB
 enhanced Mobile Broadband
 HARQ
 hybrid automatic repeat request
 FFT
 fast Fourier transform
 GA
 Gaussian approximation
 LDPC
 lowdensity paritycheck
 LLR
 loglikelihood ratio
 LTE
 long term evolution
 LVA
 list Viterbi algorithm
 ML
 maximum likelihood
 NR
 New Radio
 NA
 normal approximation
 OSD
 ordered statistics decoding
 PEG
 progressive edge growth
 RA
 random access
 RCB
 random coding bound
 SC
 successive cancellation
 SCL
 successive cancellation (SC) list
 SNR
 signaltonoise ratio
 SISO
 softinput softoutput
 SPB
 sphere packing bound
 TB
 tailbiting
 WAVA
 wraparound Viterbi algorithm
 SE
 spectral efficiency
 BRGC
 binary reflected gray code
 PAS
 probabilistic amplitude shaping
 CCDM
 constant composition distribution matching
 SMDM
 shell mapping distribution matching
 DM
 distribution matcher
 RCU
 random coding union
 MC
 metaconverse
 NA
 normal approximation
1 Introduction
During the past sixty years, a formidable effort has been focused on the research of capacityapproaching error correcting codes (1). Initially, the attention was directed to short and mediumlength linear block codes (2) (with some notable exceptions, see, e.g., (3; 4)), mainly for complexity reasons. As the idea of code concatenation (5) became established in the coding community (6), the design of long channel codes became a viable solution to approach the channel capacity. The effort resulted in a number of practical code constructions allowing reliable transmission at fractions of a decibel from the Shannon limit (7; 8; 9; 10; 11; 12; 13; 14; 15; 16) with lowcomplexity (suboptimum) decoding.
The interest in short and medium blocklength codes (i.e., codes with dimension in the range of to bits) has been rising again recently, mainly due to emerging applications that require the transmission of short data units. Examples of such applications are machinetype communications, smartmetering networks, the internet of things, remote command links and messaging services (see, e.g., (17; 18; 19; 20)). Due to these new emerging applications, renewed interest has been placed not only in the design of efficient short codes, but also in the development of tight bounds on the performance attainable by the best error correcting schemes, for a given blocklength and rate (21; 22; 23; 24). Tight bounds are now available as benchmarks not only for the unfaded additive white Gaussian noise (AWGN) case but also for fading channels (25; 26).
When the design of short iterativelydecodable codes is attempted, it turns out that some classical code construction tools that have been developed for turbolike codes tend to fail to provide codes with acceptable performance. This is the case, for instance, for density evolution (27) and extrinsic information transfer (EXIT) charts (28), which are wellestablished techniques to design powerful long lowdensity paritycheck (LDPC) and turbo codes. The reason is the asymptotic (in blocklength) nature of density evolution and EXIT analysis, which fail to model accurately the iterative decoder in the short blocklength regime. However, competitive LDPC and turbo code designs for moderate length and small blocks have been proposed, mostly based on heuristic construction techniques (29; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50). While iterative codes retain a large appeal due to their low decoding complexity, more sophisticated decoding algorithms (51; 52; 53; 54; 55; 56; 57) are feasible for short blocks leading to solutions that are competitive with (if not superior to) iterative decoding of short turbo and LDPC codes.
In this paper, we review some fundamental results on the performance achievable by codes in the short blocklength regime. This will allow us to lay the ground for a proper performance comparison among various codes and decoding algorithms. The comparison will be provided for the (unfaded) AWGN channel case with both binary modulation and highorder modulations. In the former case, the goal is to compare pure code performance, whereas in the latter case we shall see how different coding schemes can be efficiently coupled with highorder modulations, with and without shaping. The performance comparison will be provided in terms of block error rate, also referred to as codeword error rate (CER), versus signaltonoise ratio (SNR) with SNR given either by the ratio (here, is the energy per information bit and the singlesided noise power spectral density) or by the ratio (with being the energy per modulation symbol).
The remaining part of the paper is organized as follows. Section 2 reviews the fundamental limits for channel coding in the short blocklength regime. Some powerful classical short codes as well as efficient decoding algorithms are discussed in Section 3. Modern code constructions tailored to the transmission of short blocks are presented in Section 4. A comparison of various schemes is provided in Section 5. Conclusions follow in Section 6.
2 FiniteLength Performance Limits
In the following sections of the paper, with the exception of Section 5.3, we will focus on the problem of how to optimally transmit bits of information using the discretetime memoryless binaryinput additive white Gaussian noise (biAWGN) channel times
(1) 
Here, , denotes a sequence of independent and identically distributed samples of the AWGN process. We shall assume that these samples are realvalued Gaussian random variables with zero mean and unit variance. Each of the input symbols belongs to the binary set . The constant models the transmit power and, hence, the SNR, since the noise has unit variance. Finally, is the sequence of received symbols.
To convey the information bits, we use a coding scheme, which consists of: i) An encoder that maps the bit message into one out of dimensional codewords with elements in . We shall refer to the set of codewords together with the encoder as an code and to as the blocklength of the code. ii) A decoder that maps the received symbols corresponding to the transmitted message into an estimated bit message .
The message (codeword) error probability of a given coding scheme, which we denote by , is
(2) 
We stress that different decoders may be applied to a given code, yielding different error probabilities.
The rate of an code is . We also let be the minimum error probability for which one can find a coding scheme with blocklength and rate . This quantity describes the fundamental tradeoff between blocklength , rate , and error probability in the transmission of information. Unfortunately, determining exactly is a daunting task. Indeed, computing for the biAWGN channel (1) involves an exhaustive search over codes, which is infeasible for values of and of practical interest.
However, the asymptotic behavior of in the limit for fixed is well understood—a result known as Shannon’s coding theorem (1). Specifically, vanishes in the limit for all rates below the socalled channel capacity (1), whereas as for all rates above . In other words, the sequence of functions converges to a step function centered at in the limit .
For the biAWGN channel (1), the capacity (measured throughout the paper in bits per channel use) is given by
(3) 
The achievability part of Shannon’s coding theorem relies on a random coding argument and does not suggest practical capacityapproaching coding schemes. However, several advances in the coding community over the last sixty years have resulted in several lowcomplexity coding schemes that approach capacity (see e.g., (27; 58)).
In this paper, we are concerned with the less studied problem of how to approach when the blocklength is short. For the problem to be well posed, we need ways to estimate accurately. Characterizing for a fixed blocklength is a classic problem in information theory, and many nonasymptotic upper (achievability) bounds and lower (converse) bounds are available for the biAWGN channel (1), such as Gallager’s random coding bound (RCB) (59), and Shannon’s spherepacking bounds ‘59 (SPB59) (60) and ‘67 (SPB67) (61; 62). Also, many nonasymptotic results are available on the error probability achievable using linear block codes and maximum likelihood (ML) decoding (see (23) and reference therein).
Over the last ten years, a renewed interest in the performance of communication systems operating in the shortblocklength regime, has resulted in a significant improvement in the tightness of the best available achievability and converse bounds for many communication channels of practical interest, including the wellstudied biAWGN channel (1).
To showcase such improvements, we will focus in this paper on two specific classes of bounds, namely converse bounds based on the so called metaconverse (MC) theorem (24, Thm. 26), and achievability bounds based on the random coding union (RCU) bound (24, Thm. 16).
The MC theorem provides a general framework that allows one to recover all previously known converse bounds on (hence, its name). The theorem exploits the existence of a fundamental relation between the problem of determining the error probability of a given code under ML decoding and binaryhypothesis testing (63). The resulting converse bound is parametric in an auxiliary output distribution (i.e., a marginal distribution on the output vector ), which, if chosen suitably, results in a remarkably tight bound that admits an efficient numerical implementation by using the saddlepoint approximation (64).
Similar to Gallager’s RCB, the RCU bound relies on the analysis of the performance of a random coding ensemble under ML decoding. As indicated by its name, a crucial step to obtain the RCU is a judicious use of the union bound. An attractive feature of this bound, is that it generalizes naturally to mismatched decoding metrics (65; 66), which enables its use in practically relevant scenarios, such as pilotassisted transmission over fading channels (67). Similarly to the MC bound, the use of a saddlepoint approximation allows one to evaluate numerically the RCU bound for the biAWGN channel (1) in an efficient way (68).
By characterizing both MC and RCU bounds in the asymptotic limit for a fixed , one obtains a more precise characterization of the behavior of for large than the stepfunction approximation obtained via Shannon’s capacity. Specifically, one can show that for the biAWGN channel (1),
(4) 
where is the channel capacity given in (1), is the socalled channel dispersion
(5) 
is the Gaussian function and comprises terms that can be upperbounded by a constant for all sufficiently large . The approximation on obtained by neglecting the term in (4) is usually referred to as normal approximation (NA).
We now illustrate the tightness of the MC bound, of the RCU bound, and of the NA through some numerical examples. In Fig. 1, we plot the MC computed using the exponentachieving auxiliary output distribution given in (64, Eq. (28)), the RCU, and the NA for the case and (hence, ). Here, and throughout the paper, the bounds on given by the MC and the RCU are plotted as a function of the energy per information bit
(6) 
For comparison we also illustrate the SPB59 (which, for the parameters chosen in the figure, is tighter than the adaptation of the SPB67 bound given in (62)) and Gallager’s RCB. As one sees from the figure, the MC and the RCU bounds delimit tightly the that is achievable for the chosen blocklength and information payload for a large range of values. For example, the two bounds predict that the minimum energy per bit to operate at a CER of is between dB and dB. The SPB59 and the RCB are looser and give the wider range dB and dB. Note also that the NA provides an accurate estimate of the minimum codeword error probability, which lies between the MC and the RCU bounds. As we shall see, this is not a general phenomenon and one can find practically relevant scenarios for which the NA ceases to be as accurate.
In Figure 2, we plot the MC, the RCU, and the NA for and . As the blocklength increases, the gap between MC and RCU diminishes. Also Figure 2 allows one to estimate the speed at which converges to a step function centered at dB, which is the minimum required to communicate at a rate in the asymptotic limit . The gap to the asymptotic limit for is about dB at a CER of . The NA is accurate in all the scenarios considered in the figure.
Finally, in Figure 3 we plot as a function of the rate for a fixed SNR dB, i.e., the SNR value for which capacity is , and for . As in the previous figure, the bounds become increasingly tight as grows. One can also see that the NA loses accuracy when one operates at small error probability and small —a relevant scenario for ultrareliable lowlatency communications.
3 Classical Short Codes
In this section, we will review a few approaches for efficient transmission at short blocklengths which rely on (or can be applied to) classical errorcorrecting code families. The first approach is based on a general decoding algorithm called ordered statistics decoding (OSD) (51) that can be applied to any (binary) linear block code. As we shall see, OSD delivers nearML performance for short codes with manageable complexity. However, OSD becomes unfeasible when the blocklength increases. The second approach relies on tailbiting (TB) convolutional codes and an efficient nearML decoding algorithm based on the recursive application of Viterbi decoding to the TB trellis of the code.
3.1 Short Algebraic Codes under Ordered Statistics Decoding
Consider an binary linear block code . Under ML decoding, the decision is given by
(7) 
with being the channel transition probability (we assume the channel to be an arbitrary binaryinput memoryless channel). The evaluation of (7) involves a number of computations that grow exponentially in , unless the code exhibits some structure that enables an efficient implementation of the ML search. OSD reduces the decoding complexity by limiting the search to a subset of the codewords, i.e., to a list . Hence, decoding reduces to
(8) 
The decoding complexity is directly related to the list size. OSD uses a particularly effective approach for the list construction, which is based on ranking the symbolwise channel observations in decreasing order of reliability (51; 69). The received vector is permuted accordingly, yielding a vector whose first components are the most reliable channel observations. The columns of the code generator matrix are permuted accordingly. The permuted generator matrix is then put in systematic form.^{1}^{1}1This may require additional column permutations, which shall be applied to too; this step is required if the leftmost columns of are not linearly independent. The additional permutations aim at having in the first positions of the most reliable information set. The first observations in are used to obtain (via bitbybit hard detection) a bit vector . All error patterns of Hamming weight up to (where is a parameter of the OSD algorithm) are then added to , generating a set of vectors of cardinality . Each vector is then encoded via the systematic form of , yielding the list . Typically, the OSD parameter is kept small because the list size grows quickly with . OSD relies on the idea that, if one takes a hard decision on the most reliable channel observations, only few errors are typically observed, whereas the majority of the errors introduced by the channel are typically associated with the least reliable channel outputs.
OSD works remarkably well with short codes, enabling nearML decoding for small values of the parameter . However, as the blocklength grows, must be increased to keep the decoder performance close to optimal. For example, while for the Golay code choosing is enough to approach the ML decoding limit, for a extended BoseChaudhuriHocquenghem (BCH) code one needs to set as large as . Figure 4 shows the performance in terms of CER vs. for a extended BCH code with and on the biAWGN channel. For the case of , the performance is within dB from the NA at . With the gap increases to dB.
OSD does not require any specific code property (besides linearity). However, some knowledge of the code distance spectrum can be used to simplify the decoder by introducing an early stopping criterion. Consider the example of the transmission over the biAWGN channel. Assuming that the coded bits are mapped to symbols in the set , the minimum Euclidean distance between modulated codewords is where is the code minimum Hamming distance. It follows that the list construction can be halted if a codeword at a Euclidean distance less than from the channel observation is generated. This simple trick yields remarkable savings on the average list size at moderatelarge SNRs (51). Another simple approach to limit the complexity of OSD consists of applying OSD only if decoding with a lowercomplexity algorithm has failed. The idea was explored, for instance, in (70; 71) in the context of iterative decoding of LDPC codes. Here, the OSD can either intervene if the belief propagation (BP) decoder fails to converge to a valid codeword, or it can be even integrated within the iterative decoding algorithm by exploiting updated reliability estimates computed by the BP decoder. A number of additional improvements on the efficiency of OSD algorithms were further proposed during the past two decades (see, e.g., (52; 55) and the references therein).
3.2 Tailbiting Convolutional Codes
Short codes based on (compact) TB trellises have been the subject of thorough studies from both a theoretical and a practical viewpoint (72; 73; 74; 75; 76). In particular, in (74; 75) TB CCs with excellent distance properties were proposed for various encoder memories, code rates, and blocklengths. The TB structure of the trellis hinders the adoption of the standard Viterbi decoder. In fact, ML decoding of a TB CC may be naively achieved by starting Viterbi decoders in parallel, where is the number of states in a trellis section. Each Viterbi decoder will have a different assumption on the starting state (that shall coincide with the final state due to the TB constraint). The paths selected by the Viterbi decoders can be then used to form a list, within which lies the codeword that maximizes the likelihood . This solution may become expensive from a computation viewpoint already for moderatesize encoder memories. A simple alternative to this approach is given by the wraparound Viterbi algorithm (WAVA) (77). The WAVA is based on the recursive application of the Viterbi algorithm. In particular, one round of the Viterbi algorithm is applied to the TB trellis at each iteration, using the final state probabilities computed in the past iteration as initial state probabilities, with the first round assuming the initial states to be equally likely. At the end of each iteration, the decoder checks if the selected path starts and ends in the same state (hence, fulfilling the TB constraint). If the check is satisfied, then the decoder is stopped and the selected path is declared as final decision. Otherwise, another iteration of the Viterbi algorithm is performed. The process can be iterated for some preset maximum number of times. It turns out that, for many TB CCs, four iterations are sufficient to attain nearML performance.
Figure 5 shows the performance in terms of CER vs. of binary TBCCs with different memory and polynomials as specified in Table 1. For the case of the performance is within dB from the NA at . This results show that TBCCs work very well for short blocks. Unfortunately, as we will see in Section 5, the memory must be increased as the blocklength grows in order to approach the finitelength bounds, rendering the scheme less practical.^{2}^{2}2For large memory, sequential decoding algorithms may be considered to reduce the decoding complexity. We refer the reader to (76) for a thorough presentation of sequential decoders, including the advanced a bidirectional efficient algorithm for searching trees (BEAST) algorithm of (78).
3.3 CRC/TBCC Concatenation
An alternative is the concatenation of CRC errordetection code with a punctured tailbiting convolutional code. Because the addition of the CRC code substantially increases coding overhead for short blocklengths, puncturing of the TBCC can be used to reduce the overhead back to the original level. Further, it is possible to jointly decode the CRC and TBCC codes so that the concatenated code errorcorrection performance and errordetection performance are both superior to that of the TBCC operating on its own.
One algorithm for decoding the cyclic redundancy check (CRC)/TB CC concatenation is the list Viterbi algorithm (LVA) (79). The LVA would keep a list of the best paths to the termination node and choose the one that passed the CRC check. Of course, since the CC is a TB code, the algorithm would have to be a list version of the WAVA algorithm.
One may consider the concatenation of a CRC code with generator and a CC with generators to be a catastrophic CC (or encoder) with generators In principle then, because the encoding will be terminated, one may decode the CRC/TB CC combination with a WAVA. Of course, there will be a large number of states, making the decoder very complex. Even if the application allowed such a large complexity, this approach gives up the ability to reliably detect errors at the decoder output.
An algorithm that nicely trades off the errorcorrection and errordetection capabilities of the CRC/TB CC will now be presented.^{3}^{3}3To our knowledge, this algorithm has not been previously presented in the literature. To simplify the presentation, we start with the assumption that there is a softinput softoutput (SISO) trellis decoder for the TB CC that has full knowledge of the starting and ending state of the TB CC encoder. We provide Algorithm 1 with the necessary definitions given below:

weak position unreliable position, bit position whose loglikelihood ratio (LLR) has small magnitude

current number of weak positions under test; hypothesized extrinsic information will be placed in these positions

current bit pattern being tested in weak positions

MaxWeak maximum value of (Typically, MaxWeak .)

int mask[MaxWeak+1]

strong 1 large positive value (e.g., 100.0) used for extrinsic information

strong 0 large negative value (e.g., 100.0) used for extrinsic information

weakposn weakest position found after decoding with candidate (or hypothesized) bit pattern as strong 0’s and 1’s placed in the weak positions via extrinsic information

value of bit in binary representation of integer (least significant bit is bit )

strong strong 1 if , strong 0 if
Note that the decoder uses no extrinsic information the first time through the outer for loop, after which the weakest LLR position is found (if CRC fails). The second time through the outer loop, strong 0 and then strong 1 extrinsic values are tested in the weakest position. The strong 1 is attempted only if the CRC fails when strong 0 is tested. New weakest positions are found after each strong value is attempted. The third time through the loop, twobit patterns of strong 0’s and strong 1’s are attempted, each time checking the CRC and finding the8 newest weakest position if the CRC fails. The algorithm continues until there is a passed CRC event or the outer loop completes.
It should be clear from the algorithm that the larger the value of MaxWeak, the better the errorcorrection performance and the worse the errordetection performance. Note also, at low SNR values, there can be up to SISO decodings—quite a large number. However, most applications with short blocklengths do not require decoding at high speeds. Also, as we shall see, the errorcorrection performance of this algorithm can be very good and it can be easily traded off with errordetection performance by decreasing MaxWeak.
The CRC code we consider has generator polynomial . The TB CC we consider has generator polynomials . With 64 input bits, the natural parameters for this CRC/TB CC are . Consequently, to attain a code, we puncture every fifth bit of the encoder output, starting with the third bit.
The SISO decoder employs the BahlCockeJelinekRaviv (BCJR) algorithm. We consider two situations: (1) the TB CC encoder’s starting and ending state is unknown to the SISO decoder and (2) the starting and ending state is known to the SISO decoder. For the first case, we use a WAVAlike approach in the BCJR decoder. We justify the second case by arguing that, in many applications, a packet number or an identification number is expected. Such a number can be moved to the end of the TB CC encoder input word so that the encoder starting and ending state is known.
Figure 6 plots the performance of the CRC/TB CC under consideration on the biAWGN channel for the unknownstate and knownstate cases. Decoder parameter MaxWeak was set to 10. As seen in the figure, the unknownstate case is superior to the turbo and LDPC codes in Figure 5. Although it is unfair to compare the knownstate case to the bounds, we see that the knownstate CER curve is about 0.1 dB to the right of the SPB59 curve.
As for errordetection performance, we first define to be the probability that an error at the decoder output is undetected by the decoder. For the simulation curves in the figure for which MaxWeak = 10, we measured to be just under 0.1 for both cases. With MaxWeak = 4, we measured to be less than 0.001 for both cases. For MaxWeak = 4, the knownstate CER curve moves rightward about 0.4 dB and the unknownstate CER curve moves rightward about 0.7 dB.
4 Modern Short Codes
In the following subsections, we briefly review some of the best constructions of modern channel codes for short blocklengths. The review includes both binary and nonbinary turbo and LDPC codes, as well as polar codes.
4.1 Binary Turbo and LDPC Codes
For short blocklengths, turbo and LDPC codes are typically outperformed by the code classes discussed in the previous section. Their performance becomes competitive in the moderate blocklength regime thanks to their linear (in blocklength) decoding complexity for a fixed number of iterations. While this holds true for both binary and nonbinary turbo/LDPC codes, binary turbo and LDPC codes retain a larger appeal from the perspective of decoder complexity.
Binary turbo codes (7) have been successfully included as a channel coding scheme in the 3G/4G cellular standards. Turbo codes are known to provide excellent coding gains in the moderate blocklength regime and (if carefully designed) at short blocklengths as well. If low error rates are required (), a convenient design choice is to adopt state component codes, i.e., to use memory convolutional codes in the parallel concatenation, together with TB termination for the component codes (81). The small size of the information word permits an efficient interleaver optimization. Codematched (82) and protographbased (83) interleavers in particular turn out to be very effective in lowering error floors. The performance of two turbo codes with memory and memory component codes is provided in Figure 7. The first code is from the long term evolution (LTE) standard, whereas the second code has been designed with the interleaver construction of (37) and exploits TB component codes. The second code performs fairly close to the RCB, and nearly dB away from the NA at . The LTE turbo code loses almost dB at the same target CER. Remarkably, the simple state construction provides a performance that is among the best achievable by binary iterativelydecodable codes, at least down to moderate error rates.
LDPC codes (4) are particularly attractive thanks to their excellent performance and to the possibility of developing highthroughput iterative decoders based on the codes’ Tanner graph (84) with a large degree of parallelism. LDPC codes can be subdivided into two broad categories: unstructured and structured LDPC codes (27; 58). For an unstructured LDPC code, the code paritycheck matrix is designed according to a computerbased pseudorandom algorithm that places the nonzero entries (aiming, for instance, at maximizing the girth of the corresponding Tanner graph (85)). Unstructured LDPC codes are rarely implemented in practice (86). Among structured LDPC codes, protographbased codes (86; 87) are particularly interesting from a decoder implementation viewpoint. A protograph is a relatively small graph from which a larger Tanner graph can be obtained by a copyandpermute procedure: the protograph is copied times, and then the edges of the individual replicas are permuted among the replicas (under some restrictions described in (86)) to obtain a single, large graph. The parameter is often referred to as a lifting factor.
When cyclic edge permutations are used, the code associated with the Tanner graph is quasicyclic, facilitating the implementation of efficient encoders and decoders (58; 88). Powerful protograph LDPC codes have been designed during the past decade (89). A class of protograph LDPC codes that performs remarkably well down to short blocklengths is that of the accumulaterepeataccumulate (ARA) codes (90). The performance of an ARA code is provided in Figure 7. The code performs close to the LTE turbo code. An error floor appears at a CER below . The performance of an LDPC code based on a slightly modified protograph, dubbed accumulaterepeatjaggedaccumulate (ARJA) (89), is provided too. The ARJA code trades a negligible loss in the waterfall region for a superior performance at large SNRs, i.e., it has a lower error floor.
Another class of protograph LDPC codes with excellent performance is the one proposed in (91), which relies on the concatenation of an outer highrate LDPC code with an inner LDPC code. The inner LDPC code construction resembles an LT code (92), resulting in an overall LDPC code structure that closely mimics that of a Raptor code (93) (the main difference is that here the bits at the input of the inner LT encoder are, with the exception of the punctured ones, sent over the channel). This design paradigm has been adopted in the 5G standard (94).
In particular, the upcoming 5G New Radio (NR) standard foresees the use of two protographbased codes for its enhanced Mobile Broadband (eMBB) use case. Their design reflects the requirements for 5G NR, which includes the support of a wide range of blocklengths and code rates and a naive integration of hybrid automatic repeat request (HARQ). Additionally, the nested structure of the codes and the quasicyclic lifting allow a hardwarefriendly implementation with minimal description complexity as well as various possibilities for parallelization. Base graph^{4}^{4}4In the 5G NR jargon, base graph is synonymous with protograph. 1 (BG 1) targets larger blocklengths and higher rates (, ), whereas base graph 2 (BG 2) is optimized for smaller blocklengths and lower rates (, ). Both base graphs make use of punctured variable nodes. This construction is known to significantly improve the decoding threshold (89). We observe in Figure 7 that the 5G NR LDPC code based on BG 2 even slightly outperforms the ARA code with the same code parameters. In contrast, the performance of an LDPC code constructed from BG 1 (which is optimized for larger blocklengths and higher code rates) is severely degraded due to its poor minimum distance.
As described in Section 3.1, a conceptually simple improvement to the BP decoder performance can be obtained by applying OSD whenever the BP decoding fails to converge to a valid codeword. In Figure 7, we provide the performance of regular LDPC code under the BP decoder followed by an additional OSD (with order ) step applied whenever the BP decoder fails after a maximum number of iterations. The performance gain over iterative decoding is around dB at . However, the gap reduces to dB at . Besides OSD, another list decoding algorithm that can improve remarkably the performance of short (binary) LDPC codes is the bit flipping (BF) algorithm proposed in (95, Algorithm 8.6). In Figure 7, we depict the performance of this algorithm when applied to the regular LDPC code, for the case of a maximum number of bit flips. The gain over iterative decoding exceeds dB at .
4.2 NonBinary Turbo and LDPC Codes
Turbo codes constructed over nonbinary finite fields were originally investigated in (39). In (48), a design based on memory (in terms field elements) timevarying recursive TB encoders was proposed, which yields among the best known performance for iterativelydecodable short codes down to very low error rates. The construction is particularly effective for relatively large finite fields (e.g., and ). The componentcode BCJR decoder can be efficiently implemented by means of the fast Fourier transform (FFT) (40; 48), yielding remarkable savings in complexity (although the decoding complexity remains considerably larger than that of a binary turbo code). Further efficient decoder implementations have been recently investigated in (96) showing how most of the coding gains can be preserved even when dramatically reducing the decoding complexity.
Nonbinary LDPC codes (38) based on ultrasparse paritycheck matrices (41) match tightly the performance of nonbinary turbo codes, down to very low error rates, when constructed on finite fields of order larger than or equal to (97; 47). In fact, it was shown in (48) that nonbinary turbo codes based on memory timevarying recursive TB encoders admit a simple protograph LDPC representation, and correspond to a special class of nonbinary ultrasparse LDPC codes. While the decoding of nonbinary LDPC codes can be largely simplified by employing the FFT at the check nodes (with probabilitydomain decoding), efficient implementations in the logdomain are still an area of active research (98).
Figure 8 shows the performance of nonbinary turbo/LDPC codes with block length and dimension on the biAWGN channel. Both codes are constructed on . The LDPC code has been considered for standardization within Consultative Committee for Space Data Systems (CCSDS) (as errorcorrecting code for satellite telecommand) (97; 99; 100) and it has been designed according to the method proposed in (41). The turbo code has been designed according to the method proposed in (48). Both codes perform almost identically down to very low error rates, almost matching the RCB.
Also for the nonbinary case, a further decoding step based on OSD decoding can be applied to any iterativelydecodable code whenever the BP decoder fails. As an example, Figure 8 reports the performance of a nonbinary LDPC code constructed on on the biAWGN channel. After iterative decoding, when the decoder output does not fulfill the code paritycheck equations, an additional OSD step is applied with set to . Specifically, OSD is applied to the binary image of the nonbinary LDPC code. The performance is very close to the one attained by the extended BCH code, first presented in Figure 4, gaining dB over the code performance under BP decoding. Considering as reference an SNR of dB, the BP decoder for the nonbinary LDPC code fails with a probability close to . Hence, the OSD is effectively activated only for a very small fraction of the transmissions.
4.3 Polar Codes
Polar codes (101; 102) are the first class of provably capacityachieving codes with low encoding/decoding complexity over any symmetric binaryinput memoryless channels (BMC) under SC decoding (102). The underlying idea behind polar codes, called channel polarization, is to take the independent copies of a symmetric BMC and convert them into noiseless and useless synthetic channels by applying a transform to input bits and by imposing a decoding order so that coding becomes trivial: Transmit information bits over the noiseless synthetic channels while inputs to the useless ones are set (frozen) to a predetermined value, e.g., to , and the decoder knows these bits before transmission. Those input bits are called frozen bits. As the number of polarization steps grows, the fraction of noiseless synthetic channels tends to the channel capacity, while the fraction of useless channels tends to its complement to one.
For a given SNR, constructing an polar code requires one to find the least reliable synthetic channels, or equivalently, bit positions. The design is not universal, i.e., the polar code design differs depending on the channel quality. Monte Carlobased designs were proposed in (102; 101), while a density evolution (DE)based construction is introduced in (103). An efficient implementation for DE is provided in (104) together with an analysis providing lower and upper bounds for the reliabilities of the bit positions. A simple approach to implement DE using the Gaussian approximation (GA) was proposed in (105). Other methods based on a partial order among the positions were proposed in (106; 107). These methods allow one to design frozen bit sequences that show a good behavior for a wide range of channel parameters and rates. This has been of particular importance during 5G standardization (108) with its strong emphasis on lowering the description complexity.
Although being capacityachieving under SC decoding, the effectiveness of polar codes for short blocklengths comes only after modifying both the decoder and the code, i.e., by employing the list (SCL) decoder of (57) aided by the addition of an outer highrate code (typically, a CRC code). In fact, in the short and moderateblocklength regimes the performance of SC decoding of polar codes falls short of the performance under ML decoding. In the SCL decoding algorithm, a set of SC decoders work in parallel producing a list of different codeword candidates for the same observed channel output. The complexity of the algorithm is linear in the list size. The outer highrate code is used to improve the distance properties of the resulting code. Specifically, the outer code (e.g., CRC check) is used to test the list of codewords produced by the SCL decoder. Among the survivors, the one with the largest likelihood is picked as the decoder output.
The performance of a polar code designed by using the GA of DE for the biAWGN channel with dB under SC and SCL decoding is shown in Figure 9. By increasing the list size, closetoML performance is achieved. In fact, a lower bound on the ML error probability can be obtained by artificially introducing the correct codeword in the final list, prior to the final selection. One can see from the figure that the lower bound on the ML error probability is approached quickly as the list size grows. Already for , the gap from the the ML lower bound is nearly invisible for the setup considered in the figure. The performance of the concatenation of a polar code with a CRC code as an outer code is shown as well. The inner polar code was designed for dB. The outer CRC code has generator polynomial , leading to a code with dimension . A list size of has been used in the simulation. The code performs remarkably close to the RCU bound down to low error rates.
5 Code Comparison: Examples
5.1 Very Short Codes
In this section, we summarize the results reported in the previous sections about the performance of very short codes over the biAWGN channel. We focus on codes with blocklength and code dimension bits. The performance of the codes is compared in Figure 10. As reference, the performance of the binary protographbased (33) LDPC code from the CCSDS telecommand standard (99) is provided too. The CCSDS LDPC code performs somehow poorly in terms of coding gain and is outperformed by the ARA LDPC code.^{5}^{5}5All LDPC codes considered in this section have been designed by means of a girth optimization based on the progressive edge growth (PEG) algorithm (85). A maximum of belief propagation iterations have been used in the simulations (although the average iteration count is much lower, especially at high SNRs, thanks to early decoding stopping rules). At low error rates (e.g. ) the CCSDS LDPC code is likely to attain lower error rates than the ARA code thanks to its remarkable distance properties (33). Among the LDPC codes adopted for the 5G NR standard, the codes based on BG 2 are seen to be competitive, outperforming the ARA code.
The performance of a turbo code introduced in (109) based on state component recursive convolutional codes is also provided. The turbo code shows superior performance with respect to binary LDPC codes, down to low error rates. The code attains a at almost dB from the RCB. The code performance diverges remarkably from the RCB at lower error rates, due to the relatively low code minimum distance. Results for a nonbinary LDPC code are included in Figure 10. The code has been constructed over , and it attains visible gains with respect to its binary counterparts, performing on top of the RCB (and dB away from the NA) down to low error rates (no floors down to were observed in (97)). The error probability of the polarcode concatenation using a CRC as an outer code is shown. The polar code has parameters . A list size of has been used in the simulation. The code outperforms all the competitors that rely on iterative decoding algorithms. Finally, the CER of three TB CCs is included (110; 111). The three codes have memory , and , respectively. Their generators (in octal notation) and their distance properties are summarized in Table 1. The WAVA algorithm has been used for decoding (77). The memory convolutional code reaches the performance of the BCH and LDPC codes under OSD. The memory code loses dB at , but still outperforms binary LDPC and turbo codes over the whole simulation range. The third code (memory) outperforms all other codes in Figure 10 (at the expense of a high decoding complexity due to the large number of states in the code trellis).
5.2 Moderatelength Codes
In this section, we address a second case study, where an intermediate blocklength of bits is considered. The code dimension is fixed to bits yielding a rate . The performance of the codes is compared in Figure 11 for transmission over the biAWGN channel. Also here, the performance of the binary protographbased (33) LDPC code from the CCSDS telecommand standard (99) is provided as a reference. Most of the considerations that are valid in the very short blocklength regime are still valid here, with a few notable exceptions. First, we observe that the performance of the polar code (concatenated with an outer bits CRC code) is still competitive, but it performs only marginally better than binary LDPC and turbo codes when the list size is limited to . To close the gap to the finite length bounds, a larger list size (e.g., ) has to be used. A second major discrepancy with respect to the very short block regime deals with the performance of TB CCs. For the code parameters considered in Figure 11, TB CCs are far from the finite length bounds even for the memory case. This is an instance of a well known limitation of (TB) CCs, i.e., the saturation, for large enough , of the TB CC minimum distance to the free distance of the underlying (unterminated) convolutional code (in addition, the minimum weight multiplicity grows with ). This phenomenon is illustrated in Figure 12, where the SNR required to achieve a target is provided as a function of the code dimension , for various code families.
5.3 Short Codes in Coded Modulation Schemes
Higherorder modulation increases the spectral efficiency (SE) of a communication system by using constellations with more than two signal points (e.g., amplitude shift keying (ASK) or quadrature amplitude modulation (QAM)) and transmitting more than one bit per channel use (112). As this requires an interplay of both modulation and coding techniques, the term “coded modulation” (CM) has been established. The most straightforward CM approach combines an ary constellation with a nonbinary channel code over a field of order .^{6}^{6}6It is also possible to map sequences of constellation points to one Galois field symbol of appropriate field order. In this case, symbolmetric decoding (SMD) can be employed at the receiver. This is the common approach for nonbinary LDPC and Turbo codes.
Practical receivers resort to “pragmatic” CM schemes with binary channel codes. In such pragmatic schemes, an bit binary labeling is assigned to each of the constellation points (e.g., a binary reflected gray code (BRGC) code (113)) and a bitwise decoding (BMD) metric is used at the decoder. The BMD metric is obtained by marginalizing over all bit levels except the one of interest, which causes a performance loss compared to SMD. This loss is particularly pronounced at low code rates. The best known example of pragmatic CM scheme is bitinterleaved coded modulation (BICM) (114; 115). Binary LDPC and turbo codes are commonly combined with higherorder modulations using BICM. Polar codes achieve superior performance with multilevel coding and multistage decoding (116; 117; 118) due to the improved polarization process.
The use of ASK/QAM constellations with uniformly distributed constellation points incurs a performance degradation for the AWGN channel, which is known as shaping loss. Recently, many research efforts have focused on geometric and probabilistic shaping (GS/PS) approaches to overcome this deficit (119; 120) and close the gap to the Shannon limit. Simulation results (121) show that PS signaling generally performs better than GS for the same constellation size. Additionally, PS allows a fine granularity in SE as it can be tuned by means of a distribution matcher (DM) (122) and does not require different modulation and code rate combinations. To implement PS with coding, probabilistic amplitude shaping (PAS) was proposed (120), which circumvents the drawbacks of previous approaches (e.g., error propagation and the need for iterative demapping as a result of a onetomany mapping). PAS uses a shaping encoder before the encoder (reverse concatenation), and a systematic generator matrix for encoding to maintain the desired distribution. Furthermore, it exploits the symmetry of the capacity achieving distribution of the Gaussian channel.
All coding schemes discussed in the previous sections can be used in a CM scenario with higherorder modulation formats. In Fig. 13 and 14 we compare CM approaches for a target SE of bits per channel use for the case of QAM and a blocklength of , i.e., we have a number of = 32 channel uses. In Fig. 13, we illustrate a performance comparison for the case of uniform signaling.

The NBLDPC code is an ultrasparse code of rate and it is constructed over . It exhibits a gap of about to the RCB at .

The polar code was designed according to (118) for . The list size is and a 8bit CRC is used in the concatenation.

OSD uses a BCH code that is punctured in 60 parity positions and shortened in information bit positions to obtain a code. The OSD parameter is .
In Fig. 14, we use PAS to reduce the shaping loss incurred by uniform signaling and improve the power efficiency. We target a SE of bits per channel use, which is achieved by adjusting the DM rate. We show results for two DM approaches, namely constant composition distribution matching (CCDM) and shell mapping distribution matching (SMDM). CCDM was proposed first in (122) and shown to be the optimal fixedtofixed blocklength distribution matcher (DM) for the normalized informational divergence metric and long output blocklengths. Instead, SMDM has favorable performance for short blocklengths and is the informational divergence optimal DM for finite blocklength. It uses the shell mapping algorithm (123) internally to perform the mapping to powerefficient channel input sequences.

The binary LDPC code is from the 5G standard (94), has rate 3/4 and is derived from BG 1 (cf. Sec. 4.1). We use a random bitmapper for the amplitude bit levels, the uniform sign bits are assigned to the last variable nodes in the graph. At a CER of we see that SMDM is more power efficient than CCDM.
6 Conclusions
In this paper, we reviewed several code constructions tailored to the transmission of short information blocks. The performance of the codes has been compared with tight information theoretic bounds on the error probability achievable by the best codes. Our review illustrates that there is a wide spectrum of solutions for efficient transmission in the shortblocklength regime. To conclude, we provide a brief list of interesting open directions, which were not addressed in this manuscript:

Some of the decoding algorithms described in the previous sections are complete, i.e., the output of the decoder is always a codeword. Incomplete algorithms, such as belief propagation for LDPC codes, may output an erasure, i.e., the iterative decoder may converge to a decision that is not a (valid) codeword but the error is detected. Hence, while for complete decoders all error events are undetected, incomplete decoders provide the additional capability of notifying the receiver when decoding does not succeed. In some applications, it is of paramount importance to deliver very low undetected error rates. This is the case, for instance, for telecommand systems, where wrong command sequences may be harmful. The CCSDS LDPC code of Figure 10 has been designed with this objective in mind, trading part of the coding gain for a strong error detection capability (126). Complete decoders, such as those based on OSD and Viterbi decoding, may be used in such critical applications by adding an error detection mechanism. One possibility is to include an outer error detection code. In the short blocklength regime, the overhead incurred by such solution may be unacceptable. In this context, a more appealing solution is provided by a postdecoding threshold test as proposed in (127). Examples of the application of this approach are given in, e.g., (128; 129; 130).

The development of codes and decoding algorithms that address channels with unknown state such as fading channels with no a priori channelstate information available at the encoder and decoder (see, e.g., (67)) is still an open problem. Here, the decoding task is made complicated by the need of accounting for the uncertainty on the channel coefficients. A naive approach is to introduce sufficiently large pilot fields to allow for an accurate channel estimation step. However, when short blocks are transmitted, the use of large pilot fields leads to considerable overheads, i.e., rate losses. This suggests that in this setting channel decoding and channel estimations should be performed jointly (see, e.g., (131)).

Throughout the paper, we focused exclusively on the analysis of fixedlength coding schemes. In some applications where communication is bidirectional and a feedback link is hence present, it is more natural to consider variablelength coding schemes with decision (ACK/NACK) feedback. Finiteblocklength bounds for this scenarios are available (132; 133), but are not as tight as the corresponding bound for the fixedblocklength case. Also, a more accurate modeling of the ACK/NACK message compared to what is available in the literature may unveil interesting tradeoffs between coding rate and reliability of the feedback information. Indeed, for a fixed frame size, the more channel uses are used for the ACK/NACK message, the less channel uses are available for the coded bits. Code design for this setup have been recently proposed (134; 135; 136). However, the overall code design space is largely unexplored.
References
\biboptionssort&compress \biboptionssquare
References
 (1) C. Shannon, A mathematical theory of communication, Bell System Tech. J. 27 (1948) 379–423, 623–656.
 (2) E. R. Berlekamp, Key papers in the development of coding theory, IEEE Press, 1974.
 (3) P. Elias, Errorfree coding, IRE Trans. Inform. Theory 4 (4) (1954) 29–37.
 (4) R. G. Gallager, Lowdensity paritycheck codes, M.I.T. Press, Cambridge, MA, USA, 1963.
 (5) G. D. Forney, Jr., Concatenated codes, M.I.T. Press, Cambridge, MA, USA, 1966.
 (6) D. J. Costello, Jr., G. D. Forney, Jr., Channel coding: The road to channel capacity, Proc. IEEE 95 (6) (2007) 1150–1177.
 (7) C. Berrou, A. Glavieux, P. Thitimajshima, Near Shannon limit errorcorrecting coding and decoding: Turbocodes, in: Proc. IEEE Int. Conf. Commun. (ICC), Geneva, Switzerland, 1993, pp. 1064–1070.
 (8) D. J. C. MacKay, Good errorcorrecting codes based on very sparse matrices, IEEE Trans. Inf. Theory 45 (2) (1999) 399–431.
 (9) T. Richardson, M. Shokrollahi, R. Urbanke, Design of capacityapproaching irregular lowdensity paritycheck codes, IEEE Trans. Inf. Theory 47 (2) (2001) 619–637.
 (10) T. Richardson, R. Urbanke, The capacity of lowdensity paritycheck codes under messagepassing decoding, IEEE Trans. Inf. Theory 47 (2) (2001) 599–618.
 (11) M. Luby, M. Mitzenmacher, M. A. Shokrollahi, D. A. Spielman, Improved lowdensity paritycheck codes using irregular graphs, IEEE Trans. Inf. Theory 47 (2) (2001) 585–598.
 (12) H. D. Pfister, I. Sason, R. Urbanke, Capacityachieving ensembles for the binary erasure channel with bounded complexity, IEEE Trans. Inf. Theory 51 (7) (2005) 2352–2379.
 (13) H. D. Pfister, I. Sason, Accumulaterepeataccumulate codes: Capacityachieving ensembles of systematic codes for the erasure channel with bounded complexity, IEEE Trans. Inf. Theory 53 (6) (2007) 2088–2115.
 (14) E. Arikan, Channel polarization: A method for constructing capacityachieving codes for symmetric binaryinput memoryless channels, IEEE Trans. Inf. Theory 55 (7) (2009) 3051–3073.
 (15) M. Lentmaier, A. Sridharan, D. Costello, Jr., K. Zigangirov, Iterative decoding threshold analysis for LDPC convolutional codes, IEEE Trans. Inf. Theory 56 (10) (2010) 5274–5289.
 (16) S. Kudekar, T. Richardson, R. Urbanke, Threshold saturation via spatial coupling: Why convolutional LDPC ensembles perform so well over the BEC, IEEE Trans. Inf. Theory 57 (2) (2011) 803 –834.
 (17) T. de Cola, E. Paolini, G. Liva, G. Calzolari, Reliability options for data communications in the future deepspace missions, Proc. IEEE 99 (11) (2011) 2056–2074.
 (18) F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, P. Popovski, Five disruptive technology directions for 5G, IEEE Commun. Mag. 52 (2) (2014) 74–80.
 (19) E. Paolini, C. Stefanovic, G. Liva, P. Popovski, Coded random access: applying codes on graphs to design random access protocols, IEEE Commun. Mag. 53 (6) (2015) 144–150.
 (20) G. Durisi, T. Koch, P. Popovski, Toward massive, ultrareliable, and lowlatency wireless communication with short packets, Proc. IEEE 104 (9) (2016) 1711–1726.
 (21) S. Dolinar, D. Divsalar, F. Pollara, Code performance as a function of block size, TMO progress report 42133, JPL, Pasadena, CA, USA (May 1998).
 (22) T. M. Duman, M. Salehi, New performance bounds for turbo codes, IEEE Trans. Commun. 46 (6) (1998) 717–723.
 (23) I. Sason, S. Shamai, Performance analysis of linear codes under maximumlikelihood decoding: A tutorial, Found. and Trends in Commun. and Inf. Theory 3 (1–2) (2006) 1–222.
 (24) Y. Polyanskiy, V. Poor, S. Verdù, Channel coding rate in the finite blocklength regime, IEEE Trans. Inf. Theory 56 (5) (2010) 2307–235.
 (25) W. Yang, G. Durisi, T. Koch, Y. Polyanskiy, Quasistatic multipleantenna fading channels at finite blocklength, IEEE Trans. Inf. Theory 60 (7) (2014) 4232–4265.
 (26) G. Durisi, T. Koch, J. Östman, Y. Polyanskiy, W. Yang, Shortpacket communications over multipleantenna rayleighfading channels, IEEE Trans. Commun. 64 (2) (2016) 618–629.
 (27) T. Richardson, R. Urbanke, Modern coding theory, Cambridge University Press, 2008.
 (28) S. Ten Brink, Convergence behavior of iteratively decoded parallel concatenated codes, IEEE Trans. Commun. 49 (10) (2001) 1727–1737.
 (29) H. R. Sadjadpour, M. Salehi, N. J. A. Sloane, G. Nebe, Interleaver design for short block length turbo codes, in: Proc. IEEE Int. Conf. on Commun. (ICC), Vol. 2, 2000, pp. 628–632.
 (30) C. Radebaugh, C. Powell, R. Koetter, Wheel codes: Turbolike codes on graphs of small order, in: Proc. IEEE Inf. Theory Workshop (ITW), Paris, France, 2003, pp. 78–81.
 (31) M. Yang, W. E. Ryan, Y. Li, Design of efficiently encodable moderatelength highrate irregular LDPC codes, IEEE Trans. on Commun. 52 (4) (2004) 564–571.
 (32) G. Liva, W. Ryan, Short lowerrorfloor Tanner codes with Hamming nodes, in: Proc. IEEE Milcom, Atlantic City, US, 2005, pp. 208–213.
 (33) D. Divsalar, S. Dolinar, C. Jones, Short protographbased LDPC codes, in: Proc. IEEE Milcom, Orlando, FL, USA, 2007, pp. 1–6.
 (34) G. Liva, W. E. Ryan, M. Chiani, Quasicyclic generalized LDPC codes with low error floors, IEEE Trans. Commun. 56 (1) (2008) 49–57.
 (35) I. E. Bocharova, B. D. Kudryashov, R. V. Satyukov, S. Stiglmayry, Short quasicyclic LDPC codes from convolutional codes, in: IEEE Int. Symp. Inf. Theory (ISIT), 2009, pp. 551–555.
 (36) T.Y. Chen, K. Vakilinia, D. Divsalar, R. D. Wesel, ProtographBased RaptorLike LDPC Codes, IEEE Trans. Commun. 63 (5) (2015) 1522–1532.
 (37) T. Jerkovits, B. Matuz, Turbo code design for short blocks, in: Proc. 7th Adv. Sat. Mobile Sys. Conf. (ASMS), Majorca (Spain), 2016, pp. 1–6.
 (38) M. C. Davey, D. MacKay, Low density paritycheck codes over GF(q), IEEE Commun. Lett. 2 (6) (1998) 165–167.
 (39) J. Berkmann, Iterative Decoding of Nonbinary Codes, Ph.D. dissertation, Tech. Univ. München, Munich, Germany, 2000.
 (40) J. Berkmann, C. Weiss, On dualizing trellisbased APP decoding algorithms, IEEE Trans. Commun. 50 (11) (2002) 1743–1757.
 (41) C. Poulliat, M. Fossorier, D. Declercq, Design of regular LDPC codes over GF(q) using their binary images, IEEE Trans. Commun. 56 (10) (2008) 1626–1635.
 (42) A. Venkiah, D. Declercq, C. Poulliat, Design of cages with a randomized progressive edgegrowth algorithm, IEEE Commun. Lett. 12 (4) (2008) 301–303.
 (43) W. Chen, C. Poulliat, D. Declercq, Structured highgirth nonbinary cycle codes, in: AsiaPacific Conf. Commun. (APCC), Shanghai, China, 2009, pp. 462–466.
 (44) L. Costantini, B. Matuz, G. Liva, E. Paolini, M. Chiani, On the performance of moderatelength nonbinary LDPC codes for space communications, in: Proc. 5th Adv. Sat. Mobile Sys. Conf. (ASMS), Cagliari, Italy, 2010.
 (45) K. Kasai, D. Declercq, C. Poulliat, K. Sakaniwa, Multiplicatively repeated nonbinary LDPC codes, IEEE Trans. Inf. Theory 57 (10) (2011) 6788–6795.
 (46) G. Liva, B. Matuz, E. Paolini, M. Chiani, Short nonbinary IRA codes on largegirth Hamiltonian graphs, in: Proc. IEEE Int. Conf. on Commun. (ICC), Ottawa, Canada, 2012.
 (47) B. Y. Chang, D. Divsalar, L. Dolecek, Nonbinary protographbased LDPC codes for short blocklengths, in: Proc. IEEE Inf. Theory Workshop (ITW), Lausanne, Switzerland, 2012.
 (48) G. Liva, E. Paolini, B. Matuz, S. Scalise, M. Chiani, Short turbo codes over high order fields, IEEE Trans. Commun. 61 (6) (2013) 2201–2211.
 (49) B. Matuz, G. Liva, E. Paolini, M. Chiani, G. Bauch, Lowrate nonbinary LDPC codes for coherent and blockwise noncoherent AWGN channels, IEEE Trans. Commun. 61 (10) (2013) 4096–4107.
 (50) L. Dolecek, D. Divsalar, Y. Sun, B. Amiri, Nonbinary protographbased LDPC codes: Enumerators, analysis, and designs, IEEE Trans. Inf. Theory 60 (7) (2014) 3913–3941.
 (51) M. Fossorier, S. Lin, Softdecision decoding of linear block codes based on ordered statistics, IEEE Trans. Inf. Theory 41 (5) (1995) 1379–1396.
 (52) A. Valembois, M. Fossorier, Box and match techniques applied to softdecision decoding, IEEE Trans. Inf. Theory 50 (5) (2004) 796–810.
 (53) T. Hehn, J. B. Huber, S. Laendner, O. Milenkovic, Multiplebases beliefpropagation for decoding of short block codes, in: Proc. IEEE Int. Symp. Inf. Theory (ISIT), 2007, pp. 311–315.
 (54) T. Hehn, J. B. Huber, O. Milenkovic, S. Laendner, Multiplebases beliefpropagation decoding of highdensity cyclic codes, IEEE Trans. Commun. 58 (1) (2010) 1–8.
 (55) Y. Wu, C. Hadjicostis, Softdecision decoding using ordered recodings on the most reliable basis, IEEE Trans. Inf. Theory 53 (2) (2007) 829–836.
 (56) G. Liva, E. Paolini, M. Chiani, On optimum decoding of certain product codes, IEEE Commun. Lett. 18 (6) (2014) 905–908.
 (57) I. Tal, A. Vardy, List decoding of polar codes, IEEE Trans. Inf. Theory 61 (5) (2015) 2213–2226.
 (58) W. E. Ryan, S. Lin, Channel Codes – Classical and Modern, Cambridge University Press, 2009.
 (59) R. Gallager, A simple derivation of the coding theorem and some applications, IEEE Trans. Inf. Theory 11 (1) (1965) 3–18.
 (60) C. Shannon, Probability of error for optimal codes in a Gaussian channel, Bell System Tech. J. 38 (1959) 611–656.
 (61) C. E. Shannon, R. G. Gallager, E. R. Berlekamp, Lower bounds to error probability for coding on discrete memoryless channels—Part I, Inf. Contr. 10 (1967) 65–103.
 (62) A. Valembois, M. Fossorier, SpherePacking Bounds Revisited for Moderate Block Lengths, IEEE Trans. Inf. Theory 50 (12) (2004) 2998 – 3014.
 (63) G. VazquezVilar, A. T. Campo, A. Guillén i Fàbregas, A. Martinez, Bayesian Mary hypothesis testing: The metaconverse and VerdúHan bounds are tight, IEEE Trans. Inf. Theory 62 (5) (2016) 2324 – 2333.
 (64) G. VazquezVilar, A. Guillén i Fàbregas, T. Koch, A. Lancho, Saddlepoint approximation of the error probability of binary hypothesis testing, in: Proc. IEEE Int. Symp. Inf. Theory (ISIT), Veil, CO, USA, 2018, pp. 2306–2310.
 (65) A. Martinez, A. Guillén i Fàbregas, Saddlepoint approximation of random–coding bounds, in: Proc. Inf. Theory Applicat. Workshop (ITA), San Diego, CA, U.S.A., 2011, pp. 1–6.
 (66) J. Scarlett, A. Martinez, A. Guillén i Fàbregas, Mismatched decoding: Error exponents, secondorder rates and saddlepoint approximations, IEEE Trans. Inf. Theory 60 (5) (2014) 2647–2666.
 (67) J. Östman, G. Durisi, E. Ström, M. C. Coşkun, G. Liva, Short Packets Over BlockMemoryless Fading Channels: PilotAssisted or Noncoherent Transmission?, IEEE Trans. Commun.To appear.
 (68) J. FontSegura, G. VazquezVilar, A. Martinez, A. Guillén i Fàbregas, A. Lancho, Saddlepoint approximations of lower and upper bounds to the error probability in channel coding, in: Proc. Conf. Inf. Sci. Sys. (CISS), Princeton, NJ, 2018.
 (69) S. Lin, D. Costello, Jr., Error control coding, Prentice Hall, Englewood Cliffs, NJ, USA, 2004, second edition.
 (70) M. P. C. Fossorier, Iterative reliabilitybased decoding of lowdensity parity check codes, IEEE J. Sel. Areas Commun. 19 (5) (2001) 908–917.
 (71) M. Baldi, F. Chiaraluce, N. Maturo, G. Liva, E. Paolini, A hybrid decoding scheme for short nonbinary LDPC codes, IEEE Commun. Lett. 18 (12) (2014) 2093–2096.
 (72) S. Lin, T. Kasami, T. Fujiwara, M. Fossorier, Trellises and trellisbased decoding algorithms for linear block codes, Springer Science & Business Media, 1998.
 (73) A. R. Calderbank, G. D. Forney, A. Vardy, Minimal tailbiting trellises: The Golay code and more, IEEE Trans. Inf. Theory 45 (5) (1999) 1435–1455.
 (74) P. Stahl, J. B. Anderson, R. Johannesson, Optimal and nearoptimal encoders for short and moderatelength tailbiting trellises, IEEE Trans. Inf. Theory 45 (7) (1999) 2562–2571.
 (75) I. E. Bocharova, R. Johannesson, B. D. Kudryashov, P. Stahl, Tailbiting codes: Bounds and search results, IEEE Trans. Inf. Theory 48 (1) (2002) 137–148.
 (76) R. Johannesson, K. S. Zigangirov, Fundamentals of Convolutional Coding (Second Edition), John Wiley & Sons, 2015.
 (77) R. Y. Shao, S. Lin, M. P. C. Fossorier, Two decoding algorithms for tailbiting codes, IEEE Trans. Commun. 51 (10) (2003) 1658–1665.
 (78) I. E. Bocharova, M. Handlery, R. Johannesson, B. D. Kudryashov, A BEAST for prowling in trees, IEEE Trans. Inf. Theory 50 (6) (2004) 1295–1302.
 (79) N. Seshadri, C. E. W. Sundberg, List viterbi decoding algorithms with applications, IEEE Trans. Commun. 42 (234) (1994) 313–323.
 (80) C. Lou, B. Daneshrad, R. D. Wesel, Convolutionalcodespecific crc code design, IEEE Trans. Commun. 63 (10) (2015) 3459–3470.
 (81) S. Crozier, P. Guinand, Highperformance lowmemory interleaver banks for turbocodes, in: IEEE VTC, Vol. 4, 2001, pp. 2394–2398 vol.4.
 (82) W. Feng, J. Yuan, B. S. Vucetic, A codematched interleaver design for turbo codes, IEEE Trans. Commun. 50 (6) (2002) 926–937.
 (83) R. GarzónBohórquez, C. A. Nour, C. Douillard, Protographbased interleavers for punctured turbo codes, IEEE Trans. Commun. 66 (5) (2018) 1833–1844.
 (84) R. Tanner, A recursive approach to low complexity codes, IEEE Trans. Inf. Theory 27 (1981) 533–547.
 (85) X. Hu, E. Eleftheriou, D. Arnold, Regular and irregular progressive edgegrowth tanner graphs, IEEE Trans. Inf. Theory 51 (1) (2005) 386–398.
 (86) J. Thorpe, Lowdensity paritycheck (LDPC) codes constructed from protographs, IPN progress report 42154, JPL (Aug. 2003).
 (87) T. J. Richardson, R. L. Urbanke, Multiedge type LDPC codes, Workshop Honoring Prof. Bob McEliece on His 60th Birthday (2002).
 (88) M. M. Mansour, N. R. Shanbhag, Highthroughput LDPC decoders, IEEE Trans. VLSI Syst. 11 (6) (2003) 976–996.
 (89) D. Divsalar, S. Dolinar, C. R. Jones, K. Andrews, Capacityapproaching protograph codes, IEEE J. Sel. Areas Commun. 27 (6) (2009) 876–888.
 (90) A. Abbasfar, D. Divsalar, K. Yao, Accumulaterepeataccumulate codes, IEEE Trans. Commun. 55 (4) (2007) 692–702.
 (91) T. Y. Chen, K. Vakilinia, D. Divsalar, R. D. Wesel, ProtographBased RaptorLike LDPC Codes, IEEE Trans. Commun. 63 (5) (2015) 1522–1532.
 (92) M. Luby, LT codes, in: Proc. of the 43rd IEEE Symp. Foundations of Computer Science, Vancouver, Canada, 2002, pp. 271–282.
 (93) M. Shokrollahi, Raptor codes, IEEE Trans. Inf. Theory 52 (6) (2006) 2551–2567.
 (94) T. Richardson, S. Kudekar, Design of lowdensity parity check codes for 5G new radio, IEEE Commun. Mag. 56 (3) (2018) 28–34.
 (95) M. Ostojic, Multitree Search Decoding of Linear Codes, Ph.D. dissertation, ETH, Zurich, Switzerland, 2010.
 (96) R. Klaimi, C. A. Nour, C. Douillard, J. Farah, Lowcomplexity decoders for nonbinary turbo codes, in: 10th International Symposium on Turbo Codes & Iterative Information Processing (ISTC 2018), Hong Kong, 2018.
 (97) G. Liva, E. Paolini, T. D. Cola, M. Chiani, Codes on highorder fields for the CCSDS next generation uplink, in: Proc. 6th Adv. Sat. Mobile Sys. Conf. (ASMS) and 12th Signal Process. for Space Commun. Workshop (SPSC), 2012, pp. 44–48.
 (98) D. Declercq, M. Fossorier, Decoding algorithms for nonbinary LDPC codes over GF, IEEE Trans. Commun. 55 (4) (2007) 633–643.
 (99) Next Generation Uplink, Tech. Rep. 230.2G1, Consultative Committee for Space Data Systems (CCSDS), Washington, DC, USA (Jul. 2014).
 (100) Short block length LDPC codes for TC synchronization and channel coding, Tech. Rep. 231.1O1, Consultative Committee for Space Data Systems (CCSDS), Washington, DC, USA (Apr. 2015).
 (101) N. Stolte, Rekursive Codes mit der PlotkinKonstruktion und ihre Decodierung, Ph.D. dissertation, TU Darmstadt, Darmstadt, Germany, 2002.
 (102) E. Arikan, Channel polarization: A method for constructing capacityachieving codes for symmetric binaryinput memoryless channels, IEEE Trans. Inf. Theory 55 (7) (2009) 3051–3073.
 (103) R. Mori, T. Tanaka, Performance and construction of polar codes on symmetric binaryinput memoryless channels, in: Proc. IEEE Int. Symp. Inf. Theory (ISIT), 2009, pp. 1496–1500.
 (104) I. Tal, A. Vardy, How to construct polar codes, IEEE Trans. Inf. Theory 59 (10) (2013) 6562–6582.
 (105) P. Trifonov, Efficient design and decoding of polar codes, IEEE Trans. Commun. 60 (11) (2012) 3221–3227.
 (106) M. Mondelli, S. H. Hassani, R. Urbanke, Construction of polar codes with sublinear complexity, in: Proc. IEEE Int. Symp. Inf. Theory (ISIT), 2017, pp. 1853–1857.
 (107) G. He, J. Belfiore, I. Land, G. Yang, X. Liu, Y. Chen, R. Li, J. Wang, Y. Ge, R. Zhang, W. Tong, Betaexpansion: A theoretical framework for fast and recursive construction of polar codes, in: Proc. IEEE Global Telecommun. Conf., 2017, pp. 1–6.
 (108) V. Bioglio, C. Condo, I. Land, Design of polar codes in 5G new radio, arXiv:1804.04389.
 (109) M. Baldi, M. Bianchi, F. Chiaraluce, R. Garello, I. Sanchez, S. Cioni, Advanced channel coding for space mission telecommand links, in: IEEE VTC Fall, Las Vegas, NV, USA, 2013, pp. 1–5.
 (110) P. Stahl, J. B. Anderson, R. Johannesson, Optimal and nearoptimal encoders for short and moderatelength tailbiting trellises, IEEE Trans. Inf. Theory 45 (7) (1999) 2562–2571.
 (111) L. Gaudio, T. Ninacs, T. Jerkovits, G. Liva, On the performance of short tailbiting convolutional codes for ultrareliable communications, in: Proc. 11th Int. ITG Conf. Systems, Commun. and Coding, Hamburg, Germany, 2017.
 (112) G. Ungerboeck, Channel coding with multilevel/phase signals, IEEE Trans. Inf. Theory 28 (1) (1982) 55–67.
 (113) F. Gray, Pulse code communication, Tech. Rep. 2632058, U.S. Patent (1953).
 (114) E. Zehavi, 8PSK trellis codes for a Rayleigh channel, IEEE Trans. Commun. 40 (5) (1992) 873–884.
 (115) A. Guillén i Fàbregas, A. Martinez, G. Caire, BitInterleaved Coded Modulation, Foundations and Trends® in Communications and Information Theory 5 (1–2) (2008) 1–153.
 (116) H. Imai, S. Hirakawa, A new multilevel coding method using errorcorrecting codes, IEEE Trans. Inf. Theory 23 (3) (1977) 371–377.
 (117) M. Seidl, A. Schenk, C. Stierstorfer, J. B. Huber, PolarCoded Modulation, IEEE Trans. Commun. 61 (10) (2013) 4108–4119.
 (118) G. Böcherer, T. Prinz, P. Yuan, F. Steiner, Efficient polar code construction for higherorder modulation, in: Proc. IEEE Wireless Commun. and Netw. Conf. (WCNC), San Francisco, USA, 2017.
 (119) F.W. Sun, H. C. A. van Tilborg, Approaching capacity by equiprobable signaling on the Gaussian channel, IEEE Trans. Inf. Theory 39 (5) (1993) 1714–1716.
 (120) G. Böcherer, F. Steiner, P. Schulte, Bandwidth efficient and ratematched lowdensity paritycheck coded modulation, IEEE Trans. Commun. 63 (12) (2015) 4651–4665.
 (121) F. Steiner, G. Böcherer, Comparison of geometric and probabilistic shaping with application to ATSC 3.0, in: Proc. ITG Int. Conf. Syst., Commun. and Coding (SCC), Hamburg, Germany, 2017.
 (122) P. Schulte, G. Böcherer, Constant Composition Distribution Matching, IEEE Trans. Inf. Theory 62 (1) (2016) 430–434.
 (123) R. Laroia, N. Farvardin, S. A. Tretter, On optimal shaping of multidimensional constellations, IEEE Trans. Inf. Theory 40 (4) (1994) 1044–1056.
 (124) F. Steiner, G. Liva, G. Böcherer, Ultrasparse nonbinary LDPC codes for probabilistic amplitude shaping, in: Proc. IEEE Global Telecommun. Conf., 2017, pp. 1–5.
 (125) T. Prinz, P. Yuan, G. Böcherer, F. Steiner, O. Iscan, R. Böhnke, W. Xu, Polar coded probabilistic amplitude shaping for short packets, in: Proc. IEEE Int. Workshop on Signal Process. Adv. in Wireless Commun. (SPAWC), 2017.
 (126) S. Dolinar, K. Andrews, F. Pollara, D. Divsalar, Bounded angle iterative decoding of LDPC codes, in: Proc. 2008 IEEE Milcom, 2008, pp. 1–6.
 (127) G. D. Forney, Jr., Exponential error bounds for erasure, list, and decision feedback schemes, IEEE Trans. Inf. Theory 14 (2) (1968) 206–220.
 (128) E. Hof, I. Sason, S. Shamai, On optimal erasure and list decoding schemes of convolutional codes, in: Proc. Tenth Int. Symp. Commun. Theory and Applications (ISCTA), 2009, pp. 6–10.
 (129) E. Hof, I. Sason, S. Shamai, Performance bounds for erasure, list, and decision feedback schemes with linear block codes, IEEE Trans. Inf. Theory 56 (8) (2010) 3754–3778.
 (130) A. R. Williamson, M. J. Marshall, R. D. Wesel, Reliabilityoutput decoding of tailbiting convolutional codes, IEEE Trans. Commun. 62 (6) (2014) 1768–1778.
 (131) M. C. Coşkun, G. Liva, J. Östman, G. Durisi, Lowcomplexity joint channel estimation and list decoding of short codes, in: Proc. 12th ITG Int. Conf. Syst., Commun. and Coding (SCC), Rostock, Germany, to appear.
 (132) Y. Polyanskiy, H. V. Poor, S. Verdú, Feedback in the nonasymptotic regime, IEEE Trans. Inf. Theory 57 (8) (2011) 4903–4925.
 (133) J. Östman, G. C. Ferrante, R. Devassy, G. Durisi, Lowlatency shortpacket transmissions: Fixed length or HARQ?, in: Proc. IEEE Global Telecommun. Conf. (GLOBECOM), Abu Dhabi, UAE, 2018.
 (134) A. R. Williamson, T.Y. Chen, R. D. Wesel, Variablelength convolutional coding for short blocklengths with decision feedback, IEEE Trans. Commun. 63 (7) (2015) 2389–2403.
 (135) H. Wang, N. Wong, A. M. Baldauf, C. K. Bachelor, S. V. S. Ranganathan, D. Divsalar, R. D. Wesel, An information density approach to analyzing and optimizing incremental redundancy with feedback, in: Proc. IEEE Int. Symp. Inf. Theory (ISIT), Aachen, Germany, 2017.
 (136) H. Yang, E. Liang, R. D. Wesel, Joint Design of Convolutional Code and CRC under Serial List Viterbi Decoding, arXiv:1811.11932.