Parity-Check Polar Coding for 5G and Beyond
In this paper, we propose a comprehensive Polar coding solution that integrates reliability calculation, rate matching and parity-check coding. Judging a channel coding design from the industry’s viewpoint, there are two primary concerns: (i) low-complexity implementation in application-specific integrated circuit (ASIC), and (ii) superior & stable performance under a wide range of code lengths and rates. The former provides cost- & power-efficiency which are vital to any commercial system; the latter ensures flexible and robust services. Our design respects both criteria. It demonstrates better performance than existing schemes in literature, but requires only a fraction of implementation cost. With easily-reproducible code construction for arbitrary code rates and lengths, we are able to report “1-bit” fine-granularity simulation results for thousands of cases. The released results can serve as a baseline for future optimization of Polar codes.111The work was first disclosed in 2016 as a technical contribution  and accepted by IEEE ICC 2018. Part of the proposed design has been adopted by 3GPP as the Polar coding standards for 5G .
I-a Background and related works
Answering the question of “what will 5G be?” , the result is clear at least for channel coding. For the enhanced Mobile Broadband (eMBB) service category in 5G, LDPC codes and Polar codes [2, 3] have been adopted for data channel and control channel, respectively. With state-of-the-art code construction techniques [5, 4] and list decoding algorithm , Polar codes demonstrate competitive performance under short information block length (K1000), whereas the block error rate (BLER) gain over LDPC and Turbo codes is up to 1dB. Such advantages make Polar codes the most suitable candidate for the control channel, where the payload size is relatively small.
Polar code construction refers to determining the sets of information/frozen bits given certain information block length and code length . According to [2, 3], the most reliable synthesized sub-channels should be selected as information set to obtain the best performance under successive cancellation (SC) decoding. Gaussian approximation (GA)  is an efficient way to compute the “reliability” under AWGN channel.
While the performance of an SC decoder is worse than LDPC and Turbo, CRC-aided Polar (CA-Polar) codes  demonstrate significantly better performance under successive cancellation list (SCL) decoding . The reason lies in that the native code distance of Polar codes is relatively poor compared to Reed-Muller codes and many other modern codes. Without CRC bits, an SCL decoder relies solely on path metrics to select from the surviving paths. Thus, codes with poor distance spectrum cannot perform well. In contrast, CA-Polar relies on both path metric and CRC bits to pick the final path, therefore does not suffer from the performance bottleneck incurred by poor code distance.
Although SCL significantly improves the performance of Polar codes, the optimal code construction under list decoding remains an open problem. Beyond CA-Polar, several attempts [7, 8] have been made to design better Polar codes for SCL decoder. A more general form of outer codes, coined as parity-check coding, was introduced to provide additional performance gain as well as flexibility. Polar subcodes  allow some “dynamic” frozen bits to be information-bits-dependent. Extended BCH codewords were leveraged to establish parity-check functions such that the constructed codes has guaranteed minimum distance, which is always better than the original Polar codes with the same code length and code rate. Later, a heuristic parity-check construction was introduced in , which also shows evident performance gain over CA-Polar codes. These methods opened a door for better Polar construction with parity check bits.
I-B Motivation and our contributions
Despite the rich literature on Polar code construction, we found that none of them can be directly applied to a commercial network such as 5G. The reasons are below:
Implementation complexity: existing code construction schemes, including rate matching [13, 14] and parity-check coding [7, 8], rely heavily on density evolution (DE) (or its simplification GA ) to acquire sub-channel reliability. These operations (e.g., float-point computations of and sorting) are suitable for software simulations but are not hardware-friendly. They either incur large encoding/decoding latency if calculated online, or occupy much memory if calculated offline and pre-stored in ASIC.
Incomplete solution: existing parity-check coding schemes are not co-designed with a practical rate-matching scheme. The construction in  is based on -length eBCH codewords, and the corresponding generalization to arbitrary code lengths is unknown. The heuristic method in  recursively establishes parity-check functions based on GA-acquired reliability. Similarly, a rate-compatible design is not available.
Lack of fine-granularity evaluation: existing works [7, 8, 12, 13, 14] often draw conclusions from a few special cases (e.g., ). We find it quite common that a scheme that excels in certain cases may perform poorly in other cases, thus their conclusions may not hold for the general cases. To fully evaluate a scheme before large-scale implementation, fine-granularity simulations covering various code lengths and rates are necessary.
To address the above issues, we propose a PC-Polar design that integrates deterministic reliability ordering and rate matching schemes. Based on distance spectrum analysis and error propagation patterns, we propose to select PC bits from sub-channels of low row weights, and establish PC functions through a fixed-length cyclic shift register. The entire solution is hardware-friendly to facilitate ASIC implementation. To our best knowledge, such a comprehensive yet low-complexity solution for Polar construction has not been elaborated in literature. Moreover, we provide fine-granularity simulation results to demonstrate stable & better performance than existing schemes under thousands of cases. Given the construction details, our design should be reproducible for arbitrary code lengths and rates. Therefore, we hope it serve as a baseline for further optimizations of Polar codes.
Ii Polar codes
A binary Polar code of mother code length can be defined by , where and are message and codeword vectors, respectively, and is the generator matrix. To construct a Polar code, is obtained by taking rows with indices from the matrix , where is the information sub-channel indices, is the kernel and denotes Kronecker power.
Ii-a Reliability ordering
One key step of Polar code construction is determining the information set . According to Arikan , the reliability metric is channel dependent. Applying this principle, density evolution (DE) (or its simplification Gaussian approximation (GA) ) calculates the reliability of each synthesized sub-channel based on channel state information (CSI), which can be signal-to-noise-ratio (SNR) or erasure probability. The most reliable sub-channels are selected as . In the absence of assistant bits such as CRC or PC bits, the rest sub-channels are selected as the frozen set, denoted by .
Regarding ASIC implementation, the channel-dependent GA/DE method is infeasible due to (i) float-point computations of complicated functions such as , and sorting, and (ii) imperfect CSI estimation.
Alternatively, we propose a channel-independent Polarization Weight (PW) method as follows. Given a sub-channel index and its binary expansion , its PW value is defined as
where is empirically chosen to be . A higher PW value indicates a higher reliability.
A reliability ordered sequence is obtained offline through Algorithm 1, and pre-stored in ASIC such that no on-the-fly calculation is required.
Remark: Although sub-channel reliability is channel-dependent, their relative ordering is almost channel-independent under a practical working point (e.g., BLER within ). The simple and closed-form PW formula in (1) well approximates this ordering by capturing the recursive polarization process of Polar codes. It generates an information set very similar to that generated by GA/DE methods, but requires only a fraction of implementation cost.
Ii-B Rate matching
Rate matching bears much practical importance because, in a commercial system, the allocated channel resource may not have exactly bits. To support an arbitrary code length of , puncturing [12, 13] and shortening  are performed. A well-designed rate matching scheme should bring minimum performance loss with respect to its mother code of length .
For puncturing, bits are not transmitted and deemed unknown at the decoder, whereas the log-likelihood ratio (LLR) input of the corresponding punctured position is set to zeros. For shortening, bits are not transmitted and deemed known at the decoder, whereas the LLR input of the corresponding shortened position is set to infinite large (see [12, 13, 14] for details).
Quasi-uniform-puncturing (QUP)  sequentially punctures the first coded bits, i.e., from the mother codeword , and re-calculates the reliability of all sub-channels using GA. Since the selection of information set fully adapts to the punctured pattern via GA, the method yields good and stable performance under a wide range of code lengths and code rates. The Wang-Liu shortening  method defines a set of valid shortening patterns based on the Polar kernel, and yields superior performance at higher coding rates. However, both schemes [13, 14] inherit the same implementation issues from GA, that is, online reliability re-calculations and imperfect CSI estimation.
Similar to [13, 14], other existing rate matching schemes rely heavily on re-calculations of sub-channel reliability via GA/DE, since their reliability ordering changes greatly over different punctured/shortened patterns. To implement such schemes, one has to either perform online GA/DE, or pre-store all the -length reliability ordered sequences for each code length . Unfortunately, neither is feasible for ASIC implementation due to complexity/latency and memory constraints.
Our scheme takes the opposite way, i.e., defining a rate matching sequence that, no matter how many bits are punctured/shortened, the pre-defined reliability order (e.g., PW order) is maximally preserved. In this way, only one reliability ordered sequence and another rate matching sequence are required, both of which are of length . Furthermore, no online calculation is required. Since the reliability ordered sequence becomes rate-matching independent, inevitable performance loss is incurred. However, the tradeoff is worthwhile given the significant complexity reduction.
The proposed rate matching scheme is described below.
Select most reliable sub-channels as according to PW, while skipping the indices in .
As mentioned, the rate matching scheme only requires to pre-store (in addition to ), thus is hardware friendly. In fact, even can be online generated with simple procedures: switch between big endian and little endian while reading , which requires almost no computation overhead.
Iii Parity-check coding
As mentioned in Section I, CA-Polar improves the performance under list decoding with better distance spectrum. But it has two major limitations. First, CRC bits are essentially independent from the Polar kernel, thus leaves no room for joint optimization. Second, they are appended at the end, thus cannot assist decoding during intermediate decoding stages.
Parity-check bits have the advantage of improving path selection during intermediate decoding stages. Existing parity-check designs are Polar-specific by considering either the Polar kernel , or its SC decoding process . However, they require high complexity to construct and store the PC functions. Specifically,  requires to perform Gaussian elimination on the parity-check matrix, which has complexity, and  requires a recursive algorithm to establish the PC functions. These operations cannot be pipelined for hardware acceleration. Moreover, the PC functions are irregular and do not support compact representation with a few parameters. To implement, a set of bit positions have to be pre-stored to specify each PC function. For example, if a PC function is , then the indices are stored, which incurs excessive memory cost especially when the number of PC bits and functions is large.
We address the above problems with a complete solution that integrates our reliability metric in Section II-A and rate matching scheme in Section II-B. Our solution is guided by Polar-specific distance spectrum analysis and observations from bit error propagation patterns. The constructed PC functions require only one parameter to represent, and very simple hardware to implement.
Iii-a PC bit positions
Iii-A1 Distance spectrum analysis
A distance spectrum analysis of Polar codes can help to select PC bit positions. In an SCL decoder, a path is defined by a binary vector . At the -th decoding stage, what an SC decoder actually does is deciding whether the received vector is more likely to be from the subset of codewords with , or the subset of codewords with .
The former subset is called a “zero” coset and the latter subset is called a “one” coset, respectively defined as
where is the -th row of , and denotes all codewords corresponding to path and .
For example, the “zero” coset and the “one” coset with the same prefix of path has difference only at . The distance spectrum between these two cosets is denoted by , where
where denotes all codewords corresponding to path with all “0” decoded bits except “1” for , and is the weight (number of non-zero elements) of . By the definition of , it is straightforward to see that the minimum distance between the two cosets is
The concept of cosets naturally extends to an SCL decoder. It is observed that the path metric is closely related to the minimum distance and distance spectrum. To avoid discarding the true path at the -th stage, the path metrics of incorrect paths should receive more penalty than the true path. This can be achieved by letting the cosets induced by different paths to be “as far as possible” so that the true path is “as distinguishable as possible”, especially for paths with differences over only a few bits. In an SCL decoder, “a larger distance” between cosets means “a larger penalty” on the path metric.
If the -th bit does not involve in any PC function, then the minimum distance between cosets are incurred by the bit positions with minimum row weight (i.e., ) among the unfrozen bits. By selecting these bit positions as PC bits and setting their values using linear combinations of preceding information bits, the path metrics of different paths can be made “more distinguishable” and the SCL decoding performance can be improved.
Iii-A2 Tradeoff between reliability and code distance
As explained, the PC positions should be selected from the unfrozen sub-channel indices with minimum or lower row-weights. However, the number of low-weight positions may be quite large depending on . It is obviously unwise to select all of them as PC bits. Consider the extreme case where all the low weight positions are selected as frozen bits (can be viewed as a PC bit with PC function ), the remaining information set would be those with the highest row weights and the resulting code construction becomes similar to Reed-Meed codes. Although the distance spectrum of Reed-Muller codes is far better than Polar codes, its BLER performance under SC decoding is poor.
An SCL decoder with practical list sizes (e.g., ) lies somewhere between an SC decoder and a Maximum Likelihood (ML) decoder. As a result, a good PC-Polar construction should respect both reliability and code distance. In the context of PC-Polar, the corresponding design principle is to pre-select just enough PC bits from the most reliable bit positions (those otherwise would be selected as information set ), such that the reliability of the remaining information sub-channels are not sacrificed too much. Note that the unreliable bit positions (those otherwise would be selected as frozen set ) can be subsequently selected as additional PC bits, which will not sacrifice the reliability of .
To summarize, the design principles are:
Select the bit positions with minimum row weights among the non-frozen bit set as PC bits.
Pre-select a proper number of PC bits from the reliable bit positions.
In practice, easy-to-implement rules must be defined to determine the order for pre-selecting the PC bits. Since the PC functions must be forward-only to be consistent with any SC-based decoder, the last sub-channel index in a PC function always becomes a PC bit. To let the PC functions cover as many information bits as possible. An intuitive way is to select PC bits by descending reliability order222Since information set is also selected by descending reliability order, the same hardware module can be reused for pre-selecting PC bits., such that if an incorrect path passes the parity check, a larger penalty is imposed on its path metric. Specifically, we adopt the following steps:
Select PC bits from the unfrozen bit positions with the least row weight () by descending reliability order.
If there is insufficient unfrozen bit positions with row weight , continue to select those with row weight by descending reliability order.
Iii-B PC functions
As discussed, the PC bit values should be set to a linear combination of some preceding information bits, such that code distance spectrum is improved.
Take for example, if is selected as a PC bit, a good PC function would be . Their corresponding row vectors are
Observe that . If was an information bit and was a frozen bit, the minimum code weight would be at most 4, corresponding to as the lowest-weight non-zero codeword. Now that we change into a PC bit, and impose as a PC function, the combined codeword becomes
which has a higher weight of 6.
For longer codes, it becomes non-trivial to find all the PC functions that improves the minimum code distance. Even if such a method exists, the construction complexity may not be affordable in ASIC. Therefore, we resort to a hardware-friendly way to establish effective PC functions.
From the decoding perspective, is an effective PC function since it includes sub-channels with relatively independent bit errors. For example, if the -th and -th sub-channels belong to the same PC function, and a bit error in the -th sub-channel leads to another bit error in the -th sub-channel, this error pattern would not be detected by a PC bit. Although bit error propagation is inevitable with SC-based Polar decoding, we should exploit its bit error patterns to mitigate its adversary effect on PC functions.
By Monte-Carlo simulation of a length-16 Polar block, we found that among the possible error patterns, only 16 of them are dominant and take up around of the total error events. Besides the single error pattern , the frequent error propagation patterns are
Observe that the most frequent error patterns are between every 1, 2, 4, 8 bit positions. This is due to the power-of-2 recursive structure in Polar kernel. Intuitively, we should avoid setting up PC functions over bit positions with power-of-2 spacings. In contrast, we found that bit errors propagate less frequently between every 5 bit positions.
An effective yet implementable way is to set up PC functions over bit positions with fixed -sized spacing, where can be set to 5 for all cases. It can be easily implemented by a -length cyclic shift register (CSR). The PC pre-coding function, denoted by , is described by Algorithm 3.
A PC decoder reuses the same algorithm, in which is the decoded value of an information bit, and the expected PC bit value is the first register state for . All paths with an unexpected PC bit value are pruned.
The equivalent CSR operation is shown in Figure 1. It has the following advantages:
The PC function has only one parameter . No need to feed the constructor with every individual PC function.
The complexity does not grow with the number of PC bits or PC functions. All of them can be implemented by a single set of CSR.
The encoder and decoder can share the same CSR to further save chip area.
Note that more sophisticated multiple feedback CSR can also be adopted, which is defined by a polynomial. However, the implementation in Figure 1 with is the simplest while preserves the best performance.
Iii-C Code construction algorithm
A full code construction flow is depicted in Figure 2, in which the PC pre-coding module is described in Algorithm 3 and the information/frozen/PC set generation module is detailed in Algorithm 4. The rate matching pattern is obtained according to Algorithm 2.
Some clarifications to Algorithm 4 are as follows. Step 13 can be performed once offline for faster construction, and the parameter tuple can be pre-stored. There are two types of PC bits, i.e., the “reliable” PC bits pre-selected in Step 3 and the “unreliable” PC bits333The PC bits before the first information bit are equivalent to frozen bits. additionally selected in Step 5. The rough number of pre-selected PC bits is determined based on our observation that codes with rate near require more PC bits than higher and lower rates. In addition, is upper bounded by . The coefficient is used to control the number of pre-selected PC bits. The larger is, the more PC bits are pre-selected444Note that other ways to control the number of pre-selected PC bits are allowed as long as they produce good performance.. Typically, a smaller can be used for an SCL decoder with smaller list sizes, and a larger can be used for an SCL decoder with larger list sizes and better performance at higher SNR region. To facilitate reproducible research, we set in all our simulations for a balanced performance under an SCL decoder with a practical list size .
Iv Simulation results
To validate the proposed PC-Polar design, we not only compare with existing Polar coding schemes, but also provide “1-bit” fine-granularity simulation results covering a wide range of code lengths and rates. A parity-check (PC) SCL decoder is used for PC-Polar codes. It is similar to an SCL decoder except that it only keeps paths that satisfy PC functions during intermediate decoding stages. The CRC polynomials we use for CA-Polar are (CRC8) and (CRC16).
Iv-a PC bit gain
Iv-A1 Comparison with CA-Polar
In Figure 3, we compare PC-Polar with CA-Polar under various mother code lengths. The reliability ordering for PC-Polar is obtained by the hardware-friendly PW method, and that of CA-Polar is obtained by the computation-intensive GA method. The comparison is actually unfair for PC-Polar in terms of performance, since GA is more precise while PW is only an approximation. Nevertheless, we observe that PC-Polar still outperforms CA-Polar at all cases. This is due to both the sufficient gain from PC bits and negligible loss from the PW method.
It is also observed that, as the number of CRC bits increases, the performance of CA-Polar fails to improve after the CRC length reaches 8. The best performance achieved by that of CA-Polar is still worse than that of PC-Polar.
In Figure 4, we simulate the cases with non-mother code lengths, where the rate matching methods are BRS for PC-Polar and QUP for CA-Polar, respectively. Again, the construction complexity for former is much lower than the latter, since reliability re-ordering with respect to different rate-matching patterns is not allowed in the BRS method. Similarly, stable PC bit gain is observed. The overall gain is up to 0.8dB compared with CA-Polar (CRC16) and 0.3dB compared with CA-Polar (CRC8).
Iv-A2 Comparison of different parity-check schemes
We further compare the proposed PC-Polar scheme with existing parity-check schemes such as eBCH-Polar  and PCC . Since both [7, 8] only provided construction procedures and simulation results under mother code lengths, and the associated construction parameters (e.g., the design distance for eBCH-Polar and the number of check bits for PCC) are available only for a few cases, our comparison focuses on these reproducible cases. CA-Polar with 8-bit and 16-bit CRC is also simulated for reference. For CA-Polar, eBCH-Polar and PCC, the GA method is applied to obtain a more precise reliability ordering; for PC-Polar the low-complexity PW method is applied.
As shown in Figures 5 and 6, all the parity-check based schemes (except for PCC under two cases) have better performance than CA-Polar (CRC16) at the working point of interest, i.e., , which confirms the results reported in [7, 8]. In particular, we found that eBCH-Polar exhibits more stable performance than PCC due to the minimum-distance-guaranteed construction algorithm. PCC also has good performance under most cases, especially at low SNR region.
Among these schemes, PC-Polar demonstrates the best performance in all cases. The gain over CA-Polar with 16-bit and 8-bit CRC is 0.8dB and 0.3dB, respectively. The gain over eBCH-Polar and PCC varies over different cases. In certain cases, PC-Polar has slightly better performance than eBCH-Polar and PCC; while in a few cases, the gain of PC-Polar can reach 0.5dB.
Iv-B Fine-granularity simulations
As observed in Figure 5 and 6, a scheme with excellent performance in one case may have worse performance in other cases. In order to draw more solid conclusion based on more simulation cases, fine-granularity simulation is necessary in the evaluation of channel coding schemes.
Therefore, we conduct “1-bit” granularity () to cover a wide range of mother and non-mother code lengths, and typical code rates that are used in control and data channels..
In Figure 7, we report the required SNR to achieve for PC-Polar and CA-Polar in over 4700 cases. The gain ranges from 0.2dB to 1dB. Similar to previous experiments, GA/QUP are applied in CA-Polar and PW/BRS are applied in PC-Polar. For CA-Polar, 8-bit CRC instead of 16-bit CRC is adopted for better performance. Even though, these extensive results clearly show that PC-Polar outperforms CA-Polar in almost all cases. The results demonstrate that PC-Polar has stable & better performance than CA-Polar in terms of error correction capability.
In this work, we propose a novel Polar construction with superior & stable error correction performance under a wide range of code rates and lengths. As a full solution that integrates hardware-friendly reliability ordering, rate matching and parity-checking methods, our design moves one further step beyond CA-Polar and is implementable for 5G and future networks. Our solution, as detailed in this paper, applies for arbitrary code lengths and rates. Its performance can be reproduced to serve as a baseline for further optimizations.
-  J. Andrews, S. Buzzi, W. Choi, S. Hanly, A. Lozano, A. Soong, J. Zhang, “What will 5G be?”, IEEE Journal on Selected Areas in Communications, vol. 32, no. 6, pp. 1065–1082, Jun. 2014.
-  E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels”, IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, Jul. 2009.
-  N. Stolte, “Recursive codes with the Plotkin-Construction and their Decoding”, Ph.D. dissertation, University of Technology Darmstadt, Germany.
-  K. Niu and K. Chen, ”CRC-aided decoding of polar codes”, IEEE Communications Letters, vol. 16, no. 10 pp. 1668–1671, Oct. 2012.
-  P. Trifonov, “Efficient design and decoding of polar codes”, IEEE Transactions on Communications vol. 60, no. 11 pp. 3221–3227, Nov. 2012.
-  I. Tal, and A. Vardy, “List decoding of polar codes”, IEEE Trans. Inf. Theory, vol. 61, no. 5 pp. 2213–2226, May 2015.
-  P. Trifonov and V. Miloslavskaya, “Polar Subcodes”, IEEE Journal on Selected Areas in Communications, vol. 34, no. 2, pp. 254–266, Feb. 2016.
-  T. Wang, D. Qu and T. Jiang, “Parity-check-concatenated polar codes”, IEEE Communications Letters, vol. 20, no. 12 pp. 2342–2345, Dec. 2016.
-  R1-1611254 “Details of the polar code design”, Huawei, HiSilicon, 3GPP TSG RAN WG1 #87 Meeting, Reno, USA, Nov. 10th–14th, 2016.
-  “Chairman’s notes: RAN1”, 3GPP TSG RAN WG1 NR Ad-Hoc Meeting #2, Qingdao, China, 27th–30th Jun. 2017.
-  X. Liu et al., “-expansion A Theoretical Framework for Fast and Recursive Construction of Polar Codes” in Proc IEEE Globecom, Dec. 2017.
-  L. Zhang, Z. Zhang, X. Wang, Q. Yu and Y. Chen, “On the puncturing patterns for punctured polar codes”, in Proc IEEE ISIT, pp. 121-125, Jun. 2014.
-  K. Niu, K. Chen and J. R. Lin, “Beyond Turbo codes: Rate-compatible punctured Polar codes,” in Proc IEEE ICC, pp. 3423–3427, Jun. 2013.
-  R. Wang and R. Liu, “A novel puncturing scheme for polar codes,” IEEE Communications Letters, vol. 18, no. 12, pp. 2081–2084, 2014.
-  R1-167533, “Examination of NR coding candidates for low rate applications”, MediaTek Inc., 3GPP TSG RAN WG1 #86 Meeting, Gothenburg, Sweden, Aug. 22nd–26th, 2016.