ParityCheck Polar Coding for 5G and Beyond
Abstract
In this paper, we propose a comprehensive Polar coding solution that integrates reliability calculation, rate matching and paritycheck coding. Judging a channel coding design from the industry’s viewpoint, there are two primary concerns: (i) lowcomplexity implementation in applicationspecific integrated circuit (ASIC), and (ii) superior & stable performance under a wide range of code lengths and rates. The former provides cost & powerefficiency which are vital to any commercial system; the latter ensures flexible and robust services. Our design respects both criteria. It demonstrates better performance than existing schemes in literature, but requires only a fraction of implementation cost. With easilyreproducible code construction for arbitrary code rates and lengths, we are able to report “1bit” finegranularity simulation results for thousands of cases. The released results can serve as a baseline for future optimization of Polar codes.^{1}^{1}1The work was first disclosed in 2016 as a technical contribution [9] and accepted by IEEE ICC 2018. Part of the proposed design has been adopted by 3GPP as the Polar coding standards for 5G [10].
I Introduction
Ia Background and related works
Answering the question of “what will 5G be?” [1], the result is clear at least for channel coding. For the enhanced Mobile Broadband (eMBB) service category in 5G, LDPC codes and Polar codes [2, 3] have been adopted for data channel and control channel, respectively. With stateoftheart code construction techniques [5, 4] and list decoding algorithm [6], Polar codes demonstrate competitive performance under short information block length (K1000), whereas the block error rate (BLER) gain over LDPC and Turbo codes is up to 1dB. Such advantages make Polar codes the most suitable candidate for the control channel, where the payload size is relatively small.
Polar code construction refers to determining the sets of information/frozen bits given certain information block length and code length . According to [2, 3], the most reliable synthesized subchannels should be selected as information set to obtain the best performance under successive cancellation (SC) decoding. Gaussian approximation (GA) [5] is an efficient way to compute the “reliability” under AWGN channel.
While the performance of an SC decoder is worse than LDPC and Turbo, CRCaided Polar (CAPolar) codes [4] demonstrate significantly better performance under successive cancellation list (SCL) decoding [6]. The reason lies in that the native code distance of Polar codes is relatively poor compared to ReedMuller codes and many other modern codes. Without CRC bits, an SCL decoder relies solely on path metrics to select from the surviving paths. Thus, codes with poor distance spectrum cannot perform well. In contrast, CAPolar relies on both path metric and CRC bits to pick the final path, therefore does not suffer from the performance bottleneck incurred by poor code distance.
Although SCL significantly improves the performance of Polar codes, the optimal code construction under list decoding remains an open problem. Beyond CAPolar, several attempts [7, 8] have been made to design better Polar codes for SCL decoder. A more general form of outer codes, coined as paritycheck coding, was introduced to provide additional performance gain as well as flexibility. Polar subcodes [7] allow some “dynamic” frozen bits to be informationbitsdependent. Extended BCH codewords were leveraged to establish paritycheck functions such that the constructed codes has guaranteed minimum distance, which is always better than the original Polar codes with the same code length and code rate. Later, a heuristic paritycheck construction was introduced in [8], which also shows evident performance gain over CAPolar codes. These methods opened a door for better Polar construction with parity check bits.
IB Motivation and our contributions
Despite the rich literature on Polar code construction, we found that none of them can be directly applied to a commercial network such as 5G. The reasons are below:

Implementation complexity: existing code construction schemes, including rate matching [13, 14] and paritycheck coding [7, 8], rely heavily on density evolution (DE) (or its simplification GA [5]) to acquire subchannel reliability. These operations (e.g., floatpoint computations of and sorting) are suitable for software simulations but are not hardwarefriendly. They either incur large encoding/decoding latency if calculated online, or occupy much memory if calculated offline and prestored in ASIC.

Incomplete solution: existing paritycheck coding schemes are not codesigned with a practical ratematching scheme. The construction in [7] is based on length eBCH codewords, and the corresponding generalization to arbitrary code lengths is unknown. The heuristic method in [8] recursively establishes paritycheck functions based on GAacquired reliability. Similarly, a ratecompatible design is not available.

Lack of finegranularity evaluation: existing works [7, 8, 12, 13, 14] often draw conclusions from a few special cases (e.g., ). We find it quite common that a scheme that excels in certain cases may perform poorly in other cases, thus their conclusions may not hold for the general cases. To fully evaluate a scheme before largescale implementation, finegranularity simulations covering various code lengths and rates are necessary.
To address the above issues, we propose a PCPolar design that integrates deterministic reliability ordering and rate matching schemes. Based on distance spectrum analysis and error propagation patterns, we propose to select PC bits from subchannels of low row weights, and establish PC functions through a fixedlength cyclic shift register. The entire solution is hardwarefriendly to facilitate ASIC implementation. To our best knowledge, such a comprehensive yet lowcomplexity solution for Polar construction has not been elaborated in literature. Moreover, we provide finegranularity simulation results to demonstrate stable & better performance than existing schemes under thousands of cases. Given the construction details, our design should be reproducible for arbitrary code lengths and rates. Therefore, we hope it serve as a baseline for further optimizations of Polar codes.
Ii Polar codes
A binary Polar code of mother code length can be defined by , where and are message and codeword vectors, respectively, and is the generator matrix. To construct a Polar code, is obtained by taking rows with indices from the matrix , where is the information subchannel indices, is the kernel and denotes Kronecker power.
Iia Reliability ordering
One key step of Polar code construction is determining the information set . According to Arikan [2], the reliability metric is channel dependent. Applying this principle, density evolution (DE) (or its simplification Gaussian approximation (GA) [5]) calculates the reliability of each synthesized subchannel based on channel state information (CSI), which can be signaltonoiseratio (SNR) or erasure probability. The most reliable subchannels are selected as . In the absence of assistant bits such as CRC or PC bits, the rest subchannels are selected as the frozen set, denoted by .
Regarding ASIC implementation, the channeldependent GA/DE method is infeasible due to (i) floatpoint computations of complicated functions such as , and sorting, and (ii) imperfect CSI estimation.
Alternatively, we propose a channelindependent Polarization Weight (PW) method as follows. Given a subchannel index and its binary expansion , its PW value is defined as
(1) 
where is empirically chosen to be [11]. A higher PW value indicates a higher reliability.
A reliability ordered sequence is obtained offline through Algorithm 1, and prestored in ASIC such that no onthefly calculation is required.
Remark: Although subchannel reliability is channeldependent, their relative ordering is almost channelindependent under a practical working point (e.g., BLER within ). The simple and closedform PW formula in (1) well approximates this ordering by capturing the recursive polarization process of Polar codes. It generates an information set very similar to that generated by GA/DE methods, but requires only a fraction of implementation cost.
IiB Rate matching
Rate matching bears much practical importance because, in a commercial system, the allocated channel resource may not have exactly bits. To support an arbitrary code length of , puncturing [12, 13] and shortening [14] are performed. A welldesigned rate matching scheme should bring minimum performance loss with respect to its mother code of length .
For puncturing, bits are not transmitted and deemed unknown at the decoder, whereas the loglikelihood ratio (LLR) input of the corresponding punctured position is set to zeros. For shortening, bits are not transmitted and deemed known at the decoder, whereas the LLR input of the corresponding shortened position is set to infinite large (see [12, 13, 14] for details).
Quasiuniformpuncturing (QUP) [13] sequentially punctures the first coded bits, i.e., from the mother codeword , and recalculates the reliability of all subchannels using GA. Since the selection of information set fully adapts to the punctured pattern via GA, the method yields good and stable performance under a wide range of code lengths and code rates. The WangLiu shortening [14] method defines a set of valid shortening patterns based on the Polar kernel, and yields superior performance at higher coding rates. However, both schemes [13, 14] inherit the same implementation issues from GA, that is, online reliability recalculations and imperfect CSI estimation.
Similar to [13, 14], other existing rate matching schemes rely heavily on recalculations of subchannel reliability via GA/DE, since their reliability ordering changes greatly over different punctured/shortened patterns. To implement such schemes, one has to either perform online GA/DE, or prestore all the length reliability ordered sequences for each code length . Unfortunately, neither is feasible for ASIC implementation due to complexity/latency and memory constraints.
Our scheme takes the opposite way, i.e., defining a rate matching sequence that, no matter how many bits are punctured/shortened, the predefined reliability order (e.g., PW order) is maximally preserved. In this way, only one reliability ordered sequence and another rate matching sequence are required, both of which are of length . Furthermore, no online calculation is required. Since the reliability ordered sequence becomes ratematching independent, inevitable performance loss is incurred. However, the tradeoff is worthwhile given the significant complexity reduction.
The proposed rate matching scheme is described below.

Select most reliable subchannels as according to PW, while skipping the indices in .
As mentioned, the rate matching scheme only requires to prestore (in addition to ), thus is hardware friendly. In fact, even can be online generated with simple procedures: switch between big endian and little endian while reading , which requires almost no computation overhead.
Iii Paritycheck coding
As mentioned in Section I, CAPolar improves the performance under list decoding with better distance spectrum. But it has two major limitations. First, CRC bits are essentially independent from the Polar kernel, thus leaves no room for joint optimization. Second, they are appended at the end, thus cannot assist decoding during intermediate decoding stages.
Paritycheck bits have the advantage of improving path selection during intermediate decoding stages. Existing paritycheck designs are Polarspecific by considering either the Polar kernel [7], or its SC decoding process [8]. However, they require high complexity to construct and store the PC functions. Specifically, [7] requires to perform Gaussian elimination on the paritycheck matrix, which has complexity, and [8] requires a recursive algorithm to establish the PC functions. These operations cannot be pipelined for hardware acceleration. Moreover, the PC functions are irregular and do not support compact representation with a few parameters. To implement, a set of bit positions have to be prestored to specify each PC function. For example, if a PC function is , then the indices are stored, which incurs excessive memory cost especially when the number of PC bits and functions is large.
We address the above problems with a complete solution that integrates our reliability metric in Section IIA and rate matching scheme in Section IIB. Our solution is guided by Polarspecific distance spectrum analysis and observations from bit error propagation patterns. The constructed PC functions require only one parameter to represent, and very simple hardware to implement.
Iiia PC bit positions
IiiA1 Distance spectrum analysis
A distance spectrum analysis of Polar codes can help to select PC bit positions. In an SCL decoder, a path is defined by a binary vector . At the th decoding stage, what an SC decoder actually does is deciding whether the received vector is more likely to be from the subset of codewords with , or the subset of codewords with .
The former subset is called a “zero” coset and the latter subset is called a “one” coset, respectively defined as
where is the th row of , and denotes all codewords corresponding to path and .
For example, the “zero” coset and the “one” coset with the same prefix of path has difference only at . The distance spectrum between these two cosets is denoted by , where
(2) 
where denotes all codewords corresponding to path with all “0” decoded bits except “1” for , and is the weight (number of nonzero elements) of . By the definition of , it is straightforward to see that the minimum distance between the two cosets is
(3) 
The concept of cosets naturally extends to an SCL decoder. It is observed that the path metric is closely related to the minimum distance and distance spectrum. To avoid discarding the true path at the th stage, the path metrics of incorrect paths should receive more penalty than the true path. This can be achieved by letting the cosets induced by different paths to be “as far as possible” so that the true path is “as distinguishable as possible”, especially for paths with differences over only a few bits. In an SCL decoder, “a larger distance” between cosets means “a larger penalty” on the path metric.
If the th bit does not involve in any PC function, then the minimum distance between cosets are incurred by the bit positions with minimum row weight (i.e., ) among the unfrozen bits. By selecting these bit positions as PC bits and setting their values using linear combinations of preceding information bits, the path metrics of different paths can be made “more distinguishable” and the SCL decoding performance can be improved.
IiiA2 Tradeoff between reliability and code distance
As explained, the PC positions should be selected from the unfrozen subchannel indices with minimum or lower rowweights. However, the number of lowweight positions may be quite large depending on . It is obviously unwise to select all of them as PC bits. Consider the extreme case where all the low weight positions are selected as frozen bits (can be viewed as a PC bit with PC function ), the remaining information set would be those with the highest row weights and the resulting code construction becomes similar to ReedMeed codes. Although the distance spectrum of ReedMuller codes is far better than Polar codes, its BLER performance under SC decoding is poor.
An SCL decoder with practical list sizes (e.g., ) lies somewhere between an SC decoder and a Maximum Likelihood (ML) decoder. As a result, a good PCPolar construction should respect both reliability and code distance. In the context of PCPolar, the corresponding design principle is to preselect just enough PC bits from the most reliable bit positions (those otherwise would be selected as information set ), such that the reliability of the remaining information subchannels are not sacrificed too much. Note that the unreliable bit positions (those otherwise would be selected as frozen set ) can be subsequently selected as additional PC bits, which will not sacrifice the reliability of .
To summarize, the design principles are:

Select the bit positions with minimum row weights among the nonfrozen bit set as PC bits.

Preselect a proper number of PC bits from the reliable bit positions.
In practice, easytoimplement rules must be defined to determine the order for preselecting the PC bits. Since the PC functions must be forwardonly to be consistent with any SCbased decoder, the last subchannel index in a PC function always becomes a PC bit. To let the PC functions cover as many information bits as possible. An intuitive way is to select PC bits by descending reliability order^{2}^{2}2Since information set is also selected by descending reliability order, the same hardware module can be reused for preselecting PC bits., such that if an incorrect path passes the parity check, a larger penalty is imposed on its path metric. Specifically, we adopt the following steps:

Select PC bits from the unfrozen bit positions with the least row weight () by descending reliability order.

If there is insufficient unfrozen bit positions with row weight , continue to select those with row weight by descending reliability order.
IiiB PC functions
As discussed, the PC bit values should be set to a linear combination of some preceding information bits, such that code distance spectrum is improved.
Take for example, if is selected as a PC bit, a good PC function would be . Their corresponding row vectors are
Observe that . If was an information bit and was a frozen bit, the minimum code weight would be at most 4, corresponding to as the lowestweight nonzero codeword. Now that we change into a PC bit, and impose as a PC function, the combined codeword becomes
which has a higher weight of 6.
For longer codes, it becomes nontrivial to find all the PC functions that improves the minimum code distance. Even if such a method exists, the construction complexity may not be affordable in ASIC. Therefore, we resort to a hardwarefriendly way to establish effective PC functions.
From the decoding perspective, is an effective PC function since it includes subchannels with relatively independent bit errors. For example, if the th and th subchannels belong to the same PC function, and a bit error in the th subchannel leads to another bit error in the th subchannel, this error pattern would not be detected by a PC bit. Although bit error propagation is inevitable with SCbased Polar decoding, we should exploit its bit error patterns to mitigate its adversary effect on PC functions.
By MonteCarlo simulation of a length16 Polar block, we found that among the possible error patterns, only 16 of them are dominant and take up around of the total error events. Besides the single error pattern , the frequent error propagation patterns are
Observe that the most frequent error patterns are between every 1, 2, 4, 8 bit positions. This is due to the powerof2 recursive structure in Polar kernel. Intuitively, we should avoid setting up PC functions over bit positions with powerof2 spacings. In contrast, we found that bit errors propagate less frequently between every 5 bit positions.
An effective yet implementable way is to set up PC functions over bit positions with fixed sized spacing, where can be set to 5 for all cases. It can be easily implemented by a length cyclic shift register (CSR). The PC precoding function, denoted by , is described by Algorithm 3.
A PC decoder reuses the same algorithm, in which is the decoded value of an information bit, and the expected PC bit value is the first register state for . All paths with an unexpected PC bit value are pruned.
The equivalent CSR operation is shown in Figure 1. It has the following advantages:

The PC function has only one parameter . No need to feed the constructor with every individual PC function.

The complexity does not grow with the number of PC bits or PC functions. All of them can be implemented by a single set of CSR.

The encoder and decoder can share the same CSR to further save chip area.
Note that more sophisticated multiple feedback CSR can also be adopted, which is defined by a polynomial. However, the implementation in Figure 1 with is the simplest while preserves the best performance.
IiiC Code construction algorithm
A full code construction flow is depicted in Figure 2, in which the PC precoding module is described in Algorithm 3 and the information/frozen/PC set generation module is detailed in Algorithm 4. The rate matching pattern is obtained according to Algorithm 2.
Some clarifications to Algorithm 4 are as follows. Step 13 can be performed once offline for faster construction, and the parameter tuple can be prestored. There are two types of PC bits, i.e., the “reliable” PC bits preselected in Step 3 and the “unreliable” PC bits^{3}^{3}3The PC bits before the first information bit are equivalent to frozen bits. additionally selected in Step 5. The rough number of preselected PC bits is determined based on our observation that codes with rate near require more PC bits than higher and lower rates. In addition, is upper bounded by . The coefficient is used to control the number of preselected PC bits. The larger is, the more PC bits are preselected^{4}^{4}4Note that other ways to control the number of preselected PC bits are allowed as long as they produce good performance.. Typically, a smaller can be used for an SCL decoder with smaller list sizes, and a larger can be used for an SCL decoder with larger list sizes and better performance at higher SNR region. To facilitate reproducible research, we set in all our simulations for a balanced performance under an SCL decoder with a practical list size .
Iv Simulation results
To validate the proposed PCPolar design, we not only compare with existing Polar coding schemes, but also provide “1bit” finegranularity simulation results covering a wide range of code lengths and rates. A paritycheck (PC) SCL decoder is used for PCPolar codes. It is similar to an SCL decoder except that it only keeps paths that satisfy PC functions during intermediate decoding stages. The CRC polynomials we use for CAPolar are (CRC8) and (CRC16).
Iva PC bit gain
IvA1 Comparison with CAPolar
In Figure 3, we compare PCPolar with CAPolar under various mother code lengths. The reliability ordering for PCPolar is obtained by the hardwarefriendly PW method, and that of CAPolar is obtained by the computationintensive GA method. The comparison is actually unfair for PCPolar in terms of performance, since GA is more precise while PW is only an approximation. Nevertheless, we observe that PCPolar still outperforms CAPolar at all cases. This is due to both the sufficient gain from PC bits and negligible loss from the PW method.
It is also observed that, as the number of CRC bits increases, the performance of CAPolar fails to improve after the CRC length reaches 8. The best performance achieved by that of CAPolar is still worse than that of PCPolar.
In Figure 4, we simulate the cases with nonmother code lengths, where the rate matching methods are BRS for PCPolar and QUP for CAPolar, respectively. Again, the construction complexity for former is much lower than the latter, since reliability reordering with respect to different ratematching patterns is not allowed in the BRS method. Similarly, stable PC bit gain is observed. The overall gain is up to 0.8dB compared with CAPolar (CRC16) and 0.3dB compared with CAPolar (CRC8).
IvA2 Comparison of different paritycheck schemes
We further compare the proposed PCPolar scheme with existing paritycheck schemes such as eBCHPolar [7] and PCC [8]. Since both [7, 8] only provided construction procedures and simulation results under mother code lengths, and the associated construction parameters (e.g., the design distance for eBCHPolar and the number of check bits for PCC) are available only for a few cases, our comparison focuses on these reproducible cases. CAPolar with 8bit and 16bit CRC is also simulated for reference. For CAPolar, eBCHPolar and PCC, the GA method is applied to obtain a more precise reliability ordering; for PCPolar the lowcomplexity PW method is applied.
As shown in Figures 5 and 6, all the paritycheck based schemes (except for PCC under two cases) have better performance than CAPolar (CRC16) at the working point of interest, i.e., , which confirms the results reported in [7, 8]. In particular, we found that eBCHPolar exhibits more stable performance than PCC due to the minimumdistanceguaranteed construction algorithm. PCC also has good performance under most cases, especially at low SNR region.
Among these schemes, PCPolar demonstrates the best performance in all cases. The gain over CAPolar with 16bit and 8bit CRC is 0.8dB and 0.3dB, respectively. The gain over eBCHPolar and PCC varies over different cases. In certain cases, PCPolar has slightly better performance than eBCHPolar and PCC; while in a few cases, the gain of PCPolar can reach 0.5dB.
IvB Finegranularity simulations
As observed in Figure 5 and 6, a scheme with excellent performance in one case may have worse performance in other cases. In order to draw more solid conclusion based on more simulation cases, finegranularity simulation is necessary in the evaluation of channel coding schemes.
Therefore, we conduct “1bit” granularity () to cover a wide range of mother and nonmother code lengths, and typical code rates that are used in control and data channels..
In Figure 7, we report the required SNR to achieve for PCPolar and CAPolar in over 4700 cases. The gain ranges from 0.2dB to 1dB. Similar to previous experiments, GA/QUP are applied in CAPolar and PW/BRS are applied in PCPolar. For CAPolar, 8bit CRC instead of 16bit CRC is adopted for better performance. Even though, these extensive results clearly show that PCPolar outperforms CAPolar in almost all cases. The results demonstrate that PCPolar has stable & better performance than CAPolar in terms of error correction capability.
V Conclusion
In this work, we propose a novel Polar construction with superior & stable error correction performance under a wide range of code rates and lengths. As a full solution that integrates hardwarefriendly reliability ordering, rate matching and paritychecking methods, our design moves one further step beyond CAPolar and is implementable for 5G and future networks. Our solution, as detailed in this paper, applies for arbitrary code lengths and rates. Its performance can be reproduced to serve as a baseline for further optimizations.
References
 [1] J. Andrews, S. Buzzi, W. Choi, S. Hanly, A. Lozano, A. Soong, J. Zhang, “What will 5G be?”, IEEE Journal on Selected Areas in Communications, vol. 32, no. 6, pp. 1065–1082, Jun. 2014.
 [2] E. Arikan, “Channel polarization: A method for constructing capacityachieving codes for symmetric binaryinput memoryless channels”, IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, Jul. 2009.
 [3] N. Stolte, “Recursive codes with the PlotkinConstruction and their Decoding”, Ph.D. dissertation, University of Technology Darmstadt, Germany.
 [4] K. Niu and K. Chen, ”CRCaided decoding of polar codes”, IEEE Communications Letters, vol. 16, no. 10 pp. 1668–1671, Oct. 2012.
 [5] P. Trifonov, “Efficient design and decoding of polar codes”, IEEE Transactions on Communications vol. 60, no. 11 pp. 3221–3227, Nov. 2012.
 [6] I. Tal, and A. Vardy, “List decoding of polar codes”, IEEE Trans. Inf. Theory, vol. 61, no. 5 pp. 2213–2226, May 2015.
 [7] P. Trifonov and V. Miloslavskaya, “Polar Subcodes”, IEEE Journal on Selected Areas in Communications, vol. 34, no. 2, pp. 254–266, Feb. 2016.
 [8] T. Wang, D. Qu and T. Jiang, “Paritycheckconcatenated polar codes”, IEEE Communications Letters, vol. 20, no. 12 pp. 2342–2345, Dec. 2016.
 [9] R11611254 “Details of the polar code design”, Huawei, HiSilicon, 3GPP TSG RAN WG1 #87 Meeting, Reno, USA, Nov. 10th–14th, 2016.
 [10] “Chairman’s notes: RAN1”, 3GPP TSG RAN WG1 NR AdHoc Meeting #2, Qingdao, China, 27th–30th Jun. 2017.
 [11] X. Liu et al., “expansion A Theoretical Framework for Fast and Recursive Construction of Polar Codes” in Proc IEEE Globecom, Dec. 2017.
 [12] L. Zhang, Z. Zhang, X. Wang, Q. Yu and Y. Chen, “On the puncturing patterns for punctured polar codes”, in Proc IEEE ISIT, pp. 121125, Jun. 2014.
 [13] K. Niu, K. Chen and J. R. Lin, “Beyond Turbo codes: Ratecompatible punctured Polar codes,” in Proc IEEE ICC, pp. 3423–3427, Jun. 2013.
 [14] R. Wang and R. Liu, “A novel puncturing scheme for polar codes,” IEEE Communications Letters, vol. 18, no. 12, pp. 2081–2084, 2014.
 [15] R1167533, “Examination of NR coding candidates for low rate applications”, MediaTek Inc., 3GPP TSG RAN WG1 #86 Meeting, Gothenburg, Sweden, Aug. 22nd–26th, 2016.