Mutual InformationMaximizing Quantized Belief Propagation Decoding of LDPC Codes
Abstract
A severe problem for mutual informationmaximizing lookup table (MIMLUT) decoding of lowdensity paritycheck (LDPC) code is the high memory cost for using large tables, while decomposing large tables to small tables deteriorates decoding error performance. In this paper, we propose a method, called mutual informationmaximizing quantized belief propagation (MIMQBP) decoding, to remove the lookup tables used for MIMLUT decoding. Our method leads to a very practical decoder, namely MIMQBP decoder, which can be implemented based only on simple mappings and fixedpoint additions. We further present how to practically and systematically design the MIMQBP decoder for both regular and irregular LDPC codes. Simulation results show that the MIMQBP decoder can always considerably outperform the stateoftheart MIMLUT decoder. Furthermore, the MIMQBP decoder with only 3 bits per message can outperform the floatingpoint belief propagation (BP) decoder at high signaltonoise ratio (SNR) region when testing on high rate codes with a maximum of 10–30 iterations.
I Introduction
Lowdensity paritycheck (LDPC) codes[1] have been widely applied to communication and data storage systems due to their capacity approaching performance. For the sake of simple hardware implementation, many efforts have been devoted to efficiently represent messages for LDPC decoding [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]. Among them, Chen et. al [2] approximated the belief propagation (BP) algorithm by representing loglikelihood ratios (LLRs) with a low resolution, generally 5 to 7 bits. The works in [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] focused on finite alphabet iterative decoding (FAID), which makes use of messages represented by symbols from finite alphabets instead of messages represented by LLRs. FAID algorithms with messages represented by 3 to 4 bits can approach the floatingpoint BP algorithm within 0.2 dB [3, 4, 5, 6, 7, 8, 9, 12, 13, 10, 11, 14].
Nonuniform quantized BP (QBP) algorithms were investigated in [4, 5, 6], where a decoder was implemented based on simple mappings and additions (including subtractions). However, since only the decoding of the (3, 6) LDPC code (code with variable node (VN) degree 3 and check node (CN) degree 6) was considered and significant amount of hand optimization was needed for the decoder design [4, 5, 6], we can hardly generalize the design to a different scenario.
Recently, mutual informationmaximizing lookup table (MIMLUT) decoding was considered in [7, 8, 9, 12, 10, 11, 13, 14], among which [13] and [14] focused on the decoding of irregular LDPC codes. An MIMLUT decoder can reduce the hardware complexity and increase the decoding throughput. However, a serious problem on the memory requirement may arise when the sizes of the lookup tables (LUTs) are large. To avoid this problem, these tables were decomposed to small tables at the cost of deteriorating the decoder’s error performance [7, 8, 9, 10, 11, 12, 13, 14].
In this paper, we propose a method, called mutual informationmaximizing quantized belief propagation (MIMQBP) decoding, to remove the tables used for MIMLUT decoding [7, 8, 9, 10, 11, 12, 13, 14] so as to greatly save memory costs. Our method leads to a hardwarefriendly decoder, namely, the MIMQBP decoder, which can be implemented based only on simple mappings and fixedpoint additions (including subtractions). From this point of view, our decoder works similarly to those presented by [4, 5, 6], but instead of using hand optimization, we show how to practically and systematically design the MIMQBP decoder for both regular and irregular LDPC codes. Simulation results show that the MIMQBP decoder can always considerably outperform the stateoftheart MIMLUT decoder [7, 8, 9, 10, 11, 12, 13, 14]. Moreover, the MIMQBP decoder with only 3 bits per message can outperform the floatingpoint BP decoder at high signaltonoise ratio (SNR) region when testing on high rate codes with a maximum of 10–30 iterations.
The remainder of this paper is organized as follows. Section II first introduces the optimal quantization method for binaryinput discrete memoryless channel (DMC), and then gives a review of the MIMLUT decoding and also highlights the linkage between the two topics. Section III shows the necessity for removing the tables used for MIMLUT decoding, and then proposes the MIMQBP decoding for regular LDPC codes. Section IV presents how to practically design the MIMQBP decoder. Section V illustrates the design of MIMQBP decoder for irregular LDPC codes. Section VI presents the simulation results. Finally, Section VII concludes this paper.
Ii Preliminaries
Iia Mutual InformationMaximizing Quantization of BinaryInput DMC
Consider the quantization of a binaryinput DMC as shown by Fig. 1. The channel input takes values from with probability and , respectively. The channel output takes values from with channel transition probability given by , where and . The channel output is quantized to which takes values from . A wellknown criterion for channel quantization [15, 16] is to design a quantizer to maximize the mutual information (MI) between and , i.e.
(1) 
where and .
A deterministic quantizer (DQ) means that for each , there exists a unique such that and for . Let denote the preimage of . We name a sequential deterministic quantizer (SDQ) [16] if it can be equivalently described by an integer set with in the way below
We thus also name an SDQ.
According to [15], in (IIA) must be deterministic; meanwhile, is an optimal SDQ when elements in are relabelled to satisfy
(2) 
Note that after merging any two elements with , the resulting optimal quantizer is as optimal as the original one [15]. A method based on dynamic programming (DP) [17, Section 15.3] was proposed in [15] to find with complexity . Moreover, a general framework has been developed in [16] for applying DP to find an optimal SDQ to maximize for cases that the labeling of the elements in is fixed and is an SDQ.
The quantization model in Fig. 1 can be used to quantize the binaryinput continuous memoryless channel, such as quantizing the binaryinput additive white Gaussian noise (AWGN) channel. This task can be done by first uniformly quantizing the AWGN channel to a DMC with outputs, where . Then, the quantization model in Fig. 1 is applicable. If we use an SDQ to implement the quantization, the SDQ can be equivalently described by thresholds with , such that for any continuous channel output , its quantization output is given by
(3) 
More details can be found in [16]. Given , implementing the quantization in (3) has complexity , which is illustrated by Fig. 2 for .
IiB MIMLUT Decoder Design for Regular LDPC Codes
Consider a binaryinput DMC. Denote the channel input by which takes values from with equal probability, i.e., . Denote as the DMC output which takes values from with channel transition probability . By using the quantization method introduced in Section IIA, we can set according to our needs for different decoding iterations.
Consider to design a quantized message passing (MP) decoder for a regular LDPC code. Denote and as the alphabets of messages passed from VN to CN and vice versa. Note that and their related notations may or may not vary with iterations. We have used these notations without specifying their associated iterations, because after specifying the decoder design for one iteration, the design is clear for all iterations.
For the message (resp. ) passed from VN to CN (resp. CN to VN), we use (resp. ) to denote the probability mass function (pmf) of (resp. ) conditioned on the channel input bit . If the code graph is cyclefree, (resp. ) conditioned on is independent and identically distributed (i.i.d.) with respect to different edges at a same iteration. The design of the MIMLUT decoder [7, 8, 9, 12, 10, 11, 13, 14] is carried out by using density evolution [18, 4] (by tracing and ) over the assumption of cyclefree code graph. However, the MIMLUT decoder can work well on code graphs containing cycles.
For each iteration, design of the update function (UF)
(4) 
for CN update comes first, where the CN update is shown by Fig. 3(a). The MIMLUT decoding methods design to maximize . For easily understanding this design problem, we can equivalently convert it to the problem of DMC quantization, as shown by Fig. 4.
We assume is known, because for the first iteration, can be solely derived from the channel transition probability , and for the other iteration, is known after the design at VN is completed. The joint distribution of the incoming message conditioned on the channel input bit at a CN (i.e., the channel transition probability with respect to the DMC shown by Fig. 4) is given by [9]
(5) 
where is a realization of , is the dimension of , is a realization of , consists of channel input bits corresponding to the VNs associated with incoming edges, with denoting the addition in . Based on (5), we have
(6) 
Given , the design of is equivalent to the design of in (IIA) by setting and . We can solve this design problem by using the DP method proposed in [15], after listing in descending order based on (see (2)). After the design of , a LUT is typically used for storing , and the output message is passed to the CN’s neighbour VNs with given by
(7) 
Then, design of the UF
(8) 
for VN update starts, where the VN update is shown by Fig. 3(b). The MIMLUT decoding methods also design to maximize . For easily understanding this design problem, we can equivalently convert it to the problem of DMC quantization, as shown by Fig. 5.
The joint distribution of incoming message conditioned on the channel input bit at a VN (i.e., the channel transition probability with respect to the DMC shown by Fig. 5) is given by[9]
(9) 
where is a realization of , is a realization of , is the dimension of , and is a realization of .
Given , the design of is equivalent to the design of in (IIA) by setting and . We can solve this design problem by using the DP method proposed in [15], after listing in descending order based on (see (2)). After the design of , a LUT is typically used for storing , and the output message is passed to the VN’s neighbour CNs with given by
(10) 
For each iteration, we can design the estimation function
(11) 
to estimate the channel input bit corresponding to each VN. The design can be carried out similarly to the design of . The two main differences involved in the design are that i) the incoming message alphabet is changed to and ii) the outgoing message alphabet is changed to . Therefore, we ignore the details.
After finishing the design of , , and for all iterations, the design of the MIMLUT decoder is completed. In general, (resp. ) is used for all iterations, leading to a 3bit (resp. 4bit) decoder. Given , and the maximum allowed decoding iterations, the quality of the MIMLUT decoder heavily depends on the choice of , which is essentially determined by the design noise standard derivation . The maximum noise standard derivation , which can make approach 1 after reaching the maximum decoding iteration, is called the decoding threshold. Empirically, a good should be around as investigated in [7, 8, 9, 10, 11, 12, 13, 14].
Iii MIMQBP Decoding of Regular LDPC Codes
Iiia Motivation
When implementing MIMLUT decoding, , , and are implemented by using LUTs, and then the decoding works efficiently by using table lookup. The sizes of tables for implementing , , and are , , and , respectively. Thus, a serious problem for a very large memory requirement may arise due to these tables’ large sizes in practice. To solve this problem, current MIMLUT decoding methods [7, 8, 9, 10, 11, 12, 13, 14] decompose , , and into a series of subfunctions, each working on two incoming messages. After this decomposition, the sizes of tables for implementing , , and are reduced to , , and , respectively. This decomposition technique can significantly reduce the cost for storage, but at the same time, will degrade the performance of , , and in terms of maximizing MI, as shown in the example below.
Example 1
Consider the UF at a VN (i.e., ). Assume that and the conditional probabilities and are given by
(12) 
and
(13) 
respectively. Based on (9), the joint distribution , with from channel and from CN, is given by Table I. is an MIM quantizer (in the sense that is the input message) maximizing with .
0  
0  
0  
0  
1  
1  
1  
1 
We now consider to decompose into two subfunctions and , as shown by Fig. 6, where deals with the two incoming messages from CN (i.e., ), and deals with the incoming message with from channel and from the output of . The joint distribution is given by Table II(a). is a quantizer maximizing , and we have
(14) 
Using and as input messages, the joint distribution is given by Table II(b). is an MIM quantizer (in the sense that is the input message) maximizing with . The corresponding to and of Table II can be written as
(15) 
which leads to a smaller (i.e., ) than that associated with the (given by Table I) due to the decomposition.


To overcome the drawback of the MIMLUT decoding methods [7, 8, 9, 10, 11, 12, 13, 14] due to the use of LUTs, in this work, we propose a systematic method, called MIMQBP decoding, which is implemented based only on simple mappings and additions. Instead of using the decomposition technique, our method can deal with all incoming messages at a node (CN/VN) at the same time without causing any storage problem. The MIMQBP decoding is presented in the next two subsections, for the updates at CN and VN, respectively.
IiiB CN Update for MIMQBP Decoding
The framework of CN update for MIMQBP decoding is shown by Fig. 7. We implement the CN update with three steps: First, we use a function to map each incoming message symbol to a number; second, we use a function to sum up all incoming messages’ numbers (a little different from the summation in general meaning); third, we use an SDQ to map the summation to the outgoing message symbol. In this way, the CN UF is fully determined by , and . In the rest of this subsection, we show the principles for designing , and so as to result in a which tends to maximize .
First, we use a reconstruction function (RF)
(16) 
to map each incoming message realization to a specific number in the computational domain , where in general or is considered. Let be the sign of given by
For , let
We suggest to satisfy
(17) 
This suggestion associates to the channel input bit in the following way: we predict to be 0 if and to be 1 if , while indicates the unreliability of the prediction result.
Second, we represent each incoming message realization by
(18) 
We predict to be 0 if and to be 1 if , while indicates the unreliability of the prediction result. Prediction in this way is consistent with the true situation shown by Fig. 4: is the binary summation of the channel input bits associated with (determined by ), and more incoming messages lead to more unreliability (I.e., larger leads to larger . This is the reason why we regard as the unreliability.). Denote
(19) 
Elements in are labelled to satisfy
(20) 
where is a binary relation on defined by
for . Assuming , from (20) we know that we are more likely to predict to be 0 for smaller and to be 1 for larger . Let be a random variable taking values from . We have
(21) 
where and is given by (5).
Third, starting from and , we can use the general DP method proposed in [16] to find an SDQ
(22) 
to maximize (in the sense that the labelling of elements in is fixed and given by (20) and is an SDQ). We also use to generate the threshold set (TS) given by
(23) 
Note that is equivalent to in quantizing to .
Finally, the UF is fully determined by , and in the following way given by
(24) 
where is a binary relation on defined by
for . In addition, instead of using (7), we can compute for the outgoing message in a simpler way based on given by
(25) 
Note that is essentially determined by , since and can be computed accordingly after is given. We will illustrate the practical design of in Section IVA. After finishing the design of given by (24), the storage complexity for storing is ( for storing and for storing ), and thus is negligible. On the other hand, implementing the CN update shown by Fig. 7 for computing one outgoing message has complexity . In detail, computing has complexity (binary operations mainly including additions), which allows a binary treelike parallel implementation; meanwhile, mapping to based on has complexity (binary comparison operations), which can be analogously explained by Fig. 2. The fast and simple implementation for mapping to indeed benefits from the use of SDQs in (22) and (23). This is the essential reason why we choose SDQs. Instead, if an optimal DQ is used to map to in (22), we may in general need a table of size to store this optimal DQ, but at the same time, we may achieve better and can reduce the computational complexity for mapping to from to .
IiiC VN Update for MIMQBP Decoding
The framework of VN update for MIMQBP decoding is shown by Fig. 8. We implement the VN update with three steps: First, we use two functions and to map each incoming message symbol from CN and channel, respectively, to a number; second, we use a function to sum up all incoming messages’ numbers; third, we use an SDQ to map the summation to the outgoing message symbol. In this way, the VN UF is fully determined by and . In the rest of this subsection, we show the principles for designing and so as to result in a which tends to maximize .
First, we use a RF
(26) 
to map each incoming message (from CN) realization to , and use another RF
(27) 
to map the incoming message (from channel) realization to . For , let
For , let
We suggest and to satisfy
(28) 
This suggestion associates and to the channel input bit in the following way: is more likely to be 0 (resp. 1) for larger (resp. smaller) and .
Second, we represent each incoming message realization by
(29) 
The channel input bit is more likely to be 0 (resp. 1) for larger (resp. smaller) . Denote
(30) 
Elements in are labelled to satisfy
(31) 
Assuming , from (31) we know that is more likely be 0 (resp. 1) for larger (resp. smaller) . Let be a random variable taking values from . We have
(32) 
where and is given by (9).
Third, starting from and , we can use the general DP method proposed in [16] to find an SDQ
(33) 
to maximize (in the sense that the labelling of elements in is fixed and given by (31) and is an SDQ). We also use to generate the TS given by
(34) 
Note that is equivalent to in quantizing to .
Finally, the UF is fully determined by , and in the following way given by
(35) 
In addition, instead of using (10), we can compute for the outgoing message in a simpler way based on given by
(36) 
Note that is essentially determined by and , since and can be computed accordingly after and are given. We will illustrate the practical design of and in Section IVB. After finishing the design of given by (35), the storage complexity for storing is ( for storing , for storing , and for storing ), and thus is negligible. On the other hand, implementing the VN update shown by Fig. 8 for computing one outgoing message has complexity . In detail, computing has complexity , which allows a binary treelike parallel implementation; meanwhile, mapping to based on has complexity , which can be analogously explained by Fig. 2. The fast and simple implementation for mapping to also benefits from the use of SDQs in (33) and (34). If we use the optimal DQ instead, we may in general need a table of size to store this optimal DQ, but at the same time, we may achieve better and can reduce the computational complexity for mapping to from to .
Example 2
We show a practical case for the framework in Fig. 8 to apply. Consider Example 1 again. If we use with
and use with
the TS defined by (34) will be given by
Then, defined by (35) will be totally the same with the defined by Table I, which maximizes with . Therefore, instead of using Table I to store , we can use , ,