Abstract
Lowdensity paritycheck codes, a class of capacityapproaching linear codes, are particularly recognized for their efficient decoding scheme. The decoding scheme, known as the sumproduct, is an iterative algorithm consisting of passing messages between variable and check nodes of the factor graph. The sumproduct algorithm is fully parallelizable, owing to the fact that all messages can be update concurrently. However, since it requires extensive number of highly interconnected wires, the fullyparallel implementation of the sumproduct on chips is exceedingly challenging. Stochastic decoding algorithms, which exchange binary messages, are of great interest for mitigating this challenge and have been the focus of extensive research over the past decade. They significantly reduce the required wiring and computational complexity of the messagepassing algorithm. Even though stochastic decoders have been shown extremely effective in practice, the theoretical aspect and understanding of such algorithms remains limited at large. Our main objective in this paper is to address this issue. We first propose a novel algorithm referred to as the Markov based stochastic decoding. Then, we provide concrete quantitative guarantees on its performance for treestructured as well as general factor graphs. More specifically, we provide upperbounds on the first and second moments of the error, illustrating that the proposed algorithm is an asymptotically consistent estimate of the sumproduct algorithm. We also validate our theoretical predictions with experimental results, showing we achieve comparable performance to other practical stochastic decoders.
A Novel Stochastic Decoding of LDPC Codes with Quantitative Guarantees
Nima Noorshams  Aravind Iyengar 
nnoorshams@cal.berkeley.edu  ariyenga@qti.qualcomm.com 
Qualcomm Research Silicon Valley
Santa Clara, CA, USA
July 3, 2019
1 Introduction
Sparse graph codes, most notably lowdensity paritycheck (LDPC), have been adopted by the latest wireless communication standards [16, 1, 11, 27]. They are known to approach the channel capacity [30, 21, 29, 28]. What makes them even more appealing for practical purposes is their simple decoding scheme [2, 18]. More specifically, LDPC codes are decoded via a messagepassing algorithm called the sumproduct (SP). It is an iterative algorithm consisting of passing messages between variable and check nodes in the factor graph [2, 18]. The fact that all messages in the SP algorithm can be updated concurrently, makes the fullyparallel implementation—where the factor graph is directly mapped onto the chip—most efficient. However, due to complex and seemingly random connections between check and variable nodes in the factor graph, fullyparallel implementation of the SP is challenging. The wiring complexity has a big impact on the circuit area and power consumption. Also longer, more interconnected wires can create more parasitic capacitance and limit the clock rate.
Various solutions have been suggested by researchers in order to reduce the implementation complexity of the fullyparallel SP algorithm. Analog circuits have been designed for short LDPC codes [14, 36]. Bit serial algorithms, where messages are transmitted serially over single wires, have been proposed [6, 7, 3, 5]. Splitting rowmodules by partitioning check node operations has been shown to provide substantial gains in the required area and power efficiency [22, 23]. In another prominent line of work, researchers have proposed various stochastic decoding algorithms [10, 35, 33, 34, 24, 19, 20, 31]. They are all based on stochastic representation of the SP messages. More precisely, messages are encoded via Bernoulli sequences with correct marginal probabilities. As a result, the structure of check and variable nodes are substantially simplified and the wiring complexity is significantly reduced. (Benefits of such decoders are discussed in more details in Section 3.2.) Stochastic messagepassing have also been used in other contexts, among which are distributed convex optimization and learning [17, 13], efficient belief propagation algorithms [25, 26], and efficient learning of distributions [32].
Although experimental results have proved stochastic decoding extremely beneficial, to date mathematical understanding of such decoders are very limited and largely missing from the literature. Since the output of stochastic decoders are random by construction, it is natural to ask the following questions: how does the stochastic decoder behave on average? can it be tuned to approach the performance of SP and if so how fast? is the average performance typical or do we have a measure of concentration around average? The main contribution of this paper is answering these questions by providing theoretical analysis for stochastic decoders. To that end, we propose a novel algorithm, referred to as Markov based stochastic decoding (MbSD), which is amenable to theoretical analysis. We provide quantitative bounds on the first and second moments of the error in terms of the underlying parameters for treestructured (cycle free) as well as general factor graphs, showing that the performance of MbSD converges to that of SP.
The remainder of this paper is organized as follows. We begin in Section 2 with some background on factor graph representation of LDPC codes, the sumproduct algorithm, and stochastic decoding. In Section 3, we turn to our main results by introducing the MbSD algorithm followed by some discussion on its hardware implementation and statements of our main theoretical results (Theorems 1, and 2). Section 4 is devoted to proofs, with some technical aspects deferred to appendices. Finally in Section 5, we provide some experimental results, confirming our theoretical predictions.
2 Background and Problem Setup
In this section, we setup the problem and provide the necessary background.
2.1 Factor Graph Representation of LDPC Codes
A lowdensity paritycheck code is a linear errorcorrecting code, satisfying a number of parity check constraints. These constraints are encoded by a sparse paritycheck matrix . More specifically, a binary sequence is a valid codeword if and only if , where all operations are module two [30]. A popular approach for modeling LDPC codes is via the notion of factor graphs [18]. A factor graph representing an LDPC code with the paritycheck matrix is a bipartite graph , consisting of a set of variable nodes , a set of check nodes , and a set of edges connecting variable and check nodes . (In this paper, we use letters , and to denote variable and check nodes respectively.) A typical factor graph representing an LDPC code (the Hamming code) is illustrated in Figure 1.
2.2 The SumProduct Algorithm
Suppose a transmitter sends the codeword to a receiver over a memoryless, noisy communication channel. Some channel models that are commonly used in practice include the additive white Gaussian noise (AWGN), the binary symmetric channel, and the binary erasure channel. Having received the impaired signal , the receiver attempts to recover the original signal by finding either the global maximum aposteriori (MAP) estimate , or the bitwise MAP estimates , for .
Without exploiting the underlying structure of the code, or equivalently its factor graph, the MAP estimation is intractable and requires an exponential number of operations in the code length. However, this problem can be circumvented using an algorithm called the sumproduct (SP), also known as the belief propagation algorithm. The SP is an iterative algorithm consisting of passing messages, in the form of probability distributions, between nodes of the factor graph [2, 18]. It is known to converge to the correct bitwise MAP estimates for cyclefree factor graphs; however, on loopy graphs, which includes almost all practical LDPC codes, such a guarantee no longer exists. Nonetheless, the SP algorithm has been shown to be extremely accurate and effective in practice [21, 30].
We now turn to the description of the SP algorithm. For every variable node let denote the set of its neighboring check nodes. Similarly define , the set of neighboring variable nodes for every check node . The SP algorithm allocates two messages to every edge , one for each direction. At each iteration , every variable node (check node ), calculates a message (message ) and transmit it to its neighboring check node (variable node ). In updating the messages, every variable node takes into account the incoming messages from its neighboring check nodes as well as the information from the channel, namely . With this notation at hand, the description of the SP algorithm is as follows: initialize messages from variable to check nodes, , and update messages for each edge and iteration according to
(1) 
and
(2) 
Information flow on a factor graph is shown in Figure 2. Upon receiving all the incoming messages, variable node update its marginal probability
Accordingly, the receiver estimates the th bit by , where is the indicator function. It should also be mentioned that in practice, in order to reduce the quantization error, loglikelihood ratios are mostly used as messages. Moreover, to further simplify the SP algorithm, the check node operation (1) is approximated. The resultant is known as the MinSum algorithm [8].
\psfrag*i* \psfrag*j1* \psfrag*j2* \psfrag*j3* \psfrag*a*  \psfrag*i* \psfrag*b* \psfrag*a* 

(a)  (b) 
2.3 Stochastic Decoding of LDPC Codes
Stochastic computation in the context of LDPC decoding was first introduced in 2003 by Gaudet and Rapley [10]. Ever since, much research has been conducted in this field and numerous stochastic decoders have been proposed [35, 33, 34, 24, 20, 31]. For instance, Tehrani et al. [33] introduced and exploited the notions of edge memory and noise dependent scaling in order to make the stochastic decoding a viable method for long, practical, LDPC codes. Estimating the probability distributions via a successive relaxation method, LeducePrimeau et al. [20] proposed a scheme with improved decoding gain. More recently, Sarkis et al. [31] extended the stochastic decoding to the case of nonbinary LDPC codes.
The underlying structure of all these methods and most relevant to our work, however, are the following: they all encode messages by Bernoulli sequences, they all consist of ‘decoding cycles’ which should not be confused with SP iterations (roughly speaking, multiple decoding cycles correspond to one SP iteration.), the check node operation is the moduletwo sum (i.e. the message transmitted from a check node to a variable node is equal to the moduletwo sum of the incoming bits.), and finally the variable node operation is the equality (i.e. the message transmitted from a variable node to a check node is equal to one if all incoming bits are one, it is equal to zero if all incoming bits are zero, and it is equal to the previous decoding cycle’s bit in case incoming messages do not agree.). The intuition behind the stochastic variable and check node operations can be obtained from the inspection of SP message updates (1) and (2). More specifically, suppose , for , are independent Bernoulli random variables with distributions . Then , derived from equation (1), becomes the probability of having odd number of ones in the sequence (see Lemma 1 in the paper [9]). Therefore, the statistically consistent estimate of the check to variable node message is the moduletwo summation of the incoming bits. Similarly, to understand the stochastic variable node operation, let and , for , be independent Bernoulli random variables with probability distributions , and . Then , derived from equation (2), becomes the probability of the event , conditioned on the event , thus supporting the intuition that one must transmit the common value from variable to check nodes in case all incoming bits are equal.
3 Algorithm and Main Results
In this section, we introduce the MbSD algorithm, discuss its hardware design aspect, and state some theoretical guarantees regarding its performance.
3.1 The Proposed Stochastic Algorithm
The MbSD algorithm consists of passing messages between variable and check nodes of the factor graph. These messages are dimensional binary vectors, for a fixed (design parameter). However, variable and check node updates are elementwise, bit operations. Before stating the algorithm we need to define some notation.
Suppose is the received codeword with the likelihood , for . Our algorithm, involves messages from the channel to variable nodes at every iteration . More specifically, let be the dimensional binary message from the channel to the variable node at time , with independent and identically distributed (i.i.d.) entries
Moreover, let denote the dimensional binary message from the variable node to the check node at time . Similarly, let be the message from the check node to the variable node at time .
We also need to define the elementwise, moduletwo summation operator , as well as the “equality” operator . Suppose are arbitrary dimensional binary vectors. Then, the vector denotes the moduletwo summation of the vectors , if and only if
for all . Furthermore, by we mean
(3) 
for all . Here, we assume is
either zero or one, equally likely.
Now, the precise description of the MbSD algorithm is as follows:

Initialize messages from variable nodes to check nodes at time by .

For iterations , and every edge :

Update messages from check nodes to variable nodes

Update messages from variable nodes to check nodes by following these steps:

compute the auxiliary variable

update the message entries , for , by drawing i.i.d. samples from the set


Compute the binary vector and update the marginal estimates according to
(4) for all .

Few comments, regarding the interpretation of the algorithm, are worth mentioning at this point. The check to variable node message update (step (a)) is a statistically consistent estimate of the actual check to variable BP message update. However, same can not be stated about the variable to check node update (step (b)). As will be shown in Section 4, the equality operator generates Markov chains with desirable properties, thereby, justifying the “Markov based stochastic decoding” terminology. More specifically, the sequence is a Markov chain with the actual variable to check BP message as its stationary distribution. Our objective in step (b) is to estimate this stationary distribution. From basic Markov chain theory, we know that the marginal distribution of a chain converges to its stationary distribution. Therefore, for large enough , the empirical distribution of the set becomes an accurate enough estimate of the stationary distribution of the Markov chain .
3.2 Discussion on Hardware Implementation
The proposed decoding scheme enjoys all the benefits of traditional stochastic decoders [10, 33]. Since messages between variable and check nodes are binary, stochastic decoding requires a substantially lower wiring complexity compared to fullyparallel sumproduct or minsum implementations. Shorter wires yield smaller circuit area, and smaller parasitic capacitance which in turn lead to higher clock frequencies and less power consumption. Another advantage of stochastic decoding algorithms is the very simple structure of check and variable nodes. As a matter of fact, check nodes can be carried out with simple XOR gates, and variable nodes can be implemented using a combination of a random number generator, a JK flip flop, and AND gates [10]. Finally, a very beneficial property of stochastic decoding is the fact that the check node operation (XOR) is associative, i.e., can be partitioned arbitrarily without introducing any additional error. Mohsenin et al. [23], illustrated that partitioning check nodes can provide significant improvements (by a factor of four) in area, throughput, and energy efficiency.
It should be noted that in this paper and for mathematical convenience, the messages between check and variable nodes are represented by binary vectors. However, to implement the MbSD algorithm, there is no need to buffer all these vectors. We only need to count the number of ones between bits , and in each bulk, which can be accomplished by a simple counter. In that respect, MbSD has a great advantage compared to the algorithm proposed by Tehrani et al. [33], which requires buffering a substantial number of bits on each edge (edge memories). As will be discussed in Section 5, our algorithm has a superior bit error rate performance compared to [33], while maintaining the same order of maximum number of clocks, thereby achieving comparable if not better throughput. Moreover, MbSD is equipped with concrete theoretical guarantees, the subject to which we now turn.
3.3 Main Theoretical Results
Our results concern both cases of treestructured (cycle free) as well as general factor graphs. Since factor graphs of randomly generated LDPC codes are locally treelike [30], understanding the behavior of every decoding algorithm (stochastic as well as deterministic) on trees is of paramount importance. To that end, we first state some quantitative guarantees regarding the performance of the proposed stochastic decoder on treestructured factor graphs.
Recalling the fact that there exists a unique path between every two variable nodes in a tree, we denote the largest path (also known as the graph diameter) by . Moreover, we know that estimates generated by the SP algorithm on a tree converge to true marginals after iterations [2], i.e., denoting true marginals by , we have , for all , and .
Theorem 1 (Trees).
Consider the sequence of marginals , , generated by the MbSD algorithm on a treestructured factor graph. Then for arbitrarily small but fixed parameter , and sufficiently large we have:

The expected stochastic marginals become arbitrarily close to the true marginals, i.e.,
for all .

Furthermore, we have
Remarks:
Theorem 1 provides quantitative bounds on the first and second moments of the MbSD marginal estimates. Combining parts (a), and (b), it can be easily observed that
(5) 
for all . Therefore, as
(), the sequence of estimates
(ranging over ) converges to the true marginal
in the sense. The rate of
convergence, and its dependence on the underlying parameters, is fully
characterized in expression (17). It is directly a
function of the accuracy, and the factor graph structure (diameter,
node degrees, etc.), and indirectly (through Lipschitz constants,
etc.) a function of the signal to noise ratio (SNR).
We now turn to the statement of results for LDPC codes with general (loopy) factor graphs. Unlike treestructured graphs, the existence and uniqueness of the SP fixed points on general graphs is not guaranteed, nor is the convergence of SP algorithm to such fixed points. Therefore, we have to make the assumption that the LDPC code of interest is well behaved. More precisely, we make the following assumptions:
Assumption 1.
Suppose the SP message updates are consistent, that is , and as for all directed edges , and . Equivalently, there exists a sequence such that , for all .
For an accuracy parameter , arbitrarily small, we define the stopping time
(6) 
According to assumption 1, the stopping time is always finite.
Theorem 2 (General factor graphs).
Consider the marginals generated by the MbSD algorithm on an LDPC code that satisfies Assumption 1. Then for arbitrarily small but fixed parameter , and sufficiently large we have:

The expected stochastic marginals become arbitrarily close to the SP marginals, i.e.,
for all , and .

Furthermore, we have
Remarks:
Theorem 2, in contrast to Theorem 1, provides quantitative bounds on the error over a finite horizon specified by the stopping time (6). After iterations, the marginal estimates become arbitrarily close to the true marginals on average; in particular, we have
Moreover, since , as , the random variables , become more and more concentrated around their means. Specifically, a very crude bound^{1}^{1}1Tightening this bound exploiting Chernoff inequality and concentration of measure [4], can be further explored. using Chebyshev inequality [12] yields
Therefore, it is expected that the performance of the proposed stochastic decoding converges to that of SP, as .
4 Proof of the Main Results
Conceptually, proofs of Theorems 1 and 2 are very similar. Therefore, in this section, we only prove Theorem 1 and highlight its important differences with Theorem 2 in Appendix D.
Poofs make use of basic probability and Markov chain theory. At a high level, the argument consists of two parts: characterizing the expected messages and controlling the error propagation in the factor graph. As it turns out, the check node operations (module two summation ) are consistent on average, that is expected messages from check to variable nodes are the same as SP messages. In contrast, the variable node operations (equality operator ) are asymptotically consistent (as ). Therefore, for a finite message dimension , variable node operations introduce error terms which become propagated throughout the factor graph. The main challenge is to characterize and control these errors.
4.1 Proof of Part of Theorem 1
We start by stating a lemma which plays a key role in the sequel. Recall the definition of the equality operator from (3).
Lemma 1.
Suppose , for , are stationary, independent, and identically distributed binary sequence with . Then assuming , the binary sequence forms a timereversible Markov chain with the following properties:

The transition probabilities are
(7) (8) 
The stationary distribution is equal to
The proof of this lemma is straight forward and is deferred to Appendix A. Now let be the expected message from the variable node to the check node . By construction and the fact that the variables are i.i.d., the expected value is independent of . Similarly define , the expected message from the check node to variable node . Taking expectation on both sides of the equation (4), we obtain
(9) 
Therefore, in order to upperbound the expected marginal , we need to calculate the probabilities , for . From Lemma 1, we know that the sequence is a Markov chain with the following transition probabilities:
(10) 
where , and are multivariate functions, taking values in the space . Recalling the basic Markov chain theory, we can calculate the probability in terms of the stationary distribution, the iteration number, and the second eigenvalue^{2}^{2}2It is not hard to see that the second eigenvalue of the transition matrix of the Markov chain is equal to . of the transition matrix [12]. Doing some algebra, we obtain
(11) 
Substituting equation (4.1) into (9), doing some algebra simplifying the expression, and exploiting the facts
and
(12) 
yields
On the other hand, denoting
(13) 
by definition we have
Since the multivariate function is Lipschitz, assuming
for some positive constant , there exists a constant such that
(14) 
Subsequently, in order to upperbound the error we need to bound the difference between expected stochastic messages and SP messages, i.e. . The following lemma, proved in Appendix B, addresses this problem.
Lemma 2.
On a treestructured factor graph and for sufficiently large , there exists a fixed positive constant such that
(15) 
for all , and . Furthermore, denoting the maximum check and variable node degrees by , and , respectively, we have
(16) 
for all .
4.2 Proof of Part of Theorem 1
To stramline the exposition, let , for fixed , and . As previously stated, the sequence is a Markov chain with initial state , , and transition probabilities , and ; more specifically we have
Since , in order to upperbound the variance
we only need to upperbound the crossproduct terms. Doing so, for , we have
Now, exploiting the Markov property and equation (4.1), we can further simplify the aforementioned inequality
where inequality (i) follows from (12) and the fact that . According to Lemma 2, for sufficiently large , we have . Therefore, putting the pieces together doing some algebra, we obtain
for all , and .
5 Experimental Results
(a)  (b) 
To confirm our theoretical predictions, we test the MbSD algorithm on a simple LDPC code. In our experiments, we set the blocklength (variable nodes) and number of parity checks (check nodes) to be and , respectively. Using Gallager’s construction [30], we first generate a regular (3, 6)LDPC code, that is, all variable nodes have degree three, whereas all check nodes have degree six. Then, considering a binary pulseamplitude modulation (BPAM) system over an AWGN channel, we run MbSD, for iterations, on several simulated signals in order to compute the bit error rate for different values of normalized signal to noise ratio.^{3}^{3}3A BPAM system with transmit power one over an AWGN channel with noise variance has the energy per bit to noise power spectral density (Eb/No) of , where is the code rate. The test is carried out for a number of message dimensions , and results are compared with the SP algorithm and the stochastic decoding (SD) proposed by Tehrani et al. [33] (see Figure 3 (a)).^{4}^{4}4In simulating the SD algorithm, we used 30000 ‘decoding cycles’ and edge memory of length 25 without noise dependent scaling. As predicted by our theorems, the performance of the MbSD converges to that of SP as grows. Therefore, in contrast to the SD algorithm, MbSD is an asymptotically consistent estimate of the SP and does not suffer from error floor. The rate of convergence, on the other hand, can be further explored in Figure 3 (b), wherein the bit error rate gap (i.e., the difference between SP and MbSD bit error rates) versus the message dimension is illustrated. As can be observed, the error curves in the loglog domain plot are roughly linear with slope one. This observation is consistent with equation (3.3), suggesting the upperbound of for the rate of convergence.
To improve upon the seemingly slow rate of convergence, we make use of the notion of noise dependent scaling (NDS). Sensitivity to random switching activities, referred to as ‘latching’, has been observed to be a major challenge in stochastic decoders [35, 33]. To circumvent this issue, the notion of NDS, in which the received loglikelihoods are downscaled by a factor proportional to the SNR, was proposed and shown to be extremely effective [33]. The MbSD algorithm suffers from the latching problem too, especially for high SNR values. Intuitively, sequences generated by Markov chains (recall Lemma 1) are likely to be the allone or the allzero sequences when the SNR is sufficiently high. As a consequence, in such cases, the positive constant , defined in (15), is more likely to be close to zero. The rate of convergence of the expectation, specified in equation (17), is inversely proportional to , therefore, the smaller the , the slower the rate of convergence. Resolving this issue requires increasing the switching activities of Markov chain, which is accomplished by the NDS. Figure 4, illustrates the bit error rate versus Eb/No for the SP, the SD using NDS, and the MbSD using NDS.^{5}^{5}5In our simulations we set the NDS scaling parameter to be , the optimum choice as suggested in the paper [33]. As is evident, the rate of convergence of the MbSD algorithm, and thus its performance, is significantly improved. Moreover, having the same number of decoding cycles, MbSD outperforms the SD for high SNRs.
6 Conclusion
In this paper, we studied the theoretical aspect of stochastic decoders, a widely studied solution for fullyparallel implementation of LDPC decoding on chips. Generally speaking, encoding messages by binary sequences, stochastic decoders simplify check and node message updates by modulotwo summation and the equality operator, respectively. As it turns out, the check node operation is statistically consistent on average, whereas, the variable node equality operation generates a Markov chain with the desired quantity as its stationary distribution. Therefore, for a finite message dimension , the stochastic message updates introduce error terms which become propagated in the factor graph. Controlling these errors is the main challenge in the theoretical analysis of stochastic decoders. To formalize these notions, we introduced a novel stochastic algorithm, referred to as the Markov based stochastic decoding, and provided concrete theoretical guarantees on its performance. More precisely, we showed that expected marginals produced by the MbSD become arbitrarily close to marginals generated by the SP algorithm on treestructured as well as general factor graphs. The rate of convergence is governed by the message dimension, the graph structure, and the Lipschitz constant, formally specified in equation (17). Moreover, we proved that the variance of MbSD marginals are upperbounded by . These theoretical predictions were also supported by experimental results. We showed that, maintaining the same order of decoding cycles, our algorithm does not suffer from error floor; therefore, it achieves better bit error rate performance compared to other competing methods.
Acknowledgements
Authors would like to thank Aman Bahtia for providing the C++ code, simulating sumproduct and stochastic decoding algorithms.
Appendix A Proof of Lemma 1
By definition we have
Therefore, given and regardless of the sequence , the event is equivalent to . Therefore, we have
where equality (i) follows from the i.i.d. nature of the sequence. The exact same argument yields the equation (8). Finally, the stationary distribution can be obtained from the detailed balance condition [12]
Appendix B Proof of Lemma 2
Recall binary messages , and from steps 2(a) and 2(b) of the main algorithm. Also recall the definition of expected messages , and . From Lemma 1 of the paper [9], we know that
(18) 
On the other hand, by construction we have
where . Since, according to Lemma 1, the sequence forms a Markov chain with transition probabilities
basic Markov chain theory yields
Therefore, doing some algebra, noticing the facts that
and
we have