Design and Analysis of LDGM-Based Codes for MSE Quantization

# Design and Analysis of LDGM-Based Codes for MSE Quantization

Qingchuan Wang,  Chen He,  The authors are with Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China. E-mail: {r6144,chenhe}@sjtu.edu.cn. This paper was supported in part by National Natural Science Foundation of China Grant No. 60772100 and in part by Science & Technology Committee of Shanghai Municipality Grant No. 06DZ15013. Part of the material in this paper has been presented in [1] at IEEE Global Communications Conference, Washington, DC, November 2007.
###### Abstract

Approaching the 1.5329-dB shaping (granular) gain limit in mean-squared error (MSE) quantization of is important in a number of problems, notably dirty-paper coding. For this purpose, we start with a binary low-density generator-matrix (LDGM) code, and construct the quantization codebook by periodically repeating its set of binary codewords, or them mapped to -ary ones with Gray mapping. The quantization algorithm is based on belief propagation, and it uses a decimation procedure to do the guessing necessary for convergence. Using the results of a true typical decimator (TTD) as reference, it is shown that the asymptotic performance of the proposed quantizer can be characterized by certain monotonicity conditions on the code’s fixed point properties, which can be analyzed with density evolution, and degree distribution optimization can be carried out accordingly. When the number of iterations is finite, the resulting loss is made amenable to analysis through the introduction of a recovery algorithm from “bad” guesses, and the results of such analysis enable further optimization of the pace of decimation and the degree distribution. Simulation results show that the proposed LDGM-based quantizer can achieve a shaping gain of 1.4906 , or 0.0423  from the limit, and significantly outperforms trellis-coded quantization (TCQ) at a similar computational complexity.

granular gain, shaping, LDGM, source coding, decimation, belief propagation, density evolution, performance-complexity tradeoff

## I Introduction

The mean-squared error (MSE) quantization problem of [2, Sec. II-C] can be formulated as follows:111Notational conventions: and are respectively the set of integers and real numbers. is the Euclidean norm. is the cardinality of set . denotes asymptotic equality, usually with respect to block length . , entropy and mutual information are computed in base-2, while and are base-. Bold letters denote sequences or vectors whose elements are indicated by subscripts, e.g. , and sub-sequences are denoted by or . Addition and multiplication on sets apply element-by-element, e.g. . (or simply ) is defined as the unique element of , and similarly or is the unique element of . The unit “” means “bits per symbol”. let be a discrete subset of (the quantization codebook, or simply code)222Ref. [2] assumes that is a lattice, but in practice neither the trellis in TCQ nor the non-binary codebooks proposed here are lattices. Therefore, we allow to be any discrete set, and definitions are modified accordingly. and be a quantizer that maps each to a nearby codeword . The mean-square quantization error, averaged over , is given by

 σ2=limsupM→∞1(2M)n⋅1n∫[−M,M]n∥y−QΛ(y)∥2dy. (1)

The objective is to design and a practical quantizer such that the scale-normalized MSE is minimized,333This agrees with the definition of for lattices in [2]. where is the codeword density

 ρ=limsupM→∞1(2M)n|Λ∩[−M,M]n|. (2)

In this paper we consider asymptotically large dimensionality . By a volume argument, it is easy to find a lower bound for . This bound can be approached by the nearest-neighbor quantizer with a suitable random codebook e.g. in [2], whose codewords’ Voronoi regions are asymptotically spherical, but such a quantizer has exponential complexity in and is thus impractical. The simplest scalar quantizer , on the other hand, has the 1.5329-dB larger , which corresponds to the well-known 1.53-dB loss of scalar quantization. In general, we call the shaping loss of a quantizer, and it is also the gap of the granular gain and shaping gain defined in [3], for source and channel coding respectively, toward the 1.53-dB limit.

MSE quantizers with near-zero shaping losses are important in both source and channel coding. In lossy source coding, the shaping loss naturally dictates rate-distortion performance at high rates [3]. In channel coding on Gaussian channels, MSE quantizers can be used for shaping to make the channel input closer to the optimal Gaussian distribution [4]. Basically, instead of transmitting the channel-coded and QAM-modulated signal (each element of corresponding to one symbol in the code block), we transmit with , which should be closer to Gaussian. and are separated at the receiver side, and the shaping loss determines the achievable gap from channel capacity at high SNRs. Shaping is particularly important in dirty-paper coding (DPC) [5] on the channel

 y=x+s+z, (3)

where is the transmitted signal, is the interference known only at the transmitter, and is the “MMSE-adjusted” noise. Using an MSE quantizer, arbitrarily large can be pre-cancelled without significantly increasing signal power by transmitting

 x=u−s−a, with a=QΛ(u−s), (4)

 y=u−a+z. (5)

Again, the receiver must separate and , and the shaping loss determines the achievable gap from channel capacity. In this case, however, due to the lack of receiver-side knowledge of , the rate loss caused by non-ideal shaping is most significant at low SNRs and can be a significant fraction of channel capacity [6, 7, 8, 9]. For example, the shaping quantizer in [9] has 0.15  shaping loss, corresponding to a rate loss of 0.025 , yet in the 0.25-b/s DPC system this is already 10% of the rate and is responsible for 0.49  of its 0.83-dB gap from capacity. Apart from its obvious application in steganography [10], DPC and its extension to vector channels (similar in principle to vector precoding [11] but done in both time and spatial domains) are also essential in approaching the capacity of vector Gaussian broadcast channels such as MIMO downlink, therefore the design of near-ideal MSE quantizers is of great interest in these applications.

Currently, near-optimal MSE quantizers usually employ trellis-coded quantization (TCQ) [12], in which or with being respectively the codeword set of a binary convolution code or a 4-ary trellis code. The number of required trellis states increases very rapidly as the shaping gain approaches the 1.53-dB limit, and the computational complexity and memory requirement are thus very high. This is particularly bad at the receiver side of DPC systems, where the BCJR (Bahl-Cocke-Jelinek-Raviv) algorithm must be run many times on the trellis in an iterative fashion to separate and [9], resulting in a time complexity proportional to both the number of trellis states and the outer iteration count.

Inspired by the effectiveness of Turbo and low-density parity-check (LDPC) codes in channel coding, it is natural to consider the use of sparse-graph codes in quantization. In [13] Turbo codes are used in quantization of uniform sources, but convergence issues make the scheme usable only for very small block sizes , and the shaping loss is thus unsatisfactory. In [14, 15, 16], it is shown that low-density generator matrix (LDGM) codes, being the duals of LDPC codes, are good for lossy compression of binary sources, and practical quantization algorithms based on belief propagation (BP) and survey propagation (SP) have also been proposed in [17] and [18], but these works consider binary sources only. Practical algorithms for the MSE quantization of with LDGM codes have not received much attention before. Even in the binary case, little has been done in the analysis of the BP quantizer’s behavior and the optimization of the LDGM code for it.

In [1], we have addressed the problem of MSE quantization using LDGM-based codes of structure , known as -ary codes, where each is a codeword of a binary LDGM code when , and is the combination of two codewords, each from a binary LDGM code, by Gray mapping when . The degree distributions of the codes are optimized under the erasure approximation, and shaping losses as low as 0.056  have been demonstrated.

In this paper, we will improve upon the results in [1] by using better analytical techniques and more accurate methods for code optimization. We start in Section II by analyzing the minimum shaping loss achievable by this -ary structure using random-coding arguments. Although binary quantization codes have significant random-coding loss, they are analyzed first due to their simplicity. In Section III, we present the quantization algorithm for binary codes, which consists, like [18], of BP and a guessing (“decimation”) procedure to aid convergence.

Like LDPC, degree distribution plays an important role in the performance of LDGM quantization codes, but the use of decimation makes direct analysis difficult. To solve this problem, we propose the typical decimator (TD) as a suboptimal but analytically more tractable version of the decimation algorithm, and analyze first its use in the simpler binary erasure quantization (BEQ) problem in Section IV, which also forms the basis for the erasure approximation in [1]. We find that the TD can obtain asymptotically correct extrinsic information for decimation, and a solution to BEQ can be found with such information, as long as the code’s extended BP (EBP) extrinsic information transfer (EXIT) curve [19] characterizing the fixed points of the BP process satisfies certain monotonicity conditions. For a given LDGM code, the most difficult BEQ problem it can solve is then parametrized by a monotonicity threshold , and the degree distribution can be optimized by maximizing this .

In Section V, these arguments are extended to our MSE quantization problem, and similar monotonicity conditions are obtained, which can be checked by quantized density evolution (DE). These DE results can be visualized with modified EXIT curves, and a similar method to the BEQ case can then be used for degree distribution optimization.

We have assumed iteration counts in the above analysis. In Section VI, we proceed to analyze the impact of finite . We will show that a finite causes “bad” guesses in decimation, and a recovery algorithm is sometimes required for BP to continue normally afterwards. With recovery, the loss due to finite can be characterized by the delta-area between the EBP curve and the actual trajectory, which will be used in the subsequent optimization of the pace of decimation as well as the degree distribution.

All these results are extended to -ary codes (where ) in a straightforward manner in Section VII. Numerical results on MSE performance in Section VIII shows that LDGM quantization codes optimized with the aforementioned methods have the expected good performance and can achieve shaping losses of 0.2676  at 99 iterations, 0.0741  at 1022 and 0.0423  at 8356 iterations, the latter two of which are far better than what TCQ can reasonably offer and are also significantly better than the results in [1]. Indeed, a heuristical analysis on the asymptotic loss-complexity tradeoff carried out in Section IX indicates that LDGM quantization codes can achieve the same shaping loss with far lower complexity than TCQ. We conclude the paper in Section X.

## Ii Performance Bounds of m-ary Quantizers

In this paper, we consider with a periodic structure , where is a set of codewords from with each labeled by a binary sequence . We call an -ary rate- quantization code. In this section, we will analyze the achievable shaping loss by this periodic structure.

Given the source sequence , for each the nearest sequence to in is , where is the quantization error and . The quantizer has then to minimize over all ’s, or equivalently, to maximize

 qy(b)=e−t∥z(b)∥2=n∏j=1e−t(yj−uj(b))2I (6)

for some constant . The chosen is denoted , the corresponding quantization error is , and the resulting MSE (1) then becomes444For large , (7) is mostly just an average over strongly typical with respect to the uniform distribution on , i.e. those whose elements are approximately uniformly distributed over , and the rest of this paper considers such only. In shaping and DPC applications, can be a modulated signal that does not follow the uniform distribution, and in such cases it may be necessary to “dither” before quantization by adding to it a random sequence uniformly distributed in and known by the dequantizer, in order to obtain the expected MSE performance.

 (7)

where denotes averaging over .

### Ii-a Lower Bound of Quantization Error

Given , for each source sequence , let

 Qy=∑b∈{0,1}nRqy(b). (8)

Since , we can lower-bound the mean-square quantization error as

 1n∥∥zy∥∥2=−1ntlnqy(by)≥−1ntlnQy. (9)

Now let with and , and average over , then from Jensen’s inequality

 1n⟨∥∥z~y+a∥∥2⟩≥−1nt⟨lnQ~y+a⟩≥−1ntln⟨Q~y+a⟩, (10)

where can easily be found to be

 ⟨Q~y+a⟩=2nRmnn∏j=1Q~yj,with Q~y=m−1∑a=0e−t(~y+a)2I. (11)

in (7) can be lower-bounded by integrating (10) over . For asymptotically large , we only need to consider (strongly) typical with respect to the uniform distribution on , i.e. whose elements are nearly uniformly distributed over . We thus have

 σ2 ≥−1nt∫[0,1)nln⟨Q~y+a⟩d~y (12) ≐1t(lnm−Rln2−∫10lnQ~yd~y). (13)

This bound holds for any and is found to be tightest for satisfying (this is hence denoted ), when it becomes . and are defined as

 Ht =−∫Ipz(z)logpz(z)dz, (14) Pt =∫Iz2pz(z)dz, (15) pz(z) =e−tz2Q~y,z∈I,~y=zmod[0,1). (16)

### Ii-B Achievable Quantization Error with Random Coding

For asymptotically large , we will see that the aforementioned lower bound is actually achievable by random coding, that is, with the codewords in independently and uniformly sampled from (allowing for duplicates) and using the nearest-neighbor quantizer.

Again we assume , and since the MSE is bounded for any , we can consider only typical ’s with respect to the uniform distribution on . Define

 Uy={u∈{0,…,m−1}n∣∣∥(y−u)In∥2≤nPt} (17)

as the set of possible codewords that are “sufficiently close” to , and we can compute with large deviation theory. If it is larger than , with asymptotically high probability , thus some can be found whose MSE toward is no more than . Since this is true for most typical , the average MSE cannot exceed by more than a vanishingly small value.

To compute for a typical , we define the type of a sequence as the fraction of each at the positions in whose corresponding elements in are approximately . Denoting the number of sequences with this type as , we have

 1nlogN[py(u)]≐1m∫m0Hy(u)dy, (18)

where is the entropy

 Hy(u)=−m−1∑u=0py(u)logpy(u), (19)

and becomes the constraint

 1m∫m0(m−1∑u=0(y−u)2Ipy(u))dy≤Pt. (20)

According to large deviation theory, is asymptotically the maximum of (18) under the constraints (20) and

 py(u)≥0,m−1∑u=0py(u)=1,y∈[0,m). (21)

This is a convex functional optimization problem over (a function of both and ), which can be easily solved with Lagrange multipliers. The maximizing is found to be

 py(u)=e−t(y−u)2IQ~y,~y=ymod[0,1), (22)

and the resulting

 1nlog∣∣Uy∣∣≐Ht. (23)

By the argument above, as long as , i.e. , random coding can achieve for asymptotically large .

### Ii-C Marginal Distribution of Quantization Error

From the result in (22), the marginal distribution of an individual under random coding can also be obtained as

 pz(z) =1m∫m0(m−1∑u=0py(u)δ(z−(y−u)I))dy (24) =1m∫m0(m−1∑u=0e−tz2Q~yδ(z−(y−u)I))dy (25) =e−tz2Q~y,z∈I, (26)

where is the Dirac delta function and . This is simply the in (16), and in (14) and in (15) are respectively the entropy and average power of this distribution.

### Ii-D The Random-Coding Loss

We have shown that a random quantization codebook with the nearest-neighbor quantizer is asymptotically optimal among rate- quantization codes of the form . Therefore, its shaping loss represents the performance limit of such codes, and can be viewed as the cost incurred by the period- structure.

For asymptotically large , the random -ary quantizer has average MSE with and density , so the achieved . The shaping loss can then be expressed as , where is the power of a Gaussian with entropy . We called it the random-coding loss, and it is plotted in Fig. 1 for and . For large and moderate , in (11) approaches a constant, is close to a Gaussian distribution, thus and the random-coding loss is close to zero.

## Iii The Binary LDGM Quantizer

As random quantization codes with the nearest-neighbor quantizer are obviously impractical to implement, it is natural to look into sparse-graph codes as practical candidates for achieving near-zero shaping losses. In [14], it has been shown that LDPC codes are unsuitable for BEQ but LDGM codes work well, therefore we will also use LDGM codes in MSE quantization. We consider the simplest case first, and in Section VII we will look into codes with larger that are not as limited by the random-coding loss.

We thus consider with being the codeword set of an LDGM code, i.e. each is of the form , where , and the low-density generator matrix is randomly generated from some degree distribution that will be optimized below. Given such a code, in (6) can be represented by the factor graph [20] in Fig. 2(a).555In the factor graph, symbols such as and denote variable and factor nodes, while and are the variables themselves. denote the set of indices for which there is an edge connecting and . In belief propagation, is the priors on variable , is the computed extrinsic probabilities for , denotes a message from node to , and so on. The priors, posteriors and messages are all probability distributions [20], in this case over , and here we represent them by probability tuples (rather than -values, which are equivalent). For example, is viewed as a tuple satisfying (the normalization is done implicitly), which corresponds to -value . “” and “” refer to the variable-node and check-node operations in LDPC literature, i.e.  (implicitly normalized) and . , and are respectively the “sure-0”, “sure-1” and “unknown” messages. is the entropy function for . The -nodes (shorthand for the factor nodes , ) represent the relationship , whereas each factor in (6) is included in the prior on variable as

 λcj(c)=1Q~yje−t(yj−c)2I=pz((yj−c)I), (27)

where (with ) serves as the normalization factor.

The belief propagation algorithm (also known as the sum-product algorithm) can then be run on this factor graph. Unlike the case of LDPC decoding, here BP does not usually converge by itself.666Intuitively speaking, when doing LDPC decoding with SNR higher than threshold, the transmitted codeword is usually much closer to the received sequence (and thus much more likely) than any other codeword, allowing BP to converge it. In the case of quantization with LDGM codes, there are usually a large number of similarly close codewords to the source sequence, and BP cannot by itself make a decision among them. Instead, we rely on BP to generate “extrinsic probabilities” for each after a number of iterations, with which hard decisions are made on some ’s (called decimation following [17]). Subsequent BP iterations use these hard decisions as priors , and the resulting updated ’s are used for more decimations. This iterative process continues until a definite is obtained that hopefully has a large and thus a small quantization error. This quantization algorithm is shown in Fig. 3, with a BP part and a decimation part in each iteration. As is intuitively reasonable, each time we decimate the “most certain” bit , with

 (28)

and it is decimated to its most likely value

 b∗=argmaxb∈{0,1}νbi∗(b). (29)

This is called the greedy decimator (GD). Alternatively, for the convenience of analysis we will also look at the typical decimator, implementable but with worse performance in practice, in which the bit index to decimate is chosen randomly in (the set of yet undecimated bits) with equal probabilities, and its decimated value is with probability .

The number of bits to decimate is controlled through the estimated mutual information in -to- messages (i.e. the ’s), which is made to increase by about in each iteration. This amount of increase , possibly a function of the current and hence called the pace of decimation, makes the algorithm terminate within iterations if followed exactly, though the actual iteration count can be somewhat different. Uniform pacing is used in [1], i.e.  is a constant . In this paper, the pacing is optimized in Section VI-D to obtain somewhat better MSE performance. Increasing also improves MSE performance, but more iterations would be necessary.

The decimation algorithm can either be unthrottled or throttled. The unthrottled version used in most of our simulations simply decimates until the increase of in the iteration reaches . In the throttled version introduced in [1], the amount of decimation per iteration is instead controlled by , which is smoothly adapted, as shown in Fig. 3, to make increase eventually at the desired pace.

More will be said on the decimation algorithm in Section VI, but we will first discuss the optimization of LDGM’s degree distribution and the choice of in Sections IV and V.

## Iv Degree Distribution Optimization for Binary Erasure Quantization

Like LDPC codes, LDGM quantization codes require optimized degree distributions for good MSE performance. The performance of LDGM quantizers has been analyzed previously in [15] for binary sources, but this analysis, based on codeword-counting arguments, is applicable only to nearest-neighbor quantization and not very useful for the above BP quantizer. In [17]’s treatment of LDGM quantization of binary sources, degree distributions of good LDPC codes in [21] are used directly, inspired by the duality between source and channel coding in the erasure case [14]. In our previous work [1], LDGM degree distributions are instead designed by directly fitting the EXIT curves under the erasure approximation (EA), also known as the BEC (binary erasure channel) approximation [22]. Both methods perform well, but they are heuristic in their analysis of decimation, and may thus be suboptimal.

In this and the next section, we will give a detailed analysis on degree distribution optimization of BP-based LDGM quantizers that properly takes decimation into account, which should allow better MSE performance to be attained. Under the erasure approximation, we are in effect designing an LDGM quantization code for the simpler binary erasure quantization problem and using it in MSE quantization.777In this paper we only consider codes chosen randomly, through random edge assignment, from the LDGM code ensemble with a given degree distribution, therefore only the degree distribution is subjected to optimization, and we will not distinguish between codes and degree distributions. Therefore, we will first focus on BEQ in this section, and in Section V the methods given here will be extended to MSE quantization, with or without the erasure approximation.

### Iv-a Binary Erasure Quantization

The binary erasure quantization problem can be formulated as follows [14]. The source sequence has the form , where “” denotes erased positions and occurs with probability . A binary code consisting of codewords , each labeled by , is then designed according to and the rate . For each , the quantizer should find a codeword such that or for all , i.e.  agrees with on all non-erased positions. The number of non-erased positions in a given is denoted by , which is approximately for large . Ideally can be as small as this , i.e. , but in practice higher rates are necessary.

Similar to (6), can be defined as

 qy(b)=n∏j=1qyj(uj(b)),qyj(uj)={1yj=uj or ∗,0otherwise, (30)

and the quantizer can equivalently find, for a given , some such that (which then equals 1).

When is the codeword set of an LDGM code and as in Section III, can be described by the factor graph in Fig. 2(a) as well, where each is a normalized version of , i.e.  is , or if is respectively , , . Apart from this difference in , the algorithm in Fig. 3 with the typical decimator can be used here for the purpose of analysis, though the recovery algorithm in Section VI-B will be necessary for good performance in practice.

The BEQ problem may alternatively be viewed as a set of linear equations

 bGne=yne (31)

over the binary field , where and are the columns of and that correspond to non-erased positions of . Denoting by the rank of , (31) then has solutions for of the possible ’s, and for other ’s there is no solution at all.

We first assume that (31) has a set of solutions, then is a probability distribution for that is uniform over . Using this , similar to the BP-derived extrinsics , the true extrinsic probabilities of can now be defined as

 νb∗i(b)=py(bi=b|bF∖{i}=b∗F∖{i}),i=1,…,nb, (32)

which depends on the set of decimated bits and their decimated values . Note that can only be , , or : it is if all solutions with have , and otherwise there must be the same number of solutions with and with , making .

Without loss of generality, the typical decimator can be assumed to decimate in the order of . Decomposing into

 py(b)=py(b1)py(b2|b1)⋯py(bnb|bnb−11), (33)

each factor is then the after the decimation of into . We therefore construct the fictitious true typical decimator (TTD), which is just like the TD except that decimation of is done according to rather than . Moreover, the TTD shares the source of randomness with the TD, so decimation is still done in the order of , and each is decimated to the same value except to account for the difference between and .888For example, the TD and the TTD can use the same i.i.d. random sequence in decimation with each uniformly distributed in , and each is decimated to 0 in the TD if and in the TTD if , and to 1 otherwise. In this way, the decimation results are always the same if , and are rarely different if and are close. The TTD in effect samples a according to the probability distribution , so it must yield a random solution . If, for every , the TD at the time of ’s decimation has , then it will run synchronously with the TTD and yield the same solution in . Otherwise, e.g. if and for some , then the TD might decimate to 1, which will eventually result in a contradiction. Therefore, our first requirement for TD to find a solution to (31) is that BP must compute the correct extrinsic probabilities after enough iterations, which is hence called the extrinsic probability condition.

How, then, to ensure the existence of solutions to (31) for any ? We may define with (8) which, for each , gives the number of solutions to (31) and is for ’s and zero for the rest. , if normalized by , is again a uniform distribution over these ’s. We then require , making a uniform distribution over all possible ’s, so that the BEQ problem have solutions for any . This is the other condition for BEQ to be always solvable by the TD, hence called the equi-partition condition.

For , the two conditions above are now suitable for analysis with density evolution methods, which in the BEQ case can be accurately done with EXIT charts, as will be discussed in the following subsections.

### Iv-B Fixed Points and EXIT Curves

We use -regular, -irregular LDGM codes for quantization as suggested by the LDGM-LDPC duality in [14]. Let be the right-degree of all -nodes, and denote as the fraction of -nodes with left-degree and as the corresponding fraction of edges.

Assuming that the BEQ problem does have solutions for the given , with the one found by TTD denoted and . Assuming additionally that our quantizer based on TD has decimated a fraction of the -nodes and has so far maintained synchronization with the TTD in decimation decisions, is then consistent with the current priors and can serve as the reference codeword: all and ’s, with decimated or not, must be either or and never contradict the reference codeword. Denoting by e.g.  the average mutual information (MI) in the ’s from the previous iteration about their respective reference values , which in this case is simply the fraction of that equals ,999In this paper, all such MIs and EXIT curves are also averaged over the LDGM code ensemble with the given degree distribution. Assuming that relevant concentration results hold, for we can also talk about the convergence behavior of a specific code using these ensemble-averaged MIs. and using the usual fact that the factor graph becomes locally tree-like with high probability as , we can find the EXIT curve relating the input for the -nodes and their output , hence called the -curve, to be

 <