# Design and Analysis of LDGM-Based Codes for MSE Quantization

###### Abstract

Approaching the 1.5329-dB shaping (granular) gain limit in mean-squared error (MSE) quantization of is important in a number of problems, notably dirty-paper coding. For this purpose, we start with a binary low-density generator-matrix (LDGM) code, and construct the quantization codebook by periodically repeating its set of binary codewords, or them mapped to -ary ones with Gray mapping. The quantization algorithm is based on belief propagation, and it uses a decimation procedure to do the guessing necessary for convergence. Using the results of a true typical decimator (TTD) as reference, it is shown that the asymptotic performance of the proposed quantizer can be characterized by certain monotonicity conditions on the code’s fixed point properties, which can be analyzed with density evolution, and degree distribution optimization can be carried out accordingly. When the number of iterations is finite, the resulting loss is made amenable to analysis through the introduction of a recovery algorithm from “bad” guesses, and the results of such analysis enable further optimization of the pace of decimation and the degree distribution. Simulation results show that the proposed LDGM-based quantizer can achieve a shaping gain of 1.4906 , or 0.0423 from the limit, and significantly outperforms trellis-coded quantization (TCQ) at a similar computational complexity.

## I Introduction

The mean-squared error (MSE) quantization problem of
[2, Sec. II-C] can be formulated
as follows:^{1}^{1}1Notational conventions: and are
respectively the set of integers and real numbers.
is the Euclidean norm. is the cardinality
of set . denotes asymptotic equality,
usually with respect to block length . ,
entropy and mutual information are computed in base-2, while
and are base-. Bold letters denote
sequences or vectors whose elements are indicated by subscripts,
e.g. , and sub-sequences are denoted by or
. Addition and
multiplication on sets apply element-by-element, e.g. .
(or simply ) is defined as the unique element of
, and similarly
or is the unique element of
. The unit “”
means “bits per symbol”. let be a discrete subset of
(the quantization codebook, or simply
code)^{2}^{2}2Ref. [2] assumes
that is a lattice, but in practice neither the trellis in
TCQ nor the non-binary codebooks proposed here are lattices.
Therefore, we allow to be any discrete set, and
definitions are modified accordingly. and be a quantizer that maps each
to a nearby codeword . The
mean-square quantization error, averaged over , is given by

(1) |

The objective is to design and a practical quantizer
such that the scale-normalized MSE
is minimized,^{3}^{3}3This agrees with the
definition of for lattices in
[2]. where is the codeword density

(2) |

In this paper we consider asymptotically large dimensionality . By a volume argument, it is easy to find a lower bound for . This bound can be approached by the nearest-neighbor quantizer with a suitable random codebook e.g. in [2], whose codewords’ Voronoi regions are asymptotically spherical, but such a quantizer has exponential complexity in and is thus impractical. The simplest scalar quantizer , on the other hand, has the 1.5329-dB larger , which corresponds to the well-known 1.53-dB loss of scalar quantization. In general, we call the shaping loss of a quantizer, and it is also the gap of the granular gain and shaping gain defined in [3], for source and channel coding respectively, toward the 1.53-dB limit.

MSE quantizers with near-zero shaping losses are important in both source and channel coding. In lossy source coding, the shaping loss naturally dictates rate-distortion performance at high rates [3]. In channel coding on Gaussian channels, MSE quantizers can be used for shaping to make the channel input closer to the optimal Gaussian distribution [4]. Basically, instead of transmitting the channel-coded and QAM-modulated signal (each element of corresponding to one symbol in the code block), we transmit with , which should be closer to Gaussian. and are separated at the receiver side, and the shaping loss determines the achievable gap from channel capacity at high SNRs. Shaping is particularly important in dirty-paper coding (DPC) [5] on the channel

(3) |

where is the transmitted signal, is the interference known only at the transmitter, and is the “MMSE-adjusted” noise. Using an MSE quantizer, arbitrarily large can be pre-cancelled without significantly increasing signal power by transmitting

(4) |

so that the received signal

(5) |

Again, the receiver must separate and , and the shaping loss determines the achievable gap from channel capacity. In this case, however, due to the lack of receiver-side knowledge of , the rate loss caused by non-ideal shaping is most significant at low SNRs and can be a significant fraction of channel capacity [6, 7, 8, 9]. For example, the shaping quantizer in [9] has 0.15 shaping loss, corresponding to a rate loss of 0.025 , yet in the 0.25-b/s DPC system this is already 10% of the rate and is responsible for 0.49 of its 0.83-dB gap from capacity. Apart from its obvious application in steganography [10], DPC and its extension to vector channels (similar in principle to vector precoding [11] but done in both time and spatial domains) are also essential in approaching the capacity of vector Gaussian broadcast channels such as MIMO downlink, therefore the design of near-ideal MSE quantizers is of great interest in these applications.

Currently, near-optimal MSE quantizers usually employ trellis-coded quantization (TCQ) [12], in which or with being respectively the codeword set of a binary convolution code or a 4-ary trellis code. The number of required trellis states increases very rapidly as the shaping gain approaches the 1.53-dB limit, and the computational complexity and memory requirement are thus very high. This is particularly bad at the receiver side of DPC systems, where the BCJR (Bahl-Cocke-Jelinek-Raviv) algorithm must be run many times on the trellis in an iterative fashion to separate and [9], resulting in a time complexity proportional to both the number of trellis states and the outer iteration count.

Inspired by the effectiveness of Turbo and low-density parity-check (LDPC) codes in channel coding, it is natural to consider the use of sparse-graph codes in quantization. In [13] Turbo codes are used in quantization of uniform sources, but convergence issues make the scheme usable only for very small block sizes , and the shaping loss is thus unsatisfactory. In [14, 15, 16], it is shown that low-density generator matrix (LDGM) codes, being the duals of LDPC codes, are good for lossy compression of binary sources, and practical quantization algorithms based on belief propagation (BP) and survey propagation (SP) have also been proposed in [17] and [18], but these works consider binary sources only. Practical algorithms for the MSE quantization of with LDGM codes have not received much attention before. Even in the binary case, little has been done in the analysis of the BP quantizer’s behavior and the optimization of the LDGM code for it.

In [1], we have addressed the problem of MSE quantization using LDGM-based codes of structure , known as -ary codes, where each is a codeword of a binary LDGM code when , and is the combination of two codewords, each from a binary LDGM code, by Gray mapping when . The degree distributions of the codes are optimized under the erasure approximation, and shaping losses as low as 0.056 have been demonstrated.

In this paper, we will improve upon the results in [1] by using better analytical techniques and more accurate methods for code optimization. We start in Section II by analyzing the minimum shaping loss achievable by this -ary structure using random-coding arguments. Although binary quantization codes have significant random-coding loss, they are analyzed first due to their simplicity. In Section III, we present the quantization algorithm for binary codes, which consists, like [18], of BP and a guessing (“decimation”) procedure to aid convergence.

Like LDPC, degree distribution plays an important role in the performance of LDGM quantization codes, but the use of decimation makes direct analysis difficult. To solve this problem, we propose the typical decimator (TD) as a suboptimal but analytically more tractable version of the decimation algorithm, and analyze first its use in the simpler binary erasure quantization (BEQ) problem in Section IV, which also forms the basis for the erasure approximation in [1]. We find that the TD can obtain asymptotically correct extrinsic information for decimation, and a solution to BEQ can be found with such information, as long as the code’s extended BP (EBP) extrinsic information transfer (EXIT) curve [19] characterizing the fixed points of the BP process satisfies certain monotonicity conditions. For a given LDGM code, the most difficult BEQ problem it can solve is then parametrized by a monotonicity threshold , and the degree distribution can be optimized by maximizing this .

In Section V, these arguments are extended to our MSE quantization problem, and similar monotonicity conditions are obtained, which can be checked by quantized density evolution (DE). These DE results can be visualized with modified EXIT curves, and a similar method to the BEQ case can then be used for degree distribution optimization.

We have assumed iteration counts in the above analysis. In Section VI, we proceed to analyze the impact of finite . We will show that a finite causes “bad” guesses in decimation, and a recovery algorithm is sometimes required for BP to continue normally afterwards. With recovery, the loss due to finite can be characterized by the delta-area between the EBP curve and the actual trajectory, which will be used in the subsequent optimization of the pace of decimation as well as the degree distribution.

All these results are extended to -ary codes (where ) in a straightforward manner in Section VII. Numerical results on MSE performance in Section VIII shows that LDGM quantization codes optimized with the aforementioned methods have the expected good performance and can achieve shaping losses of 0.2676 at 99 iterations, 0.0741 at 1022 and 0.0423 at 8356 iterations, the latter two of which are far better than what TCQ can reasonably offer and are also significantly better than the results in [1]. Indeed, a heuristical analysis on the asymptotic loss-complexity tradeoff carried out in Section IX indicates that LDGM quantization codes can achieve the same shaping loss with far lower complexity than TCQ. We conclude the paper in Section X.

## Ii Performance Bounds of -ary Quantizers

In this paper, we consider with a periodic structure , where is a set of codewords from with each labeled by a binary sequence . We call an -ary rate- quantization code. In this section, we will analyze the achievable shaping loss by this periodic structure.

Given the source sequence , for each the nearest sequence to in is , where is the quantization error and . The quantizer has then to minimize over all ’s, or equivalently, to maximize

(6) |

for some constant . The chosen is denoted
, the corresponding quantization error is
, and the resulting
MSE (1) then becomes^{4}^{4}4For large ,
(7) is mostly just an average over strongly
typical with respect to the uniform distribution on
, i.e. those whose elements are approximately uniformly
distributed over , and the rest of this paper considers such
only. In shaping and DPC applications, can be a
modulated signal that does not follow the uniform distribution, and
in such cases it may be necessary to “dither” before
quantization by adding to it a random sequence uniformly distributed
in and known by the dequantizer, in order to obtain the
expected MSE performance.

(7) |

where denotes averaging over .

### Ii-a Lower Bound of Quantization Error

Given , for each source sequence , let

(8) |

Since , we can lower-bound the mean-square quantization error as

(9) |

Now let with and , and average over , then from Jensen’s inequality

(10) |

where can easily be found to be

(11) |

in (7) can be lower-bounded by integrating (10) over . For asymptotically large , we only need to consider (strongly) typical with respect to the uniform distribution on , i.e. whose elements are nearly uniformly distributed over . We thus have

(12) | ||||

(13) |

This bound holds for any and is found to be tightest for satisfying (this is hence denoted ), when it becomes . and are defined as

(14) | ||||

(15) | ||||

(16) |

### Ii-B Achievable Quantization Error with Random Coding

For asymptotically large , we will see that the aforementioned lower bound is actually achievable by random coding, that is, with the codewords in independently and uniformly sampled from (allowing for duplicates) and using the nearest-neighbor quantizer.

Again we assume , and since the MSE is bounded for any , we can consider only typical ’s with respect to the uniform distribution on . Define

(17) |

as the set of possible codewords that are “sufficiently close” to , and we can compute with large deviation theory. If it is larger than , with asymptotically high probability , thus some can be found whose MSE toward is no more than . Since this is true for most typical , the average MSE cannot exceed by more than a vanishingly small value.

To compute for a typical , we define the type of a sequence as the fraction of each at the positions in whose corresponding elements in are approximately . Denoting the number of sequences with this type as , we have

(18) |

where is the entropy

(19) |

and becomes the constraint

(20) |

According to large deviation theory, is asymptotically the maximum of (18) under the constraints (20) and

(21) |

This is a convex functional optimization problem over (a function of both and ), which can be easily solved with Lagrange multipliers. The maximizing is found to be

(22) |

and the resulting

(23) |

By the argument above, as long as , i.e. , random coding can achieve for asymptotically large .

### Ii-C Marginal Distribution of Quantization Error

### Ii-D The Random-Coding Loss

We have shown that a random quantization codebook with the nearest-neighbor quantizer is asymptotically optimal among rate- quantization codes of the form . Therefore, its shaping loss represents the performance limit of such codes, and can be viewed as the cost incurred by the period- structure.

For asymptotically large , the random -ary quantizer has average MSE with and density , so the achieved . The shaping loss can then be expressed as , where is the power of a Gaussian with entropy . We called it the random-coding loss, and it is plotted in Fig. 1 for and . For large and moderate , in (11) approaches a constant, is close to a Gaussian distribution, thus and the random-coding loss is close to zero.

## Iii The Binary LDGM Quantizer

As random quantization codes with the nearest-neighbor quantizer are obviously impractical to implement, it is natural to look into sparse-graph codes as practical candidates for achieving near-zero shaping losses. In [14], it has been shown that LDPC codes are unsuitable for BEQ but LDGM codes work well, therefore we will also use LDGM codes in MSE quantization. We consider the simplest case first, and in Section VII we will look into codes with larger that are not as limited by the random-coding loss.

We thus consider with
being the codeword set of an LDGM code, i.e. each is of the form ,
where , and the low-density
generator matrix is randomly
generated from some degree distribution that will be optimized below.
Given such a code, in (6) can be
represented by the factor graph [20] in
Fig. 2(a).^{5}^{5}5In the factor
graph, symbols such as and denote variable
and factor nodes, while and are the variables
themselves. denote
the set of indices for which there is an edge connecting
and . In belief propagation,
is the priors on variable ,
is the computed extrinsic probabilities for ,
denotes a message from node to , and so on.
The priors, posteriors and messages are all probability
distributions [20], in this case over
, and here we represent them by probability tuples (rather
than -values, which are equivalent). For example,
is viewed as a tuple
satisfying
(the normalization is done
implicitly), which corresponds to -value . “” and “” refer to the
variable-node and check-node operations in LDPC literature, i.e. (implicitly normalized) and
.
, and
are respectively the
“sure-0”, “sure-1” and “unknown” messages. is the entropy function for . The -nodes (shorthand for the factor nodes
, ) represent the relationship , whereas each factor
in (6) is included in the prior on variable
as

(27) |

where (with ) serves as the normalization factor.

The belief propagation algorithm (also known as the sum-product
algorithm) can then be run on this factor graph. Unlike the case of
LDPC decoding, here BP does not usually converge by itself.^{6}^{6}6Intuitively speaking, when doing LDPC decoding with SNR
higher than threshold, the transmitted codeword is usually much
closer to the received sequence (and thus much more likely) than any
other codeword, allowing BP to converge it. In the case of
quantization with LDGM codes, there are usually a large number of
similarly close codewords to the source sequence, and BP cannot by
itself make a decision among them. Instead, we rely on BP to
generate “extrinsic probabilities” for each
after a number of iterations, with which hard decisions are made on
some ’s (called decimation following
[17]). Subsequent BP iterations use
these hard decisions as priors , and the resulting
updated ’s are used for more decimations. This
iterative process continues until a definite is obtained
that hopefully has a large and thus a small
quantization error. This quantization algorithm is shown in
Fig. 3, with a BP part and a decimation part in each
iteration. As is intuitively reasonable, each time we decimate the
“most certain” bit , with

(28) |

and it is decimated to its most likely value

(29) |

This is called the greedy decimator (GD). Alternatively, for the convenience of analysis we will also look at the typical decimator, implementable but with worse performance in practice, in which the bit index to decimate is chosen randomly in (the set of yet undecimated bits) with equal probabilities, and its decimated value is with probability .

The number of bits to decimate is controlled through the estimated mutual information in -to- messages (i.e. the ’s), which is made to increase by about in each iteration. This amount of increase , possibly a function of the current and hence called the pace of decimation, makes the algorithm terminate within iterations if followed exactly, though the actual iteration count can be somewhat different. Uniform pacing is used in [1], i.e. is a constant . In this paper, the pacing is optimized in Section VI-D to obtain somewhat better MSE performance. Increasing also improves MSE performance, but more iterations would be necessary.

The decimation algorithm can either be unthrottled or throttled. The unthrottled version used in most of our simulations simply decimates until the increase of in the iteration reaches . In the throttled version introduced in [1], the amount of decimation per iteration is instead controlled by , which is smoothly adapted, as shown in Fig. 3, to make increase eventually at the desired pace.

## Iv Degree Distribution Optimization for Binary Erasure Quantization

Like LDPC codes, LDGM quantization codes require optimized degree distributions for good MSE performance. The performance of LDGM quantizers has been analyzed previously in [15] for binary sources, but this analysis, based on codeword-counting arguments, is applicable only to nearest-neighbor quantization and not very useful for the above BP quantizer. In [17]’s treatment of LDGM quantization of binary sources, degree distributions of good LDPC codes in [21] are used directly, inspired by the duality between source and channel coding in the erasure case [14]. In our previous work [1], LDGM degree distributions are instead designed by directly fitting the EXIT curves under the erasure approximation (EA), also known as the BEC (binary erasure channel) approximation [22]. Both methods perform well, but they are heuristic in their analysis of decimation, and may thus be suboptimal.

In this and the next section, we will give a detailed analysis on
degree distribution optimization of BP-based LDGM quantizers that
properly takes decimation into account, which should allow better MSE
performance to be attained. Under the erasure approximation, we are
in effect designing an LDGM quantization code for the simpler
binary erasure quantization problem and using it in MSE
quantization.^{7}^{7}7In this paper we only consider codes chosen
randomly, through random edge assignment, from the LDGM code
ensemble with a given degree distribution, therefore only the degree
distribution is subjected to optimization, and we will not
distinguish between codes and degree distributions. Therefore, we
will first focus on BEQ in this section, and in
Section V the methods given here will be extended
to MSE quantization, with or without the erasure approximation.

### Iv-a Binary Erasure Quantization

The binary erasure quantization problem can be formulated as follows [14]. The source sequence has the form , where “” denotes erased positions and occurs with probability . A binary code consisting of codewords , each labeled by , is then designed according to and the rate . For each , the quantizer should find a codeword such that or for all , i.e. agrees with on all non-erased positions. The number of non-erased positions in a given is denoted by , which is approximately for large . Ideally can be as small as this , i.e. , but in practice higher rates are necessary.

Similar to (6), can be defined as

(30) |

and the quantizer can equivalently find, for a given , some such that (which then equals 1).

When is the codeword set of an LDGM code and as in Section III, can be described by the factor graph in Fig. 2(a) as well, where each is a normalized version of , i.e. is , or if is respectively , , . Apart from this difference in , the algorithm in Fig. 3 with the typical decimator can be used here for the purpose of analysis, though the recovery algorithm in Section VI-B will be necessary for good performance in practice.

The BEQ problem may alternatively be viewed as a set of linear equations

(31) |

over the binary field , where and are the columns of and that correspond to non-erased positions of . Denoting by the rank of , (31) then has solutions for of the possible ’s, and for other ’s there is no solution at all.

We first assume that (31) has a set of solutions, then is a probability distribution for that is uniform over . Using this , similar to the BP-derived extrinsics , the true extrinsic probabilities of can now be defined as

(32) |

which depends on the set of decimated bits and their decimated values . Note that can only be , , or : it is if all solutions with have , and otherwise there must be the same number of solutions with and with , making .

Without loss of generality, the typical decimator can be assumed to decimate in the order of . Decomposing into

(33) |

each factor is then the
after the decimation of into
. We therefore construct the fictitious true
typical decimator (TTD), which is just like the TD except that
decimation of is done according to rather
than . Moreover, the TTD shares the source of
randomness with the TD, so decimation is still done in the order of
, and each is decimated to the same value
except to account for the difference between and
.^{8}^{8}8For example, the TD and the TTD can use
the same i.i.d. random sequence in decimation with each uniformly distributed
in , and each is decimated to 0 in the TD if and in the TTD if , and to 1 otherwise. In this way, the
decimation results are always the same if , and are rarely different if
and are close. The TTD in effect samples a
according to the probability distribution
, so it must yield a random solution . If, for every , the TD at the time of
’s decimation has , then it
will run synchronously with the TTD and yield the same solution in
. Otherwise, e.g. if
and for some , then the TD
might decimate to 1, which will eventually result in a
contradiction. Therefore, our first requirement for TD to find a
solution to (31) is that BP must compute the correct
extrinsic probabilities after enough iterations, which is hence called
the extrinsic probability condition.

How, then, to ensure the existence of solutions to (31) for any ? We may define with (8) which, for each , gives the number of solutions to (31) and is for ’s and zero for the rest. , if normalized by , is again a uniform distribution over these ’s. We then require , making a uniform distribution over all possible ’s, so that the BEQ problem have solutions for any . This is the other condition for BEQ to be always solvable by the TD, hence called the equi-partition condition.

For , the two conditions above are now suitable for analysis with density evolution methods, which in the BEQ case can be accurately done with EXIT charts, as will be discussed in the following subsections.

### Iv-B Fixed Points and EXIT Curves

We use -regular, -irregular LDGM codes for quantization as suggested by the LDGM-LDPC duality in [14]. Let be the right-degree of all -nodes, and denote as the fraction of -nodes with left-degree and as the corresponding fraction of edges.

Assuming that the BEQ problem does have solutions for the given
, with the one found by TTD denoted and
. Assuming additionally
that our quantizer based on TD has decimated a fraction of
the -nodes and has so far maintained synchronization with the
TTD in decimation decisions, is then consistent with the
current priors and can serve as the reference codeword: all
and ’s, with decimated or not, must
be either or and never contradict
the reference codeword. Denoting by e.g. the average
mutual information (MI) in the ’s from the previous
iteration about their respective reference values , which in
this case is simply the fraction of that equals
,^{9}^{9}9In this paper, all such MIs and EXIT
curves are also averaged over the LDGM code ensemble with the given
degree distribution. Assuming that relevant concentration results
hold, for we can also talk about the convergence
behavior of a specific code using these ensemble-averaged MIs. and
using the usual fact that the factor graph becomes locally tree-like
with high probability as , we can find the EXIT curve
relating the input for the -nodes and their output
, hence called the -curve, to be