# Replica analysis and approximate message passing decoder for superposition codes

###### Abstract

Superposition codes are efficient for the additive white gaussian noise channel. We provide here a replica analysis of the performances of these codes for large signals. We also consider a Bayesian approximate message passing decoder based on a belief-propagation approach, and discuss its performance using the density evolution technic. Our main findings are 1) for the sizes we can access, the message-passing decoder outperforms other decoders studied in the literature 2) its performance is limited by a sharp phase transition and 3) while these codes reach capacity as (a crucial parameter in the code) increases, the performance of the message passing decoder worsen as the phase transition goes to lower rates.

Superposition coding is a scheme for error-correction over the Additive White Gaussian Noise (AWGN) channel where a codeword is a sparse linear superposition of a random independent and identically distributed (i.i.d) matrix . Even though it has been shown in [1, 2] that these codes (in a proper limit) are reliable —up to exponentially small error— up to capacity, the performance of the overall scheme is highly dependent on the decoder efficiency. In particular, [1] have proposed an iterative algorithm called adaptive successive decoder, later on improved in [16]. In the present work, we expose another kind of iterative procedure, based on a Bayesian approach combined with a Belief-Propagation (BP) type algorithm, using technics that have been originally developed for compressed-sensing: the so-called Approximate Message Passing algorithm (AMP) [3, 4, 5, 6]. Much in the same way BP is used in the context of Low Density Parity Check (LDPC) codes [7], the AMP approach combines the knowledge of the noise and signal statistics with the powerful inference capabilities of BP. A second contribution we provide is the computation of the performance of these codes under optimal decoding using the (non-rigorous) replica method [8, 9]. The approach is deeply related to what has been previously applied to LDPC codes.

This contribution is organized as follow: In Sec. I, we briefly present superposition codes and introduce the notations. The replica analysis is performed in Sec. II. In Sec. III we describe the AMP algorithm and study its performance by the Density Evolution [4] (DE) technic in Sec. IV. Finally, Sec. V presents a numerical study of the performances for finite size signals.

## I Superposition codes

We refer the reader to the original papers [1, 10, 2] for a full description of superposition codes. The message to be transmitted is a string where each . It is converted onto a binary string of dimension where in each of the sections of size , there is a unique value 0 at the position corresponding to the state of the associated variable in (using a power of for ensures that this step is trivial). One then introduce the coding matrix of dimensions () whose elements are i.i.d Gaussian distributed with mean and variance . The codeword (there are of them) reads: .

We choose the scaling of such that has a unit constant power , the only relevant parameter is thus the signal to noise ratio . The dimension of is where is the coding rate in bits per channel use. The noisy output of the AWGN channel with capacity reads

(1) |

where is a Gaussian distribution with mean and variance . Now comes the crucial point that underline our approach: Consider that each section of variables in is one single -dimensional (-d) variable on which we have the prior information that it is zero in all dimensions but one. In other words, instead of dealing with a -d vector with elements , we deal with a -d vector whose elements are -d vectors ( thus contains the information on the section of variables in ). In this setting, decoding from the knowledge of and the dictionary is a (multidimensional) linear estimation problem with element-wise prior information on the signal. This is exactly the kind of problem considered in the Bayesian approach to compressed sensing [5, 11, 6, 12].

## Ii Replica analysis

Let us first access the performances of these codes when is finite in the large limit. We proceed as in the LDPC case [7] and define the "potential" function (or free-entropy [9]) at fixed as where denotes the average over the random variable and is the normalization of the probability measure of the estimator :

(2) |

where is the -d variable of the estimator (i.e one of its sections as for ). Our computation is based on the replica method, an heuristic coming from the statistical physics of disordered systems that relies on the identity to compute averages of logarithms, and that has been applied very often to coding theory [9], sparse estimation [13, 14, 6, 12] or optimization problems [8] where it has been later shown to rigorously give the correct answer. We shall not give a detailed description of our computation that in fact follows the one given in [12] for compressed sensing almost step by step. The only difference is again that we are dealing here with -d variables whose components takes binary value , and only one of the ’s in is non zero. Any value is a possible choice (such as the exponentially distributed of [1, 10]) but we shall here restrict ourselves to the simplest case where . We have observed empirically that this seems to be an efficient distribution for the algorithm we describe in Sec. III.

The estimator we are interested in is the section-error rate (), the fraction of incorrectly decoded sections:

(3) | |||

(4) | |||

(5) |

where is the posterior average with respect to (2) of the -d variable , is the marginal probability of section (i.e the joint probability ) and is the indicator function. Most of our results, however, can be expressed more conveniently as function of a different estimator that we shall denote as the biased MSE :

(6) |

Both quantities behave similarly in our computations. In fact, the replica and DE analysis show that the can be computed directly from the value of (see Sec. IV, and in particular eq.(18)).

The main interest of the potential is that the actual typical value (and thus the typical value ) for a given ensemble of codes in the large signal size limit is obtained by maximizing it so that . After computation and defining a rescaled biased MSE one can show that for a given and up to constants, the potential reads:

(7) |

where and (with being a centered Gaussian measure of unit variance). Using the mapping from to given by eq.(18) we can compute numerically . Exemples are shown in Fig. 1 in the case for different rates.

Interestingly, the potential behaves very similarly to the one of LDPC codes so that we observed the very same phenomenology. For large enough rates, it develops a "two-maxima shapes" with a high and low error maxima. The low error maxima dominates as long as it is the higher one, i.e statistically dominant. When the two maxima of have same height (red curve on Fig. 1) the reconstruction capacities become extremely poor: this mark what we shall refer to as the "optimal threshold", as it represents the point until which the code has good performance under optimal decoding.

It is well known in LDPC codes that the rate for which the second maxima appears (blue curve on Fig. 1) corresponds to the moment where the BP decoder does not correspond anymore to the optimal one. We shall see that this remains the case here and that only in the one-maxima region (green curve on Fig. 1), our AMP decoder will match asymptotically the optimal performance. Fig. 2 represents these two thresholds in the limit as a function of the section size for fixed . We shall come back to the BP performances, but let us first notice here that (i) as increases, the optimal threshold approaches the Shannon capacity (and in fact the corresponding value of goes to zero, see Fig 4). However, (ii) the rate at which we expect message passing algorithms to converge to the optimal value unfortunately decays as gets larger. These two observations are in agreament with the limit, performed by replica analysis: The optimal threshold matches the capacity, as predicted in [1] and the rate at the BP threshold is asymptotically given by:

(8) |

## Iii Approximate Message Passing algorithm

We shall now consider a BP type iterative algorithm to estimate the joint probability of one section, still in the case where . Following the derivation of the AMP in [12], it follows from the parametrization:

(9) | ||||

(10) | ||||

(11) |

where is the prior that matches the signal distribution, eq.(2). This matching condition is called Nishimori condition in statistical physics [15, 12]. The set are Gaussian distributed, with moments iteratively computed by the AMP. Marginalizing over we get after simplification the posterior average and variance of at step t of the algorithm:

(12) | ||||

(13) |

AMP allows to obtain the estimate at time given the parametrization (9) as follows (see again [12]):

Here only the functions and depend in an explicit way on the signal model . A suitable initialization for the quantities is (, , ). Once the convergence of the iterative equations is reached, one get the posterior average of the signal component given by and the as well using , eq.(4).

## Iv Asymptotic analysis of AMP

The dynamics of the under the AMP recursions can be described in the large signal size (, , fixed) using the DE technic [4]. It consists in the statistical analysis of the quantities , which is based on the fact that the product in (9) becomes Gaussian distributed (by central-limit theorem) as .

As for the replica analysis, our derivation follows [12] with the only difference that we are dealing with -d variables. Defining the effective variance and the rescaled biased MSE as in (6) which now depends on the iteration step:

(14) |

then the rescaled biased MSE follows the following iteration in the limit:

(15) |

where:

(16) | ||||

(17) |

In this approach, there is a one to one correspondance from the value of the biased MSE to the thanks to the mapping:

(18) |

The fixed-point of this set of equations for a given , starting from , allows to compute the asymptotic reached after convergence. Fig. 3 compares the DE prediction of the versus the algorithm dynamics in the case for different rates. We see how close is the DE asymptotic theory from the true behavior even for small signals.

Finally, let us now comment on the classical duality between DE results and the replica potential: DE equations are nothing but the fixed point iterations of the replica potential [9]. As discussed in the Sec. II we thus now can identify the BP threshold studied in Sec. II as the moment where the DE iterations remain confined in high values.

What have we learned from this analysis? The main point is stressed in Fig. 2: the BP threshold goes to larger distance from capacity as increases. This is unfortunate since, as shown in Fig. 4, the optimal decays drastically when is increased. Nevertheless, AMP alone is not able to reach optimal performance for high rates.

## V Numerical experiments

The previous sections were asymptotic studies of the AMP performances. In this section we study numerically the influence of finite size effects over the performance of superposition codes combined with the AMP decoder. Fig. 5 summarizes our findings. We have limited ourselves to small sizes and performed some numerical protocols to test the iterative AMP decoding. We checked that not only the Bayesian decoder does follow the density evolution, but it works very well for limited size as well, at least far enough from the BP threshold. In fact, when the rate is below the threshold, the decoding is usually perfect and is found to reach with high probability . This is due to the fact that in order to observe an , there must be at least sections which is not the case for small signals, when the asymptotic is very small. Even for reasonably large sizes allowing to get close to the asymptotic performances (such as , green curve on Fig. 5), AMP just take seconds to decode on a Matlab code.

To confirm that the relevant parameter with respect to the finite size effects is and not , we made the following experiment (protocol 1 in Fig. 5) fixing and varying (thus decreases as increases) and observed that the performances worsen rapidly as increases.

Finally we compared the efficiency of AMP to the iterative successive decoder introduced by Baron and Joseph [1]. This decoder has been shown to be capacity achieving in the large limit for a slightly different version of the superposition codes, when an exponential distribution of the signal entries (see Sec. II) is used instead of the distribution we have been studying. However, the finite corrections to the performance appear to be quite severe. We have compared the performance of our approach (with the distribution) with the result obtained in [1] (see again Fig. 5) and found that at the values of that one can consider in a computer, AMP appears to be superior in performance (though it will eventually be outperformed it in the limit of very large values of ). This remains the case even when compared to the most recent developements [16]. It would be interesting to actually find what is the best possible signal distribution to further optimize the performances of AMP reconstruction.

## Vi Perspectives

We have provided here a replica analysis of superposition codes and a Bayesian approximate message passing decoder based on a belief-propagation like approach that we have analyzed using the density evolution technic. We have found a behavior very similar to those found in LDPC codes. In particular, our AMP decoder seems to have good performances, but is limited to region far from optimal decoding. We have released a Matlab implementation of our decoder at https://github.com/jeanbarbier/BPCS_common.

There are a number of possible continuations of the present work, in the spirit of recent developments in compressed sensing. Superposition codes which lies on the fuzzy separation between coding and compressed sensing theories are in fact a natural venue for these ideas. The most natural one is to use the spatial coupling approach [17, 18, 19] in order to design codes with exact optimal performances but for which AMP will be able to achieve them for all rates. This strategy is known to work in compressed sensing [12, 6, 20] so that its application in the present context should be straightforward. Another interesting direction would be to replace the random matrices used in this work by structured ones allowing a fast multiplication, as it has been done, again, in compressed sensing [21, 22], in order to have a decoder as fast as those used in sparse codes. We plan to study these points in a future work.

## Acknowledgment

The research leading to these results has received funding from the European Research Council under the European Union’s Framework Programme (FP/2007-2013)/ERC Grant Agreement 307087-SPARCS and from the French Ministry of defense/DGA.

## References

- [1] A. R. Barron and A. Joseph, “Sparse superposition codes: Fast and reliable at rates approaching capacity with gaussian noise,” Manuscript. Available at “http://www. stat. yale. edu/ arb4, 2010.
- [2] A. Joseph and A. R. Barron, “Least squares superposition codes of moderate dictionary size are reliable at rates up to capacity,” Information Theory, IEEE Transactions on, vol. 58, no. 5, pp. 2541–2557, 2012.
- [3] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,” Proc. Natl. Acad. Sci., vol. 106, no. 45, pp. 18 914–18 919, 2009.
- [4] M. Bayati and A. Montanari, “The dynamics of message passing on dense graphs, with applications to compressed sensing,” IEEE Transactions on Information Theory, vol. 57, no. 2, pp. 764 –785, 2011.
- [5] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in IEEE International Symposium on Information Theory Proceedings (ISIT), 2011, pp. 2168 –2172.
- [6] F. Krzakala, M. Mézard, F. Sausset, Y. Sun, and L. Zdeborová, “Statistical physics-based reconstruction in compressed sensing,” Phys. Rev. X, vol. 2, p. 021005, 2012.
- [7] T. Richardson and R. Urbanke, Modern Coding Theory. Cambridge University Press, 2008.
- [8] M. Mézard, G. Parisi, and M. A. Virasoro, Spin-Glass Theory and Beyond. Singapore: World Scientific, 1987, vol. 9.
- [9] M. Mézard and A. Montanari, Information, Physics, and Computation. Oxford: Oxford Press, 2009.
- [10] A. R. Barron and A. Joseph, “Analysis of fast sparse superposition codes,” in Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on. IEEE, 2011, pp. 1772–1776.
- [11] A. Montanari, “Graphical models concepts in compressed sensing,” Compressed Sensing: Theory and Applications, pp. 394–438, 2012.
- [12] F. Krzakala, M. Mézard, F. Sausset, Y. Sun, and L. Zdeborová, “Probabilistic reconstruction in compressed sensing: Algorithms, phase diagrams, and threshold achieving matrices,” J. Stat. Mech., 2012.
- [13] T. Tanaka, “A statistical-mechanics approach to large-system analysis of cdma multiuser detectors,” IEEE Trans. Infor. Theory, vol. 48, pp. 2888–2910, 2002.
- [14] Y. Kabashima, T. Wadayama, and T. Tanaka, “A typical reconstruction limit of compressed sensing based on lp-norm minimization,” J. Stat. Mech., p. L09003, 2009.
- [15] H. Nishimori, Statistical Physics of Spin Glasses and Information Processing. Oxford: Oxford University Press, 2001.
- [16] A. R. Barron and S. Cho, “High-rate sparse superposition codes with iteratively optimal estimates,” in Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on. IEEE, 2012, pp. 120–124.
- [17] S. Kudekar, T. Richardson, and R. Urbanke, “Spatially coupled ensembles universally achieve capacity under belief propagation,” 2012, arXiv:1201.2999v1 [cs.IT].
- [18] ——, “Threshold saturation via spatial coupling: Why convolutional ldpc ensembles perform so well over the bec,” in Information Theory Proceedings (ISIT),, 2010, pp. 684–688.
- [19] A. Yedla, Y.-Y. Jian, P. S. Nguyen, and H. D. Pfister, “A simple proof of threshold saturation for coupled scalar recursions,” 2012, arXiv:1204.5703v1 [cs.IT].
- [20] D. L. Donoho, A. Javanmard, and A. Montanari, “Information-theoretically optimal compressed sensing via spatial coupling and approximate message passing,” in Proc. of the IEEE Int. Symposium on Information Theory (ISIT), 2012.
- [21] A. Javanmard and A. Montanari, “Subsampling at information theoretically optimal rates,” in Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, July 2012, pp. 2431–2435.
- [22] J. Barbier, F. Krzakala, and C. Schülke, “Compressed sensing and approximate message passing with spatially-coupled fourier and hadamard matrices,” arXiv preprint arXiv:1312.1740, 2013.