Techniques for Improving the Finite Length Performance of Sparse Superposition Codes

# Techniques for Improving the Finite Length Performance of Sparse Superposition Codes

Adam Greig,  and Ramji Venkataramanan,  A. Greig and R. Venkataramanan are with Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK (e-mails: ag611@cam.ac.uk,   rv285@cam.ac.uk).This work was supported in part by EPSRC Grant EP/N013999/1, and by an EPSRC Doctoral Training Award.
###### Abstract

Sparse superposition codes are a recent class of codes introduced by Barron and Joseph for efficient communication over the AWGN channel. With an appropriate power allocation, these codes have been shown to be asymptotically capacity-achieving with computationally feasible decoding. However, a direct implementation of the capacity-achieving construction does not give good finite length error performance. In this paper, we consider sparse superposition codes with approximate message passing (AMP) decoding, and describe a variety of techniques to improve their finite length performance. These include an iterative algorithm for SPARC power allocation, guidelines for choosing codebook parameters, and estimating a critical decoding parameter online instead of pre-computation. We also show how partial outer codes can be used in conjunction with AMP decoding to obtain a steep waterfall in the error performance curves. We compare the error performance of AMP-decoded sparse superposition codes with coded modulation using LDPC codes from the WiMAX standard.

Sparse regression codes, Approximate Message Passing, Low-complexity decoding, Finite length performance, Coded modulation

## I Introduction

We consider communication over the memoryless additive white Gaussian noise (AWGN) channel given by

 y=x+w,

where the channel output is the sum of the channel input and independent zero-mean Gaussian noise of variance . There is an average power constraint on the input, so a length- codeword has to satisfy . The goal is to build computationally efficient codes that have low probability of decoding error at rates close to the AWGN channel capacity . Here snr denotes the signal-to-noise ratio .

Though it is well known that Shannon-style i.i.d. Gaussian codebooks can achieve very low probability of error at rates approaching the AWGN capacity [1], this approach has been largely avoided in practice due to the high decoding complexity of unstructured Gaussian codes. Current state of the art approaches for the AWGN channel such as coded modulation [2, 3] typically involve separate coding and modulation steps. In this approach, a binary error-correcting code such as an LDPC or turbo code is first used to generate a binary codeword from the information bits; the code bits are then modulated with a standard scheme such as quadrature amplitude modulation. Though these schemes have good empirical performance, they have not been proven to be capacity-achieving for the AWGN channel.

Sparse Superposition Codes or Sparse Regression Codes (SPARCs) were recently proposed by Barron and Joseph [4, 5] for efficient communication over the AWGN channel. In [5], they introduced an efficient decoding algorithm called “adaptive successive decoding” and showed that it achieved near-exponential decay of error probability (with growing block length), for any fixed rate . Subsequently, an adaptive soft-decision successive decoder was proposed in [6, 7], and Approximate Message Passing (AMP) decoders were proposed in [8, 9, 10, 11]. The adaptive soft-decision decoder in [7] as well as the AMP decoder in [11] were proven to be asymptotically capacity-achieving, and have superior finite length performance compared to the original adaptive successive decoder of [5].

The above results mainly focused on characterizing the error performance of SPARCs in the limit of large block length. In this work, we describe a number of code design techniques for improved finite length error performance. Throughout the paper, we focus on AMP decoding due to its ease of implementation. However, many of the code design ideas can also be applied to the adaptive soft-decision successive decoder in [6, 7]. A hardware implementation of the AMP decoder was recently reported in [12, 13]. We expect that the techniques proposed in this paper can be used to reduce the complexity and optimize the decoding performance in such implementations.

In the remainder of this section, we briefly review the SPARC construction and the AMP decoder from [11], and then list the main contributions of this paper. A word about notation before we proceed. Throughout the paper, we use to denote logarithms with base , and to denote natural logarithms. For a positive integer , we use to denote the set . The transpose of a matrix is denoted by , and the indicator function of an event by .

### I-a The sparse superposition code

A SPARC is defined in terms of a design matrix of dimension . Here is the block length, and are integers which are specified below in terms of and the rate . As shown in Fig. 1, the design matrix has sections with columns each. In the original construction of [4, 5] and in the theoretical analysis in [6, 7, 11, 14], the entries of are assumed to be i.i.d. Gaussian . For our empirical results, we use a random Hadamard-based construction for that leads to significantly lower encoding and decoding complexity [9, 10, 11].

Codewords are constructed as sparse linear combinations of the columns of . In particular, a codeword is of the form , where is a length column vector with the property that there is exactly one non-zero for the section , one non-zero for the section , and so forth. The non-zero value of in each section is set to , where are pre-specified positive constants that satisfy , the average symbol power allowed.

Both and the power allocation are known to both the encoder and decoder in advance. The choice of power allocation plays a crucial role in determining the error performance of the decoder. Without loss of generality, we will assume that the power allocation is non-increasing across sections. Two examples of power allocation are:

• Flat power allocation, where for all . This choice was used in [4] to analyze the error performance with optimal (least-squares) decoding.

• Exponentially decaying power allocation, where . This choice was used for the asymptotically capacity-achieving decoders proposed in [5, 7, 11].

At finite block lengths both these power allocations could be far from optimal and lead to poor decoding performance. One of the main contributions of this paper is an algorithm to determine a good power allocation for the finite-length AMP decoder based only on , , .

Rate: As each of the sections contains columns, the total number of codewords is . With the block length being , the rate of the code is given by

 R=log(ML)n=LlogMn. (1)

In other words, a SPARC codeword corresponding to input bits is transmitted in channel uses.

Encoding: The input bitstream is split into chunks of bits. A chunk of input bits can be used to index the location of the non-zero entry in one section of . Hence successive chunks determine the message vector , with the th chunk of input bits determining the non-zero location in section , for .

Approximate Message Passing (AMP) decoder: The AMP decoder produces iteratively refined estimates of the message vector, denoted by , where is the (pre-specified) number of iterations. Starting with , for the AMP decoder generates

 zt =y−Aβt+zt−1τ2t−1(P−∥βt∥2n), (2) βt+1i =ηti(βt+A∗zt), (3)

where

 ηti(s)=√nPℓexp(si√nPℓτ2t)∑j∈sec(i)exp(sj√nPℓτ2t),1≤i≤ML. (4)

Here the notation refers to all indices in the same section as . (Note that there are indices in each section.) At the end of each step , may be interpreted as the updated posterior probability of the th entry being the non-zero one in its section.

The constants are specified by the following scalar recursion called “state evolution” (SE):

 τ20=σ2+P,τ2t=σ2+P(1−x(τt−1)),t≥1, (5)

where

 x(τ):=L∑ℓ=1PℓPE⎡⎢ ⎢ ⎢ ⎢⎣e√nPℓτ(Uℓ1+√nPℓτ)e√nPℓτ(Uℓ1+√nPℓτ)+∑Mj=2e√nPℓτUℓj⎤⎥ ⎥ ⎥ ⎥⎦. (6)

In (6), are i.i.d.  random variables for . The significance of the SE parameters is discussed in Section II. In Section IV, we use an online approach to accurately compute the values rather than pre-computing them via (6).

At the end of iterations, the decoded message vector is produced by setting the maximum value in section of to and the remaining entries to zero, for .

Error rate of the AMP decoder: We measure the section error rate as

 Esec=1LL∑ℓ=11{ˆβℓ≠βℓ} (7)

Assuming a uniform mapping between the input bitstream and the non-zero locations in each section, each section error will cause approximately half of the bits it represents to be incorrect, leading to a bit error rate .

Another figure of merit is the codeword error rate , which estimates the probability . If the SPARC is used to transmit a large number of messages (each via a length codeword), measures the fraction of codewords that are decoded with one or more section errors. The codeword error rate is insensitive to where and how many section errors occur within a codeword when it is decoded incorrectly.

At finite code lengths, the choice of a good power allocation crucially depends on whether we want to minimize or . As we will see in the next section, a power allocation that yields reliably low section error rates may result in a high codeword error rate, and vice versa. In this paper, we will mostly focus on obtaining the best possible section error rate, since in practical applications a high-rate outer code could readily correct a small fraction of section errors to give excellent codeword error rates as well. Further, the bit error rate (which is approximately half the section error rate) is useful to compare with other channel coding approaches, where it is a common figure of merit.

### I-B Organization of the paper and main contributions

In the rest of the paper, we describe several techniques to improve the finite length error performance and reduce the complexity of AMP decoding. The sections are organized as follows.

• In Section II, we introduce an iterative power allocation algorithm that gives improved error performance with fewer tuning parameters than other power allocation schemes.

• In Section III, we analyze the effects of the code parameters and the power allocation on error performance and its concentration around the value predicted by state evolution.

• In Section IV, we describe how an online estimate of the key SE parameter improves error performance and allows a new early-stopping criterion. Furthermore, the online estimate enables us to accurately estimate the actual section error rate at the end of the decoding process.

• In Section V, we derive simple expressions to estimate and given the rate and power allocation.

• In Section VI we compare the error performance of AMP-decoded SPARCs to LDPC-based coded modulation schemes used in the WiMAX standard.

• In Section VII, we describe how partial outer codes can be used in conjunction with AMP decoding. We propose a three-stage decoder consisting of AMP decoding, followed by outer code decoding, and finally, AMP decoding once again. We show that by covering only a fraction of sections of the message with an outer code, the three-stage decoder can correct errors even in the sections not covered by the outer code. This results in bit-error curves with a steep waterfall behavior.

The main technical contributions of the paper are the iterative power allocation algorithm (Section II) and the three-stage decoder with an outer code (Section VII). The other sections describe how various choices of code parameters influence the finite length error performance, depending on whether the objective is to minimize the section error rate or the codeword error rate. We remark that the focus in this paper is on improving the finite length performance using the standard SPARC construction with power allocation. Optimizing the finite length performance of spatially-coupled SPARCs considered in [10, 9] is an interesting research direction, but one that is beyond the scope of this paper.

## Ii Power Allocation

Before introducing the power allocation scheme, we briefly give some intuition about the AMP update rules (2)–(4), and the SE recursion in (5)–(6). The update step (3) to generate each estimate of is underpinned by the following key property: after step , the “effective observation” is approximately distributed as , where is standard normal random vector independent of . Thus is the effective noise variance at the end of step . Assuming that the above distributional property holds, is just the Bayes-optimal estimate of based on the effective observation. The entry is proportional to the posterior probability of the th entry being the non-zero entry in its section.

We see from (5) that the effective noise variance is the sum of two terms. The first is the channel noise variance . The other term can be interpreted as the interference due to the undecoded sections in . Equivalently, is the expected power-weighted fraction of sections which are correctly decodable at the end of step .

The starting point for our power allocation design is the following result from [11], which gives analytic upper and lower bounds for of (5).

###### Lemma 1.

[14, Lemma 1(b)] Let . For sufficiently large , and for any ,

 x(τ) ≤L∑ℓ=1PℓP[1{νℓ>2−δ}+M−κ1δ21{νℓ≤2−δ}], (8) x(τ) ≥(1−M−κ2δ2δ√lnM)L∑ℓ=1PℓP1{νℓ>2+δ}. (9)

where are universal positive constants.

As the constants in (8)–(9) are not precisely specified, for designing power allocation schemes, we use the following approximation for :

 x(τ)≈L∑ℓ=1PℓP1{LPℓ>2Rτ2ln2}. (10)

This approximate version, which is increasingly accurate as grow large, is useful for gaining intuition about suitable power allocations. Indeed, if the effective noise variance after step is , then (10) says that any section whose normalized power is larger than the threshold is likely to be decodable correctly in step , i.e., in , the probability mass within the section will be concentrated on the correct non-zero entry. For a given power allocation, we can iteratively estimate the SE parameters for each using the lower bound in (10). This provides a way to quickly check whether or not a given power allocation will lead to reliable decoding in the large system limit. For reliable decoding at a given rate , the effective noise variance given by should decrease with until it reaches a value close to in a finite number of iterations. Equivalently, in (6) should increase to a value very close to .

For a rate , there are infinitely many power allocations for which (10) predicts successful decoding in the large system limit. However, as illustrated below, their finite length error performance may differ significantly. Thus the key question addressed in this section is: how do we choose a power allocation that gives the lowest section error rate?

The exponentially-decaying power allocation given by

 Pℓ=P(22C/L−1)1−2−2C2−2Cℓ/L,ℓ∈[L], (11)

was proven in [11] to be capacity-achieving in the large system limit, i.e., it was shown that the section error rate of the AMP decoder converges almost surely to as , for any . However, it does not perform well at practical block lengths, which motivated the search for alternatives. We now evaluate it in the context of (10) to better explain the development of a new power allocation scheme.

Given a power allocation, using (10) one can compute the minimum required power for any section to decode, assuming that the sections with higher power have decoded correctly. The dashed lines in Figure 2 shows the minimum power required for each section to decode (assuming the exponential allocation of (11) for the previous sections), for and . The figure shows that the power allocation in (11) matches (up to order terms) with the minimum required power when . However, for , we see that the exponentially-decaying allocation allocates significantly more power to the earlier sections than the minimum required, compared to later sections. This leads to relatively high section error rates, as shown in Figure 6.

Figure 2 shows that the total power allocated by the minimal power allocation at is significantly less than the available power . Therefore, the key question is: how do we balance the allocation of available power between the various sections to minimize the section error rate? Allocating excessive power to the earlier sections ensures they decode reliably early on, but then there will not be sufficient power left to ensure reliable decoding in the final sections. This is the reason for the poor finite length performance of the exponentially-decaying allocation. Conversely, if the power is spread too evenly then no section particularly stands out against the noise, so it is hard for the decoding to get started, and early errors can cause cascading failures as subsequent sections are also decoded in error.

This trade-off motivated the following modified exponential power allocation proposed in [11]:

 Pℓ={κ⋅2−2aCℓ/L1≤ℓ≤fL,κ⋅2−2aCffL+1≤ℓ≤L, (12)

where the normalizing constant is chosen to ensure that . In (12), the parameter controls the steepness of the exponential allocation, while the parameter flattens the allocation after the first fraction of the sections. Smaller choices of lead to less power allocated to the initial sections, making a larger amount available to the later sections. Similarly, smaller values of lead to more power allocated to the final sections. See Figure 3 for an illustration.

While this allocation improves the section error rate by a few orders of magnitude (see [11, Fig. 4]), it requires costly numerical optimization of and . A good starting point is to use , but further optimization is generally necessary. This motivates the need for a fast power allocation algorithm with fewer tuning parameters.

### Ii-a Iterative power allocation

We now describe a simple parameter-free iterative algorithm to design a power allocation. The sections of the SPARC are divided into blocks of sections each. Each section within a block is allocated the same power. For example, with and , there are blocks with sections per block. The algorithm sequentially allocates power to each of the blocks as follows. Allocate the minimum power to the first block of sections so that they can be decoded in the first iteration when . Using (10), we set the power in each section of the first block to

 Pℓ=2Rτ20ln2L,1≤ℓ≤LB.

Using (10) and (5), we then estimate . Using this value, allocate the minimum required power for the second block of sections to decode, i.e., for . If we sequentially allocate power in this manner to each of the blocks, then the total power allocated by this scheme will be strictly less than whenever . We therefore modify the scheme as follows.

For , to allocate power to the th block of sections assuming that the first blocks have been allocated, we compare the two options and choose the one that allocates higher power to the block: i) allocating the minimum required power (computed as above) for the th block of sections to decode; ii) allocating the remaining available power equally to sections in blocks , and terminating the algorithm. This gives a flattening in the final blocks similar to the allocation in (12), but without requiring a specific parameter that determines where the flattening begins. The iterative power allocation routine is described in Algorithm 1. Figure 4 shows a toy example building up the power allocation for , where flattening is seen to occur in step 4. Figure 5 shows a more realistic example with and .

Choosing : By construction, the iterative power allocation scheme specifies the number of iterations of the AMP decoder in the large system limit. This is given by the number of blocks with distinct powers; in particular the number of iterations (in the large system limit) is of the order of . For finite code lengths, we find that it is better to use a termination criterion for the decoder based on the estimates generated by the algorithm. This criterion is described in Sec. IV. This data-driven termination criterion allows us to choose the number of blocks to be as large as . We found that choosing , together with the termination criterion in Sec. IV, consistently gives a small improvement in error performance (compared to other choices of ), with no additional time or memory cost.

Additionally, with , it is possible to quickly determine a pair for the modified exponential allocation in (12) which gives a nearly identical allocation to the iterative algorithm. This is done by first setting to obtain the same flattening point found in the iterative allocation, and then searching for an which matches the first allocation coefficient between the iterative and the modified exponential allocations. Consequently, any simulation results obtained for the iterative power allocation could also be obtained using a suitable with the modified exponential allocation, without having to first perform a costly numerical optimization over .

Figure 6 compares the error performance of the exponential and iterative power allocation schemes discussed above for different values of at . The iterative power allocation yields significantly improved for rates away from capacity when compared to the original exponential allocation, and additionally outperforms the modified exponential allocation results reported in [11].

For the experiments in Figure 6, the value for used in constructing the iterative allocation (denoted by ) was optimized numerically. Constructing an iterative allocation with yields good results, but due to finite length concentration effects, the yielding the smallest average error rate may be slightly different from the communication rate . The effect of on the concentration of error rates is discussed in Section III-B. We emphasize that this optimization over is simpler than numerically optimizing the pair for the modified exponential allocation. Furthermore, guidelines for choosing as a function of are given in Section III-B.

In this section, we discuss how the choice of SPARC design parameters can influence the trade-off between the ‘typical’ value of section error rate and concentration of actual error rates around the typical values. The typical section error rate refers to that predicted by state evolution (SE). Indeed, running the SE equations (5)–(6) until convergence gives the following prediction for the section error rate:

 ESEsec:=1−1LL∑ℓ=1E⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣e√nPℓτT⎛⎜⎝Uℓ1+√nPℓτT⎞⎟⎠e√nPℓτT⎛⎜⎝Uℓ1+√nPℓτT⎞⎟⎠+∑Mj=2e√nPℓτTUℓj⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦, (13)

where denotes the value in the final iteration. The concentration refers to how close the SE prediction is to the observed section error rate.

As we describe below, the choice of SPARC parameters and the power allocation both determine a trade-off between obtaining a low value for , and concentration of the actual section error rate around . This trade-off is of particular interest when applying an outer code to the SPARC, as considered in Section VII, which may be able to reliably handle only a small number of section errors.

### Iii-a Effect of L and M on concentration

Recall from (1) that the code length at a given rate is determined by the choice of and according to the relationship . In general, and may be chosen freely to meet a desired rate and code length.

To understand the effect of increasing , consider Figure 7 which shows the error performance of a SPARC with , as we increase the value of . From (1), the code length increases logarithmically with . We observe that the section error rate (averaged over trials) decreases with up to , and then starts increasing. This is in sharp contrast to the SE prediction (13) (plotted using a dashed line in Figure 7) which keeps decreasing as is increased.

This divergence between the actual section error rate and the SE prediction for large is due to large fluctuations in the number of section errors across trials. Recent work on the error exponent of SPARCs with AMP decoding shows that the concentration of error rates near the SE prediction is strongly dependent on both and . For , [14, Theorem 1] shows that for any , the section error rate satisfies

 P(Esec>ESEsec+ϵ)≤KTe−κTL(logM)2T−1(ϵln(1+snr)4(1+snr)−f(M))2, (14)

where is the number of iterations until state evolution convergence, are constants depending on , and is a quantity that tends to zero with growing . For any power allocation, increases as approaches . For example, for the exponential power allocation. We observe that the deviation probability bound on the RHS of (14) depends on the ratio .

In our experiments, is generally on the order of a few tens. Therefore, keeping constant, the probability of large deviations from the SE prediction increases with . This leads to the situation shown in Figure 7, which shows that the SE prediction continues to decrease with , but beyond a certain value of , the observed average section error rate becomes progressively worse due to loss of concentration. This is caused by a small number of trials with a very large number of section errors, even as the majority of trials experience lower and lower error rates as is increased. This effect can be clearly seen in Figure 8, which compares the histogram of section error rates over trials for and . The distribution of errors is clearly different, but both cases have the same average section error rate due to the poorer concentration for .

To summarize, given , and , there is an optimal that minimizes the empirical section error rate. Beyond this value of , the benefit from any further increase is outweighed by the loss of concentration. For a given , values of close to are a good starting point for optimizing the empirical section error rate, but obtaining closed-form estimates of the optimal for a given is still an open question.

For fixed , the optimal value of increases with snr. This effect can be seen in the results of Figure 12, where there is an inversion in the order of best-performing values as increases. This is because as snr increases, the number of iterations for SE to converge decreases. A smaller mitigates the effect of larger in the large deviations bound of (14). In other words, a larger snr leads to better error rate concentration around the SE prediction, so larger values of are permissible before the performance starts degrading.

### Iii-B Effect of power allocation on concentration

The non-asymptotic bounds on in Lemma 1 indicate that at finite lengths, the minimum power required for a section to decode in an iteration may be slightly different than that indicated by the approximation in (10). Recall that the iterative power allocation algorithm in Section II-A was designed based on (10). We can compensate for the difference between the approximation and the actual value of by running the iterative power allocation in Algorithm 1 using a modified rate which may be slightly different from the communication rate . The choice of directly affects the error concentration. We now discuss the mechanism for this effect and give guidelines for choosing as a function of .

If we run the power allocation algorithm with , from (10) we see that additional power is allocated to the initial blocks, at the cost of less power for the final blocks (where the allocation is flat). Consequently, it is less likely that one of the initial sections will decode in error, but more likely that some number of the later sections will instead. Figure 9 (bottom) shows the effect of choosing a large : out of a total of trials, there were no trials with more than 7 sections decoded in error (the number of sections ); however, relatively few trials () have zero section errors.

Conversely, choosing allocates less power to the initial blocks, and increases the power in the final sections which have a flat allocation. This increases the likelihood of the initial section being decoded in error; in a trial when this happens, there will be a large number of section errors. However, if the initial sections are decoded correctly, the additional power in the final sections increases the probability of the trial being completely error-free. Thus choosing makes completely error-free trials more likely, but also increases the likelihood of having trials with a large number of sections in error. In Figure 9 (top), the smaller gives zero or one section errors in the majority () of cases, but the remaining trials typically have a large number of sections in error.

To summarize, the larger the , the better the concentration of section error rates of individual trials around the overall average. However, increasing beyond a point just increases the average section error rate because of too little power being allocated to the final sections.

For different values of the communication rate , we empirically determined an that gives the lowest average section error rate, by starting at and searching the neighborhood in steps of . Exceptionally, at low rates (for ), the optimal is found to be , leading to a completely flat power allocation with for all . We note from (10) that for , the large system limit theory does not predict that we can decode any of the sections — this is because no section is above the threshold in the first iteration of decoding. However, in practice, we observe that some sections will decode initially (due to the correct column being aligned favorably with the noise vector), and this reduces the threshold enough to allow subsequent decoding to continue in most cases. For , when closer to is used, the lower power in later sections hinders the finite length decoding performance.

We found that the value of that minimizes the average section error rate increases with . In particular, the optimal was for ; the optimal for was close to 1, and for , the optimal was between and . Though this provides a useful design guideline, a deeper theoretical analysis of the role of in optimizing the finite length performance is an open question.

Finally, a word of caution when empirically optimizing to minimize the average section error rate. Due to the loss of concentration as is decreased below , care must be taken to run sufficient trials to ensure that a rare unseen trial with many section errors will not catastrophically impact the overall average section error rate. For example, in one scenario with , we observed 192 trials with errors out of 407756 trials, but only 4 of these trials had more than one error, with between 400 to 600 section errors in those 4 cases. The average section error rate was . With fewer trials, it is possible that no trials with a large number of section errors would be observed, leading to an estimated error rate an order of magnitude better, at around .

## Iv Online Computation of τ2t and Early Termination

Recall that the update step (4) of the AMP decoder requires the SE coefficients , for . In the standard implementation [11], these coefficients are computed in advance using the SE equations (5)–(6). The total number of iterations is also determined in advance by computing the number of iterations required the SE to converge to its fixed point (to within a specified tolerance). This technique produced effective results, but advance computation is slow as each of the expectations in (6) needs to be computed numerically via Monte-Carlo simulation, for each . A faster approach is to compute the coefficients using the asymptotic expression for given in (10). This gives error performance nearly identical to the earlier approach with significant time savings, but still requires advance computation. Both these methods are referred to as “offline” as the values are computed a priori.

A simple way to estimate online during the decoding process is as follows. In each step , after producing as in (2), we estimate

 ˆτ2t=∥zt∥2n=1nn∑i=1z2i. (15)

The justification for this estimate comes from the analysis of the AMP decoder in [11, 14], which shows that for large , is close to in (5) with high probability. In particular, [14] provides a concentration inequality for similar to (14). We note that such a similar online estimate has been used previously in various AMP and GAMP algorithms [8, 9, 10, 15]. Here, we show that in addition to being fast, the online estimator permits an interpretation as a measure of SPARC decoding progress and provides a flexible termination criterion for the decoder. Furthermore, the error performance with the online estimator was observed to be the same or slightly better than the offline methods.

Recall from the discussion at the beginning of Section II that in each step, we have

 st:=βt+A∗zt≈β+τtZ, (16)

where is a standard normal random vector independent of . Starting from , a judicious choice of power allocation ensures that the SE parameter decreases with , until it converges at in a finite number of iterations .

However, at finite lengths there are deviations from this trajectory of predicted by SE, i.e., the variance of the effective noise vector may deviate from . The online estimator is found to track very accurately, even when this variance deviates significantly from . This effect can be seen in Figure 10, where 16 independent decoder runs are plotted and compared with the SE trajectory for (dashed line). For the 15 successful runs, the empirical variance approaches along different trajectories depending on how the decoding is progressing. In the unsuccessful run, converges to a value much larger than .

In all the runs, is indistinguishable from . This indicates that we can use the final value to accurately estimate the power of the undecoded sections — and thus the number of sections decoded correctly — at runtime. Indeed, is an accurate estimate of the total power in the incorrectly decoded sections. This, combined with the fact that the power allocation is non-increasing, allows the decoder to estimate the number of incorrectly decoded sections.

Furthermore, we can use the change in between iterations to terminate the decoder early. If the value has not changed between successive iterations, or the change is within some small threshold, then the decoder has stalled and no further iterations are worthwhile. Empirically we find that a stopping criterion with a small threshold (e.g., stop when ) leads to no additional errors compared to running the decoder for the full iteration count, while giving a significant speedup in most trials. Allowing a larger threshold for the stopping criterion gives even better running time improvements. This early termination criterion based on gives us flexibility in choosing the number of blocks in the iterative power allocation algorithm of Section II-A. This is because the number of AMP iterations is no longer tied to , hence can be chosen as large as desired.

To summarize, the online estimator provides an estimate of the noise variance in each AMP iteration that accurately reflects how the decoding is progressing in that trial. It thereby enables the decoder to effectively adapt to deviations from the values predicted by SE. This explains the improved performance compared to the offline methods of computing . More importantly, it provides an early termination criterion for the AMP decoder as well as a way to track decoding progress and predict the number of section errors at runtime.

## V Predicting Esec, Eber and Ecw

For a given power allocation and reasonably large SPARC parameters , it is desirable to have a quick way to estimate the section error rate and codeword error rate, without resorting to simulations. Without loss of generality, we assume that the power allocation is asymptotically good, i.e., the large system limit SE parameters (computed using (10)) predict reliable decoding, i.e., the SE converges to and in the large system limit. The goal is to estimate the finite length section error rate .

One way to estimate is via the state evolution prediction (13), using . However, computing (13) requires computing expectations, each involving a function of independent standard normal random variables. The following result provides estimates of and that are as accurate as the SE-based estimates, but much simpler to compute.

###### Proposition 1.

Let the power allocation be such that the state evolution iteration using the asymptotic approximation (10) converges to . Then, under the idealized assumption that (where is a standard normal random vector independent of ), we have the following. The probability of a section (chosen uniformly at random) being incorrectly decoded is

 ¯Esec =1−1LL∑ℓ=1EU[Φ(√nPℓσ+U)]M−1. (17)

The probability of the codeword being incorrectly decoded is

 ¯Ecw =1−L∏ℓ=1EU[Φ(√nPℓσ+U)]M−1. (18)

In both expressions above, is a standard normal random variable, and is the standard normal cumulative distribution function.

###### Proof:

As , the effective observation in the final iteration has the representation . The denoising function generates a final estimate based on this effective observation, and the index of the largest entry in each section is chosen to form the decoded message vector . Consider the decoding of section of . Without loss of generality, we can assume that the first entry of the section is the non-zero one. Using the notation to denote the th entry of the section , we therefore have , and for . As the effective observation for section has the representation , the section will be incorrectly decoded if and only if the following event occurs:

 {√nPℓ+σZℓ,1≤σZℓ,2}∪…∪{√nPℓ+σZℓ,1≤σZℓ,M}.

Therefore, the probability that the th section is decoded in error can be computed as

 (19)

where and denote the density and the cumulative distribution function of the standard normal distribution, respectively. In the second line of (19), we condition on and then use the fact that are i.i.d. .

The probability of a section chosen uniformly at random being incorrectly decoded is . The probability of codeword error is one minus the probability that no section is in error, which is given by . Substituting for from (19) yields the expressions in (17) and (18).

The section error rate and codeword error rate can be estimated using the idealized expressions in (17) and (18). This still requires computing expectations, but each expectation is now a function of a single Gaussian random variable, rather than the independent ones in the SE estimate. Thus we reduce the complexity by a factor of over the SE approach; evaluations of and typically complete within a second.

Figure 7 shows alongside the SE estimate for , and various values of . We see that both these estimates match the simulation results closely up to a certain value of . Beyond this point, the simulation results diverge from theoretical estimates due to lack of concentration in section error rates across trials, as described in Sec. III-A. Figure 11 compares the idealized codeword error probability in (18) with that obtained from simulations. Here, there is a good match between the estimate and the simulation results as the concentration of section error rates across trials plays no role — any trial with one or more section errors corresponds to one codeword error.

## Vi Comparison with Coded Modulation

In this section, we compare the performance of AMP-decoded SPARCs against coded modulation with LDPC codes. Specifically, we compare with two instances of coded modulation with LDPC codes from the WiMax standard IEEE 802.16e: 1) A -QAM constellation with a rate LDPC code for an overall rate bit/channel use/real dimension, and 2) A -QAM constellation with a rate LDPC code for an overall rate bits/channel use/real dimension. (The spectral efficiency is bits/s/Hz.) The coded modulation results, shown in dashed lines in Figures 12 and 13, are obtained using the CML toolkit [16] with LDPC code lengths and .

Each figure compares the bit error rates (BER) of the coded modulation schemes with various SPARCs of the same rate, including a SPARC with a matching code length of . Using and , the signal-to-noise ratio of the SPARC can be expressed as . The SPARCs are implemented using Hadamard-based design matrices, power allocation designed using the iterative algorithm in Sec. II-A with , and online parameters with the early termination criterion (Sec. IV). An IPython notebook detailing the SPARC implementation is available at [17].

Figure 12 shows that for , the best value of among those considered increases from at lower snr values to at higher snr values. This is due to the effect discussed in Section III-A, where larger snr values can support larger values of , before performance starts degrading due to loss of concentration.

At both and , the SPARCs outperform the LDPC coded modulation at values close to the Shannon limit, but the error rate does not drop off as quickly at higher values of . One way to enhance SPARC performance at higher snr is by treating it as a high-dimensional modulation scheme and adding an outer code. This is the focus of the next section.

## Vii AMP with Partial Outer Codes

Figures 12 and 13 show that for block lengths of the order of a few thousands, AMP-decoded SPARCs do not exhibit a steep waterfall in section error rate. Even at high values, it is still common to observe a small number of section errors. If these could be corrected, we could hope to obtain a sharp waterfall behavior similar to the LDPC codes.

In the simulations of the AMP decoder described above, when and are chosen such that the average error rates are well-concentrated around the state evolution prediction, the number of section errors observed is similar across trials. Furthermore, we observe that the majority of sections decoded incorrectly are those in the flat region of the power allocation, i.e., those with the lowest allocated power. This suggests we could use a high-rate outer code to protect just these sections, sacrificing some rate, but less than if we naïvely protected all sections. We call the sections covered by the outer code protected sections, and conversely the earlier sections which are not covered by the outer code are unprotected. In [4], it was shown that a Reed-Solomon outer code (that covered all the sections) could be used to obtain a bound the probability of codeword error from a bound on the probability of excess section error rate.

Encoding with an outer code (e.g., LDPC or Reed-Solomon code) is straightforward: just replace the message bits corresponding to the protected sections with coded bits generated using the usual encoder for the chosen outer code. To decode, we would like to obtain bit-wise posterior probabilities for each codeword bit of the outer code, and use them as inputs to a soft-information decoder, such as a sum-product or min-sum decoder for LDPC codes. The output of the AMP decoding algorithm permits this: it yields , which contains weighted section-wise posterior probabilities; we can directly transform these into bit-wise posterior probabilities. See Algorithm 2 for details.

Moreover, in addition to correcting AMP decoding errors in the protected sections, successfully decoding the outer code also provides a way to correct remaining errors in the unprotected sections of the SPARC codeword. Indeed, after decoding the outer code we can subtract the contribution of the protected sections from the channel output sequence , and re-run the AMP decoder on just the unprotected sections. The key point is that subtracting the contribution of the later (protected) sections eliminates the interference due to these sections; then running the AMP decoder on the unprotected sections is akin to operating at a much lower rate.

Thus the decoding procedure has three stages: i) first round of AMP decoding, ii) decoding the outer code using soft outputs from the AMP, and iii) subtracting the contribution of the sections protected by the outer code, and running the AMP decoder again for the unprotected sections. We find that the final stage, i.e., running the AMP decoder again after the outer code recovers errors in the protected sections of the SPARC, provides a significant advantage over a standard application of an outer code, i.e., decoding the final codeword after the second stage.

We describe this combination of SPARCs with outer codes below, using an LDPC outer code. The resulting error rate curves exhibit sharp waterfalls in final error rates, even when the LDPC code only covers a minority of the SPARC sections.

We use a binary LDPC outer code with rate , block length and code dimension , so that . For clarity of exposition we assume that both and are multiples of (and consequently that is a power of two). As each section of the SPARC corresponds to bits, if is an integer, then and bits represent an integer number of SPARC sections, denoted by

 LLDPC=nLDPClogMandLprotected=kLDPClogM,

respectively. The assumption that and are multiples of is not necessary in practice; the general case is discussed at the end of the next subsection.

We partition the sections of the SPARC codeword as shown in Fig 14. There are sections corresponding to the user (information) bits; these sections are divided into unprotected and protected sections, with only the latter being covered by the outer LDPC code. The parity bits of the LDPC codeword index the last sections of the SPARC. For convenience, the protected sections and the parity sections together are referred to as the LDPC sections.

For a numerical example, consider the case where , . There are bits per SPARC section. For a LDPC code () we obtain the following relationships between the number of the sections of each kind:

 Lparity=nLDPC−kLDPClogM=(5120−4096)8=128, Luser=L−Lparity=1024−128=896, Lprotected=kLDPClogM=40968=512, LLDPC=Lprotected+Lparity=512+128=640, Lunprotected=Luser−Lprotected=L−LLDPC=384.

There are user bits, of which the final are encoded to a systematic -bit LDPC codeword. The resulting bits (including both the user bits and the LDPC parity bits) are encoded to a SPARC codeword using the SPARC encoder and power allocation described in previous sections.

We continue to use to denote the overall user rate, and to denote the SPARC code length so that . The underlying SPARC rate (including the overhead due to the outer code) is denoted by . We note that , hence . For example, with and and the outer code parameters as chosen above, , so .

### Vii-a Decoding SPARCs with LDPC outer codes

At the receiver, we decode as follows:

1. Run the AMP decoder to obtain . Recall that entry within section of is proportional to the posterior probability of the column being the transmitted one for section . Thus the AMP decoder gives section-wise posterior probabilities for each section .

2. Convert the section-wise posterior probabilities to bit-wise posterior probabilities using Algorithm 2, for each of the sections. This requires time complexity, of the same order as one iteration of AMP.

3. Run the LDPC decoder using the bit-wise posterior probabilities obtained in Step 2 as inputs.

4. If the LDPC decoder fails to produce a valid LDPC codeword, terminate decoding here, using to produce by selecting the maximum value in each section (as per usual AMP decoding).

5. If the LDPC decoder succeeds in finding a valid codeword, we use it to re-run AMP decoding on the unprotected sections. For this, first convert the LDPC codeword bits to a partial as follows, using a method similar to the original SPARC encoding:

1. Set the first entries of to zero,

2. The remaining sections (with entries per section) of will have exactly one-non zero entry per section, with the LDPC codeword determining the location of the non-zero in each section. Indeed, noting that , we consider the LDPC codeword as a concatenation of blocks of bits each, so that each block of bits indexes the location of the non-zero entry in one section of . The value of the non-zero in section is set to , as per the power allocation.

Now subtract the codeword corresponding to from the original channel output , to obtain .

6. Run the AMP decoder again, with input , and operating only over the first sections. As this operation is effectively at a much lower rate than the first decoder (since the interference contribution from all the protected sections is removed), it is more likely that the unprotected bits are decoded correctly than in the first AMP decoder.

We note that instead of generating , one could run the AMP decoder directly on , but enforcing that in each AMP iteration, each of the sections has all its non-zero mass on the entry determined by , i.e., consistent with Step 5.b).

7. Finish decoding, using the output of the final AMP decoder to find the first elements of , and using for the remaining elements.

In the case where and are not multiples of , the values and will not be integers. Therefore one section at the boundary of and will consist of some unprotected bits and some protected bits. Encoding is not affected in this situation, as the LDPC encoding happens prior to SPARC codeword encoding. When decoding, conversion to bit-wise posterior probabilities is performed for all sections containing LDPC bits (including the intermediate section at the boundary) and only the bit posteriors corresponding to the LDPC codeword are given to the LDPC decoder. When forming , the simplest option is to treat the intermediate section as though it were unprotected and set it to zero. It is also possible to compute column posterior probabilities which correspond to the fixed LDPC bits and probabilities arising from , though doing so is not covered in this paper.

### Vii-B Simulation results

The combined AMP and outer LDPC setup described above was simulated using the (5120, 4096) LDPC code () specified in [18] with a min-sum decoder. Bit error rates were measured only over the user bits, ignoring any bit errors in the LDPC parity bits.

Figure 15 plots results at overall rate , where the underlying LDPC code (modulated with BPSK) can be compared to the SPARC with LDPC outer code, and to a plain SPARC with rate . In this case , giving a flat power allocation. Figure 16 plots results at overall rate , where we can compare to the QAM-64 WiMAX LDPC code, and to the plain SPARC with rate 1.5 of Figure 13.

The plots show that protecting a fraction of sections with an outer code does provide a steep waterfall above a threshold value of . Below this threshold, the combined SPARC + outer code has worse performance than the plain rate SPARC without the outer code. This can be explained as follows. The combined code has a higher SPARC rate , which leads to a larger section error rate for the first AMP decoder, and consequently, to worse bit-wise posteriors at the input of the LDPC decoder. For below the threshold, the noise level at the input of the LDPC decoder is beyond than the error-correcting capability of the LDPC code, so the LDPC code effectively does not correct any section errors. Therefore the overall performance is worse than the performance without the outer code.

Above the threshold, we observe that the second AMP decoder (after subtracting the contribution of the LDPC-protected sections) is successful at decoding the unprotected sections that were initially decoded incorrectly. This is especially apparent in the case (Figure 15), where the section errors are uniformly distributed over all sections due to the flat power allocation; errors are just as likely in the unprotected sections as in the protected sections.

### Vii-C Outer code design choices

In addition to the various SPARC parameters discussed in previous sections, performance with an outer code is sensitive to what fraction of sections are protected by the outer code. When more sections are protected by the outer code, the overhead of using the outer code is also higher, driving higher for the same overall user rate . This leads to worse performance in the initial AMP decoder, which has to operate at the higher rate . As discussed above, if is increased too much, the bit-wise posteriors input to the LDPC decoder are degraded beyond its ability to successfully decode, giving poor overall performance.

Since the number of sections covered by the outer code depends on both and , various trade-offs are possible. For example, given , choosing a larger value of corresponds to fewer sections being covered by the outer code. This results in smaller rate overhead, but increasing may also affect concentration of the error rates around the SE predictions, as discussed in Section III-A. We conclude with two remarks about the choice of parameters for the SPARC and the outer code.

1. When using an outer code, it is highly beneficial to have good concentration of the section error rates for the initial AMP decoder. This is because a small number of errors in a single trial can usually be fully corrected by the outer code, while occasional trials with a very large number of errors cannot.

2. Due to the second AMP decoder operation, it is not necessary for all sections with low power to be protected by the outer code. For example, in Figure 15, all sections have equal power, and around 30% are not protected by the outer code. Consequently, these sections are often not decoded correctly by the first decoder. Only once the protected sections are removed is the second decoder able to correctly decode these unprotected sections. In general the aim should be to cover all or most of the sections in the flat region of the power allocation, but experimentation is necessary to determine the best trade-off.

An interesting direction for future work would be to develop an EXIT chart analysis to jointly optimize the design of the SPARC and the outer LDPC code.

## Acknowledgement

The authors thank the Editor and the anonymous referees for several helpful comments which improved the paper.

## References

• [1] R. G. Gallager, Information theory and reliable communication.   Springer, 1968.