A FFAST framework for computing a k-sparse DFT in O(k\log k) time using sparse-graph alias codes

# A FFAST framework for computing a k-sparse DFT in O(klogk) time using sparse-graph alias codes

\authorblockNSameer Pawar and Kannan Ramchandran
\authorblockADept. of Electrical Engineering and Computer Sciences
University of California, Berkeley
{spawar, kannanr}@eecs.berkeley.edu
The material in this paper was presented in part at the IEEE International Symposium on Information Theory, Istanbul, Turkey, 2013.
###### Abstract

Given an -length input signal , it is well known that its Discrete Fourier Transform (DFT), , can be computed from samples in operations using a Fast Fourier Transform (FFT) algorithm. If the spectrum is -sparse (where ), can we do better? We show that asymptotically in and , when is sub-linear in (precisely, ) where ), and the support of the non-zero DFT coefficients is uniformly random, our proposed FFAST (Fast Fourier Aliasing-based Sparse Transform) algorithm computes the DFT , from samples in arithmetic operations, with high probability. Further, the constants in the big Oh notation for both sample and computational cost are small, e.g., when , which essentially covers almost all practical cases of interest, the sample cost is less than .

Our approach is based on filterless subsampling of the input signal using a set of carefully chosen uniform subsampling patterns guided by the Chinese Remainder Theorem (CRT). The idea is to cleverly exploit, rather than avoid, the resulting aliasing artifacts induced by subsampling. Specifically, the subsampling operation on is designed to create aliasing patterns on the spectrum that “look like” parity-check constraints of a good erasure-correcting sparse-graph code. Next, we show that computing the sparse DFT is equivalent to decoding of sparse-graph codes. These codes further allow for fast peeling-style decoding. The resulting DFT computation is low in both sample complexity and decoding complexity. We analytically connect our proposed CRT-based aliasing framework to random sparse-graph codes, and analyze the performance of our algorithm using density evolution techniques from coding theory. We also provide simulation results, that are in tight agreement with our theoretical findings.

## I Introduction

Spectral analysis using the Discrete Fourier Transform (DFT) has been of universal importance in engineering and scientific applications for a long time. The Fast Fourier Transform (FFT) is the fastest known way to compute the DFT of an arbitrary -length signal, and has a computational complexity of 111Recall that a single variable function is said to be , if for a sufficiently large the function is bounded above by , i.e., for some constant . Similarly, if and if the growth rate of as , is negligible as compared to that of , i.e. .. Many applications of interest involve signals, e.g. audio, image, video data, biomedical signals etc., which have a sparse Fourier spectrum. In such cases, a small subset of the spectral components typically contain most or all of the signal energy, with most spectral components being either zero or negligibly small. If the -length DFT, , is -sparse, where , can we do better in terms of both sample and computational complexity of computing the sparse DFT? We answer this question affirmatively. In particular, we show that asymptotically in and , when is sub-linear in (precisely, ) where ), and the support of the non-zero DFT coefficients is uniformly random, our proposed FFAST (Fast Fourier Aliasing-based Sparse Transform) algorithm computes the DFT , from judiciously chosen samples in arithmetic operations, with high probability. Further, the constants in the big Oh notation for both sample and computational cost are small, e.g., when , which essentially covers almost all practical cases of interest, the sample cost is less than .

At a high level, our idea is to cleverly exploit rather than avoid the aliasing resulting from subsampling, and to shape the induced spectral artifacts to look like parity constraints of “good” erasure-correcting codes, e.g., Low-Density-Parity-Check (LDPC) codes [1] and fountain codes [2], that have both low computational complexity and are near-capacity achieving. Towards our goal of shaping the spectral artifacts to look like parity constraints of good erasure-correcting codes, we are challenged by not being able to design any arbitrary spectral code at will, as we can control only the subsampling operation on the time-domain signal. The key idea is to design subsampling patterns, guided by the Chinese-Remainder-Theorem (CRT) [3], that create the desired code-like aliasing patterns. Based on the qualitative nature of the subsampling patterns needed, our analysis is decomposed into two regimes (see Section VI and Section VII for more details):

• The “very-sparse” regime, where . For the very sparse regime the subsampling patterns are based on relatively co-prime numbers.

• The “less-sparse” regime, where . The sub-sampling pattern, for the less-sparse regime, comprise of “cyclically-shifted” overlapping co-prime integers.

Our approach is summarized in Fig. 1. The -length input signal is uniformly subsampled, through a small number222We show that the number of stages depend on the sparsity index , and is in the range of to for , as shown in Table II. of stages, say . Each stage subsamples the input signal and its circularly shifted version by a CRT guided set (see Sections VI and VII) of sampling periods, to generate samples per sub-sampling path, for . Next, the large -length DFT is synthesized from much smaller (length DFTs, using the FFAST back-end peeling decoder. For the entire range of practical interest of sub-linear sparsity, i.e., , the FFAST algorithm computes -length -sparse DFT in computations from less than samples. It is gratifying to note that both the sample complexity and the computational complexity depend only on the sparsity parameter , which is sub-linear in .

We emphasize the following caveats. First, our analytical results are probabilistic, and hold for asymptotic values of and , with a success probability that approaches asymptotically. This contrasts the FFT algorithm which works deterministically for all values of and . Secondly, we assume a uniformly random model for the support of the non-zero DFT coefficients. Lastly, we require the signal length to be a product of a few (typically 3 to 9) distinct primes333This is not a major restriction for two reasons. Firstly, in many problems of interest, the choice of is available to the system designer, and choosing to be a power of 2 is often invoked only to take advantage of the readily-available radix-2 FFT algorithms. Secondly, by truncating or zero-padding the given signal, by a constant number of samples, one can obtain a modified signal of a desired length . The DFT of the modified signal has dominant non-zero DFT coefficients at the same frequencies as the original signal but with additional noise-like side-lobes, due to windowing effect, and can be decoded using a noise robust version of the FFAST algorithm [4].. In effect, our algorithm trades off sample and computational complexity for asymptotically zero probability of failure guarantees, for a uniformly random model of the support. The FFAST algorithm is applicable whenever is sub-linear in (i.e. is ), but is obviously most attractive when is much smaller than . As a concrete example, when , and , our algorithm achieves computational savings by a factor of more than , and savings in the number of input samples by a factor of more than over the standard FFT (see [5] for computational complexity of a prime factor FFT algorithm). This can be a significant advantage in many existing applications, as well as enable new classes of applications not thought practical so far.

The focus of this paper is on signals having an exactly-sparse DFT. Our motivation for this focused study is three-fold: (i) to provide conceptual clarity of our proposed approach in a noiseless setting; (ii) to present our deterministic subsampling front-end measurement subsystem as a viable alternative to the class of randomized measurement matrices popular in the compressive sensing literature [6, 7]; and (iii) to explore the fundamental limits on both sample complexity and computational complexity for the exactly sparse DFT problem, which is of intellectual interest. In addition, the key insights derived from analyzing the exactly sparse signal model, apply more broadly to the noisy setting (see discussion in Section X), i.e., where the observations are further corrupted by noise [4].

The rest of the paper is organized as follows. Section II states the problem and introduce important notations. Section III presents our main results, and the related literature is reviewed in Section IV. Section V exemplifies the mapping of the problem of computing a sparse DFT to decoding over an appropriate sparse-graph code. In Section VI and Section VII we analyze the performance of the FFAST algorithm for the very-sparse and the less-sparse regimes respectively. Section IX provides simulation results that empirically corroborate our theoretical findings. The paper is concluded in Section X, with few remarks about extending the FFAST framework for noise robustness and other applications.

## Ii Problem formulation, notations and preliminaries

### Ii-a Problem formulation

Consider an -length discrete-time signal that is sum of complex exponentials, i.e., its -length discrete Fourier transform has non-zero coefficients:

 x[p]=k−1∑q=0X[ℓq]e2πıℓqp/n,    p=0,1,…,n−1, (1)

where the discrete frequencies and the amplitudes , for . We further assume that the discrete frequencies are uniformly random in to and the amplitudes are drawn from some continuous distribution over complex numbers. Under these assumptions, we consider the problem of identifying the unknown frequencies and the corresponding complex amplitudes from the time domain samples . A straightforward solution is to compute an -length DFT, , using a standard FFT algorithm [3], and locate the non-zero coefficients. Such an algorithm requires samples and computations. When the DFT is known to be exactly -sparse and , computing all the DFT coefficients seems redundant.

In this paper, we address the problem of designing an algorithm, to compute the -sparse -length DFT of for the asymptotic regime of and . We would like the algorithm to have the following features:

• it takes as few input samples of as possible.

• it has a low computational cost, that is possibly a function of only the number of input samples .

• it is applicable for the entire sub-linear regime, i.e., for all , where .

• it computes the DFT with a probability of failure vanishing to as becomes large.

In Table I, we provide notations and definitions of the important parameters used in the rest of the paper.

### Ii-B The Chinese-Remainder-Theorem (CRT)

The CRT plays an important role in our proposed FFAST architecture as well as in the FFAST decoder. For integers , we use to denote the operation, , i.e., .

###### Theorem II.1 (Chinese-Remainder-Theorem [3]).

Suppose are pairwise co-prime positive integers and . Then, every integer ‘’ between and is uniquely represented by the sequence of its remainders modulo respectively and vice-versa.

Further, given a sequence of remainders , where , one can find an integer ‘’, such that,

 (a)ni≡ri  for i=0,1,…,d−1. (2)

For example, consider the following pairwise co-prime integers and . Then, given a sequence of remainders , there exists a unique integer ‘’, such that,

 2 ≡ amod3 2 ≡ amod4 (3) 3 ≡ amod5

It is easy to verify that is a unique integer, between and , that satisfies the congruencies in (II-B).

## Iii Main Results

We propose a novel FFAST architecture and algorithm to compute a -sparse -length DFT, , of an -length signal . In this paper, an -length input signal is said to have a -sparse DFT , if has at most non-zero coefficients, whose locations are uniformly randomly distributed in . The FFAST algorithm computes the -sparse -length DFT with high probability, using as few as samples of and computations. The following theorem states the main result of the paper.

###### Theorem III.1.

For any , and large enough , the FFAST algorithm computes a -sparse DFT of an -length input , where , with the following properties:

1. Sample complexity: The algorithm needs samples of , where is a small constant and depends on the sparsity index ;

2. Computational complexity: The FFAST algorithm computes DFT using , arithmetic operations.

3. Probability of success: The probability that the algorithm correctly computes the -sparse DFT is at least 1- .

###### Proof.

We prove the theorem in three parts. In Section VI, we analyze the performance of the FFAST algorithm for the very-sparse regime of , and in Section VII we analyze the less-sparse regime of . Lastly, in Section VIII we analyze the sample and computational complexity of the FFAST algorithm.∎

###### Remark III.2.

[Oversampling ratio ] The achievable oversampling ratio in the sample complexity , depends on the number of stages used in the FFAST architecture. The number of stages , in turn, is a function of the sparsity index (recall )), and increases as (i.e., as the number of the non-zero coefficients, , approach the linear regime in ). In Sections VI and VII, we provide constructions of the FFAST front-end architecture, that require increasing number of stages as increase from to . In our proposed construction, the increase in occurs over intervals of , resulting in a staircase plot as shown in Fig. 2. Note, that the plot in Fig. 2 is an achievable performance of the FFAST algorithm using the constructions described in Section VI-D and Section VII-D. There are many other constructions of FFAST architecture that may achieve similar or better performance for a specific sparsity index . Table II provides some example values of and for different values of the sparsity index .

## Iv Related work

The problem of computing a sparse discrete Fourier transform of a signal is related to the rich literature of frequency estimation [8, 9, 10, 11] in statistical signal processing as well as compressive-sensing [6, 7]. In frequency estimation, it is assumed that a signal consists of complex exponentials embedded in additive noise. The frequency estimation techniques are based on well-studied statistical methods like MUSIC and ESPRIT [8, 9, 10, 11], where the focus is on ‘super-resolution’ spectral estimation from initial few consecutive samples, i.e., extrapolation. In contrast, the algorithm proposed in this paper combine tools from coding theory, number theory, graph theory and signal processing, to ‘interpolate’ the signal from interspersed but significantly less number of samples. In compressive sensing, the objective is to reliably reconstruct the sparse signal from as few measurements as possible, using a fast recovery technique. The bulk of this literature concentrates on random linear measurements, followed by either convex programming or greedy pursuit reconstruction algorithms [7, 12, 13]. An alternative approach, in the context of sampling a continuous time signal with a finite rate of innovation is explored in [14, 15, 16, 17]. Unlike the compressive sensing problem, where the resources to be optimized are the number of measurements444Consider a compressive sensing problem with a measurement matrix , i.e., , where is a measurement vector and is the input signal. Then, the sample complexity is equal to the number of non-zero columns of and the measurement complexity is equal to the number of non-zero rows of . and the recovery cost, in our problem, we want to minimize the number of input samples needed by the algorithm in addition to the recovery cost.

At a higher level though, despite some key differences in our approach to the problem of computing a sparse DFT, our problem is indeed closely related to the spectral estimation and compressive sensing literature, and our approach is naturally inspired by this, and draws from the rich set of tools offered by this literature.

A number of previous works [18, 19, 20, 21, 22] have addressed the problem of computing a -D DFT of a discrete-time signal that has a sparse Fourier spectrum, in sub-linear sample and time complexity. Most of these algorithms achieve a sub-linear time performance by first isolating the non-zero DFT coefficients into different bins, using specific filters or windows that have ‘good’ (concentrated) support in both, time and frequency. The non-zero DFT coefficients are then recovered iteratively, one at a time. The filters or windows used for the binning operation are typically of length . As a result, the sample and computational complexity is typically or more. Moreover the constants involved in the big-Oh notation can be large, e.g., the empirical evaluation of [19] presented in [23] shows that for and , the number of samples required are which is times more than the sample complexity of the FFAST algorithm555As mentioned earlier, the FFAST algorithm requires the length of the signal to be a product of a few distinct primes. Hence, the comparison is for an equivalent and .. The work of [20] provides an excellent tutorial on some of the key ideas used by most sub-linear time sparse FFT algorithms in the literature. While we were writing this paper, we became aware of a recent work [24], in which the authors consider the problem of computing a noisy as well as an exactly-sparse -D DFT of size signal. For an exactly sparse signal, and when , the algorithm in [24] uses samples to compute the 2-D DFT of the signal in time with a constant probability of failure (that is controllable but that does not appear go to zero asymptotically). In [25], the author proposes a sub-linear time algorithm with a sample complexity of or and computational complexity of or to compute a sparse DFT, with high probability or zero-error respectively. The algorithm in [25] exploits the Chinese-Remainder-Theorem, along with number of subsampling patterns to identify the locations of the non-zero DFT coefficients. In contrast, the FFAST algorithm exploits the CRT to induce ‘good’ sparse-graph codes using a small constant number of subsampling patterns and computes the sparse DFT with a vanishing probability of failure.

## V Computing DFT using decoding of alias-codes

In this section, we use a simple example to illustrate the working mechanics of the FFAST sub-sampling “front-end” and the associated “back-end” peeling-decoder. Then we demonstrate, how the output of the FFAST front-end sub-sampling can be viewed as a sparse graph code in the frequency domain, which we refer to as “Alias-codes”, and computing the DFT is equivalent to decoding of this resulting alias-code. Later, we also point out a connection between the FFAST and coding techniques for a packet erasure channel.

Consider a -length discrete-time signal , such that its -length DFT , is -sparse. Let the non-zero DFT coefficients of the signal be and .

### V-a FFAST sub-sampling front-end

In general, the FFAST sub-sampling front-end architecture, as shown in Fig. 1, consists of multiple sub-sampling stages and each stage further has sub-sampling paths, referred to as “delay-chains”. The sampling periods used in both the delay-chains of a stage are identical. For example, consider a stage FFAST sub-sampling front-end, shown in Fig. 3, that samples the -length input signal and its circularly shifted version666Conventionally, in signal processing literature is used to denote a time-delay. In this paper, for ease of exposition we use to denote a time-advancement (see Fig. 3 for an example). by factors and respectively. In the sequel we use, to denote the number of the output samples per delay-chain of stage , e.g., and in Fig. 3. The FFAST front-end then computes short DFT’s of appropriate lengths of the individual output data streams of the delay-chains. Next, we group the output of short DFT’s into “bin-observation”.

#### V-A1 Bin observation

A bin-observation is a -dimensional vector formed by collecting one scalar output value from the DFT output of the signal from each of the delay chains in a stage. For example, is an observation vector of bin in stage and is given by,

 →yb,0,1=(Xs[1]~Xs[1]). (4)

The first index of the observation vector corresponds to the stage number, while the second index is the bin number within a stage. Note that in the FFAST architecture of Fig. 3, there are total of bins in stage and bins in stage .

Using basic Fourier transform properties, reviewed below, of sampling, aliasing and circular shift, one can compute the relation between the original -length DFT of the signal and the output bin-observations of the FFAST front-end.

• Aliasing: If a signal is subsampled in the time-domain, its frequency components mix together, i.e., alias, in a pattern that depends on the sampling procedure. For example, consider the output of the first delay-chain of the stage-0 in Fig. 3. The input signal is uniformly sampled by a factor of to get . Then, the -length DFT of is related to the original -length DFT as:

 Xs[0] = X[0]+X[4]+X[8]+X[12]+X[16]=0 Xs[1] = X[1]+X[5]+X[9]+X[13]+X[17]=10 Xs[2] = X[2]+X[6]+X[10]+X[14]+X[18]=3 Xs[3] = X[3]+X[7]+X[11]+X[15]+X[19]=4

More generally, if the sampling period is (we assume that divides ) then,

 Xs[i]=∑j≡(i)n/NX[j], (5)

where the notation , denotes .

• Circular Shift in time: A circular shift in the time-domain results in a phase shift in the frequency-domain. Consider a circularly shifted signal obtained from as . The DFT coefficients of the shifted signal , are given as, , where is an root of unity. In general a circular shift of results in .

### V-B Alias-codes and its connection to computing sparse DFT from sub-sampled signal

In this section, we represent the relation between the original -length -sparse DFT and the FFAST front-end output, i.e., bin-observations, obtained using the Fourier properties, in a graphical form and interpret it as a “channel-code”. In particular, since this code is a result of a sub-sampling and aliasing operations, we refer to it as an “Alias-code”. We also establish that computing the sparse DFT of the signal from its sub-sampled data is equivalent to decoding the alias-code resulting from processing the input signal through the FFAST front-end.

Suppose that the -length example input signal is processed through a stage FFAST front-end architecture shown in Fig. 3, to obtain the bin-observation vectors and . Then, the relation between the bin-observation vectors and the non-zero DFT coefficients of the signal can be computed using the signal processing properties. A graphical representation of this relation is shown in Fig. 4. Left nodes of the bi-partite graph represent the non-zero DFT coefficients of the input signal , while the right nodes represent the “bins” ( also sometimes referred in the sequel as “check nodes”) with vector observations. An edge connects a left node to a right node iff the corresponding non-zero DFT coefficient contributes to the observation vector of that particular bin, e.g., after aliasing, due to sub-sampling, the DFT coefficient contributes to the observation vector of bin of stage and bin of stage . Thus, the problem of computing a -sparse -length DFT has been transformed into that of decoding the values and the support of the left nodes of the bi-partite graph in Fig. 4, i.e., decoding of alias-code. Next, we classify the observation bins based on its edge degree in the bi-partite graph, i.e., number of non-zero DFT coefficients contributing to a bin, which is then used to decode the alias-code.

• zero-ton: A bin that has no contribution from any of the non-zero DFT coefficients of the signal, e.g., bin of stage in Fig. 4. A zero-ton bin can be trivially identified from its observations.

• single-ton: A bin that has contribution from exactly one non-zero DFT coefficient of the signal, e.g., bin of stage . Using the signal processing properties, it is easy to verify that the observation vector of bin of stage is given as,

 →yb,0,2=(X[10]e2πı10/20X[10]).

The observation vector of a single-ton bin can be used to determine the support and the value, of the only non-zero DFT coefficient contributing to that bin, as follows:

• support: The support of the non-zero DFT coefficient contributing to a single-ton bin can be computed as,

 10=202π∠→yb,0,2[1]y†b,0,2[0] (6)
• Value: The value of the non-zero DFT coefficient is given by the observation .

We refer to this procedure as a “ratio-test”, in the sequel. Thus, a simple ratio-test on the observations of a single-ton bin correctly identifies the support and the value of the only non-zero DFT coefficient connected to that bin. It is easy to verify that this property holds for all the single-ton bins.

• multi-ton: A bin that has a contribution from more than one non-zero DFT coefficients of the signal, e.g., bin of stage . The observation vector of bin of stage is,

 →yb,0,1 = X[1](1eı2π/20)+X[5](1eı2π5/20)+X[13](1eı2π13/20) = (10−3.1634−ı3.3541)

Now, if we perform the “ratio-test” on these observations, we get, the support to be . Since, we know that the support has to be an integer value between to , we conclude that the observations do not correspond to a single-ton bin. In Appendix -B, we rigorously show that the ratio-test identifies a multi-ton bin almost surely.

Hence, using the “ratio-test” on the bin-observations, the FFAST decoder can determine if a bin is a zero-ton, a single-ton or a multi-ton, almost surely. Also, when a bin is single-ton the ratio-test provides the support as well as the value of the non-zero DFT coefficient connected to that bin. In the following Section V-C, we provide the FFAST peeling-decoder that computes the support and the values of all the non-zero DFT coefficients of the signal .

### V-C FFAST back-end peeling-decoder

In the previous section we have explained how the ratio-test can be used to determine if a bin is a zero-ton, single-ton or a multi-ton bin almost surely. Also, in the case of a single-ton bin the ratio-test also identifies the support and the value of the non-zero DFT coefficient connected to that bin.

##### FFAST peeling-decoder

The FFAST decoder repeats the following steps (the pseudocode is provided in Algorithm 1 and Algorithm 2 in Appendix -A):

1. Select all the edges in the graph with right degree 1 (edges connected to single-ton bins).

2. Remove these edges from the graph as well as the associated left and right nodes.

3. Remove all the other edges that were connected to the left nodes removed in step-2. When a neighboring edge of any right node is removed, its contribution is subtracted from that check node.

Decoding is successful if, at the end, all the edges have been removed from the graph. It is easy to verify that the FFAST peeling-decoder operated on the example graph of Fig. 4 results in successful decoding, with the coefficients being uncovered in the following possible order: .

Thus, the FFAST architecture has transformed the problem of computing the DFT of into that of decoding alias-code of Fig. 4. Clearly the success the FFAST decoder depends on the properties of the sparse bi-partite graph resulting from the sub-sampling operation of the FFAST front-end. In particular, if the sub-sampling induced aliasing bi-partite graph is peeling-friendly, i.e., has few single-ton bins to initiate the peeling procedure and creates new single-tons at each iteration, until all the DFT coefficients are uncovered, the FFAST peeling-decoder succeeds in computing the DFT .

### V-D Connection to coding for packet erasure channels

The problem of decoding sparse-graph codes for erasure channel has been well studied in the coding theory literature. In this section we draw an analogy between decoding over sparse-graph codes for a packet erasure channel and decoding over sparse bi-partite graphs induced by the FFAST architecture.

##### Sparse-graph code for a packet-erasure channel

Consider an packet erasure code. Each -length codeword consists of information packets and parity packets. The erasure code is defined by a bi-partite graph as shown in Fig. 5(a). An -length sequence of packets that satisfies the constraints defined by the graph in Fig. 5(a), i.e., sum of the packets connected to a parity check node equals zero, is a valid codeword. Suppose a codeword from the code, defined by the graph of Fig. 5(a), is transmitted over an erasure channel that uniformly at random drops some number of packets. In Fig. 5(a), we use dotted circles to represent the correctly received packets (a cyclic redundancy check can be used to verify the correctly received packets). Let be the packets that are dropped/erased by the channel. The dotted edges in the bi-partite graph of Fig. 5(a) denote the operation of subtracting the contribution of all the correctly received packets from the corresponding check nodes. A peeling-decoder can now iteratively unravel the erased packets that are connected to a check node with exactly one erased packet. If the bipartite graph consisting only of the solid variable nodes and the solid edges is such that the peeling-decoder successfully unravels all the erased packets, decoding is successful.

##### Decoding over a bipartite graph induced by the FFAST

Consider an -length signal that has a -sparse DFT . Let the signal be sub-sampled by some appropriately designed FFAST front-end. The induced spectral-domain aliasing due to FFAST front-end sub-sampling operation is graphically represented by a bipartite graph shown in Fig. 5(b). The variable (left) nodes correspond to the -length DFT and the check (right) nodes are the bins consisting of the aliased DFT coefficients. A variable node is connected to a check node, iff after aliasing that particular DFT coefficient contributes to the observation of the considered check node. Let be the non-zero DFT coefficients. The zero DFT coefficients are represented by dotted circles. The dashed edges in Fig. 5(b) denotes that the contribution to the bin-observation, due to this particular edge is zero. Using the ratio-test on the vector observation at each check-node one can determine if the check node is a “single-ton”, i.e., has exactly one solid edge. A peeling-decoder can now iteratively unravel the non-zero DFT coefficients connected to single-ton check nodes. If the bipartite graph consisting only of the solid variable nodes and the solid edges is such that peeling-decoder successfully unravels all the variable nodes, the algorithm succeeds in computing the DFT . In Table III we provide a comparison between decoding over bi-partite graphs of Fig. 5(a) and Fig. 5(b).

Thus, the problem of decoding bi-partite graphs corresponding to sparse-graph codes designed for a packet-erasure channel is closely related to decoding the sparse bi-partite graphs induced by the FFAST architecture. We use this analogy: a) to design a sub-sampling front-end that induces a ‘good’ left-regular degree sparse-graph codes; and b) to formally connect our proposed Chinese-Remainder-Theorem based aliasing framework to a random sparse-graph code constructed using a balls-and-bins model (explained in Section VI-A), and analyze the convergence behavior of our algorithm using well-studied density evolution techniques from coding theory.

Next, we address the question of how to carefully design the sub-sampling parameters of the FFAST front-end architecture so as to induce “good-graphs” or “alias-codes”. In Section VI and Section VII we provide constructions of the FFAST front-end architecture and analyze the performance of the FFAST peeling-decoder for the very-sparse, i.e., , and the less-sparse, i.e., , regimes of sparsity respectively.

## Vi FFAST Construction and Performance analysis for the very-sparse (k=O(nδ), 0<δ≤1/3) regime

In Section V, using an example, we illustrated that the problem of computing a -sparse -length DFT of a signal can be transformed into a problem of decoding over sparse bipartite graphs using the FFAST architecture. In this section, we provide a choice of parameters of the FFAST front-end architecture and analyze the probability of success of the FFAST peeling-decoder for the very-sparse regime of . As shown in Section V-D, the FFAST decoding process is closely related to the decoding procedure on sparse-graph codes designed for erasure channels. From the coding theory literature, we know that there exist several sparse-graph code constructions that are low-complexity and capacity-achieving for the erasure channels. The catch for us is that we are not at liberty to use any arbitrary bi-partite graph, but can choose only those graphs that correspond to the alias-codes, i.e., are induced via aliasing through our proposed FFAST subsampling front-end. How do we go about choosing the right parameters and inducing the good graphs?

We describe two ensembles of bi-partite graphs. The first ensemble is based on a “balls-and-bins” model, while the second ensemble is based on the CRT. The balls-and-bins model based ensemble of graphs is closer in spirit to the sparse-graph codes in the coding-theory literature. Hence, is amenable to a rigorous analysis using coding-theoretic tools like density-evolution [26]. The bi-partite graphs induced by the FFAST front-end sub-sampling operation belong to the CRT ensemble. Later, in Lemma VI.1, we show that the two ensembles are equivalent. Hence, the analysis of the balls-and-bins construction carries over to the FFAST. We start by setting up some notations and common parameters.

Consider a set of pairwise co-prime integers. Let the signal length , for some positive integer , and . The integers ’s are chosen such that they are approximately equal and we use to denote this value. More precisely, , for , where is an asymptotically large number. The perturbation term in each is used to obtain a set of co-prime integers777An example construction of an approximately equal sized co-prime integers can be obtained as follows. Let for any integers greater than . Then, and are co-prime integers. approximately equal to . We construct a FFAST sub-sampling front-end architecture with stages. Each stage further has delay-chains (see Fig. 1 for reference). The sub-sampling period used in both delay-chains of stage is and hence the number of output samples is . The total number of input samples used by the FFAST algorithm is (see Fig. 1). In this section, we use , for some constant . This results in a sparsity index , depending on the value of the integer .

### Vi-a Ensemble Ck1(F,nb) of bi-partite graphs constructed using a “Balls-and-Bins” model

Bi-partite graphs in the ensemble , have variable nodes on the left and check nodes on the right. Further, each variable (left) node is connected to right nodes, i.e., left-regular degree bi-partite graphs. An example graph from an ensemble , for and is shown in Fig. 4. More generally, the ensemble of -left regular edge degree bipartite graphs constructed using a “balls-and-bins” model is defined as follows. Set , where . Partition the set of check nodes into subsets with the subset having check nodes. For each variable node, choose one neighboring check node in each of the subsets, uniformly at random. The corresponding -left regular degree bipartite graph is then defined by connecting the variable nodes with their neighboring check nodes by an undirected edge.

An edge in the graph is represented as a pair of nodes , where and are the variable and check nodes incident on the edge . By a directed edge we mean an ordered pair or corresponding to the edge . A path in the graph is a directed sequence of directed edges such that, if , then the for . The length of the path is the number of directed edges in it, and we say that the path connecting to starts from and ends at .

#### Vi-A1 Directed Neighborhood

The directed neighborhood of depth of , denoted by , is defined as the induced subgraph containing all the edges and nodes on paths starting at node such that . An example of a directed neighborhood of depth is given in Fig. 6. If the induced sub-graph corresponding to the directed neighborhood is a tree then we say that the depth- neighborhood of the edge is tree-like.

### Vi-B Ensemble Ck2(F,n,nb) of bipartite graphs constructed using the Chinese-Remainder-Theorem (CRT)

The ensemble of -left regular degree bipartite graphs, with variable nodes and check nodes, is defined as follows. Partition the set of check nodes into subsets with the subset having check nodes (see Fig. 4 for an example). Consider a set of integers, where each element of the set is between and . Assign the integers from the set to the variable nodes in an arbitrary order. Label the check nodes in the set from to for all . A -left regular degree bi-partite graph with variable nodes and check nodes, is then obtained by connecting a variable node with an associated integer to a check node in the set , for . The ensemble is a collection of all the -left regular degree bipartite graphs induced by all possible sets .

###### Lemma VI.1.

The ensemble of bipartite graphs is identical to the ensemble .

###### Proof.

It is trivial to see that . Next we show the reverse. Consider a graph . Suppose, a variable node is connected to the check nodes numbered . Then, using the CRT, one can find P number of integer’s ‘’ between and such that . Thus, for every graph , there exists a set of integers, that will result in an identical graph using the CRT based construction. Hence, . ∎

Note that the modulo rule used to generate a graph in the ensemble is same as the one used in equation (5) of Section V. Thus, the FFAST architecture of Fig. 1, generates graphs from the CRT ensemble , where the indices of the variable nodes correspond to the locations (or support) of the non-zero DFT coefficients888A set with repeated elements corresponds to a signal with fewer than non-zero DFT coefficients. of the signal . Also, under the assumption that the support of the non-zero DFT coefficients of the signal is uniformly random, the resulting aliasing graph is uniformly random choice from the ensemble .

Next, we analyze the performance of the FFAST peeling-decoder over a uniformly random choice of a graph from the ensemble , which along with the Lemma VI.1, provides a lower bound on the success performance of the FFAST decoder over graphs from the ensemble . Although the construction and the results described in this section are applicable to any value of , we are particularly interested in the case when . For and , we achieve the sub-linear sparsity index , while other values of , are achieved using larger values of .

### Vi-C Performance analysis of the FFAST peeling-decoder on graphs from the ensemble Ck1(F,nb)

In this section, we analyze the probability of success of the FFAST peeling-decoder, over a randomly chosen graph from the ensemble , after a fixed number of peeling iterations . Our analysis follows exactly the arguments in [26] and [27]. Thus, one may be tempted to take the results from [26] “off-the-shelf”. However, we choose here to provide a detailed analysis for two reasons. First, our graph construction in the ensemble is different from that used in [26], which results in some fairly important differences in the analysis, such as the expansion properties of the graphs, thus warranting an independent analysis. Secondly, we want to make the paper more self-contained and complete.

We now provide a brief outline of the proof elements highlighting the main technical components needed to show that the FFAST peeling-decoder decodes all the non-zero DFT coefficients with high probability.

• Density evolution: We analyze the performance of the message-passing algorithm, over a typical graph from the ensemble, for iterations. First, we assume that a local neighborhood of depth of every edge in a typical graph in the ensemble is tree-like, i.e., cycle-free. Under this assumption, all the messages between variable nodes and the check nodes, in the first rounds of the algorithm, are independent. Using this independence assumption, we derive a recursive equation that represents the expected evolution of the number of single-tons uncovered at each round for this typical graph.

• Convergence to the cycle-free, case: Using a Doob martingale as in [26], we show that a random graph from the ensemble, chosen as per nature’s choice of the non-zero DFT coefficients, behaves like a “typical” graph, i.e., -depth neighborhood of most of the edges in the graph is cycle-free, with high probability. This proves that for a random graph in , the FFAST peeling-decoder decodes all but an arbitrarily small fraction of the variable nodes with high probability in a constant number of iterations, .

• Completing the decoding using the graph expansion property: We first show that if a graph is an “expander” (as will be defined later in Section VI-C3), and the FFAST peeling-decoder successfully decodes all but a small fraction of the non-zero DFT coefficients, then it decodes all the non-zero DFT coefficients successfully. Next, we show that a random graph from the ensemble is an expander with high probability.

#### Vi-C1 Density evolution for local tree-like view

In this section we assume that a local neighborhood of depth of every edge in a graph in the ensemble is tree-like. Next, we define the edge-degree distribution polynomials of the bipartite graphs in the ensemble as and , where (resp. ) denotes the probability that an edge of the graph is connected to a left (resp. right) node of degree . Thus for the ensemble , constructed using the balls-and-bins procedure, by construction. Further, as shown in Appendix -C, the edge degree distribution .

Let denote the probability that an edge is present (or undecoded) after round of the FFAST peeling-decoder, then . Under the tree-like assumption, the probability , is given as,

 pj+1=λ(1−ρ(1−pj))  j=0,1,…,ℓ−1. (7)

Equation (7) can be understood as follows (also see Fig. 6): the tree-like assumption implies that, up to iteration , messages on different edges are independent. Thus, the total probability, that at iteration , a variable node is decoded due to a particular check node is given by and similarly the total probability that none of the neighboring check nodes decode the variable node is . Specializing equation (7) for the edge degree distributions of we get,

 pj+1=(1−e−pjη)d−1, ∀ j=0,1,…,ℓ−1 (8)

where . The evolution process of (8) asymptotically (in the number of iterations ) converges to , for an appropriate choice of the parameter , e.g., see Table IV.

#### Vi-C2 Convergence to cycle-free case

In the following Lemma VI.2 we show; a) the expected behavior over all the graphs in the ensemble converges to that of a cycle-free case, and b) with exponentially high probability, the proportion of the edges that are not decoded after iterations of the FFAST peeling-decoder is tightly concentrated around , as defined in (8).

###### Lemma VI.2 (Convergence to Cycle-free case).

Over the probability space of all graphs , let be the total number of edges that are not decoded after (an arbitrarily large but fixed) iterations of the FFAST peeling-decoder over a randomly chosen graph. Further, let be as given in the recursion (8). Then there exist constants and such that for any and sufficiently large we have

 (a) E[Z]<2kdpℓ. (9) (b) Pr(|Z−E[Z]|>kdϵ1)

#### Vi-C3 Successful Decoding using Expansion

In the previous section we have shown that with high probability, the FFAST peeling-decoder decodes all but an arbitrarily small fraction of variable nodes. In this section, we show how to complete the decoding if the graph is a “good-expander”. Our problem requires the following definition of an “expander-graph”, which is somewhat different from conventional notions of an expander-graph in literature, e.g., edge expander, vertex expander or spectral expander graphs.

###### Definition VI.3 (Expander graph).

A -left regular degree bipartite graph from ensemble , is called an expander, if for all subsets , of variable nodes, of size at most , there exists a right neighborhood of , i.e., , that satisfies , for some .

In the following lemma, we show that if a graph is an expander, and if the FFAST peeling-decoder successfully decodes all but a small fraction of the non-zero DFT coefficients, then it decodes all the non-zero DFT coefficients successfully.

###### Lemma VI.4.

Consider a graph from the ensemble , with , that is an expander for some . If the FFAST peeling-decoder over this graph succeeds in decoding all but at most variable nodes, then it decodes all the variable nodes.

###### Proof.

See Appendix -D

In Lemma VI.5, we show that most of the graphs in the ensemble are expanders.

###### Lemma VI.5.

Consider a random graph from the ensemble , where . Then, all the subsets of the variable nodes, of the graph, satisfy ,

1. with probability at least , for sets of size , for small enough and some .

2. with probability at least , for sets of size .

###### Proof.

See Appendix -E

The condition is a necessary condition for part of Lemma VI.5. This can be seen as follows. Consider a random graph from the ensemble , where . If any two variable nodes in the graph have the same set of neighboring check nodes, then the expander condition, for the set consisting of these two variable nodes, will not be satisfied. In a bi-partite graph from the ensemble , there are a total of distinct sets of check nodes. Each of the variable nodes chooses a set of check nodes, uniformly at random and with replacement, from the total of sets. If we draw integers uniformly at random between to , the probability that at least two numbers are the same is given by,

 Pr(k;n)≈1−e−k2/2n. (11)

This is also known as the birthday paradox or the birthday problem in literature [28]. For a graph from the ensemble , we have . Hence, if the number of stages , there is a constant probability that there exists a pair of variable nodes that share the same neighboring check nodes, in both the stages, thus violating the expander condition.

###### Theorem VI.6.

The FFAST peeling-decoder over a random graph from the ensemble , where and :

• successfully uncovers all the variable nodes with probability at least ;

• successfully uncovers all but a vanishingly small fraction, i.e., , of the variable nodes with probability at least for some constants , and .

for an appropriate choice of the constant as per Table IV.

###### Proof.

Consider a random graph from the ensemble . Let be the number of the edges not decoded by the FFAST peeling-decoder in (large but fixed constant) iterations after processing this graph. Then, from recursion (8) and Lemma VI.2, for an appropriate choice of the constant (as per Table IV), , for an arbitrarily small constant , with probability at least . The result then follows from Lemmas VI.5 and VI.4. ∎

###### Corollary VI.7.

The FFAST peeling-decoder over a random graph from the ensemble , where and :

• successfully uncovers all the variable nodes with probability at least ;

• successfully uncovers all but a vanishingly small fraction, i.e., , of the variable nodes with probability at least for some constants , and .

for an appropriate choice of the constant as per Table IV.

###### Proof.

Follows from equivalence of ensembles Lemma VI.1 and Theorem VI.6. ∎

### Vi-D The FFAST front-end architecture parameters for achieving the sparsity index 0<δ≤1/3

Consider a set of a pairwise co-prime integers. The integers ’s are such that they are approximately equal, i.e, , for , where (see Table IV) is an asymptotically large number. Set the signal length , where, , thus achieving the sparsity index , for any positive constant . We construct a FFAST sub-sampling front-end with stages, where each stage further has delay-chains (see Fig. 1). The sub-sampling period used in the both the delay-chains of the stage is . As a result, the number of samples at the output of each delay-chain of the stage is for , i.e., the total number of samples used by the FFAST algorithm is .

## Vii FFAST Construction and Performance analysis for the less-sparse (k=O(nδ), 1/3<δ<1) regime

In the FFAST front-end architecture for the less-sparse regime, the integers in the set , unlike for the very-sparse regime of Section VI, are not pairwise co-prime. Instead, for the less-sparse regime () the relation between the integers ’s is bit more involved. Hence, for ease of exposition, we adhere to the following approach:

• First, we describe the FFAST front-end construction and analyze performance of the FFAST decoding algorithm for a simple case of less-sparse regime of .

• Then, in Section VII-B, we provide a brief sketch of how to generalize the FFAST architecture of , to , for integer values of . This covers the range of values of etc.

• In Section VII-C, we show how to achieve the intermediate values of .

• Finally, in Section VII-D, we use all the techniques learned in the previous sections to provide an explicit choice of parameters for the FFAST front-end architecture that achieves all the sparsity indices in the range .

### Vii-a Less-sparse regime of δ=2/3

#### Vii-A1 FFAST front-end construction

Consider , where the set consists of approximately equal sized co-prime integers with each and . This results in . Choose the integers , such that . Then, we construct a stage FFAST front-end architecture, where stage has two delay-chains each with a sub-sampling period of and number of output samples.

#### Vii-A2 Performance analysis of the FFAST decoding algorithm

In order to analyze the performance of the FFAST decoding algorithm, we follow a similar approach as in Section VI for the very-sparse regime of . We first provide an ensemble of bi-partite graphs constructed using a balls-and-bins model. Then, we provide CRT based ensemble of bi-partite graphs, that are generated by the FFAST front-end of Section VII-A1. We show by construction these two ensembles are equivalent and analyze the performance of the FFAST peeling-decoder on a uniformly random graph from the balls-and-bins ensemble.

##### Balls-and-Bins construction

We construct a bi-partite graph with variable nodes on the left and , check nodes on the right (see Fig. 7) using balls-and-bins model as follows. Partition the check nodes into sets/stages containing and check nodes respectively. The check nodes in each of the sets are arranged in a matrix format as shown in Fig. 7, e.g., check nodes in the set are arranged as rows and columns. A check node in row and column in the set , is indexed by a pair and so on and so forth. Each variable node uniformly randomly chooses a triplet , where is between and . The triplets are chosen with replacement and independently across all the variable nodes. A -regular degree bi-partite graph with variable nodes and check nodes is then constructed by connecting a variable node with a triplet to the check nodes and in the three sets on right respectively, as shown in Fig. 7.