Deterministic Construction of Compressed Sensing Matrices using BCH Codes

# Deterministic Construction of Compressed Sensing Matrices using BCH Codes

Arash Amini, and Farokh Marvasti,  A. Amini and F. Marvasti are with the Department of Electrical Engineering, Advanced Communication Research Institute (ACRI), Sharif University of Technology, Tehran, Iran e-mail: arashsil@ee.sharif.edu , marvasti@sharif.edu.Manuscript received July ?, 2009
###### Abstract

In this paper we introduce deterministic RIP fulfilling matrices of order such that . The columns of these matrices are binary BCH code vectors that their zeros are replaced with (excluding the normalization factor). The samples obtained by these matrices can be easily converted to the original sparse signal; more precisely, for the noiseless samples, the simple Matching Pursuit technique, even with less than the common computational complexity, exactly reconstructs the sparse signal. In addition, using Devore’s binary matrices, we expand the binary scheme to matrices with elements.

Compressed Sensing, Deterministic Matrices, Restricted Isometry Property , BCH codes.

## I Introduction

Decreasing the number of required samples for unique representation of a class of signals known as sparse has been the subject of extensive research in the past five years. The field of compressed sensing which was first introduced in [1] and further in [2, 3], deals with reconstruction of a but -sparse vector from its linear projections () onto an -dimensional () space: . The two main concerns in compressed sensing are 1) selecting the sampling matrix and 2) reconstruction of from the measurements by exploiting the sparsity constraint.

In general, the exact solution to the second problem, is shown to be an NP-complete problem [4]; however, if the number of samples () exceeds the lower bound of , minimization (Basis Pursuit) can be performed instead of the exact minimization (sparsity constraint) with the same solution for almost all the possible inputs [4, 2]. There are also other techniques such as greedy methods [5, 6] that can be used.

The first problem (sampling matrix) is usually treated by random selection of the matrix; among the well-known random matrices are i.i.d Gaussian [1] and Rademacher [7] matrices. Before addressing some of the deterministic matrix constructions, we first describe the well known Restricted Isometry Property (RIP) [2]:

We say that the matrix obeys RIP of order with constant (RIC) if for all -sparse vectors we have:

 1−δk≤∥Ax∥2ℓ2∥x∥2ℓ2≤1+δk (1)

In other words, RIP of order implies that each columns of the matrix resembles a quasi-orthonormal set: if is formed by different columns of , all the eigenvalues of the Grammian matrix should lie inside the interval .

RIP is a sufficient condition for stable recovery. The basis pursuit and greedy methods can be applied for recovery of -sparse vectors from noisy samples with good results if the matrix obeys RIP of order with a good enough constant [8, 6].

In this paper we are interested in deterministic as opposed to random sampling matrices. Deterministic sampling matrices are useful because in practice, the sampler should finally choose a deterministic matrix; realizations of the random matrices are not guaranteed to work. Moreover, by proper choice of the matrix, complexity or compression rate may be improved. In deterministic sampling matrix, we are looking for matrices which obey the RIP of order . It is well-known that any columns of a Vandermond matrix are linearly independent; thus, if we normalize the columns, for all values of , the new matrix satisfies the RIP condition of order . In other words, arbitrary RIP-constrained matrices could be constructed in this way; however, when increases, the constant rapidly approaches and some of the submatrices become ill-conditioned [9] which makes the matrix impractical. In [10], matrices ( is a power of a prime integer) with elements (prior to normalization) are proposed which obey RIP of order where . Another binary matrix construction with measurements () is investigated in [11] which employs hash functions and extractor graphs. The connection between coding theory and compressed sensing matrices is established in [12] where second order Reed-Muller codes are used to construct matrices with elements; unfortunately, the matrix does not satisfy RIP for all k-sparse vectors. Complex matrices with chirp-type columns are also conjectured to obey RIP of some order [13]. Recently, almost bound-achieving matrices have been proposed in [14] which, rather than the exact RIP, satisfy statistical RIP (high probability that RIP holds). In this paper, we explicitly introduce matrices with elements which obey the exact RIP for . The new construction is achieved by replacing the zeros of the linear binary block codes (specially BCH codes) by . In this approach, we require binary codes with minimum distances as large as almost half their code length; the existence of these codes will be shown by providing BCH codes.

The rest of the paper is organized as follows: In the next section we show the connection between linear block codes and construction of RIP-fulfilling matrices. In section III we introduce BCH codes that meet the requirements to produce compressed sensing matrices. Matrix construction and recovery of the sparse signal from the samples using the matching pursuit method is discussed in section IV. The introduced matrices are combined with a previous scheme to form matrices in section V and finally, section VI concludes the paper.

## Ii Matrix Construction via Linear Codes

In this section we will describe the connection between the sampling matrix and coding theory. Since the parameters are used in both compressed sensing and coding theory, we distinguish the two by using the notation for coding parameters; i.e., refers to the code length while denotes the number of columns of the sampling matrix.

Let be a linear binary block code and be the all vector. We say is ’symmetric’ if . For symmetric codes, if is a code vector, due to the linearity of the code, complement of which is defined as is also a valid code vector; therefore, code vectors consist of complement couples.

###### Theorem 1

Let be a symmetric code with the minimum distance and let be the matrix composed of code vectors as its columns such that from each complement couple, exactly one is selected. Define:

 A~n×2~k−1≜1√~n(2~A~n×2~k−1−(1)~n×2~k−1) (2)

Then, satisfies RIP with the constant for ( is the RIP order).

Proof. First note that the columns of are normal. In fact is the same matrix as where zeros are replaced by ; hence, absolute value of each element of is equal to which reveals that the columns are normal.

To prove the RIP, we use a similar approach to that of [10]; we show that for each two columns of , the absolute value of their inner product is less than . Let be two different columns of and be their corresponding columns in . If and differ at bits, we have:

 ⟨a,b⟩=1~n(1×(~n−l)+(−1)×l)=~n−2l~n (3)

Moreover, and (complement of ) differ at bits and since all the three vectors are different code words (from each complement couple, exactly one is chosen and thus ), both and should be greater than or equal to , i.e.,:

 {l≥~dmin~n−l≥~dmin ⇒ ~dmin≤l≤~n−~dmin (4) ⇒ |~n−2l|≤~n−2~dmin

Note that and for each code vector , either or cannot exceed ; therefore, . Combining (3) and (4) we have:

 |⟨a,b⟩|≤~n−2~dmin~n (5)

which proves the claim on the inner product of the columns of . Now let be the matrix formed by different columns of . According to the previous arguments, is a matrix that has ’s on its main diagonal while its off-diagonal elements have absolute values less than or equal to . It is now rather easy to complete the proof with use of the Gershgorin circle theorem

The above theorem is useful only when is close to (denominator for the upper bound of ), which is not the case for the common binary codes. In fact, in communication systems, parity bits are inserted to protect the main data payload, i.e., bits of data are followed by parity bits. In this case, we have ; thus, to have , the number of parity bits should have the same order as the data payload which is impractical. In the next section we show how these types of codes can be designed using the well-known BCH codes.

## Iii BCH codes with large ~dmin

Since the focus in this section is on the design of BCH codes with large minimum distances, we first briefly review the BCH structure.

BCH codes are a class of cyclic binary codes with which are produced by a generating polynomial such that [15]. According to a result in Galois theory, we know:

 x2~m−1+1=∏r∈GF(2~m)r≠0(x−r) (6)

Hence, the BCH generating polynomial can be decomposed into the product of linear factors in . Let be a primitive root of the field and let be one of the roots of . Since , all conjugate elements of (with respect to ) are also roots of . Again using the results in Galois theory, we know that these conjugates are different elements of the set . Moreover, since , implies which reveals the circular behavior of the exponents.

The main advantage of the BCH codes compared to other cyclic codes is their guaranteed lower bound on the minimum distance [15]: if are different roots of (not necessarily all the roots) such that form an arithmetic progression, then .

Now we get back to our code design approach. We construct the desired code generating polynomials by investigating their parity check polynomial which is defined as:

 h(x)≜x2~m−1+1g(x) (7)

In other words, each field element is the root of exactly one of the and . We construct by introducing its roots. Let be an integer and define

 G(l)~m={α0,α1,…,α2~m−1+2l−1} (8)

Note that the definition of depends on the choice of the primitive element (). We further define as the subset of which is closed with respect to the conjugate operation:

 H(l)~m≜{r∈G(l)~m ∣∣ ∀ j∈N: r2j∈G(l)~m} (9)

The above definition shows that if then its conjugate . Now let us define :

 h(x)=∏r∈H(l)~m(x−r) (10)

As discussed before, if is a root of , all its conjugates are also roots of ; therefore, , which is a required condition. Moreover,

 1=α0∈G(l)~m ⇒ 1∈H(l)~m (11) ⇒ (1+x)∣∣h(x)

which means that the all one vector is a valid code word:

 c=[1,…,12~m−1]T (12) ⇒ c(x)=1+x+⋯+x2~m−2=x2~m−1+1x+1 ⇒ x2~m−1+1∣∣(x2~m−1+1)h(x)1+x=c(x)h(x)

Hence, the code generated by is a symmetric code and fulfills the requirement of Theorem 1. For the minimum distance of the code, note that the roots of form a subset of ; thus, all the elements in are roots of :

 ∀ 2~m−1+2l≤j≤2~m−2:  g(αj)=0 (13)

Consequently, there exists an arithmetic progression of length among the powers of in roots of . As a result:

 ~dmin≥(2~m−1−2l−1)+1=2~m−1−2l (14)

In coding, it is usual to look for a code with maximum given . Here, we have designed a code with good for a given but with unknown :

 ~n = ~k+deg(g(x)) ⇒~k = ~n−deg(g(x)) (15) = = deg(h(x))=|H(l)~m|

The following theorem reveals how should be calculated.

###### Theorem 2

With the previous terminology, is equal to the number of binary sequences of length such that if the sequence is written around a circle, between each two ’s, there exists at least zeros.

Proof. We show that there exists a 1-1 mapping between the elements of and the binary sequences. Let be one of the binary sequences and let be the decimal number that its binary representation coincides with the sequence:

 β=(¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯b~m−1…b0)2=~m−1∑i=0bi2i (16)

We will show that . For the sake of simplicity, let us define as the decimal number that its binary representation is the same as the sequence subjected to units of left circular shift ():

 β0 = (¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯b~m−1…b0)2 β1 = (¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯b~m−2…b0b~m−1)2 β2 = (¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯b~m−3…b0b~m−1b~m−2)2 ⋮ β~m−1 = (¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯b0b~m−1…b1)2 (17)

Now we have:

 2βj = 2×(¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯b~m−1−j…b0b~m−1b~m−j)2 = 2~mb~m−1−j+(¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯b~m−2−j…b0b~m−1b~m−j0)2 ≡ βj+1 (mod 2~m−1) ⇒ βj≡2jβ (mod 2~m−1) ⇒ αβj=α2jβ (18)

which shows that are conjugates of . To show , we should prove that all the conjugates belong to , or equivalently, we should show . It is clear that ; to prove the right inequality we consider two cases:

1. MSB of is zero:

 b~m−1−j=0⇒βj<2~m−1<2~m−1+2l−1 (19)
2. MSB of is one; therefore, according to the property of the binary sequences, the following bits are zero:

 b~m−1−j=1 ⇒ b~m−2−j=⋯=bl−j=0 (20) ⇒ βj≤2~m−1+l−1∑j=02j ⇒ βj≤2~m−1+2l−1

Up to now, we have proved that each binary sequence with the above zero-spacing property can be assigned to a separate root of . To complete the proof, we show that if the binary representation of does not satisfy the property, then we have . In fact, by circular shifts introduced in , all the bits can be placed in the MSB position; thus, if the binary representation of does not obey the property, at least one of the ’s should be greater than . This means that at least one of the conjugates of does not belong to

Theorem 2 relates the code parameter to a combinatorics problem. Using this relation, it is shown in Appendix A that .

## Iv Sampling and Reconstruction

In previous sections, we presented the principles of matrix construction. In this section, in addition to a stepwise instruction set, we focus on the column selection procedure from complement pairs. In the second part of this section, we show that the original sparse vector can be reconstructed from the samples by simple methods such as Matching Pursuit.

### Iv-a Matrix Construction

Recalling the arguments in the previous section, the choice of the polynomial depends on the choice of the primitive root. In addition to this degree of freedom, from Theorem 1, no matter which code vectors from complement sets are selected, the generated matrix satisfies RIP. Hence, for a given primitive element, there are (there are complement pairs) possible matrix constructions. Among these huge number of possibilities, some of them have better characteristics for signal recovery from the samples. More specifically, we look for the matrices such that columns are closed with respect to the circular shift operation: if is a column of , for all , is also a column of .

The key point is that the BCH codes are a subset of cyclic codes, i.e., if is a code vector, all its circular shifts are also valid code vectors. Thus, if we are careful in selecting from the complement sets, the generated sampling matrix will also have the cyclic property. For this selection, it should be noted that if is a complement pair and is a circular shifted version of , the overal parity (sum of the elements in mod ) of and are different (each code vector has elements which is an odd number) while and have the same parity. Therefore, if we discard the code vectors with even (odd) parity (from the set of all code vectors), we are left with a set half the size of the main set such that from each complement set exactly one is selected while the set is still closed with respect to the circular shift operation. The selection algorithm is as follows:

1. For a given (compressed sensing parameter), let and choose (the number of compressed samples will be ).

2. Let be the set of all binary sequences of length such that ’s are circularly spaced with at least zeros. In addition, let be the set of decimal numbers such that their binary representation is a sequence in .

3. Choose as one of the primitive roots of and define:

 H={αr ∣∣ r∈Hdec} (21)
4. Define the parity check and code generating polynomials as:

 h(x)=∏r∈H(x−r) (22)

and

 g(x)=x2~m−1h(x) (23)
5. Let be the binary matrix composed of even parity code vectors as its columns, i.e., if columns are considered as polynomial coefficients (in ), each polynomial should be divisible by (the additional factor of implies the even parity).

6. Replace all the zeros in by and normalize each column to obtain the final compressed sensing matrix ().

For a simple example, we consider the case . It is easy to check that the number of ’s in each of the binary sequences in step 2 cannot exceed one. Therefore, we have . This means that , except for the factor is the same as the minimal polynomial of (the primitive root). Since for code generation, we use instead of , the effective will be the minimal polynomial of which is a primitive polynomial. In this case, the matrix is the square matrix whose columns are circularly shifted versions of the Pseudo Noise Sequence (PNS) output generated by the primitive polynomial (the absolute value of the inner product of each two columns of is exactly ).

Table I summarizes some of the parity check polynomials for (useful for ). Also, Fig. 1 shows the degree of for some of the choices of and .

### Iv-B Reconstruction from the samples

Matching Pursuit is one of the simplest methods for the recovery of sparse signals from sampling matrices (linear projections). Here we show that this method can exactly recover the sparse signal from noiseless samples.

Let and be the sampling matrix and the -sparse signal vector, respectively. The sampling process is defined by:

 ym×1=Am×n⋅sn×1 (24)

For unique reconstruction of from the samples , it is sufficient that the sampling matrix satisfies RIP of order [8]. In this section, we show that if is constructed as described in previous section and satisfies RIP of order , the matching pursuit method can be used for perfect reconstruction. In addition, due to the circular structure of the columns in , the computational complexity can be decreased (less than the ordinary matching pursuit).

Let be the nonzero locations in ; thus, we have:

 ym×1=A⋅s=k∑j=1sijaij (25)

where denotes the column in . In the matching pursuit method, in order to find the nonzero locations in , the inner products of the sample-vector () with all the columns of are evaluated and then, the index of the maximum value (in absolute) is chosen as the most probable nonzero location. Here, we show that the index associated with the maximum value is always a nonzero location. Without loss of generality, assume . We then have:

 ∣∣⟨y,ai1⟩∣∣ = ∣∣k∑j=1sij⟨aij,ai1⟩∣∣ (26) ≥ |si1|⟨ai1,ai1⟩−k∑j=2|sij||⟨aij,ai1⟩| > |si1|−12k−1k∑j=2|sij| ≥ |si1|−k−12k−1|si1|=k2k−1|si1|

Now assume :

 ∣∣⟨y,al⟩∣∣ = ∣∣k∑j=1sij⟨aij,al⟩∣∣ (27) ≤ k∑j=1|sij||⟨aij,al⟩| < 12k−1k∑j=1|sij|≤k2k−1|si1|

Combining (26) and (27), we get:

 ∣∣⟨y,al⟩∣∣ < k2k−1|si1| < ∣∣⟨y,ai1⟩∣∣ (28)

Hence, the largest inner product is obtained either with or one of the other ’s. Therefore, in the noiseless case, we never select a nonzero location by using the matching pursuit algorithm, and finally we reconstruct the original sparse signal perfectly.

In each recursion of the matching pursuit algorithm, the inner product of with all the columns in needs to be calculated. Each inner product requires multiplications and additions. Now we observe that the circular property of the columns of can be useful. Let be one of the columns in and be its circularly shifted version. We observe that are all columns of ; thus, has to be calculated for all . Let be different elements of (obviously and more precisely ). These inner products require multiplications and additions if directly calculated.

An alternative approach for evaluation of these values is to employ Discrete Fourier Transform (DFT) or its fast implementation-FFT. The key point in this approach is that the inner products can be found through circular convolution of and , i.e.,

 ⟨y,a(j)⟩=y\varoastma∣∣j (29)

where represents the circular convolution with period . It is well-known that the circular convolution can be easily calculated using DFT: if and denote the DFT of and , respectively, we have:

 (30)

where . For evaluation of the inner products in this way, has to be calculated only once using DFT. Thus, excluding the calculation of (which is done only once), the inner products of with require one , one and multiplications. Since different circular shifts of are possible, at most coefficients of at equi-distance positions are nonzero; hence, -point DFT (and consequently IDFT) of rather than the general -point DFT is adequate. For -point DFT of , we can simply down-sample the evaluated vector of (note that ) and there is no need for an extra -point DFT. Employing the FFT version, we require multiplications and additions per -point DFT or IDFT. Comparing the number of required multiplications in calculation of the above inner products reveal the efficiency of the DFT approach; i.e., the required computational complexity for reconstruction of the signal from the samples obtained from the sampling matrix is less than the common amount for general matrices. It should be emphasized that this reduction in computational complexity is the result of the circular format of the columns.

## V Matrices with {0,1,−1} Elements

We have presented a method to generate RIP-fulfilling matrices with elements. In this section, we show that the matrices introduced in [10] can be improved using our technique in this paper.

In [10], in contrast to this paper, binary compressed sensing matrices are considered. The main difficulty in designing such matrices is that the columns should (almost) be normal which means that prior to normalization, the number of ’s in each column is fixed (matrix elements are all scaled with the same coefficient for normalization). In [10], binary matrices are introduced such that in each column, exactly elements are equal to (equal to after normalization) and the inner product of each two columns is less than or equal to ( after normalization). Here is a power of a prime integer; the matrix construction is based on polynomials in .

It is evident that by changing some of the ’s in the aforementioned matrix into , the norm of the columns does not change; however, the inner products change. To show how we can benefit from this feature, let us assume that ; thus, there are nonzero elements in each column. We construct a new matrix from the original binary matrix as follows: we repeat each column times and then change the sign of the nonzero elements in the replicas in such a way that these nonzero elements form a Walch-Hadamard matrix. In other words, for each column, there are columns (including itself) that have the same pattern of nonzero elements. Moreover, the nonzero elements of these semi-replica vectors are different columns of the Walch-Hadamard matrix. Thus, the semi-replica vectors are orthogonal and the absolute value of the inner product of two vectors with different nonzero patterns is upper-bounded by (maximum possible value in the original matrix). Hence, the new matrix still satisfies the RIP condition with the same and .

Although we have expanded the matrix with this trick, the change is negligible when the order of matrix sizes are considered ( is expanded to ). In fact, the orthogonality of the semi-replicas is not a necessary condition; we only need that their inner products do not exceed in absolute value. It shows that instead of the Walch-Hadamard matrix, we can use other matrices with more number of columns (with the same number of rows) such that their columns are almost orthogonal (inner product less than ). This is the case for the matrices introduced in the previous sections.

In order to mathematically describe the procedure, we need to define an operation. Let be a binary vector with exactly elements of in locations . Also, let be an arbitrary vector. We define as:

 {∀ 1≤j≤α:yrj=xj∀ j∉{r1,…,rα}:yj=0 (31)

From the above definition, we can see:

 ⟨μ(s,x1),μ(s,x2)⟩=⟨x1,x2⟩ (32)

Furthermore, if the elements of both lie in the closed interval , we have:

 ∣∣⟨μ(s1,x1),μ(s2,x2)⟩∣∣≤⟨s1,s2⟩ (33)

For the matrix construction, let be an integer such that is a prime (the primes of this form are called Mersenne primes). Let be the required order of the RIP condition and let:

 r=⌊pk⌋  ,  i=⌈log2k⌉ (34)

Also let be the binary RIP-fulfilling matrix constructed as in [10] and ( with the previous terminology) be the matrix introduced in the previous sections (we further normalize the columns of these matrices). We construct a new matrix with elements in by combining these two matrices:

 A=[μ(si,xj)]i,j (35)

Employing the same approach as used before, we show that satisfies the RIP condition of order , i.e., we show that the inner product of two different columns of cannot exceed in absolute value while each column is normal:

 ⟨ μ(si,xj) , μ(si,xj) ⟩=⟨xj,xj⟩=1 (36)

To study the inner product of and , we consider two cases:

1. . In this case, since , we have:

 ∣∣⟨ μ(si1,xj1) , μ(si2,xj2) ⟩∣∣ = ∣∣⟨ xj1 , xj2 ⟩∣∣ (37) < 1k−1
2. and therefore, ; since the elements of both and lie in , we have:

 ∣∣⟨ μ(si1,xj1) , μ(si2,xj2) ⟩∣∣ ≤ ∣∣⟨ si1 , si2 ⟩∣∣ (38) < 1k−1

Inequalities (37) and (38) hold due to the RIP-fulfilling structure of the matrices and . Hence, the claimed property of the inner products of the columns in is proved. Consequently, obeys the RIP condition of order .

## Vi Conclusion

Despite the enormous amount of literature in random sampling matrices for compressed sensing, deterministic designs are not well researched. In this paper, we introduce a new connection between the coding theory and RIP fulfilling matrices. In the new design, we replace the zeros in the binary linear code vectors by and use them as the columns of the sampling matrix in compressed sensing. The advantage of these matrices, in addition to their deterministic and known structure, is the simplicity in the sampling process; real/complex entries in the sampling matrix increases the computational complexity of the sampler as well as the required bit-precision for storing the samples. The linear codes for this purpose should have some desired characteristics; existence of such linear codes is proved by explicitly introducing binary BCH codes. One of the features of these matrices is that their produced samples can be easily (using matching pursuit method) decoded as the original sparse vector and due to the circular structure of the columns, the computational complexity in recovery can be reduced. These matrices are further expanded by considering elements; this expansion is achieved by combining the matrices introduced in this paper with the Devore’s binary matrices. Although the generated matrices show an improvement in the realizable size of the RIP-constrained matrices, the bound predicted by random matrices is not achieved yet.

## Appendix A Evaluation of ~k

In Theorem 2, we showed that is equal to the number of binary sequences of length such that no two s are spaced by less than zeros (circular definition). To evaluate this number, let us define as the number of binary sequences of length such that if the sequence is put around a circle, between each two ’s, there is at least zeros. In addition, let be the number of binary sequences such ’s are spaced by at least zeros apart (circular property is no longer valid for ). We first calculate and then we show the connection between and .

There are two kinds of binary sequences counted in :

1. The last bit in the sequence is ; by omitting this bit, we obtain a sequence of length with the same property. Also, each binary sequence of length with the above property can be padded by while still satisfying the required property to be included in . Therefore, there are binary sequence of this type.

2. The last bit in the sequence is ; this means that the last bits of the sequence are . Similar to the above case, each binary sequence of length counted in can be padded by the block to produce a sequence included in . Thus, there are binary sequences of this type.

In summary, we have the following recursive equation:

 κ(a)b=κ(a)b−1+κ(a)b−a−1 (39)

Since for , there can be at most one in the binary sequence, we thus have:

 1≤b≤a+1: κ(a)b=b+1 (40)

From (39), the last initial condition () is equivalent to . If we define the onesided -transform of as follows

 κ(a)(z)=∞∑b=0κ(a)bz−b, (41)

it is not hard to check that:

 κ(a)(z)=zz−1⋅za+1−