Quickest Sequence Phase Detection

# Quickest Sequence Phase Detection

Lele Wang , Sihuang Hu , and Ofer Shayevitz  This work was supported by an ERC grant no. 639573, and an ISF grant no. 1367/14. The material in this paper was presented in part in the IEEE International Symposium on Information Theory 2016, Barcelona, Spain.L. Wang is jointly with the Department of Electrical Engineering, Stanford University, Stanford, CA, USA and the Department of Electrical Engineering - Systems, Tel Aviv University, Tel Aviv, Israel (email: wanglele@stanford.edu).S. Hu and O. Shayevitz are with the Department of Electrical Engineering - Systems, Tel Aviv University, Tel Aviv, Israel (emails: sihuanghu@post.tau.ac.il, ofersha@eng.tau.ac.il).
###### Abstract

A phase detection sequence is a length- cyclic sequence, such that the location of any length- contiguous subsequence can be determined from a noisy observation of that subsequence. In this paper, we derive bounds on the minimal possible in the limit of , and describe some sequence constructions. We further consider multiple phase detection sequences, where the location of any length- contiguous subsequence of each sequence can be determined simultaneously from a noisy mixture of those subsequences. We study the optimal trade-offs between the lengths of the sequences, and describe some sequence constructions. We compare these phase detection problems to their natural channel coding counterparts, and show a strict separation between the fundamental limits in the multiple sequence case. Both adversarial and probabilistic noise models are addressed.

## I Introduction

A magician enters the room with a 32-card deck. He invites five volunteers to the stage and claims he will read their minds. Another volunteer is asked to cut the deck a few times and pass the top five cards to the volunteers, one for each. “Now I need you to think about your card and I will tell what it is,” the magician says. Silence. “Please concentrate! Think harder.” A long pause. “Okay, the weather is not good today. It is interfering with the brainwaves between us. I need you to work with me a bit,” the magician begs. “Could the people with red cards move one step closer to me?” Another long pause. “Hmm, you have the six of clubs. You have the five of spades…” Sure enough, he gets them all!

This is Diaconis’ mind-reading trick [1, 2]. The magic makes use of a binary de Bruijn sequence of order 5 , which is a length-32 circulant binary sequence such that every length-5 binary string occurs as a contiguous subsequence exactly once. The magician enters the room with the 32 cards prearranged such that their color (black/red) corresponds to the de Bruijn sequence. Cutting the deck only shifts the sequence cyclically. By the property of de Bruijn sequence, knowing the colors reveals the location (or phase) of the 5 contiguous cards inside the deck, hence uniquely determines their identities. More generally, this trick can be performed with volunteers and a deck of size , by using a de Bruijn sequence of order , which is a binary sequence such that every length- binary string occurs as a contiguous subsequence exactly once .

Suppose now that some of the volunteers are not collaborative and may lie when asked about their card color. Can the magician still guess the cards correctly? In other words, can one design a length- sequence such that the set of all length- contiguous subsequences forms a good error-correcting code? Besides its appeal as a card trick, such a sequence can also be useful e.g. for phase detection in positioning systems. Imagine that a satellite sends the length- sequence periodically. A user hearing a noisy chunk of the sequence would like to figure out the location of his chunk within the original sequence, so as to measure the transmission delay and compute his distance to the satellite. Fixing the sequence length (which results in a given ambiguity of the distance estimation), it is clearly desirable to minimize , as this results in the fastest positioning. Clearly, cannot be smaller than , and this lower bound can be achieved in case there is no noise, by using a de Bruijn sequence of order . As we shall see, in the noisy case is also sufficient, and we will in fact be interested in characterizing the exact constant , which will be referred to as rate.

In reality, positioning systems typically employ multiple satellites, each transmitting its own length- sequence. Sequences get combined through a multiple access channel (MAC) when reaching the user. Upon hearing a length- chunk of the combined sequence, the user wishes to measure his distance to all of the satellites by locating the chunk within each one of the sequences. We note that existing techniques (such as GPS ) typically employ sequences (e.g. Gold codes ) that possess good autocorrelation and cross-correlation properties, and use , for some repetition factor . From our perspective, these systems hence operate at zero rates. In fact, when the repetition factor , this does not precisely fall under our setup; we further remark on this in Example 1. In what follows, we focus on fast positioning at non-zero rates. We are interested in characterizing the optimal trade-offs among that ensure successful detection, as well as in constructing sequences that achieve the optimal trade-offs.

In what follows, we refer to the first problem, which only involves a single-sequence design, as point-to-point phase detection. We refer to the second problem as multiple access phase detection. Different noise models are considered: the adversarial noise and the probabilistic noise. For the probabilistic noise, different error criteria are discussed: the vanishing error criterion and the zero error criterion. These models are defined formally in the sequel. We also compare the phase detection problems to their natural channel coding counterparts.

### I-a Point-to-Point Phase Detection

In Sections IIIII, and IV, we consider point-to-point phase detection.

An point-to-point phase detection scheme consists of

• a sequence , and

• a detector , where and is an error symbol.

We assume that the detector observes a noisy version of the sequence , and attempts to correctly identify the phase . Clearly, any reliable scheme would require . Thus, it is natural to define the efficiency of a scheme as the excess multiplicative factor it uses over the minimal possible, i.e., . However, for comparison to channel coding, it would be more convenient to work with the inverse of this quantity and take logarithms in base 2, namely work with the rate

 R≜log2nk.

We note that any phase detection scheme induces a codebook111The codebook is treated as a multiset, namely there might be repetitions in its elements. of rate . Here and throughout indices are taken cyclically, modulo the set . Also, we assume throughout that .

We discuss three distinct models: the adversarial noise model in Section II, the probabilistic noise with vanishing error in Section III, and the probabilistic noise with zero error in Section IV. For convenience, let the function return the length- contiguous subsequence of starting at phase , i.e., . We will typically omit the dependence on the sequence , and simple write .

For the adversarial noise model, we assume that and the observation sequence is obtained from by flipping at most bits, where is the correct phase, and is fixed and given. We define the minimum distance of a scheme as the minimum Hamming distance of its induced codebook. A rate is said to be achievable in this setting if, for a divergent sequence of ’s, there exist schemes with , such that can be recovered from without error. Namely, we require the scheme to have a minimum distance . The capacity of adversarial phase detection is defined as the supremum over all achievable rates222Here we define capacity asymptotically. Note that similarly to adversarial channel coding, it is not guaranteed short sequences with rate above the capacity do not exist..

Several works have addressed this noise model in the literature. The trade-off between the rate and the minimum distance of the code was studied in [6, 7]. Kumar and Wei provided a lower bound on in the regime of for -sequences, which are generated by linear feedback shift registers . Some explicit sequence constructions were also provided in [8, 9, 10, 11, 12]. By a concatenation of an optimal binary channel code with the Reed–Solomon code, Berkowitz and Kopparty have recently constructed a phase detection scheme with nonzero rate and nonzero relative distance . For generalization to two dimensional phase detection, see [13, 14, 15, 16].

In Section II, we focus on the tradeoff between the rate and the minimum distance in the asymptotic limit. We note that a codebook induced by any phase detection scheme can be used as a channel code in the standard binary adversarial channel model . The capacity of the latter setup is unknown. Clearly however, any upper bound for that capacity, such as the MRRW upper bound , also serves as an upper bound for . The best known binary adversarial channel coding lower bound is given by Gilbert and Varshamov [19, 20]. Applying the Lovász local lemma , we show in Section II-A that this rate is also achievable for adversarial phase detection. In Section II-B, we characterize the family of linear phase detection schemes and study their performance.

For the probabilistic noise model with vanishing error criterion, we assume that the phase is uniformly distributed, i.e., . We further assume that the noisy observation is obtained from via a discrete memoryless channel . The probability of error is defined as

 P(k)e=P{M≠^m(Yk)}.

A rate is said to be achievable if, for a divergent sequence of ’s, there exist schemes with and . The vanishing error capacity of probabilistic phase detection is defined as the supremum over all achievable rates.

As before, the codebook induced by any phase detection scheme is also a channel code. Thus, the Shannon capacity of the channel is an upper bound for . In Section III-A, we show that in fact equals the Shannon capacity. Moreover, we present in Section III-B a concatenated construction with complexity that achieves the capacity of probabilistic phase detection. As a consequence, this construction also establishes the equivalence between channel coding and phase detection for this noise model.

For the probabilistic noise model with zero error criterion, we again assume that the noisy observation is obtained from via a discrete memoryless channel . A rate is said to be achievable if, for a divergent sequence of ’s, there exist schemes with such that the phase can be recovered with zero error for any . Similar to Shannon’s zero error channel coding , achievable rates can be equivalently defined on the confusion graph associated with the channel . Here the vertex set is and two distinct vertices are connected if they may result in the same output, i.e., there exists a such that and . Let be the -fold strong product of , where two distinct vertices are connected if for all , either or . Then, a rate is achievable if and only if, for a divergent sequence of ’s, there exist schemes with such that for any two distinct phases , or in other words, the induced codebook forms an independent set of . The zero error capacity is defined as the supremum over all achievable rates.

We note the distinction between phase detection and channel coding under the zero error criterion. For zero error channel coding (in contrast to vanishing error and adversarial channel coding) if a rate is achievable at some length , it is also achievable for all multiples of (by concatenation) and thus for a divergent sequence of ’s. However, this argument cannot be applied to the phase detection setting, since concatenating the codewords of two induced codebooks may not necessarily result in a new codebook that can be chained up into a single sequence. Nevertheless, and despite the fact that the zero error channel capacity is generally unknown, we show in Section IV that the zero error capacity for phase detection coincides with its channel coding counterpart.

### I-B Multiple Access Phase Detection

In Sections V and VI, we consider multiple access phase detection. We only discuss the two-user case for simplicity. But all the results extend to more users.

An multiple access phase detection scheme consists of

• two sequences and , and

• a detector that declares two phase estimates and .

We assume that the detector observes , which is the output of a discrete memoryless multiple access channel with the two inputs and , and attempts to correctly identify the phases . Similar to the point-to-point case, we define the rates of the two sequences as

 R1≜log2n1k and R2≜log2n2k.

We note that every multiple access phase detection scheme induces two (multiset) codebooks

 C1={ϕ1(m1):m1∈[n1]}⊆Xk1 (1)

and

 C2={ϕ2(m2):m2∈[n2]}⊆Xk2 (2)

of rates and respectively.

We discuss two different error criteria: the vanishing error criterion in Section V and the zero error criterion in Section VI.

Under the vanishing error criterion, we assume that the phase pair is uniformly distributed over . The probability of error is defined as

 P(k)e=P{(M1,M2)≠(^m1(Yk),^m2(Yk))}.

A rate pair is said to be achievable if, for a divergent sequence of ’s, there exist schemes with , , and . The vanishing error capacity region is defined as the closure of the set of achievable rate pairs.

In Section V-A, we establish the vanishing error capacity region of multiple access phase detection. This region turns out to be strictly included, in general, in the capacity region of its channel coding counterpart. This is in contrast to all models in the point-to-point case, in which phase detection either achieves the same best known rate or shares the same capacity as its channel coding counterpart. Due to the lack of synchronization between sequences, a phase detection scheme achieves at best the usual MAC capacity region without the time-sharing random variable. In Section V-B, we provide a low-complexity () sequence construction that achieves any rate pair in the capacity region.

Under the zero error criterion, a rate pair is said to be achievable if, for a divergent sequence of ’s, there exist schemes with and such that can be recovered from with zero error for any pair . The zero error capacity region is defined as the closure of the set of achievable rate pairs. We note that the problem of zero error phase detection in MACs is generally very difficult, as it is at least as hard as the zero error MAC coding problem, which in turn is open even in the simplest cases, e.g., the binary adder channel [23, 24, 25, 26, 27, 28]. Nevertheless, in Section VI-A, we demonstrate the distinction between the phase detection and the channel coding problems, by showing a separation between their capacity regions.

In Sections VI-B and VI-C, we restrict our attention to a simple channel model, the modulo-2 addition channel with and . For this channel, a rate pair is achievable if every element in the sumset

 Csum≜{ϕ1(m1)⊕ϕ2(m2):m1∈[n1],m2∈[n2]} (3)

can be uniquely expressed as an element in the induced codebook plus an element in the induced codebook . Note that is defined as a regular set with distinct elements (rather than a multiset). Hence, any and induced by a valid scheme must also have distinct elements.

Clearly, the zero-error channel coding capacity region is an outer bound for that of phase detection. In Section VI-B, we establish the achievability of this region by a random construction that exploits properties of linear codes, in a way that resembles Wyner’s linear Slepian–Wolf codes . We further provide in Section VI-C an explicit sequence construction that achieves this region, by exploiting properties from finite field theory. As an consequence, the induced code from our phase detection sequences can be used for channel coding and achieve any rate pair in the zero-error capacity region, without using time sharing333For other channel codes that achieve this region without using time sharing, see for example [30, 31]. For a channel code that achieves the rate pair with the same codebook, see [23, 32] for a construction utilizing the parity check matrix of a BCH code..

In this section we discuss the adversarial noise model. We first examine whether the adversarial phase detection schemes achieve the best known rate for adversarial channel coding, namely the Gilbert–Varshamov (GV) bound [19, 20].

### Ii-a Fundamental Limit

###### Theorem 1.

An point-to-point phase detection scheme with minimum distance exists if

 n≤2k16k∑di=0(ki). (4)
###### Corollary 1.

The capacity for adversarial phase detection is lower bounded by

where is the binary entropy function.

We show the existence of a good sequence using the probabilistic method. We note that while several different proofs of the GV bound exist [19, 20, 33], none of them seem to directly extend to our setting. This is simply due to the fact that there is a dependence between the codewords in the induced codebook. To alleviate this technical difficulty, we need the following well-known lemma.

###### Lemma 1 (Lovász Local Lemma ).

Let be a set of “bad” events with , where each event is mutually independent of all but at most of the other events. If , then

 P{∩Nj=1Acj}>0.
###### Proof:

We generate the phase detection sequence i.i.d. and apply minimum distance detection. Let be the collection of events where the Hamming distance between a pair of codewords where . We have

 P(Aj) \lx@stackrelto0.0pt\hss$(a)$\hss=P{wt(Zk)≤d,Zk% i.i.d.∼Bern(1/2)} =d∑i=0(ki)12k,

where () follows since for any two distinct phases , the sum of the two codewords is i.i.d. even if they are overlapping subsequences of . Now each is mutually independent of all other events, except for a set of at most events. This is because the random variable is mutually independent of all ’s with , which excludes at most events. Applying Lemma 1, the phase detection sequence has minimum distance greater than with positive probability

if

 16knd∑i=0(ki)12k≤1, (5)

or equivalently the condition in (4). This completes the proof of Theorem 1. ∎

###### Proof:

Set in (4). Applying the Hamming ball volume approximation

 2pk∑i=0(ki)≤2kh(2p)

and plugging in (5), we have

 R≤1−h(2p)−log(16k)k.

Letting , it follows that a rate is achievable if . ∎

###### Remark 1.

In the standard channel coding setup, a random codebook attains the GV bound with high probability. In contrast, the probability of randomly drawing a good scheme for our setup is exponentially small. This is most obvious in the noiseless case (), where it is well known that the fraction occupied by de Bruijn sequences among all sequences vanishes exponentially fast .

### Ii-B Linear Phase Detection Schemes

Theorem 1 and Corollary 1 showed the existence of a good adversarial phase detection scheme. Now, we discuss explicit constructions of such schemes. First, we ask whether phase detection schemes are “equivalent” to error-correcting codes in a certain sense. Clearly, any adversarial phase detection scheme induces a codebook that can be used as an error-correcting code for the corresponding adversarial channel coding problem. The converse direction seems more challenging. Given an error-correcting code, is it possible to “chain up” all or a sizable fraction of its codewords to create a sequence, and use the decoding rule as the detector? If so, what structure should such a code possess? In the following, we answer these questions for the class of linear error-correcting codes.

First, we note that in order to induce any error-correcting code with minimum distance , the phase detection sequence should not contain as a contiguous subsequence, for otherwise a shift by one from that position would create a codeword that is at distance from . Following that, an phase detection scheme is said to be linear if , namely its induced codebook together with the zero codeword, forms a linear code. Let be the dimension of this linear code. Then, the length of the linear phase detection sequence is .

###### Theorem 2.

A phase detection scheme with is linear if and only if it is generated by a linear feedback shift register (LFSR) with a primitive characteristic polynomial over , i.e.,

 xr+j=r−1∑i=0aixi+j,j∈[n]. (6)
###### Corollary 2.

The non-zero codewords of a linear code of dimension can be chained up to a sequence of length if and only if any contiguous columns of the generator matrix are linearly independent, and

 gr+j=r−1∑i=0aigi+j,j∈[k−r], (7)

where ’s are the coefficients of a primitive polynomial over .

###### Proof:

To prove sufficiency, suppose that is generated by an LFSR with a primitive characteristic polynomial in (6) and a nonzero initial state vector . Then, every length- string except occurs exactly once in (see [34, Theorem 8.33]). It follows that for any distinct codewords and , there exists such that for . For , follows since the sequence is generated by an LFSR of degree .

For necessity, let be a sequence associated with a linear phase detection scheme. We show that the first columns of the generator matrix must be linearly independent. Assuming that contrary, there exist not all zero such that

 r∑i=1figi=0. (8)

Let . Multiplying both sides of (8) by , we have . Applying this to every codeword in , and recalling that the codewords are all contiguous subsequences of , we have

 r∑i=1fixi+j=0,j∈[n].

Let . If , then has to be , in contradiction. For , we have

 xj+i0=i0−1∑i=1fixi+j,j∈[n],

which implies is generated by an LFSR of degree . But this contradicts the fact that is of length and all codewords , are distinct.

Now, since the first columns of are linearly independent, there exist such that

 gr+1=r−1∑i=0aigi+1.

From this it follows that (6) holds and is generated by an LFSR. Finally, an LFSR sequence is of maximum length if and only if the characteristic polynomial is primitive. ∎

###### Proof:

The sufficiency follows since for a linear code, the relation (7) implies (6). The necessity follows the same way as the necessity in Theorem 2. ∎

###### Remark 2.

As an application of Theorem 2, we can design a card trick for adversarial crowds. Picking the primitive polynomial and , we get a sequence of length and minimum distance . Ordering cards according to this sequence, the magician can now correct one lie out of 9 contiguous color reads.

###### Remark 3.

When the characteristic polynomial of the LFSR is irreducible but not primitive, the sequence it generates has length , which equals the order of the characteristic polynomial. Depending on the initial state , the LFSR generates one out of disjoint sequences . The length- contiguous subsequences of each sequence together with the zero codeword form a linear code Conversely for a linear code, if the first columns of its generator matrix are linearly independent and (7) holds with ’s being the coefficients of an irreducible but not primitive polynomial of order , then its nonzero codewords can be partitioned into equal size subsets, each of which can be chained up to a phase detection sequence.

We now provide two results on the performance of linear phase detection schemes. In Theorem 3, we cite a known result from [34, Theorem 8.85] on asymptotic relative distance, which improves upon [6, Theorem 1]. Then, inspired by a linear programing bound for LDPC codes , we provide in Theorem 4 an upper bound on the sequence length of a linear phase detection scheme of a given minimum distance, using the linear programing method originated by Delsarte .

###### Theorem 3 (Theorem 8.85 ).

For every linear phase detection scheme, for every ,

In particular, for such that , the relative distance of the induced code converges to

 limk→∞dk=12. (9)
###### Remark 4.

We note a similar result in [6, Theorem 1], which claims (9) for every and . Theorem 3 improves upon  by allowing to be sublinear in .

For the next result, we need the following definitions. For and , let

 Kt(z)=t∑j=0(−1)j(zj)(l−zt−j)

be the Krawtchouk polynomial [37, Ch. 5. § 2] , where the binomial coefficient for is defined as For large , the exponent of can be approximated as [35, Equation (40)]

 1klogK⌊pk⌋(⌊λk⌋)=h(p)+Int% (p,λ)+o(1),

where

 Int(p,λ) =∫λ0log(1−2p+√(1−2p)2−4(1−y)y2(1−y))dy. (10)
###### Theorem 4.

Every linear phase detection scheme with length and minimum distance must satisfy

 2r⋅K2t(ic)((k−r)/c2i)c2i(kt)≤2k

for every such that . Here is the number of nonzero coefficients of the characteristic polynomial .

###### Remark 5.

Compared to Delsarte’s linear programing bound for channel codes , the bound in Theorem 4 can sometimes be better. For example, when , and , the linear programing bound yields , while Theorem 4 requires . We note, however, that with further optimization for these specific parameters, the best known channel coding upper bound is  .

###### Remark 6.

For low-complexity LFSR implementation, it may be desirable to choose a characteristic polynomial with low coefficient weight. According to a conjecture in finite field theory [40, 41], there are infinitely many primitive polynomials with coefficient weight . For this class of primitive polynomials, Theorem 4 implies that when the adversarial channel can flip at most a fraction of the inputs, the rate of the linear phase detection scheme must satisfy

 max0≤μ≤1−R9{2μlog3+h(9μ1−R)(1−R)9+2Int(p,3μ)} ≤1−h(p)−R, (11)

where is given in (10). This bound can sometimes be better than the second MRRW bound , which is the best known asymptotic upper bound for binary channel codes. For example, when , the second MRRW bound requires . However, and violate condition (11) when .

###### Proof:

Following the same line of reasoning as in Section II-C (29)–(36) and (48)–(49) of , we have for every ,

 2r⋅K2t(α)Bα(kt)≤2k, (12)

where is the number of codewords of weight in the dual code of the linear code induced by the phase detection scheme. Now we show that when the coefficient weight of the characteristic polynomial is , for every such that , we can lower bound

 Bic≥((k−r)/c2i)c2i. (13)

To that end, note that our parity check matrix, which is also the generator matrix of the dual code, can be written in the following form

 ⎡⎢ ⎢ ⎢ ⎢⎣1a1⋯ar−1100⋯001a1⋯ar−110⋯0⋮⋮⋮⋮⋮⋮⋮⋮⋮00⋯01a1⋯ar−11⎤⎥ ⎥ ⎥ ⎥⎦.

A weight codeword of the dual code could come from the sum of rows of whose nonzero elements (the ’s) are in disjoint columns. We lower bound the number of such codewords. First, we select an arbitrary row from the rows. Since each row of has weight , the locations of the ’s in the chosen row overlap that of at most rows (including itself). Then a second row is chosen from the remaining non-overlapping rows. We continue in this manner until we obtain rows (we will not exhaust all rows provided that ). Hence, the number of choices is lower bounded by

 1i!(k−r)(k−r−c2)⋯(k−r−(i−1)c2) =((k−r)/c2i)c2i,

which establishes (13). Plugging (13) into (12) with completes the proof. ∎

## Iii Point-to-Point: Probabilistic Noise, Vanishing Error

In this section we discuss the probabilistic noise model with a vanishing error criterion. We first show that the capacity in this case coincides with the Shannon capacity of the observation channel. We then proceed to describe a low-complexity coding construction, based on a concatenation of a channel code and a de Bruijn sequence, that approaches this fundamental limit.

### Iii-a Fundamental Limit

###### Theorem 5.

The vanishing error capacity for probabilistic phase detection over a channel is

 Cve=maxp(x)I(X;Y).

Before we proceed to the proof, we need a technical lemma. We denote the typical set of length- vectors corresponding to by

 T(k)ϵ(X,Y) ≤ϵp(x,y) for all x∈X,y∈Y}.
###### Lemma 2 (Lemma 24.2 ).

Let and . If is sufficiently small, then there exists that depends only on such that

 P{(Xm+k−1m,Yk)∈T(k)ϵ(X,Y)}≤2−kγ(ϵ) (14)

for every . Moreover, for non-overlapping sequences, i.e., for ,

 P{(Xm+k−1m,Yk)∈T(k)ϵ(X,Y)}≤2−k(I(X;Y)−δ(ϵ)), (15)

where tends to zero as .

###### Proof:

Clearly, any phase detection sequence is also a channel code. Thus, the above rate cannot be exceeded. We proceed to prove the achievability. Recall .

Phase detection sequence generation. We generate the sequence i.i.d..

Detection. Upon receiving , the detector declares is the phase estimate if it is the unique phase such that ; otherwise—if there is none or more than one—it declares an error.

Analysis of the probability of error. Without loss of generality, we assume the phase . The detector makes an error only if one or more of the following events occurs:

 E1 ={(ϕ(1),Yk)∉T(k)ϵ(X,Y)}, E2 ={(ϕ(m),Yk)∈T(k)ϵ(X,Y) for % some m≠1}.

By the law of large numbers, tends to zero as . For the second term, we have

 P(E2) ≤(k∑m=2+n∑m=n−k+2)P{(ϕ(m),Yk)∈T(k)ϵ(X,Y)} +n−k+1∑m=k+1P{(ϕ(m),Yk)∈T(k)ϵ(X,Y)} \lx@stackrelto0.0pt\hss$(a)$\hss=2(k−1)2−kγ(ϵ)+(2kR−2k+1)2−k(I(X;Y)−δ(ϵ)),

which tends to zero as if . Here the first and the second terms in () follow from (14) and (15) respectively. Letting completes the proof. ∎

###### Example 1.

Consider the case of GPS signaling. For GPS, the binary (BPSK) symbol duration is about 1sec, and the length of the underlying Gold code sequence is . Consider a typical observation time of second, which corresponds to a repetition factor and binary observations. A correlator receiver can thus increase the SNR by about by coherently integrating over this sequence (assuming symbol timing has been recovered). Due to the good autocorrelation structure of the Gold code, an SNR of is typically sufficient in order to distinguish the correct phase (out of the possibilities, and typically also over several Doppler hypotheses), with a small enough error probability. Namely, one can operate at an SNR of , and provide positioning with uncertainty of sec; multiplied by the speed of light, this yields a positioning modulo km, which is sufficient as it is of the same order of the distance to the satellites.

Let us now show that one can significantly improve sensitivity using a more general phase detection sequence. Using the same observation period of second, let us assume a much lower SNR of . Using the Gaussian capacity formula and Theorem 5, we have that

 lognk≈12log2(1+SNR)

can be asymptotically achieved. Using our and solving for , we get that the largest that can be supported is . Since this large is also (much) larger than , we can in principle design a phase detection sequence with roughly these parameters that attains a low error probability. This will reliably find our distance to the satellite with an uncertainty of about billion km, a huge overkill, but saves in the SNR relative to the competing GPS solution operating with the same observation time. To make the comparison more precise, one should look more carefully at many important details such as the exact error probability performance, the effect of multiple Doppler hypotheses, complexity of detection, and accounting for multiple satellites. Most of these issues are beyond the scope of this paper. In the next subsection and in Secion VI we discuss the issues of complexity and multiple sequences.

### Iii-B A Low-Complexity Construction

Now we present a sequence construction with low-complexity detection that achieves the capacity asymptotically. The construction consists of three main ingredients:

1. a de Bruijn sequence with an efficient decoding algorithm ,

2. a capacity achieving low-complexity code, e.g. a polar code , that protects the de Bruijn sequence against noise, and

3. an i.i.d. synchronization sequence, which is known at the detector a priori, that allows the detector to find the block boundary.

The details are as follows.

Phase detection sequence design. We design a de Bruijn sequence of order according the method in . To encode it to a phase detection sequence , we let , where and are integers. The de Bruijn sequence is chopped up into length- chunks, each of which is encoded into a length- codeword using a channel code of rate . Then, a synchronization sequence is generated i.i.d., where the parameter is a linear function of , i.e., for some constants and . Below we use , but will prove useful later in Section V-B. This sequence is inserted every blocks. The middle chunk of the synchronization sequence is given to the detector. The chunks and play the role of “guarding bits” between codewords and the middle chunk . Their purpose is to simplify the analysis of the error probability event associated with the synchronization detection (later denoted ), as will become clear in the sequel. This is illustrated in Figure 1.

Detection. We choose the length of the detection window to be

 k=lt+3τ+max{t,τ}. (16)

The extra symbols are the margin to ensure there are complete channel code blocks and a complete synchronization sequence in the received sequence. Upon receiving , the detector first finds an such that . If there are more than one, it chooses the smallest index. It declares an error if there is none. This determines the block boundary of the channel code blocks, i.e., a complete block starts from index in . By design, there are at least complete channel code blocks in (the dashed-line parts in of Figure 1). The detector then applies the channel decoder to recover blocks of messages. This corresponds to contiguous bits in the de Bruijn sequence (the dashed-line parts in the sequence of Figure 1), which uniquely determines the location of these bits via the de Bruijn decoder of . The phase estimate is then declared as

 ^m=^w2t+⌈^w2l⌉3τ+1−(^w1−τmodt).

Analysis of the probability of error. For clarity of notation, we set in the following analysis. Similar analysis can be done for other linear functions of . Let be the actual index of the noisy version of in . The detector makes an error only if at least one of the following events occurs:

 E1 ={W1≠^W1}, E2 ={an error in channel decoding}.

Given , the de Bruijn decoder can figure out the phase of the decoded bits with zero error. Since we are using a good channel code, we have as . To bound , assume for convenience and without loss of generality that . We have

 P(E1) =P{(B2tt+1,Yw1+tw1+1)∈T(t)ϵ for some w1≠t−1} ≤t−2∑w1=0P{(B2tt+1,Yw1+tw1+1)∈T(t)ϵ} +2t−2∑w1=tP{(B2tt+1,Yw1+tw1+1)∈T(t)ϵ} +k−t∑w1=2t−1P{(B2tt+1,Yw1+tw1+1)∈T(t)ϵ} \lx@stackrelto0.0pt\hss$(a)$\hss≤(t−1)2−tγ(ϵ)+(t−1)2−tγ(ϵ) +((l+1)t+2)2−t(I(X;Y)−δ(ϵ)),

which, for fixed , tends to zero as . Here, the first term in () follows from Lemma 2 and the fact that and its preceding guarding block are i.i.d.. The second term follows since and its succeeding guarding block are i.i.d.. The third term follows by virtue of the packing lemma [42, Lemma 3.1], since any length- chunk from two channel code blocks is independent of . Note the role of the “guarding bits” here is to make sure never overlaps with both and a codeword, as we cannot generally assume too much about the statistics of a specific codeword. Therefore, the probability of error averaged over all possible realizations of tends to zero as . It follows that a good deterministic sequence exists (in fact, most choices are good).

Rate. By design, the rate of the sequence is

 R =lognk =log[2rlt+3τls]lt+3τ+max{t,τ} (17) \lx@stackrelto0.0pt\hss$(a)$\hss=log[2r(l+3)tls](l+4)t =Rcode(1−4l+4) +Rcodes(l+4)log[1Rcode(1+3l)],

which, for fixed and , tends to as . Choosing a large and a capacity achieving code for the underlying channel ensures the rate of the phase detection sequence can be as close to as desired. Note that in step (), we set . But one can verify that the rate approaches capacity for other choices of .

Complexity. Finding the block boundary is complexity. Recalling that is linear in and using the method of , decoding the de Bruijn sequence is complexity. There exist capacity achieving channel codes with decoding complexity, e.g., polar codes . Therefore, the overall detection complexity is .

###### Remark 7.

For future reference, we refer to the above construction an point-to-point phase detection sequence. Once these four parameters are given, , and both and can be expressed as in (16) and (17). As shown above, an point-to-point phase detection sequence has detection complexity . Moreover, for with some constants and , the achievable rate of the sequence satisfies

 liml→∞limt→∞R(Rch,l,t,τ)=Rch.

This construction will also prove useful in Section V-B.

###### Remark 8.

It appears plausible that the synchronization sequence could be discarded, and that the codeword boundary could be determined as part of the detection process. This coding scheme, in a sense, shows the equivalence between error-correcting codes and phase detection schemes for the probabilistic setting.

###### Remark 9.

Our analysis for the point-to-point phase detection problem in the probabilistic noise model assumed a uniformly distributed phase, which in channel coding terms corresponds to an average error probability criterion. In channel coding, the capacity under a more stringent maximal error probability criterion remains the same; this is easily shown by throwing away the worse half of a good average error probability codebook. In the sequence phase detection problem however, it is not immediately clear whether the capacity remains the same, as throwing bad codewords can significantly shorten the sequence. However, using our specific construction above and using a maximal error capacity achieving channel code (which may increase the detection complexity), we can show that the resulting phase detection sequence is capacity achieving under maximal error probability criterion.

## Iv Point-to-Point: Probabilistic Noise, Zero Error

In this section, we consider zero error phase detection. Let denote the independence number of a graph , i.e., the cardinality of a maximum independent set of . Then, the Shannon capacity of a graph can be defined as 

 C(G)≜supklogα(Gk)k=limk→∞logα(Gk)k,

where is the -fold strong product of (see definition in Section I-A). It is well known that is the zero error capacity of any channel with confusion graph . An explicit expression for is unknown. Nevertheless, the following theorem shows that is also the fundamental limit in the zero error phase detection setting.

###### Theorem 6.

The zero error capacity for phase detection in a channel with confusion graph coincides with the Shannon capacity of this graph, i.e.,

 Cze(G)=C(G).
###### Proof:

Again, the induced codebook of every phase detection scheme is also a good channel code for the same confusion graph, and thus . For the other direction, we show that every channel code of rate can be used to construct a phase detection scheme with the same rate in the asymptotic limit.

To this end, we first note th