Streaming Data Transmission in the Moderate Deviations and Central Limit Regimes

# Streaming Data Transmission in the Moderate Deviations and Central Limit Regimes

Si-Hyeon Lee, Vincent Y. F. Tan, and Ashish Khisti S.-H. Lee and A. Khisti are with the Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada (e-mail: sihyeon.lee@utoronto.ca; akhisti@comm.utoronto.ca). V. Y. F. Tan is with the Department of Electrical and Computer Engineering and the Department of Mathematics, National University of Singapore, Singapore (e-mail: vtan@nus.edu.sg). The work of V. Y. F. Tan is supported in part by a Singapore Ministry of Education (MOE) Tier 2 grant (R-263-000-B61-112).
###### Abstract

We consider streaming data transmission over a discrete memoryless channel. A new message is given to the encoder at the beginning of each block and the decoder decodes each message sequentially, after a delay of blocks. In this streaming setup, we study the fundamental interplay between the rate and error probability in the central limit and moderate deviations regimes and show that i) in the moderate deviations regime, the moderate deviations constant improves over the block coding or non-streaming setup by a factor of and ii) in the central limit regime, the second-order coding rate improves by a factor of approximately for a wide range of channel parameters. For both regimes, we propose coding techniques that incorporate a joint encoding of fresh and previous messages. In particular, for the central limit regime, we propose a coding technique with truncated memory to ensure that a summation of constants, which arises as a result of applications of the central limit theorem, does not diverge in the error analysis.

Furthermore, we explore interesting variants of the basic streaming setup in the moderate deviations regime. We first consider a scenario with an erasure option at the decoder and show that both the exponents of the total error and the undetected error probabilities improve by factors of . Next, by utilizing the erasure option, we show that the exponent of the total error probability can be improved to that of the undetected error probability (in the order sense) at the expense of a variable decoding delay. Finally, we also extend our results to the case where the message rate is not fixed but alternates between two values.

## I Introduction

In many multimedia applications, a stream of data packets is required to be sequentially encoded and decoded under strict latency constraints. For such a streaming setup, both the fundamental limits and optimal schemes can differ from classical communication systems. In recent years, there has been a growing interest in the characterization of fundamental limits for streaming data transmission [1, 2, 3, 4, 5, 6]. In [1, 2, 3], coding techniques based on tree codes were proposed for streaming setup with applications to control systems. In [4], Khisti and Draper established the optimal diversity-multiplexing tradeoff (DMT) for streaming over a block-fading multiple-input multiple-output channel. In [5], the same authors proposed a coding technique using finite memory for streaming over discrete memoryless channels (DMCs) that attains the same reliability as previously known semi-infinite coding techniques with growing memory. In [6], the error exponent was studied in a streaming setup of distributed source coding. We note that these prior works assumed that the code operates in the large deviations regime in which the rate is bounded away from capacity (or the rate pair is strictly inside the optimal rate region for compression problems) and the error probability decays exponentially as the blocklength increases.

Other interesting asymptotic regimes include the central limit and moderate deviations regimes. Let denote the blocklength of a single message henceforth. In the central limit regime, the rate approaches to the capacity at a speed proportional to and the error probability does not vanish as the blocklength increases. In the moderate deviations regime, the rate approaches to the capacity strictly slower than and the error probability decays sub-exponentially fast as the blocklength increases. For block coding problems, both regimes have received a fair amount of attention recently. These works aim to characterize the fundamental interplay between the coding rate and error probability. The most notable early work on channel coding in the central limit regime (also known as second-order asymptotics or the normal approximation regime) is that of Strassen [7], who considered DMCs and showed that the backoff from capacity scales as when the error probability is fixed. Strassen also deduced the constant of proportionality, which is related to the so-called dispersion [8]. Hayashi [9] considered DMCs with cost constraints as well as discrete channels with Markovian memory. Polyanskiy et al. [8] refined the asymptotic expansions and also compared the normal approximation to the finite blocklength (non-asymptotic) fundamental limits. For a review and extensions to multi-terminal models, the reader is referred to [10]. For the moderate deviations regime, He et al. [11] considered fixed-to-variable length source coding with decoder side information. Altuğ and Wagner [12] initiated the study of moderate deviations for channel coding, specifically DMCs. Polyanskiy and Verdú [13] relaxed some assumptions in the conference version of Altuğ and Wagner’s work [14] and they also considered moderate deviations for additive white Gaussian noise (AWGN) channels. However, this line of research has not been extensively studied for the streaming setup. To the best of our knowledge, there has been no prior work on the streaming setup in the moderate deviations and central limit regimes with the exception [15] where the focus is on source coding.

In this paper, we study streaming data transmission over a DMC in the moderate deviations and central limit regimes. Our streaming setup is illustrated in Fig. 1. In each block of length , a new message is given to the encoder at the beginning, and the encoder generates a codeword as a function of all the past and current messages and transmits it over the channel. The decoder, given all the past received channel output sequences, decodes each message after a delay of blocks. This streaming setup introduces a new dimension not present in the block coding problems studied previously. In the special case of , the setup reduces to the block channel coding problem. If , however, there exists an inherent tension in whether we utilize a block only for the fresh message or use it also for the previous messages with earlier deadlines. It is not difficult to see that due to the memoryless nature of the model, a time sharing scheme111In a time sharing scheme, some fraction of a block is used for a fresh message and some other fraction of the block is used for previous messages. will not provide any gain compared to the case of . A natural question is whether a joint encoding of fresh and previous messages would improve the performance when .

Our results indicate that the fundamental interplay between the rate and error probability can be greatly improved when delay is allowed in the streaming setup. In the moderate deviations regime, the moderate deviations constant is shown to improve over the block coding or non-streaming setup by a factor of . In the central limit regime, the second-order coding rate is shown to improve by a factor of approximately for a wide range of channel parameters. For both asymptotic regimes, we propose coding techniques that incorporate a joint encoding of fresh and previous messages. For the moderate deviations regime, we propose a coding technique in which, for every block, the encoder jointly encodes all the previous and fresh messages and the decoder re-decodes all the previous messages in addition to the current target message. For the error analysis of this coding technique, we develop a refined and non-asymptotic version of the moderate deviations upper bound in [16, Theorem 3.7.1] that allows us to uniformly bound the error probabilities associated with the previous messages. On the other hand, for the central limit regime, we cannot apply such a coding technique whose memory is linear in the block index. In the error analysis in the central limit regime, we encounter a summation of constants as a result of applications of the central limit theorem. If the memory is linear in the block index, this summation causes the upper bound on the error probability to diverge as the block index tends to infinity. Hence, for the central limit regime, we propose a coding technique with truncated memory where the memory at the encoder varies in a periodic fashion. Our proposed construction judiciously balances the rate penalty imposed due to the truncation and the growth in the error probability due to the contribution from previous messages. By analyzing the second-order coding rate of our proposed setup, we conclude that the channel dispersion parameter also decreases approximately by a factor of for a wide range of channel parameters.

Furthermore, we explore interesting variants of the basic streaming setup in the moderate deviations regime. First, we consider a scenario where there is an erasure option at the decoder and analyze the undetected error and the total error probabilities, extending a result by Hayashi and Tan [17]. Next, by utilizing the erasure option, we analyze the rate of decay of the error probability when a variable decoding delay is allowed. We show that such a flexibility in the decoding delay can dramatically improve the error probability in the streaming setup. This result is the analog of the classical results on variable-length decoding (see e.g., [18]) to the streaming setup. Finally, as a simple example for the case where the message rates are not constant, we consider a scenario where the rate of the messages in odd block indices and the rate of the messages in even block indices are different and analyze the moderate deviations constants separately for the two types of messages. This setting finds applications in video and audio coding where streams of data packets do not necessarily have a constant rate.

The rest of this paper is organized as follows. In Section II, we formally state our streaming setup. The main theorems are presented in Section III and proved in Section IV. In Section V, the moderate deviations result for the basic streaming setup is extended in various directions. We conclude this paper in Section VI.

### I-a Notation

The following notation is used throughout the paper. We reserve bold-font for vectors whose lengths are the same as blocklength . For two integers and , denotes the set . For constants and , denotes the vector and denotes where the subscript is omitted when , i.e., . This notation is naturally extended for vectors , random variables , and random vectors . for an event denotes the indicator function, i.e., it is 1 if is true and 0 otherwise. and denote the ceiling and floor functions, respectively.

For a DMC and an input distribution , we use the following standard notation and terminology in information theory:

• Information density:

 i(x;y):=logW(y|x)PW(y), (1)

where denotes the output distribution. We note that depends on and but this dependence is suppressed. The definition (1) can be generalized for two vectors and of length as follows:

 i(xl;yl):=l∑j=1i(xj;yj). (2)
• Mutual information:

 I(P,W) :=E[i(X;Y)] (3) =∑x∈X∑y∈YP(x)W(y|x)logW(y|x)PW(y). (4)
• Unconditional information variance:

 U(P,W) :=Var[i(X;Y)]. (5)
• Conditional information variance:

 V(P,W) :=E[Var[i(X;Y)|X]]. (6)
• Capacity:

 C=C(W):=maxP∈PI(P,W), (7)

where denotes the probability simplex on .

• Set of capacity-achieving input distributions:

 Π=Π(W):={P∈P:I(P,W)=C(W)}. (8)
• Channel dispersion

 V=V(W) :=minP∈ΠV(P,W) (9) (a)=minP∈ΠU(P,W), (10)

where is from [8, Lemma 62], where it is shown that for all .

## Ii Model

Consider a DMC . A streaming code is defined as follows:

###### Definition 1 (Streaming code).

An -streaming code consists of

• a sequence of messages each distributed uniformly over ,

• a sequence of encoding functions that maps the message sequence to the channel input codeword , and

• a sequence of decoding functions that maps the channel output sequences to a message estimate ,

that satisfies

 limsupN→∞N∑k=1Pr(^Gk≠Gk)N≤ϵ, (11)

i.e., the probability of error averaged over all block messages does not exceed .

We note that a streaming code with a fixed blocklength consists of a sequence of encoding and decoding functions since a stream of messages is sequentially encoded and decoded. Fig. 1 illustrates our streaming setup for the case with . In the beginning of block , new message is given to the encoder. The encoder generates a codeword as a function of all the past and current messages and transmits it over the channel in block . Since , the decoder decodes message at the end of block , as a function of all the past received channel output sequences .

## Iii Main Results

In this section, we state our main results. The following two theorems present achievability bounds for the moderate deviations and the central limit regimes, respectively, which are proved in Section IV.

###### Theorem 1 (Moderate deviations regime).

Consider a DMC with and any sequence of integers such that , where and .222Throughput the paper, we ignore integer constraints on the number of codewords . Then, there exists a sequence of -streaming codes such that333If for some , corresponds to an upper bound on the moderate deviations constant. In the special case of , the moderate deviations constant is shown to be the channel dispersion in [12, 13].

 limsupn→∞1nρ2nlogϵn≤−T2V. (12)
###### Theorem 2 (Central limit regime).

Consider a DMC with . For any and , there exists a sequence of -streaming codes such that444 is termed second-order coding rate in this paper. This is slightly different from what is common in the literature where instead is known as the second-order coding rate [9].

 logMn =nC−L√n+O(nδlogn) (13)

and

 ϵn ≤∞∑j=TQ(√j√VL)+O(n−δ/2). (14)

The following corollary, whose proof is in Appendix A, elucidates a closed-form and interpretable expression for the upper bound on the error probability in (14).

###### Corollary 3.

Consider a DMC with . For any , there exists a sequence of -streaming codes such that

 limn→∞nC−logMn√n =L (15)

and

 limsupn→∞ϵn ≤cL,V,TQ(√TVL), (16)

where defined in the following has the property that for every , tends to 1 as tends to infinity:

 cL,V,T:=1+(L/√V)2T(L/√V)2T⋅11−exp{−(L/√V)2/2}. (17)

Fig. 2 illustrates how fast the constant in Corollary 3 converges to 1 as increases. For , we can see that is less than 1.1 when and is less than 1.05 when . Hence, the effect of the constant is not significant for a wide range of and .

Theorems 1 and 2 illustrate that the fundamental interplay between the rate and probability of error can be greatly improved when delay is allowed in the streaming setup. In the moderate deviations regime, the moderate deviations constant improves by a factor of . Assuming that can be approximated sufficiently well by , for the central limit regime, the second-order coding rate is improved (reduced) by a factor of . Another way to view this via the lens of the channel dispersion ; this parameter is approximately reduced by a factor of .

## Iv Proofs of the Main Theorems

### Iv-a Proof of Theorem 1 for the moderate deviations regime

Consider a DMC with and any sequence of integers such that , where and . We denote by an input distribution that achieves the dispersion (9).

#### Iv-A1 Encoding

For each and , generate in an i.i.d. manner according to . The generated codewords constitute the codebook . In block , after observing the true message sequence , the encoder sends .

#### Iv-A2 Decoding

Consider the decoding of at the end of block . In our scheme, the decoder not only decodes , but also re-decodes at the end of block .555We note that for has been already decoded at the end of block . Nevertheless, the decoder re-decodes at the end of , because the decoder needs to decode to decode and the probability of error associated with becomes lower (in general) by utilizing recent channel output sequences. Let denote the estimate of at the end of block . The decoder decodes sequentially from to as follows:

• Given , the decoder chooses according to the following rule.666When , is null. If there is a unique index that satisfies777We use the following notation for the set of codewords. Let for denote the set of message indices mapped to the -th codeword according to the encoding procedure. For and , we denote by the set of codewords .

 i(x[j:Tk](^GTk,[1:j−1],g[j:Tk]),y[j:Tk]) >(Tk−j+1)⋅logMn (18)

for some , let .888We note that in (18) is defined in terms of and . This dependence is suppressed henceforth. If there is none or more than one such , let .

• If , repeat the above procedure by increasing to . If , the decoding procedure terminates and the decoder declares that the -th message is .

#### Iv-A3 Error analysis

We first consider the probability of error averaged over random codebook . The error event for happens only if at least one of the following events occurs:

 Ek,j :={i(X[j:Tk](GTk),Y[j:Tk])≤(Tk−j+1)⋅logMn},\leavevmode\nobreak j∈[1:k] (19) ~Ek,j :={i(X[j:Tk](Gj−1,g[j:Tk]),Y[j:Tk])>(Tk−j+1)⋅logMn (20) \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak  for some g[j:Tk] such that % gj≠Gj},\leavevmode\nobreak j∈[1:k]. (21)

Now, we have

 ECn[Pr(^Gk≠Gk|Cn)] ≤k∑j=1(Pr(Ek,j)+Pr(~Ek,j)). (22)

For each , we have

 Pr(Ek,j)+Pr(~Ek,j) ≤Pr⎛⎝n(Tk−j+1)∑l=1i(Xl;Yl)≤(Tk−j+1)⋅logMn⎞⎠ (23) \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak +MTk−j+1nPr⎛⎝n(Tk−j+1)∑l=1i(Xl;¯Yl)>(Tk−j+1)logMn⎞⎠ (24) (a)=E⎡⎢⎣exp⎧⎨⎩−⎡⎣n(Tk−j+1)∑l=1i(Xl;Yl)−(Tk−j+1)logMn⎤⎦+⎫⎬⎭⎤⎥⎦ (25) =E⎡⎢⎣exp⎧⎨⎩−⎡⎣n(Tk−j+1)∑l=1i(Xl;Yl)−(Tk−j+1)n(C−ρn)⎤⎦+⎫⎬⎭⎤⎥⎦, (26)

where ’s are i.i.d. random variables each generated according to and is from the identity [8, Eq. (69)] used to derive the DT bound.

Now, fix an arbitrary . By applying the chain of inequalities [13, Eq. (53)-(56)], we have

 exp⎧⎨⎩−⎡⎣n(Tk−j+1)∑l=1i(Xl;Yl)−(Tk−j+1)n(C−ρn)⎤⎦+⎫⎬⎭ (27) ≤\mathbbm1⎧⎨⎩n(Tk−j+1)∑l=1i(Xl;Yl)≤(Tk−j+1)n(C−λρn)⎫⎬⎭+exp{−(Tk−j+1)n(1−λ)ρn}. (28)

Combining the bounds in (26) and (28), we obtain

 Pr(Ek,j)+Pr(~Ek,j) (29) ≤Pr⎛⎝n(Tk−j+1)∑l=1i(Xl;Yl)≤(Tk−j+1)n(C−λρn)⎞⎠+exp{−(Tk−j+1)n(1−λ)ρn} (30) (a)≤exp{−(Tk−j+1)n(λ2ρ2n2V−λ3ρ3nτ)}+exp{−(Tk−j+1)n(1−λ)ρn} (31)

for sufficiently large , where is some non-negative constant dependent only on the input distribution and channel statistics and is from the moderate deviations upper bound in Lemma 4, which is relegated to the end of this subsection. Also see Remark 1.

Now, we have

 ECn[Pr(^Gk≠Gk|Cn)] (32) ≤k∑j=1(exp{−(Tk−j+1)nρ2nλ2(12V−λρnτ)}+exp{−(Tk−j+1)n(1−λ)ρn}) (33) ≤Tk∑j=T(exp{−jnρ2nλ2(12V−λρnτ)}+exp{−jn(1−λ)ρn}) (34) ≤exp{−Tnρ2nλ2(12V−λρnτ)}1−exp{−nρ2nλ2(12V−λρnτ)}+exp{−Tn(1−λ)ρn}1−exp{−n(1−λ)ρn} (35)

for sufficiently large , which leads to

 limsupn→∞1nρ2nlogECn[limsupN→∞∑Nk=1Pr(^Gk≠Gk|Cn)N]≤−Tλ22V. (36)

Finally, by taking , we have

 limsupn→∞1nρ2nlogECn[limsupN→∞∑Nk=1Pr(^Gk≠Gk|Cn)N]≤−T2V. (37)

Hence, there must exist a sequence of codes that satisfies (12), which completes the proof.

The following lemma used in the proof of Theorem 1 corresponds to a non-asymptotic upper bound of the moderate deviations theorem [16, Theorem 3.7.1], whose proof is in Appendix B.

###### Lemma 4.

Let be a sequence of i.i.d. random variables such that , , and its cumulant generating function for is analytic around the origin and satisfies that is finite. For a sequence satisfying the moderate deviations constraints, i.e., and , the following bound holds:

 Pr(1nn∑l=1Zl≥εn) ≤exp{−n(ε2n2σ2−ε3n6σ6K)} (38)

for sufficiently large .

###### Remark 1.

Let us comment on the assumption in Lemma 4 that is finite. In our application,

 Zl≡i(Xl;Yl)−I(Xl;Yl). (39)

Then, we have

 h(s) =logE[exp{s(logW(Y1|X1)PXW(Y1)−I(X1;Y1))}] (40) =−sI(X1;Y1)+logE[(W(Y1|X1)PXW(Y1))s]. (41)

By differentiating thrice, we can show that is continuous in .999A detailed calculation follows similarly as in the proof of [12, Lemma 1]. Restricting to means that is a continuous function over a compact set. Hence its maximum is attained and is necessarily finite.

### Iv-B Proof of Theorem 2 for the central limit regime

Consider a DMC with . We remark that in the moderate deviations regime, for every block, the encoder maps all the previous messages to a codeword. For the central limit regime, we propose a coding strategy where the encoder maps only some recent messages to the codeword in each block. Similar idea of incorporating truncated memory was used in [5] with the focus on reducing the complexity. Here, we use a different memory structure from [5]. Let and denote the maximum and the minimum numbers of messages that can possibly be mapped to a codeword in each block, respectively. We choose the size of message alphabet as follows:

 logMn=A−2B+T+2A(nC−L√n) (42)

for some . To make the above choice of valid, we assume . Furthermore, we assume that the minimum encoding memory is at least , i.e., . We denote by an input distribution that achieves the dispersion (9).

#### Iv-B1 Encoding

Our encoder has a periodically time-varying memory with a period of blocks, after an initialization step of the first blocks. Let us first describe our message-codeword mapping rule for the case of and , which is illustrated in Fig. 3. For the first nine blocks, the encoder maps all the previous messages to a codeword. Since the maximum encoding memory is nine in this example, we truncate the messages that are mapped to a codeword on and after the tenth block, so that the encoding memory is periodically time-varying from four to nine with a period of six blocks. For instance, let us consider the first period from the tenth block to the fifteenth block. In the tenth block, the encoder maps the messages to a codeword, thus ensuring that the encoding memory is four. In block , the encoder maps the messages to a codeword and hence the encoding memory becomes the maximum memory of nine when .

Now, let us formally describe the encoding procedure for the general case. For each and , generate in an i.i.d. manner according to . In block , the encoder sends . Let for denote the set of block indices in the -th period on and after the -st block, i.e., . For each and ,101010In block , a total of messages, i.e., , are mapped to a codeword. generate in an i.i.d. manner according to . In block , the encoder sends .

On the other hand, we note that our message-codeword mapping rule is also periodic in the (vertical) axis of message index. We can group the messages according to the maximum block index to which a message is mapped. Let for denote the -th group of messages that are mapped to a codeword up to block , which is illustrated in Fig. 3 for the example of and . This grouping rule is useful for describing the decoding rule.

#### Iv-B2 Decoding

The decoding rule of at the end of block is exactly the same as that for the moderate deviations regime. Hence, from now on, let us focus on the decoding of for at the end of block . At the end of block , the decoder decodes not only , but also all the messages in the previous group and the previous messages in the current group,111111Similarly as in the moderate deviations regime, for has been already decoded at the end of block . Nevertheless, the decoder re-decodes some of the previous messages at the end of . i.e., . Let denote the estimate of at the end of block .

Let us first describe our decoding procedure for the example of , , and illustrated in Fig. 3. Consider the decoding of at the end of block .121212By using the symmetry of the message-codeword mapping rule, the procedure for decoding for the cases and can be stated in a similar manner. The decoder decodes not only , but also all the messages in and the previous messages in . The underlying rules of our decoding procedure can be summarized as follows:

• Since messages in , which we do not want to decode, are involved in blocks , we do not utilize the channel output sequences in those blocks for decoding.

• For the decoding of the -th message for , among the channel output sequences from block to block , we utilize the channel output sequences in which the -th message is involved.

According to the above rules, the blocks to be considered for the decoding of messages are as follows:

1. for , blocks131313We note the last block index to which the messages in are involved is if , and it is otherwise. In other words, the last block index to which the messages in are involved is . indexed from 10 to ,

2. for for , blocks indexed from to , and

3. for for , blocks indexed from to .

In particular, since the pairs of the first block index and the last block index to be considered for the decoding of messages are the same, we decode simultaneously. By keeping this in mind, our decoding procedure for for the example of , and is formally stated as follows:

1. If there is a unique index vector that satisfies141414Similarly as in the proof of Theorem 2, the following notation is used for the set of codewords. Let for denote the set of message indices mapped to the -th codeword according to the encoding procedure. For and , we denote by the set of codewords .

 i(x[10:νk](g[7:νk]),y[10:νk]) >(νk−6)⋅logMn (43)

for some , let . If there is none or more than one such , let .

2. The decoder sequentially decodes from to as follows:

• Given , the decoder chooses according to the following rule. If there is a unique index that satisfies

 i(x[j:νk](^GTk,[7:j−1],g[j:νk]),y[j:νk)]) >(νk−j+1)⋅logMn (44)

for some , let . If there is none or more than one such , let .

• If , repeat the above procedure by increasing to . If , proceed to the next decoding procedure.

3. The decoder sequentially decodes from to as follows:

• Given , the decoder chooses according to the following rule. If there is a unique index that satisfies

 i(x[j:Tk](^GTk,[7:j−1],g[j:Tk]),y[j:Tk]) >(Tk−j+1)⋅logMn (45)

for some , let . If there is none or more than one such , let .

• If , repeat the above procedure by increasing to . If , the whole decoding procedure terminates and the decoder declares that the -th message is .

The above description of the decoding procedure for the example in Fig. 3 is naturally extended for the general case. In general, the procedure for decoding of for at the end of block consists of the following three steps: (i) simultaneous non-unique decoding of the first messages in the previous group, (ii) sequential decoding of the remaining messages in the previous group, and (iii) sequential decoding of the messages in the current group up to the current block. Let us describe the decoding rule when in the following:

1. If there is a unique index vector that satisfies

 i(x[B:min(A,Tk)](gmin(A,Tk)),y[B:min(A,Tk)]) >min(A,Tk)⋅logMn (46)

for some , let . If there is none or more than one such , let .

2. The decoder sequentially decodes from to as follows:

• Given , the decoder chooses according to the following rule. If there is a unique index that satisfies

 i(x[j:min(A,Tk)](^GTk,[1:j−1],g[j:min(A,Tk)]),y[j:min(A,Tk)]) >(min(A,Tk)−j+1)⋅logMn (47)

for some , let . If there is none or more than one such , let .

• If , repeat the above procedure by increasing to . If , proceed to the next decoding procedure.

3. The decoder sequentially decodes from to as follows:

• Given , the decoder chooses according to the following rule. If there is a unique index that satisfies

 i(x[j:Tk](^GTk,[1:j−1],g[j:Tk]),y[j:Tk]) >(Tk−j+1)⋅logMn (48)

for some , let . If there is none or more than one such , let .

• If , repeat the above procedure by increasing to . If , the whole decoding procedure terminates and the decoder declares that the -th message is .

By exploiting the symmetry of the message-codeword mapping rule, the decoding rule for proceeds similarly.

#### Iv-B3 Error analysis

We first consider the probability of error averaged over random codebook . Let us consider the decoding of . Let . The error event happens only if at least one of the following events occurs:

 E(i)k :={i(X[B:α](Gα),Y[B:α])≤α⋅logMn} (49) ~E(i)k :={i(X[B:α](gα),Y[B:α])>