Streaming Data Transmission in the Moderate Deviations and Central Limit Regimes
Abstract
We consider streaming data transmission over a discrete memoryless channel. A new message is given to the encoder at the beginning of each block and the decoder decodes each message sequentially, after a delay of blocks. In this streaming setup, we study the fundamental interplay between the rate and error probability in the central limit and moderate deviations regimes and show that i) in the moderate deviations regime, the moderate deviations constant improves over the block coding or nonstreaming setup by a factor of and ii) in the central limit regime, the secondorder coding rate improves by a factor of approximately for a wide range of channel parameters. For both regimes, we propose coding techniques that incorporate a joint encoding of fresh and previous messages. In particular, for the central limit regime, we propose a coding technique with truncated memory to ensure that a summation of constants, which arises as a result of applications of the central limit theorem, does not diverge in the error analysis.
Furthermore, we explore interesting variants of the basic streaming setup in the moderate deviations regime. We first consider a scenario with an erasure option at the decoder and show that both the exponents of the total error and the undetected error probabilities improve by factors of . Next, by utilizing the erasure option, we show that the exponent of the total error probability can be improved to that of the undetected error probability (in the order sense) at the expense of a variable decoding delay. Finally, we also extend our results to the case where the message rate is not fixed but alternates between two values.
I Introduction
In many multimedia applications, a stream of data packets is required to be sequentially encoded and decoded under strict latency constraints. For such a streaming setup, both the fundamental limits and optimal schemes can differ from classical communication systems. In recent years, there has been a growing interest in the characterization of fundamental limits for streaming data transmission [1, 2, 3, 4, 5, 6]. In [1, 2, 3], coding techniques based on tree codes were proposed for streaming setup with applications to control systems. In [4], Khisti and Draper established the optimal diversitymultiplexing tradeoff (DMT) for streaming over a blockfading multipleinput multipleoutput channel. In [5], the same authors proposed a coding technique using finite memory for streaming over discrete memoryless channels (DMCs) that attains the same reliability as previously known semiinfinite coding techniques with growing memory. In [6], the error exponent was studied in a streaming setup of distributed source coding. We note that these prior works assumed that the code operates in the large deviations regime in which the rate is bounded away from capacity (or the rate pair is strictly inside the optimal rate region for compression problems) and the error probability decays exponentially as the blocklength increases.
Other interesting asymptotic regimes include the central limit and moderate deviations regimes. Let denote the blocklength of a single message henceforth. In the central limit regime, the rate approaches to the capacity at a speed proportional to and the error probability does not vanish as the blocklength increases. In the moderate deviations regime, the rate approaches to the capacity strictly slower than and the error probability decays subexponentially fast as the blocklength increases. For block coding problems, both regimes have received a fair amount of attention recently. These works aim to characterize the fundamental interplay between the coding rate and error probability. The most notable early work on channel coding in the central limit regime (also known as secondorder asymptotics or the normal approximation regime) is that of Strassen [7], who considered DMCs and showed that the backoff from capacity scales as when the error probability is fixed. Strassen also deduced the constant of proportionality, which is related to the socalled dispersion [8]. Hayashi [9] considered DMCs with cost constraints as well as discrete channels with Markovian memory. Polyanskiy et al. [8] refined the asymptotic expansions and also compared the normal approximation to the finite blocklength (nonasymptotic) fundamental limits. For a review and extensions to multiterminal models, the reader is referred to [10]. For the moderate deviations regime, He et al. [11] considered fixedtovariable length source coding with decoder side information. Altuğ and Wagner [12] initiated the study of moderate deviations for channel coding, specifically DMCs. Polyanskiy and Verdú [13] relaxed some assumptions in the conference version of Altuğ and Wagner’s work [14] and they also considered moderate deviations for additive white Gaussian noise (AWGN) channels. However, this line of research has not been extensively studied for the streaming setup. To the best of our knowledge, there has been no prior work on the streaming setup in the moderate deviations and central limit regimes with the exception [15] where the focus is on source coding.
In this paper, we study streaming data transmission over a DMC in the moderate deviations and central limit regimes. Our streaming setup is illustrated in Fig. 1. In each block of length , a new message is given to the encoder at the beginning, and the encoder generates a codeword as a function of all the past and current messages and transmits it over the channel. The decoder, given all the past received channel output sequences, decodes each message after a delay of blocks. This streaming setup introduces a new dimension not present in the block coding problems studied previously. In the special case of , the setup reduces to the block channel coding problem. If , however, there exists an inherent tension in whether we utilize a block only for the fresh message or use it also for the previous messages with earlier deadlines. It is not difficult to see that due to the memoryless nature of the model, a time sharing scheme^{1}^{1}1In a time sharing scheme, some fraction of a block is used for a fresh message and some other fraction of the block is used for previous messages. will not provide any gain compared to the case of . A natural question is whether a joint encoding of fresh and previous messages would improve the performance when .
Our results indicate that the fundamental interplay between the rate and error probability can be greatly improved when delay is allowed in the streaming setup. In the moderate deviations regime, the moderate deviations constant is shown to improve over the block coding or nonstreaming setup by a factor of . In the central limit regime, the secondorder coding rate is shown to improve by a factor of approximately for a wide range of channel parameters. For both asymptotic regimes, we propose coding techniques that incorporate a joint encoding of fresh and previous messages. For the moderate deviations regime, we propose a coding technique in which, for every block, the encoder jointly encodes all the previous and fresh messages and the decoder redecodes all the previous messages in addition to the current target message. For the error analysis of this coding technique, we develop a refined and nonasymptotic version of the moderate deviations upper bound in [16, Theorem 3.7.1] that allows us to uniformly bound the error probabilities associated with the previous messages. On the other hand, for the central limit regime, we cannot apply such a coding technique whose memory is linear in the block index. In the error analysis in the central limit regime, we encounter a summation of constants as a result of applications of the central limit theorem. If the memory is linear in the block index, this summation causes the upper bound on the error probability to diverge as the block index tends to infinity. Hence, for the central limit regime, we propose a coding technique with truncated memory where the memory at the encoder varies in a periodic fashion. Our proposed construction judiciously balances the rate penalty imposed due to the truncation and the growth in the error probability due to the contribution from previous messages. By analyzing the secondorder coding rate of our proposed setup, we conclude that the channel dispersion parameter also decreases approximately by a factor of for a wide range of channel parameters.
Furthermore, we explore interesting variants of the basic streaming setup in the moderate deviations regime. First, we consider a scenario where there is an erasure option at the decoder and analyze the undetected error and the total error probabilities, extending a result by Hayashi and Tan [17]. Next, by utilizing the erasure option, we analyze the rate of decay of the error probability when a variable decoding delay is allowed. We show that such a flexibility in the decoding delay can dramatically improve the error probability in the streaming setup. This result is the analog of the classical results on variablelength decoding (see e.g., [18]) to the streaming setup. Finally, as a simple example for the case where the message rates are not constant, we consider a scenario where the rate of the messages in odd block indices and the rate of the messages in even block indices are different and analyze the moderate deviations constants separately for the two types of messages. This setting finds applications in video and audio coding where streams of data packets do not necessarily have a constant rate.
The rest of this paper is organized as follows. In Section II, we formally state our streaming setup. The main theorems are presented in Section III and proved in Section IV. In Section V, the moderate deviations result for the basic streaming setup is extended in various directions. We conclude this paper in Section VI.
Ia Notation
The following notation is used throughout the paper. We reserve boldfont for vectors whose lengths are the same as blocklength . For two integers and , denotes the set . For constants and , denotes the vector and denotes where the subscript is omitted when , i.e., . This notation is naturally extended for vectors , random variables , and random vectors . for an event denotes the indicator function, i.e., it is 1 if is true and 0 otherwise. and denote the ceiling and floor functions, respectively.
For a DMC and an input distribution , we use the following standard notation and terminology in information theory:

Information density:
(1) where denotes the output distribution. We note that depends on and but this dependence is suppressed. The definition (1) can be generalized for two vectors and of length as follows:
(2) 
Mutual information:
(3) (4) 
Unconditional information variance:
(5) 
Conditional information variance:
(6) 
Capacity:
(7) where denotes the probability simplex on .

Set of capacityachieving input distributions:
(8)
Ii Model
Consider a DMC . A streaming code is defined as follows:
Definition 1 (Streaming code).
An streaming code consists of

a sequence of messages each distributed uniformly over ,

a sequence of encoding functions that maps the message sequence to the channel input codeword , and

a sequence of decoding functions that maps the channel output sequences to a message estimate ,
that satisfies
(11) 
i.e., the probability of error averaged over all block messages does not exceed .
We note that a streaming code with a fixed blocklength consists of a sequence of encoding and decoding functions since a stream of messages is sequentially encoded and decoded. Fig. 1 illustrates our streaming setup for the case with . In the beginning of block , new message is given to the encoder. The encoder generates a codeword as a function of all the past and current messages and transmits it over the channel in block . Since , the decoder decodes message at the end of block , as a function of all the past received channel output sequences .
Iii Main Results
In this section, we state our main results. The following two theorems present achievability bounds for the moderate deviations and the central limit regimes, respectively, which are proved in Section IV.
Theorem 1 (Moderate deviations regime).
Consider a DMC with and any sequence of integers such that , where and .^{2}^{2}2Throughput the paper, we ignore integer constraints on the number of codewords . Then, there exists a sequence of streaming codes such that^{3}^{3}3If for some , corresponds to an upper bound on the moderate deviations constant. In the special case of , the moderate deviations constant is shown to be the channel dispersion in [12, 13].
(12) 
Theorem 2 (Central limit regime).
Consider a DMC with . For any and , there exists a sequence of streaming codes such that^{4}^{4}4 is termed secondorder coding rate in this paper. This is slightly different from what is common in the literature where instead is known as the secondorder coding rate [9].
(13) 
and
(14) 
The following corollary, whose proof is in Appendix A, elucidates a closedform and interpretable expression for the upper bound on the error probability in (14).
Corollary 3.
Consider a DMC with . For any , there exists a sequence of streaming codes such that
(15) 
and
(16) 
where defined in the following has the property that for every , tends to 1 as tends to infinity:
(17) 
Fig. 2 illustrates how fast the constant in Corollary 3 converges to 1 as increases. For , we can see that is less than 1.1 when and is less than 1.05 when . Hence, the effect of the constant is not significant for a wide range of and .
Theorems 1 and 2 illustrate that the fundamental interplay between the rate and probability of error can be greatly improved when delay is allowed in the streaming setup. In the moderate deviations regime, the moderate deviations constant improves by a factor of . Assuming that can be approximated sufficiently well by , for the central limit regime, the secondorder coding rate is improved (reduced) by a factor of . Another way to view this via the lens of the channel dispersion ; this parameter is approximately reduced by a factor of .
Iv Proofs of the Main Theorems
Iva Proof of Theorem 1 for the moderate deviations regime
Consider a DMC with and any sequence of integers such that , where and . We denote by an input distribution that achieves the dispersion (9).
IvA1 Encoding
For each and , generate in an i.i.d. manner according to . The generated codewords constitute the codebook . In block , after observing the true message sequence , the encoder sends .
IvA2 Decoding
Consider the decoding of at the end of block . In our scheme, the decoder not only decodes , but also redecodes at the end of block .^{5}^{5}5We note that for has been already decoded at the end of block . Nevertheless, the decoder redecodes at the end of , because the decoder needs to decode to decode and the probability of error associated with becomes lower (in general) by utilizing recent channel output sequences. Let denote the estimate of at the end of block . The decoder decodes sequentially from to as follows:

Given , the decoder chooses according to the following rule.^{6}^{6}6When , is null. If there is a unique index that satisfies^{7}^{7}7We use the following notation for the set of codewords. Let for denote the set of message indices mapped to the th codeword according to the encoding procedure. For and , we denote by the set of codewords .
(18) for some , let .^{8}^{8}8We note that in (18) is defined in terms of and . This dependence is suppressed henceforth. If there is none or more than one such , let .

If , repeat the above procedure by increasing to . If , the decoding procedure terminates and the decoder declares that the th message is .
IvA3 Error analysis
We first consider the probability of error averaged over random codebook . The error event for happens only if at least one of the following events occurs:
(19)  
(20)  
(21) 
Now, we have
(22) 
For each , we have
(23)  
(24)  
(25)  
(26) 
where ’s are i.i.d. random variables each generated according to and is from the identity [8, Eq. (69)] used to derive the DT bound.
Now, fix an arbitrary . By applying the chain of inequalities [13, Eq. (53)(56)], we have
(27)  
(28) 
Combining the bounds in (26) and (28), we obtain
(29)  
(30)  
(31) 
for sufficiently large , where is some nonnegative constant dependent only on the input distribution and channel statistics and is from the moderate deviations upper bound in Lemma 4, which is relegated to the end of this subsection. Also see Remark 1.
Now, we have
(32)  
(33)  
(34)  
(35) 
for sufficiently large , which leads to
(36) 
Finally, by taking , we have
(37) 
Hence, there must exist a sequence of codes that satisfies (12), which completes the proof.
The following lemma used in the proof of Theorem 1 corresponds to a nonasymptotic upper bound of the moderate deviations theorem [16, Theorem 3.7.1], whose proof is in Appendix B.
Lemma 4.
Let be a sequence of i.i.d. random variables such that , , and its cumulant generating function for is analytic around the origin and satisfies that is finite. For a sequence satisfying the moderate deviations constraints, i.e., and , the following bound holds:
(38) 
for sufficiently large .
Remark 1.
Let us comment on the assumption in Lemma 4 that is finite. In our application,
(39) 
Then, we have
(40)  
(41) 
By differentiating thrice, we can show that is continuous in .^{9}^{9}9A detailed calculation follows similarly as in the proof of [12, Lemma 1]. Restricting to means that is a continuous function over a compact set. Hence its maximum is attained and is necessarily finite.
IvB Proof of Theorem 2 for the central limit regime
Consider a DMC with . We remark that in the moderate deviations regime, for every block, the encoder maps all the previous messages to a codeword. For the central limit regime, we propose a coding strategy where the encoder maps only some recent messages to the codeword in each block. Similar idea of incorporating truncated memory was used in [5] with the focus on reducing the complexity. Here, we use a different memory structure from [5]. Let and denote the maximum and the minimum numbers of messages that can possibly be mapped to a codeword in each block, respectively. We choose the size of message alphabet as follows:
(42) 
for some . To make the above choice of valid, we assume . Furthermore, we assume that the minimum encoding memory is at least , i.e., . We denote by an input distribution that achieves the dispersion (9).
IvB1 Encoding
Our encoder has a periodically timevarying memory with a period of blocks, after an initialization step of the first blocks. Let us first describe our messagecodeword mapping rule for the case of and , which is illustrated in Fig. 3. For the first nine blocks, the encoder maps all the previous messages to a codeword. Since the maximum encoding memory is nine in this example, we truncate the messages that are mapped to a codeword on and after the tenth block, so that the encoding memory is periodically timevarying from four to nine with a period of six blocks. For instance, let us consider the first period from the tenth block to the fifteenth block. In the tenth block, the encoder maps the messages to a codeword, thus ensuring that the encoding memory is four. In block , the encoder maps the messages to a codeword and hence the encoding memory becomes the maximum memory of nine when .
Now, let us formally describe the encoding procedure for the general case. For each and , generate in an i.i.d. manner according to . In block , the encoder sends . Let for denote the set of block indices in the th period on and after the st block, i.e., . For each and ,^{10}^{10}10In block , a total of messages, i.e., , are mapped to a codeword. generate in an i.i.d. manner according to . In block , the encoder sends .
On the other hand, we note that our messagecodeword mapping rule is also periodic in the (vertical) axis of message index. We can group the messages according to the maximum block index to which a message is mapped. Let for denote the th group of messages that are mapped to a codeword up to block , which is illustrated in Fig. 3 for the example of and . This grouping rule is useful for describing the decoding rule.
IvB2 Decoding
The decoding rule of at the end of block is exactly the same as that for the moderate deviations regime. Hence, from now on, let us focus on the decoding of for at the end of block . At the end of block , the decoder decodes not only , but also all the messages in the previous group and the previous messages in the current group,^{11}^{11}11Similarly as in the moderate deviations regime, for has been already decoded at the end of block . Nevertheless, the decoder redecodes some of the previous messages at the end of . i.e., . Let denote the estimate of at the end of block .
Let us first describe our decoding procedure for the example of , , and illustrated in Fig. 3. Consider the decoding of at the end of block .^{12}^{12}12By using the symmetry of the messagecodeword mapping rule, the procedure for decoding for the cases and can be stated in a similar manner. The decoder decodes not only , but also all the messages in and the previous messages in . The underlying rules of our decoding procedure can be summarized as follows:

Since messages in , which we do not want to decode, are involved in blocks , we do not utilize the channel output sequences in those blocks for decoding.

For the decoding of the th message for , among the channel output sequences from block to block , we utilize the channel output sequences in which the th message is involved.
According to the above rules, the blocks to be considered for the decoding of messages are as follows:

for , blocks^{13}^{13}13We note the last block index to which the messages in are involved is if , and it is otherwise. In other words, the last block index to which the messages in are involved is . indexed from 10 to ,

for for , blocks indexed from to , and

for for , blocks indexed from to .
In particular, since the pairs of the first block index and the last block index to be considered for the decoding of messages are the same, we decode simultaneously. By keeping this in mind, our decoding procedure for for the example of , and is formally stated as follows:

If there is a unique index vector that satisfies^{14}^{14}14Similarly as in the proof of Theorem 2, the following notation is used for the set of codewords. Let for denote the set of message indices mapped to the th codeword according to the encoding procedure. For and , we denote by the set of codewords .
(43) for some , let . If there is none or more than one such , let .

The decoder sequentially decodes from to as follows:

Given , the decoder chooses according to the following rule. If there is a unique index that satisfies
(44) for some , let . If there is none or more than one such , let .

If , repeat the above procedure by increasing to . If , proceed to the next decoding procedure.


The decoder sequentially decodes from to as follows:

Given , the decoder chooses according to the following rule. If there is a unique index that satisfies
(45) for some , let . If there is none or more than one such , let .

If , repeat the above procedure by increasing to . If , the whole decoding procedure terminates and the decoder declares that the th message is .

The above description of the decoding procedure for the example in Fig. 3 is naturally extended for the general case. In general, the procedure for decoding of for at the end of block consists of the following three steps: (i) simultaneous nonunique decoding of the first messages in the previous group, (ii) sequential decoding of the remaining messages in the previous group, and (iii) sequential decoding of the messages in the current group up to the current block. Let us describe the decoding rule when in the following:

If there is a unique index vector that satisfies
(46) for some , let . If there is none or more than one such , let .

The decoder sequentially decodes from to as follows:

Given , the decoder chooses according to the following rule. If there is a unique index that satisfies
(47) for some , let . If there is none or more than one such , let .

If , repeat the above procedure by increasing to . If , proceed to the next decoding procedure.


The decoder sequentially decodes from to as follows:

Given , the decoder chooses according to the following rule. If there is a unique index that satisfies
(48) for some , let . If there is none or more than one such , let .

If , repeat the above procedure by increasing to . If , the whole decoding procedure terminates and the decoder declares that the th message is .

By exploiting the symmetry of the messagecodeword mapping rule, the decoding rule for proceeds similarly.
IvB3 Error analysis
We first consider the probability of error averaged over random codebook . Let us consider the decoding of . Let . The error event happens only if at least one of the following events occurs:
(49)  