Delay Asymptotics with Retransmissions and Incremental Redundancy Codes over Erasure Channels
Abstract
Recent studies have shown that retransmissions can cause heavytailed transmission delays even when packet sizes are lighttailed. Moreover, the impact of heavytailed delays persists even when packets size are upper bounded. The key question we study in this paper is how the use of coding techniques to transmit information, together with different system configurations, would affect the distribution of delay. To investigate this problem, we model the underlying channel as a Markov modulated binary erasure channel, where transmitted bits are either received successfully or erased. Erasure codes are used to encode information prior to transmission, which ensures that a fixed fraction of the bits in the codeword can lead to successful decoding. We use incremental redundancy codes, where the codeword is divided into codeword trunks and these trunks are transmitted one at a time to provide incremental redundancies to the receiver until the information is recovered. We characterize the distribution of delay under two different scenarios: (I) Decoder uses memory to cache all previously successfully received bits. (II) Decoder does not use memory, where received bits are discarded if the corresponding information cannot be decoded. In both cases, we consider codeword length with infinite and finite support. From a theoretical perspective, our results provide a benchmark to quantify the tradeoff between system complexity and the distribution of delay.
I Introduction
Retransmission is the basic component used in most medium access control protocols and it is used to ensure reliable transfer of data over communication channels with failures [1]. Recent studies [2][3][4] have revealed the surprising result that retransmissionbased protocols could cause heavytailed transmission delays even if the packet length is light tail distributed, resulting in very long delays and possibly zero throughput. Moreover, [5] shows that even when the packet sizes are upper bounded, the distribution of delay, although eventually lighttailed, may still have a heavytailed main body, and that the heavytailed main body could dominate even for relatively small values of the maximum packet size. In this paper we investigate the use of coding techniques to transmit information in order to alleviate the impact of heavy tails, and substantially reduce the incurred transmission delay.
In our analysis, we focus on the Binary Erasure Channel. Erasures in communication systems can arise in different layers. At the physical layer, if the received signal falls outside acceptable bounds, it is declared as an erasure. At the data link layer, some packets may be dropped because of checksum errors. At the network layer, packets that traverse through the network may be dropped because of buffer overflow at intermediate nodes and therefore never reach the destination. All these errors can result in erasures in the received bit stream.
In order to investigate how different coding techniques would affect the delay distribution, we use a general coding framework called incremental redundancy codes. In this framework, each codeword is split into several pieces with equal size, which are called codeword trunks. The sender sends only one codeword trunk at a time. If the receiver cannot decode the information, it will request the sender to send another piece of the codeword trunk. Therefore, at every transmission, the receiver gains extra information, which is called incremental redundancy.
In order to combat channel erasures, we use erasure codes as channel coding to encode the information. Erasure codes represent a group of coding schemes which ensure that even when some portions of the codeword are lost, it is still possible for the receiver to recover the corresponding information. Roughly speaking, the encoder transforms a data packet of symbols into a longer codeword of symbols, where the ratio is called the coderate. An erasure code is said to be near optimal if it requires slightly more than symbols, say symbols, to recover the information, where can be made arbitrary small at the cost of increased encoding and decoding complexity. Many elegant low complexity erasure codes have been designed for erasure channels, e.g., Tornado Code [7], LT code [8], and Raptor code [9]. For the sake of simplicity, throughout the paper, we assume . In other words, any fraction of the codeword can recover the corresponding information and a lower indicates a larger redundancy in the codeword.
We specify different scenarios in this paper. In the first scenario, as shown in Fig. 1, the entire codeword is transmitted as a unit, and received bits are simply discarded if the corresponding information cannot be recovered. Note that in this scenario, the decoder memory is not exploited for caching received bits across different transmissions. This scenario occurs because the receiver may not have the requisite computation/storage power to keep track of all the erasure positions and the bits that have been previously received, especially when the receiver is responsible for handling a large number of flows simultaneously. In the second scenario, we assume that the receiver has enough memory space and computational power to accumulate received bits from different (re)transmissions, which enables the use of incremental redundancy codes, where a codeword of length is split into codeword trunks with equal size, and these codeword trunks are transmitted one at a time. At the receiver, all successfully received bits from every transmission are buffered at the receiver memory according to their positions in the codeword. If the receiver cannot decode the corresponding information, it will request the sender to send another piece of codeword trunk. At the sender, these codeword trunks are transmitted in a roundrobin manner. We call these two scenarios Decoder that does not use memory and Decoder that uses momery, respectively.
Given the above two different types of decoder, there are two more factors that can affect the distribution of delay. (I) Channel Dynamics: In order to capture the time correlation nature of the wireless channels, we assume that the channel is Markovian modulated. More specifically, we assume a time slotted system where one bit can be transmitted per time slot, and the current channel state distribution depends on channel states in the previous time slots. When , it corresponds to the i.i.d. channel model. (II) Codeword length distribution: We assume throughout the paper that the codeword length is light tail distributed, which implies that the system works in a benign environment. We consider two different codeword length distributions, namely, codeword length with infinite support and codeword length with finite support, respectively. For the former, the codeword length distribution has an exponentially decaying tail with decay rate , for the latter, the codeword length has an upper bound .
Contribution
The main contribution of this work is the following:

When decoder memory is not exploited, the tail of the delay distribution depends on the code rate. Specifically, we show that when the coding rate is above a certain threshold, the delay distribution is heavy tailed, otherwise it is light tailed. This shows that substantial gains in delay can be achieved over the standard retransmission case (repetition coding) by adding a certain amount of redundancy in the codeword. As mentioned earlier, prior work has shown that repetition coding results in heavy tailed delays even when the packet size are light tailed.

When decoder memory is exploited, the tail of the delay distribution is always lighttailed. This implies that the use of receiver memory results in a further substantial reduction in the transmission delay.

The aforementioned results are for the case when the codeword size can have infinite support. We also characterize the transmission delay for each of the above cases when the codeword size has finite support (zerotailed), and show similar tradeoffs between the coding rate and use of receiver memory in terms of the main body of the delay distribution (rather than the eventual tail).
The remainder of this paper is structured as follows: In Section II, we describe the system model. In Section III we consider the scenario where the decoder memory is exploited. Then, in Section IV we investigate the situation where the decoder does not use memory. Finally, in Section V, we provide numerical studies to verify our main results.
Ii System Model
The channel dynamics are modeled as a slotted system where one bit can be transmitted per slot. Furthermore, we assume that the slotted channel is characterized by a binary stochastic process , where corresponds to the situation when the bit transmitted at time slot is successfully received, and when the bit is erasured (called an erasure).
Since, in practice, the channel dynamics are often temporarily correlated, we investigate the situation in which the current channel state distribution depends on the channel states in the preceding time slots. More precisely, for and fixed , we define for with for , and assume that for all . To put it another way, the augmented state forms a Markov chain. Let denote the transition matrix of the Markov chain , where
with being the onestep transition probability from state to state . Throughout this paper, we assume that is irreducible and aperiodic, which ensures that this Markov chain is ergodic [10]. Therefore, for any initial value , the parameter is well defined and given by
and, from ergodic theorem (see Theorem 1.10.2 in [10])
which means the longterm fraction of the bits that can be successfully received is equal to . Therefore, we call the channel capacity.
In the degenerated case when , we have a memoryless binary erasure channel (i.i.d. binary erasure channel). Correspondingly, and .
As mentioned in the introduction, we study two different scenarios in this paper, namely decoder that uses memory and decoder that does not use memory. In the first scenario, the sender splits a codeword into codeword trunks with equal size and transmits them one at a time in a roundrobin manner, while the receiver uses memory to cache all previously successfully received bits according to their positions in the codeword. In the second scenario, the receiver discards any successfully received bits if they cannot recover the corresponding information, and the sender transmits the entire codeword as a unit.
We let denote the number of bits in the codeword with infinite support, and assume that there exist and such that
(1) 
We let denote the number of bits in the codeword with finite support, with being the maximum codeword length, and let for any . We focus on erasure codes, where a fixed fraction () of bits in the codeword can lead to a successful decoding. We call this fraction coderate.
Formal definitions of the number of retransmissions and the delays are given as follows:
Definition 1 (Decoder that uses memory).
The total number of transmissions for a codeword with variable length and number of codeword trunks when the decoder uses memory is defined as
The transmission delay is defined as .
Definition 2 (Decoder that does not use memory).
The total number of transmissions for a codeword with variable length when the decoder does not use memory is defined as
The transmission delay is defined as .
For a codeword with variable length , the corresponding numbers of transmissions and delays are denoted as and , respectively.
Notations
In order to present the main results, we introduce some necessary notations here.
Notation 1.
Let denote the PerronFrobenius eigenvalue (see Theorem 3.11 in [11]) of the matrix , which is the largest eigenvalue of .
Notation 2.
For , let denote the state space of , where and . Then, we define a mapping from to as
Notation 3.
Let denote the large deviation rate function, which is given by
where^{1}^{1}1For a matrix , is the fold Kronecker product of with itself, or we can call it the Kronecker power of .
Notation 4.
Let denote the root of the rate function . More precisely,
Notation 5.
Iii Decoder that uses Memory
When the decoder uses memory to cache all previously successfully received bits, we can apply incremental redundancy codes, where the sender splits a codeword into codeword trunks and transmits one codeword trunk at a time. If the receiver, after receiving a codeword trunk, is not able to decode the corresponding information, it will use memory to cache the successfully received bits in the codeword trunk and request the sender to send another codeword trunk. In this way, at every transmission, the receiver gains extra information, which we call incremental redundancy. The sender will send these codeword trunks in a roundrobin manner, meaning that if all of the codeword trunks have been requested, it will start over again with the first codeword trunk. It should be noted that incremental redundancy code is a fairly general framework in that if , it degenerates to a fixed rate erasure code, while as approaches infinity, it resembles a rateless erasure code.
Iiia Codeword with infinite support
When the distribution of codeword length has an exponentially decaying tail with decay rate , as indicated by Equation (1), we find that the delay will always be lighttailed, and we characterize the decay rate in Theorem 1.
Theorem 1.
In the case when the decoder uses memory, when we apply incremental redundancy code with parameter to transmit codeword with variable length , we obtain a lower and upper bound on the decay rate of delay,
In the special case when ,
The definitions of and can be found in Notation 5.
Proof:
see Section VIB.
Remark 1.1.
From the definitions of and in Notation 5 we observe that firstly, the decay rate of delay when is no greater than the decay rate of delay when (), which means that incremental redundancy code () outperforms fixed rate erasure code (); secondly, the decay rate of delay increases with the increase of , which means we can reduce delay by increasing the number of codeword trunks . These observations are verified through Example 1 in Section V.
IiiB Codeword with finite support
In practice, codeword length is bounded by the maximum transmission unit (MTU). Therefore, we investigate the case when the codeword has variable length , with being the maximum codeword length, and characterize the corresponding delay distribution in Theorem 2.
Theorem 2.
In the case when decoder uses memory, when we apply incremental redundancy code with parameter to transmit codeword with variable length , we get
1) for any and any , we can find such that for any , we have
,
2) in the special case when , for any and any , we can find such that for any , we have ,
where
Proof:
see Section VIC.
Remark 2.1.
This theorem shows that even if the codeword length has an upper bound , the distribution of delay still has a lighttailed main body whose decay rate is similar as the decay rate of the infinite support scenario. The waist of this main body is when and when . Since both and are independent of , we know that the waist of this lighttailed main body scales linearly with respect to the maximum codeword length . This theorem is verified through Example 2 in Section V.
Iv Decoder that does not use Memory
For receivers that do not have the required computation/storage power, it is difficult to keep track of all the erasure positions and the bits that have been successfully received. Therefore, in this section, we study the case when the decoder does not use memory, as illustrated in Fig. 1. In this situation, since the receiver simply discards any successfully received bits if they cannot recover the corresponding information, it is better for the sender to transmit the whole codeword as a unit instead of dividing the codeword into pieces before transmission.
Iva Codeword with infinite support
Interestingly, we observe an intriguing threshold phenomenon. We show that when the codeword length distribution is lighttailed and has an infinite support, the transmission delay is lighttailed (exponential) only if , and heavytailed (power law) if .
Theorem 3 (Threshold phenomenon).
In the case when decoder does not use memory and the codeword has variable length , we get

if , then

if , then
The definition of can be found in Notation 3.
Proof:
see Section VID.
Remark 3.1.
The tail distribution of the transmission delay changes from power law to exponential, depending on the relationship between coderate and channel capacity . If , the system even has a zero throughput.
IvB Codeword with finite support
Under the heavytailed delay case when , we can further show that if the codeword length is upper bounded, the delay distribution still has a heavytailed main body, although it eventually becomes lighttailed.
Theorem 4.
In the case when decoder does not use memory and the codeword has variable length , if , for any , we can find and such that
1) For any , we have ,
2)
3) For any , we have ,
4)
where
(2) 
The definition of can be found in Notation 3.
Proof:
see Section VIE.
Remark 4.1.
From Equation (2) and by Lemma 3, we can obtain
(3) 
which implies that increases exponentially fast with the increase of maximum codeword length . Since the waist of the heavytailed main body of the delay distribution is , we know that the waist also scales exponentially fast as we increase the maximum codeword length .
From Theorem 4 we know that even if the codeword length is bounded, the heavytailed main body could still play a dominant role. From Theorem 3 we know that when and , the throughput will vanish to zero as approaches infinity. Now we explore how fast the throughput vanishes to zero as increases.
Let be the i.i.d. sequence of codeword lengths with distribution . Denote as the transmission delay of . The throughput of this system is defined as .
Theorem 5 (Throughput).
In the case when decoder does not use memory and the codeword has variable length , if and , we have
The definition of can be found in Notation 3.
Proof:
see Section VIF.
Remark 5.1.
Theorem 5 indicates that when coderate is greater than channel capacity and , as the maximum codeword length increases, the throughput vanishes to at least exponentially fast with rate .
V Simulations
In this section, we conduct simulations to verify our main results. As is evident from the following figures, the simulations match theoretical results well.
Example 1.
In this example, we study the case when the decoder uses memory and the codeword length has infinite support. We assume that the channel is i.i.d.(). As shown in Theorem 1, under the above assumptions, the delay distribution is always lighttailed. In order to verify this result, we assume that is geometrically distributed with mean (), and choose coderate and channel capacity . By Theorem 1 we know that when , the decay rate of delay is ; when , the decay rate of delay is ; when , the decay rate of delay is . From Fig. 2 we can see that the decay rate of delay increases when increases from to , and the theoretical result is quite accurate.
Example 2.
In this simulation, we study the case when the decoder uses memory and the codeword length has a finite support. We assume that the channel is i.i.d. (), coderate , , , and channel capacity . From these system parameters we can calculate and . We choose four sets of maximum codeword length as . Theorem 2 indicates that the delay distribution has a lighttailed main body with decay rate and waist . In Fig. 3 we plot the delay distributions when together with the infinite support case when , and we use a short solid line to indicate the waist of the lighttailed main body. As we can see from Fig. 3, the theoretical waists of the main bodies, which are , are close to the simulation results.
Example 3.
Now we use simulations to verify Theorem 4. Theorem 4 says that when the decoder does not use memory, if code rate is greater than channel capacity and the codeword length has a finite support, the distribution of delay as well as the distribution of number of retransmissions have a heavytailed main body and an exponential tail. The waist of the main body increases exponentially fast with the increase of maximum codeword length . In this experiment, we set coderate , channel capacity , and . From these parameters we can get . We choose four sets of maximum codeword length as . As Equation (3) indicates, the waist of the heavytailed main bodies of the number of retransmissions is In Fig. 4, we plot the distribution of the number of retransmissions when together with the infinite support case when , and we use a short solid line to indicate the waist of the heavytailed main body. As can be seen from Fig. 4, the simulation matches with our theoretical result.
Vi Proofs
Via Lemmas
In order to prove the theorems, first we need the following three lemmas.
Lemma 1.
where
Proof:
First we consider the case when . By Definition 1, we have
(4) 
where , .
Let and
If , then given , forms a Markov chain with state space and probability transition matrix . We further observe that if , we have the following relationship
Using the above observation, we can construct upper and lower bounds as follows.
(5) 
By a direct application of Theorem 3.1.2 in [11], we know that for a given and any values of , we can find such that
(6)  
(7) 
whenever . Since is a large deviation rate function, from [11] we know that
(8) 
The upper and lower bounds (7) and (6), together with Equation (4), (5) and (8), imply that
which, with , completes the proof when . Next, let us consider the case when .
In this memoryless channel case, for a single bit in the codeword, after transmission, the probability that this bit is successfully received is . Therefore equivalently, we can consider a single transmission in a memoryless channel with erasure probability . Then, by a direct application of GärtnerEllis theorem (Theorem 2.3.6 in [11]), we have, for any ,
where
and .
Lemma 2.
Assume is a function of , which satisfies . Then, for any we have
Proof:
Lemma 3.

if , then
where as .

if , then
where as .
Proof:
ViB Proof of Theorem 1
Proof:
Observe that
(13) 
Let us first focus on the first part of Equation (13). Denote , then it is easy to check that