Novel Bounds on the Capacity of the Binary Deletion Channel
Abstract
We present novel bounds on the capacity of the independent and identically distributed binary deletion channel. Four upper bounds are obtained by providing the transmitter and the receiver with genieaided information on suitablydefined random processes. Since some of the proposed bounds involve infinite series, we also introduce provable inequalities that lead to more manageable results. For most values of the deletion probability, these bounds improve the existing ones and significantly narrow the gap with the available lower bounds. Exploiting the same auxiliary processes, we also derive, as a byproduct, a couple of very simple lower bounds on the channel capacity, which, for low values of the deletion probability, are almost as good as the best existing lower bounds.
Binary deletion channel, channel capacity, capacity bounds.
1 Introduction
We consider a binary deletion channel where each bit in the input sequence gets deleted, independently of the others, with probability , while the nondeleted bits are received without errors and in the correct order. The positions at which the deletions occur are unknown to both the transmitter and the receiver. Formally, let be a sequence of bits at the input of the channel, let be the number of received bits, which is a random variable taking values in according to the realization of the deletion process, and let be the received sequence. The capacity per input bit of this channel, generally referred to as independent and identically distributed (IID) binary deletion channel, is defined as [1]
(1) 
where is the distribution of the input sequence, and is the average mutual information between two random sequences [2]. The capacity (1) is unknown, and only some upper and lower bounds are available in the current literature.
The first lower bound on the capacity of the deletion channel was derived by Gallager in [3], where he proved that, for , the capacity of interest is at least equal to that of a binary symmetric channel with bitflipping probability . A number of lower bounds have since been proposed (see [4], [5], and references therein), among which the best bounds that we are aware of are the ones presented in [4] and [5]. In particular, the latter bound outperforms the former when , that is, for all values of for which the authors of [5] could run the required computations whose execution time grows quickly as increases. Throughout the paper, the reference lower bound will thus be the one in [5] for and the one in [4] for .
Only a few upper bounds have been derived on the capacity of the IID deletion channel. A simple upper bound is given by the capacity of an IID erasure channel with erasure probability , since the erasure channel is identical to the deletion channel, except that the receiver additionally knows the positions of the deleted bits [2]. A combinatorial bound proposed by Ullman in [6], which was originally derived for particular channels with synchronization errors, had been used for decades as an upper bound for the deletion channel. However, it is not a true upper bound, and it has been recently found to be violated by provable lower bounds on the capacity of the deletion channel [4]. The reason is due to the fact that Ullman focused on systems with null error probability, while the definition of capacity relies on the weaker condition of error probability that can be made arbitrarily low by increasing the length of the codewords [2]. The only nontrivial upper bound that we are aware of is the one presented in [7], which will be adopted here as a reference benchmark.
This paper presents novel upper bounds on the capacity of the IID deletion channel that improve the existing ones for most values of the deletion probability . All upper bounds are computed by considering the capacity of some auxiliary channels obtained by providing genieaided information on suitable random processes related to the deletion process. In particular, we show that, when such auxiliary random processes are revealed to the transmitter and/or the receiver, we obtain memoryless channels whose capacity can be evaluated by means of the BlahutArimoto algorithm (BAA) [8, 9], leading to provable upper bounds on the capacity of interest. Moreover, we show that, based on the introduced auxiliary processes, lower bounds on the capacity of the deletion channel can be derived as well. The obtained lower bounds, yet close to the ones proposed in [4] and [5] for low values of , do not improve them, and will only be considered as byproduct results.
The paper is organized as follows. Section 2 introduces an auxiliary channel based on which we derive three upper bounds on the capacity of the IID deletion channel, which are presented in Sections 3, 4, and 5, respectively. The fourth upper bound, evaluated by exploiting a different auxiliary channel, is introduced in Section 6. The main contributions in upper bounding the capacity of the deletion channel are summarized an discussed in Section 7. Finally, Section 8 introduces a couple of simple lower bounds, while Section 9 gives some concluding remarks.
2 A Useful Auxiliary Channel
Let and be two natural numbers such that , and let us define . We consider a channel for which, at each use, the input consists of a sequence of bits and the output consists of a sequence of bits. The input/output relationship characterizing each channel use is the following: bits are deleted from the input bits, while the remaining bits are received without errors and in the correct order. At each channel use, the deletion pattern, that is, the positions at which the deletions occur, randomly takes on each of the possible realizations with equal probability, and is unknown to both the transmitter and the receiver. Also, deletion patterns in different channel uses are independent, so that the channel is memoryless. As an example, the transition probabilities characterizing the use of the channel are reported in Table 1, for the case and . and denote the input sequence and the output sequence, respectively, while denotes conditional probability.
The capacity per use of the considered auxiliary channel is defined as
(2) 
where is the distribution of the input sequence. Since each channel output is a sequence of bits, the following upper bound holds
(3) 
In some particular cases, it can be shown that achieves the upper bound (3). These cases are listed and briefly discussed in the following.

. All input bits are deleted and no information can be delivered.

. A capacityachieving scheme consists of transmitting, at each channel use, either a sequence of zeros or a sequence of ones, with equal probability and independently of the previous/future transmissions. In this case, for each channel use, the only received bit fully determines the input sequence, irrespectively of the deletion pattern. Formally, adopting the standard notation for the entropy and the conditional entropy [2], we get
which achieves the upper bound (3).

. Since all transmitted bits are correctly received, the capacity is equal to bits per channel use, which is achieved by independent and uniformly distributed (IUD) input bits.
When , we could not find a closedform expression of the capacity . On the other hand, since the auxiliary channel is memoryless and has finite input/output alphabets, its capacity can be numerically evaluated by means of the BAA [8, 9]. To run the BAA, we only need the transition probabilities characterizing the channel, as those reported in Table 1. Hence, in principle, we can compute the capacity based on similar tables, for all desired values of and . Unfortunately, the implementation of the BAA becomes computationally infeasible for large values of — for example, is the largest value that we were able to manage for all possible values of , while is the largest value that we were able to manage for , which will be shown later to be a case of particular interest. Some values of are reported in Table 2, where the results obtained by means of the BAA have a twodigit precision after the decimal point, and are rounded up to the next hundredth since, rigorously, the BAA can underestimate the true capacity if a finite number of iterations are performed [8, 9].
0  
0  1  
0  1  2  
0  1  1.48  3  
0  1  1.35  2.18  4  
0  1  1.30  1.88  2.87  5  
0  1  1.28  1.77  2.43  3.62  6  
0  1  1.26  1.71  2.23  3.04  4.41  7 
In the following, we introduce several lemmas that will be used in the remaining sections to manipulate the capacity of the auxiliary channel when running the BAA seems impossible. Before providing the lemmas, we define
(4) 
so that we can index the capacity of the auxiliary channel either by the number of received bits, using , or by the number of deleted bits, using . The following definitions will also be useful in the remaining sections:
(5)  
(6) 
Note that the coefficients and cannot be negative due to (3).
Lemma 1: For all values of and , the following holds
(7) 
The proof is based on the fact that, when additional information is provided to the transmitter, the capacity of a system cannot decrease [2]. In particular, the capacity cannot decrease if, at each channel use, the transmitter knows one of the positions at which the deletions occur. Clearly, the bit transmitted in that position is irrelevant. Moreover, if the revealed position is chosen according to a uniform distribution on the possible values, the system is characterized by effective input bits, output bits, and IUD deletion patterns, that is, by definition, a system with capacity . Hence, the lemma is proved.
Lemma 2: For all values of and , the following holds
(8) 
The proof that is simply derived from (5) and (7). The remainder of the lemma can then be proved by induction.
Lemma 3: For all values of and all positive values of , the following holds
(9) 
The proof is based on the fact that, when additional information is provided to both the transmitter and the receiver, the capacity of a system cannot decrease [2]. In particular, we consider the information on the binary event “the last bit of the transmitted bits is deleted”, which occurs with probability . When the event occurs, the last transmitted bit is irrelevant and the system is characterized by effective input bits and deletions on IUD positions, that is, the system has capacity . When the event does not occur, the last transmitted bit can be safely sent uncoded, while, for the first transmitted bits, the systems is characterized by effective input bits and deletions on IUD positions, that is, the system has capacity . Hence, the lemma is proved.
Lemma 4: For all values of and all values of , the following holds
(10) 
The lemma is proved after straightforward manipulations of (9) based on (3) and (6).
The lemmas provided hereafter focus on a particular case, that is, the occurrence of exactly one deletion. The reader may skip them without affecting the arguments exploited in Sections 3, 4, 5, and 6. The interest for this case will become evident in Section 7.
Lemma 5: For all values of , the following holds
(11) 
Let us partition the input sequence of bits into subsequences of consecutive bits, and let us assume that both the transmitter and the receiver knows in which of the subsequences the deletion occurs. By definition, this subsequence has capacity , while the remaining subsequences have capacity . Hence, since the capacity cannot exceed that of the described genieaided system, the lemma is proved.
Lemma 6: For all values of , the following holds
(12) 
Lemma 7: For all values of , the following holds
(13) 
where is the binary entropy function [2].
To prove the lemma, we first notice that the equation
holds irrespectively of the definition of the random processes , , and [2]. Moreover, since cannot be larger than the entropy of the process , we can write
(14) 
In particular, let and be, respectively, the input sequence and the output sequence of the auxiliary channel considered in this section, when the input sequence includes bits and exactly one deletion occurs. Also, let be the binary event “the last bit of the transmitted bits is deleted”, whose entropy is . Under these definitions, the inequality
(15) 
follows from (14). Note that the first term at the righthand side of (15) is the capacity of a channel identical to the considered one, when the receiver is provided with side information on the event , while the transmitter is not. According to the dataprocessing inequality [2], the capacity of this genieaided system does not increase if, when the event occurs, the receiver deletes one of the received bits, selected with equal probability over the received bits. In this case, the channel consists of two independent subchannels: the former is characterized by input bits and one deletion on IUD positions, and thus has capacity , while the latter is an erasure channel with erasure probability , and thus has capacity . Hence, we can write
which, combined with (15), proves the lemma.
Lemma 8: The following holds
(16) 
To prove the lemma, we first notice that the inequality
(17) 
directly follows from (13) by definition (6). Then, according to (10) and (17), we can write
which proves the lemma since both sides tend to one as tends to infinity.
Lemma 9: The following holds
(18) 
To prove the lemma, we first notice that the inequalities
(19) 
follow from (9) and (13) after simple manipulations. The lefthand side in (19) clearly tends to one as tends to infinity. Then, we notice that the limit
follows from the fact that the binary channel with one deletion tends to the binary identity channel, whose capacity per input bit is one, as the length of the input sequence tends to infinity. Hence, the righthand side in (19) tends to one as tends to infinity, and the lemma is proved. Note that (18) implies (16), but is stronger.
3 The First Upper Bound
In this section, we derive an upper bound on by providing side information on a random process , defined in the following. Let be a nonnegative integer parameter and let us assume that the total number of deleted bits is a multiple of , so that is an integer — this assumption does not affect the capacity evaluation, where the limit is to be considered. We define such that is equal to the position in the transmitted sequence of the th deleted bit and, for each value of in , is equal to the difference between the position in the transmitted sequence of the th deleted bit and that of the th deleted bit. An example is depicted in Fig. 1 and discussed in the related caption. Given the assumption of IID deletions, the process is IID too, and each element of takes on the value with probability
(20) 
according to the Pascal distribution [10], for all values of such that . To point out various similarities between the bounds presented in this paper, it is useful to define, for and , the terms
(21)  
(22) 
so that we get
(23) 
The realizations of the process are actually unknown to both the transmitter and the receiver. Hence, an upper bound on the capacity of the deletion channel can be obtained by providing them with genieaided information on . We will refer to the capacity per input bit of this genieaided system as . With this side information, we have blocks that do not interfere with each other, where the th block has input bits, of which get deleted. The last input bit of each block is irrelevant, since both the transmitter and the receiver know that it gets deleted. The th block is thus characterized by effective input bits and deletions on IUD positions, so that the related capacity is , as defined in Section 2. Hence, defining the expectation operator and considering that
by the law of large numbers [10], we get
where the last equality follows from the law of large numbers. Finally, by exploiting the properties of the Pascal distribution [10], the upper bound yields
which can be also written as
(24)  
Since the coefficients cannot be negative, the bound (24) is at least as good as the trivial bound . In particular, by combining Lemma 4 with the available outcomes of the BAA, it can be proved that the bound (24) equals when , otherwise it is strictly better.
Unless , it seems infeasible to evaluate the coefficients for all values of required in (24).
Let us assume that we know the coefficients for all values of such that , but not for larger values of — in particular, we have .
In this case, we can exploit the inequality in (10) to manipulate the coefficients for .
The obtained results are reported in Fig. 2, for all values of in and .
The resulting bounds, referred to as , are actually larger than the capacity in (24), because of the use of (10) for .
Hence, the reported curves can be improved when an inequality tighter than (10) is exploited to manipulate the coefficients for .
In Fig. 2, the upper bound proposed in [7] and the lower bounds proposed in [4] and [5], which are the best existing bounds that we are aware of, are also reported for comparison.
4 The Second Upper Bound
In this section, we derive an upper bound on by providing side information on a random process , defined in the following. Let be a nonnegative integer parameter and let us assume that the number of bits at the output of the deletion channel is a multiple of , so that is an integer — this assumption does not affect the capacity evaluation, as in the previous case. We define such that is equal to the position in the transmitted sequence of the th received bit and, for each value of in , is equal to the difference between the position in the transmitted sequence of the th received bit and that of the th received bit. An example is depicted in Fig. 3 and discussed in the related caption. Given the assumption of IID deletions, the process is IID too, and each element of takes on the value with probability
(25) 
according to the Pascal distribution [10], for all values of such that .
As in the previous case, an upper bound on the capacity of the deletion channel can be obtained by providing the transmitter and the receiver with genieaided information on the realizations of . We will refer to the capacity per input bit of this genieaided system as . Similarly to the previous case, we have blocks that do not interfere with each other, the th block having input bits and output bits. The last input bit of each block can be safely sent uncoded, since both the transmitter and the receiver know that it is correctly received. Hence, following the same arguments as in the previous section, we get
Finally, by exploiting (25) and the properties of the Pascal distribution, the upper bound yields
which can be also written as
(26)  
Since the coefficients cannot be negative, the bound (26) is at least as good as the trivial bound . In particular, by combining Lemma 2 with the available outcomes of the BAA, it can be proved that the bound (26) equals when , otherwise it is strictly better.
When , it seems infeasible to evaluate the coefficients for all values of required in (26). Let us assume that we know the coefficients for all values of such that , but not for larger values of — in particular, we have . In this case, we can exploit (8) to manipulate the coefficients for , obtaining
(27) 
after a few straightforward manipulations — the bound is referred to as because it is actually larger than the capacity in (26). The obtained results are reported in Fig. 4, for all values of in and . Clearly, such curves can be improved when an inequality tighter than (8) is exploited to manipulate the coefficients for . In Fig. 4, the upper bound proposed in [7] and the lower bounds proposed in [4] and [5] are also reported for comparison. We point out that the upper bound improves the upper bound presented in [7] for most values of , in particular when , and, for large values of , the gap from the best lower bound is now roughly halved.
5 The Third Upper Bound
In this section, we derive an upper bound on by providing side information on a random process , defined in the following. Let be a positive integer parameter, based on which we partition the input sequence into subsequences of consecutive bits. Formally, we define
For example, when , we have , , , and so on. We assume that is a multiple of , and thus that there are exactly subsequences — this assumption does not affect the capacity evaluation, as in the previous cases. We then partition the output sequence into subsequences , where, for each value of in , includes the received bits related to the input subsequence . Finally, we define the random process such that, for each value of in , denotes the number of bits in the subsequence . An example is depicted in Fig. 5 and discussed in the related caption. Given the assumption of IID deletions, the process is IID too, and each element of takes on the value in with probability , according to the binomial distribution.
As in the previous cases, an upper bound on the capacity of the deletion channel can be obtained by providing the transmitter and the receiver with genieaided information on the realizations of . We will refer to the capacity per input bit of this genieaided system as . Similarly to the previous cases, we have blocks that do not interfere with each other, the th block having input bits and output bits. Hence, using similar arguments as in the previous sections, we get
which can be also written as
(28)  
Hence, since the coefficients cannot be negative, the bound (28) is at least as good as the trivial bound . In particular, by combining Lemma 2 with the available outcomes of the BAA, it can be proved that the bound (28) equals when , otherwise it is strictly better. Note that, unlike the previous cases, the bound does not involve an infinite series.
The upper bound (28) is plotted in Fig. 6, together with the upper bound proposed in [7] and the lower bounds proposed in [4] and [5]. For each value of for which we could run the BAA, the bound improves as increases — we conjecture that this behavior holds for any value of (see Section 7). Note that the considered approach significantly improves the bound presented in [7] for most values of the deletion probability , in particular when .
6 The Fourth Upper Bound
Given any positive value of the integer parameter , we can define a system identical to the deletion channel, in which the receiver knows the realizations of the process defined in the previous section, while the transmitter does not. In this case, it is useful to think of the system as if there were a “parallel” channel that provides the sequence to the receiver. The capacity per input bit of this system, which will be denoted by , is definitely an upper bound on the capacity (1), since, when the parallel output is neglected, the original deletion channel is obtained. Moreover, the upper bound cannot be larger than for the same value of , since the system with capacity reduces to the system with capacity when the transmitter neglects the side information on the process .
As for the system considered in the previous section, we have blocks that do not interfere with each other, so that a discrete memoryless channel results. For each use of this channel, we still have an input sequence of bits and, with probability , an output sequence of bits, but now the value of is unknown to the transmitter. Hence, all transmitted sequences must be taken from the same distribution, and no longer from a distribution matched to the number of deletions in the current channel use. Consequently, the results related to the auxiliary channel introduced in Section 2 cannot be exploited here. Formally, we get
(29)  
When , this auxiliary channel reduces to the erasure channel, so that . In any other case, we could not find a closedform expression of , and still resorted to the BAA. To run the BAA, we need the transition probabilities characterizing the channel, as those reported in Table 3 for the case . We point out that, unlike the auxiliary channel considered in Section 2, the transition probabilities now depend on the value of , so that the BAA must be run for each value of the deletion probability.
The upper bounds and are compared in Fig. 7 for three different values of — in both cases, is the largest value for which we could run the BAA. We point out that the difference between the two bounds, yet is rigorously tighter for each value of , tends to vanish as increases. This is due to the fact that, for large values of , the number of deletions for every transmitted bits is very likely to be close to , so that the advantage of knowing the actual number of such deletions (as it happens to the transmitter for the system with capacity ) tends to vanish. As for the bound , for each value of for which we could run the BAA, the bound improves as increases, and we conjecture that this behavior holds for any value of (see Section 7).
7 Discussions on the Proposed Upper Bounds
In Table 4, we report a comparison between the best upper bounds found in this paper, that is, with for and with for , and the existing upper bounds that we are aware of. We remark that the proposed approaches lead to a new stateoftheart upper bound on the capacity of the deletion channel for most values of , as evident from the table (where the best values are shown in bold face).
Erasurechannel bound  Bound from [7]  Proposed bound  

0.01  0.990  not given in [7]  0.963 
0.02  0.980  not given in [7]  0.926 
0.03  0.970  not given in [7]  0.891 
0.04  0.960  not given in [7]  0.858 
0.05  0.950  0.816  0.826 
0.10  0.900  0.704  0.689 
0.15  0.850  0.619  0.579 
0.20  0.800  0.551  0.491 
0.25  0.750  0.494  0.420 
0.30  0.700  0.447  0.362 
0.35  0.650  0.406  0.315 
0.40  0.600  0.371  0.275 
0.45  0.550  0.340  0.241 
0.50  0.500  0.311  0.212 
0.55  0.450  0.284  0.187 
0.60  0.400  0.258  0.165 
0.65  0.350  0.233  0.144 
0.70  0.300  0.208  0.126 
0.75  0.250  0.183  0.108 
0.80  0.200  0.157  0.091 
0.85  0.150  0.130  0.073 
0.90  0.100  0.100  0.049 
0.95  0.050  0.064  0.025 
0.96  0.040  not given in [7]  0.020 
0.97  0.030  not given in [7]  0.015 
0.98  0.020  not given in [7]  0.010 
0.99  0.010  not given in [7]  0.005 
We believe that the values reported in Table 4 could be improved if it were possible to run the BAA for longer sequences. In particular, our conjecture is formalized in the following.
These conjectures are based on the amount of genieaided information, that is, the entropy per input bit of the revealed processes. The idea is that the lower the entropy per input bit of the revealed information, the tighter the upper bound. For example, let us consider the bound : if we reveal the position of one deletion every 100, we expect a tighter bound than if we reveal the position of one deletion every 3. Unfortunately, we could not completely prove the conjectures listed above, but we were able to derive closely related results. For example, we can prove that does not increase when is replaced by any positive multiple of . It is sufficient to note that, when , carries the same information as when (), plus some additional information. Hence, we get
which, according (29), proves that does not increase when is replaced by .
We now discuss the behavior of the proposed upper bounds for limiting values of , that is, and . In particular, after straightforward manipulations, the following results can be obtained
(30)  
(31)  
(32) 
which are valid for any finite value of , , and . The limits reported above are the only ones leading to closedform expressions that do not reduce to the trivial erasurechannel bound.
The limit for small values of is determined by the coefficient , some values of which are reported in Table 5 — note that the coefficients in (30) and (31) are identical, except for the name of the parameters. The best value that we have found so far is
(33) 
obtained when . Other than the erasurechannel bound, we are not aware of any upper bound that leads to closedform limiting expressions comparable with the reported one. We believe that (33) could be improved if it were possible to run the BAA for longer sequences, as formalized in the following.
10  11  12  13  14  15  16  17  18  19  20  21  22  

2.08  2.21  2.33  2.44  2.55  2.64  2.73  2.82  2.90  2.98  3.05  3.12  3.19 
Conjecture 2: For all values of , the following holds
(34) 
We wish to prove this conjecture since it would imply that the asymptotic upper bound (31) does not worsen as increases. Additionally, a strict inequality in (34), which holds for all available outcomes of the BAA, would imply that the asymptotic upper bound (31) improves as increases. Lemma 6 gives a partial proof of (34). We point out that the limiting value (31) may not be limited, since (17) does not satisfy any convergence criterion [11].
The limit for large values of leads to similar considerations. In particular, the best value that we have found so far is
(35) 
obtained by (32) when and . Note that, according to (8), the reported value could be improved by running the BAA for longer sequences, which unfortunately seems infeasible. We point out that (35) improves the limiting upper bound
derived in [7], and closes the gap from the limiting lower bound
derived [12].
8 Two Simple Lower Bounds
In this section, we derive lower bounds on by exploiting the random process defined in Section 5. For any input distribution , the following equation holds
(36) 
by definition [2]. Moreover, since cannot be larger than the entropy of the process , we can write
(37) 
from which we get the following lower bound on the capacity of the deletion channel
(38) 
If we consider the process defined before, following the arguments given for the derivation of (29), we obtain
so that (38) can be written as
(39) 
In Fig. 8, the lower bound (39) is compared with the best lower bound available in the literature, namely the one from [4] or the one from [5], depending on the value of (see Section 1). For the computation of (39), two different input distributions have been considered, that is, the distribution that maximizes , which was considered in the previous section to derive , and IUD input bits. Note that the difference between the curve related to the optimized input distribution and that related to IUD input bits is not significant for low values of , which is compliant with the fact that IUD input bits are optimal when . Interestingly, for low values of , both distributions lead to a lower bound roughly as good as the reference benchmarks, as evident from Table 6 (where the best values are shown in bold face).
Bound from [3]  Bound from [5]  Bound (39), , optimized input  Bound (39), , IUD input  

0.01  0.919  not given in [5]  0.921  0.921 
0.02  0.858  not given in [5]  0.862  0.862 
0.03  0.805  not given in [5]  0.811  0.811 
0.04  0.757  not given in [5]  0.766  0.765 
0.05  0.713  0.728  0.724  0.722 
0.10  0.531  0.562  0.555  0.546 
9 Conclusions
We have presented novel upper bounds on the capacity of the IID binary deletion channel. All bounds have been obtained by revealing side information on suitable random processes, and by computing the capacity of the resulting genieaided systems. The proposed approaches lead to a new stateoftheart upper bound for most values of the deletion probability , and provide novel insights on the channel capacity in the limiting scenarios and . As a byproduct of our approach, we have also presented simple lower bounds, which turn out not to improve the existing ones.
Footnotes
References
 R. L. Dobrushin, “Shannon’s theorems for channels with synchronization errors,” Problems of Information Transmission, vol. 3, no. 4, pp. 11–26, 1967.
 T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: John Wiley & Sons, Inc., 1991.
 R. Gallager, “Sequential decoding for binary channels with noise and synchronization errors,” tech. rep., Lincoln Lab. Group Report, Oct. 1961.
 E. Drinea and M. Mitzenmacher, “Improved lower bounds for the capacity of i.i.d. deletion and duplication channels,” IEEE Trans. Inform. Theory, vol. 53, pp. 2693–2714, Aug. 2007.
 E. Drinea and A. Kirsch, “Directly lower bounding the information capacity for channels with I.I.D. deletions and duplications,” in Proc. IEEE International Symposium on Information Theory, pp. 1731–1735, 2007.
 J. D. Ullman, “On the capabilites of codes to correct synchronization errors,” IEEE Trans. Inform. Theory, vol. 13, pp. 95–105, Jan. 1967.
 S. Diggavi, M. Mitzenmacher, and H. D. Pfister, “Capacity upper bounds for the deletion channels,” in Proc. IEEE International Symposium on Information Theory, pp. 1716–1720, 2007.
 R. E. Blahut, “Computation of channel capacity and rate distortion functions,” IEEE Trans. Inform. Theory, vol. 18, pp. 460–473, Jan. 1972.
 S. Arimoto, “An algorithm for calculating the capacity of an arbitrary discrete memoryless channel,” IEEE Trans. Inform. Theory, vol. 18, pp. 14–20, Jan. 1972.
 A. Papoulis, Probability, Random Variables and Sthocastic Processes. New York, NY: McGrawHill, 1991.
 W. Rudin, Principles of Mathematical Analysis. New York: McGrawHill, 1974.
 M. Mitzenmacher and E. Drinea, “A simple lower bound for the capacity of the deletion channel,” IEEE Trans. Inform. Theory, vol. 52, pp. 4657–4660, Oct. 2006.