Fundamental Limits of Covert Packet Insertion
Abstract
Covert communication conceals the existence of the transmission from a watchful adversary. We consider the fundamental limits for covert communications via packet insertion over packet channels whose packet timings are governed by a renewal process of rate . Authorized transmitter Jack sends packets to authorized receiver Steve, and covert transmitter Alice wishes to transmit packets to covert receiver Bob without being detected by watchful adversary Willie. Willie cannot authenticate the source of the packets. Hence, he looks for statistical anomalies in the packet stream from Jack to Steve to attempt detection of unauthorized packet insertion. First, we consider a special case where the packet timings are governed by a Poisson process and we show that Alice can covertly insert packets for Bob in a time interval of length ; conversely, if Alice inserts , she will be detected by Willie with high probability. Then, we extend our results to general renewal channels and show that in a stream of packets transmitted by Jack, Alice can covertly insert packets; if she inserts packets, she will be detected by Willie with high probability.
ptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptptpt
Fundamental Limits of Covert Packet Insertion
Ramin Soltani^{∗}, Dennis Goeckel^{∗}, Don Towsley^{†}, and Amir Houmansadr^{†}
^{∗}Electrical and Computer Engineering Department, University of Massachusetts, Amherst, {soltani, goeckel}@ecs.umass.edu
^{†}College of Information and Computer Sciences, University of Massachusetts, Amherst, {towsley, amir}@cs.umass.edu
^{0}^{0}footnotetext: This work has been supported by the National Science Foundation under grants ECCS1309573 and CNS1525642, and appeared, in part, at the Allerton Conference on Communications, Control, and Computing in 2015 [1] and 2016 [2].^{0}^{0}footnotetext: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
Keywords: Covert Packet Insertion, Covert Packet Communication, Covert Wired Communication, Covert Channel, Low Probability of Detection, LPD, Network Security, Information Theory.
I Introduction
P RIVACY and security have become crucial issues in daily life as the use of communication systems has increased (e.g. telephone, email, social media) [3, 4, 5, takbiri2018asymptotichadian2018privacy, 6]. Information theoretic secrecy [7] and encryption [8] protect the secrecy of message contents; however, these techniques do not satisfy the security and privacy requirements of users in many scenarios. Recently, the need for another level of secrecy was highlighted by the Snowden disclosures [9]: users of a communication system often need not only secrecy for the contents of their messages, but also for hiding the existence of their communication. As a solution, covert communication ensures that a watchful adversary is not able to detect whether communication is taking place or not. Two applications of covert communication are the removal of the ability to track daily user activities and to hide the presence of military activities.
Steganography [10] is utilized to covertly embed information into an overt message on a digital (and typically noiseless) channels. Alternatively, spread spectrum methods [11] provide covert communication on noisy channels. Informationtheoretic limits of covert communications only recently gained attention first with the study of additive white Gaussian (AWGN) channels [12, 13], which was later extended to provide a comprehensive characterization of the limits of covert communication over discrete memoryless channels (DMCs), optical channels, and AWGN channels [14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26].
In this paper, we extend the work in [14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26] to packet processes typical of wired computer networks. In computer networks, covert channels can be divided into two major categories [27]: covert storage channels and covert timing channels. A covert storage channel involves the writing of a shared storage location by one process and reading of it by another; e.g. modifying headers of packets [28, 29, 30, 31]. Alternatively, a covert timing channel involves the exchange of information between two users by manipulation of timings of some shared resources; e.g. embedding information packet timings first explored by Girling [32] and later studied by many others [33, 34, 35, 36, 37, 38, 39, 40, 41]. This includes applications of covert channels in TCP/IP![31, 42, 43], VoIP [44], LTEA [45], BitTorrent [46], and establishment of a covert communication over IPV4 [33, 47] and IPV6 [48] have been studied.
Considerable work has focused on detection of covert channels [34, 49, 50, 51, 52, 49, 53, 54] as well as eluding detection by leveraging the statistical properties of the legitimate channel [55]. Moreover, significant research has been performed on quantifying and optimizing the capacity of covert channels [56, 57, 58, 34, 59, 60, 42, 61, 62] by leveraging informationtheoretic analysis and the use of various coding techniques [63, 64, 40]. In particular, Anantharam and Verdu [65] derived the Shannon capacity of the timing channel with a singleserver queue, and Dunn [66] analyzed the secrecy capacity of such a system.
Per above, here we take a fundamental approach analogous to [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], but turn our attention to covert communication over wired channels in which communication takes place through packet transmissions. Specifically, we consider the scenario shown in Fig. 1. Authorized transmitter Jack sends packets to authorized receiver Steve. Assume that Alice wishes to transmit data covertly to Bob on this channel in the presence of an adversary, Willie, who is monitoring the channel. Willie can be in one of the two locations, either between Alice and Bob (Setting 1), or between Bob and Steve (Setting 2). Alice and Bob know that Willie is located at one or the other of these two places; however, they do not know which place he is located at. We assume Willie cannot authenticate the source of the packets (e.g., whether they are sent by Jack or not). However, he knows the statistical model of the timings of the packets transmitted by Jack. Alice can buffer and release Jack’s packets and insert her own packets. Also, Bob can authenticate packets, remove the ones originally inserted by Alice, and buffer and release Jack’s packets. We assume Alice can only send information to Bob by inserting her own packets into the channel, since she is not allowed to share a secret codebook with Bob and thus she is not able to send covert messages to Bob via packet timings; i.e., altering the timing of the packets according to a shared codebook, to embed information in interpacket delays (IPDs) [65]. In addition, transmission of information through packet timings is sensitive to natural network noise and thus is not applicable in scenarios where timing noise alters the transmitted packet timings (codeword) such that the receiver is not able to decode the message with the required decoding error (e.g. complex channels where the packet streams are first mixed and then separated). We answer this question: how many packets can Alice transmit to Bob without being detected by Willie?
We consider two statistical models for the timing process of Jack’s transmitted packets. First, we analyze a Poisson channel (Assumption 1) [1]; i.e., IPDs of Jack’s transmitted stream are modeled by independent and identically distributed (i.i.d.) exponential random variables with mean , and Willie is aware of this. Therefore, Willie seeks to verify whether the packet process has the proper characteristics. We exploit the fact that the superposition of two independent Poisson processes is a Poisson process: Alice generates a Poisson process of low enough rate and uses it to govern the times at which she inserts the covert packets into the JacktoSteve channel. We assume Willie is aware of Alice’s transmission strategy (insertion scheme, rate, etc.) as well as what Bob can do, if they choose to communicate with each other.
Covertness as defined formally in Section II requires that Willie’s decision on whether Alice transmits or not be arbitrarily close to random guessing. In Theorem 1, we show that Alice can transmit packets covertly to Bob in a time interval of length . Conversely, we prove that if Alice transmits packets during a time interval of length , she will be detected by Willie with high probability.
Next, we extend the Poisson channel to a renewal channel [2] (Assumption 2), where the timings of Jack’s transmitted packets are modeled by a renewal process; i.e., IPDs of Jack’s transmitted stream are modeled by i.i.d. random variables with probability density function (pdf) and transmission rate packets per second, and Willie is aware of these characteristics. Therefore, Willie seeks to verify whether the packet process has the proper properties. Since the superposition of two independent renewal processes is a not generally a renewal process, we use a technique different from the one employed in the Poisson channel.
The remainder of the paper is organized as follows. In Section II, we present the system model and definitions employed. We provide constructions and their analysis for the Poisson channel in Section III, and we analyze the renewal channel in Section IV. Section V contains the discussion of the results, and Section VII summarizes our results.
Ii System Model and Definitions
Iia System Model
As shown in Fig. 1, Jack transmits packets to Steve while a watchful warden Willie observes the packet flow from his vantage point. Willie does not have access to the contents of the packets and therefore cannot authenticate whether a packet is originally transmitted by Jack, or generated and inserted by Alice. Instead, based on the timings of the packets, Willie attempts to discern any irregularities that might indicate that someone is inserting packets into the channel. Alice’s goal is to insert her own packets in the stream of the packets sent by Jack so as to communicate covertly with Bob. Willie’s location is fixed; he is either between Alice and Bob (Setting 1 shown in Fig. 0(a)), or he is between Bob and Steve (Setting 2 shown in Fig. 0(b)), and Alice and Bob are unaware of his location.
Alice communicates with Bob by sending her packets into the channel, but Alice and Bob do not share a secret, thus preventing the distribution of a secret codebook to communicate via packet timings [65, 1, 2]. Alice can also buffer and release Jack’s transmitted packets. Bob can authenticate, receive and remove packets originally inserted by another party. He is also allowed to buffer and release Jack’s transmitted packets. We assume Willie knows the characteristics of Alice’s potential insertion scheme (rate, method of insertion, etc.) and Bob’s capabilities. We denote the IPDs of the packets departing Jack, Alice, and Bob by , , and , respectively.
We consider two sets of assumptions regarding the timing process of Jack’s packets:
Assumption 1 (Poisson channel model) Transmission times for the packets generated by Jack are modeled by a Poisson process with parameter ; i.e., IPDs of Jack’s transmitted stream are i.i.d. random variables with pdf , and Jack’s packet transmission rate is known to both Alice and Willie.
Assumption 2 (Renewal channel model): Transmission times of the packets transmitted by Jack are modeled by a renewal process; i.e., IPDs of Jack’s transmitted stream are positive i.i.d. random variables with pdf and Jack’s transmission rate is . Both Willie and Alice know and .
When IPDs are samples of and modeled by a renewal process, the arrival times are , where
(1) 
and the total number of arrivals within the interval is . Observe:
(2) 
For a Poisson process, (), we omit the subscripts of and .
IiB Definitions
Willie is faced with a binary hypothesis test: the null hypothesis corresponds to the case that Alice does not transmit, and the alternative hypothesis corresponds to the case that Alice transmits. We denote the distributions of IPDs that Willie observes by and under and , respectively.
We denote by the probability of rejecting when it is true (type I error or false alarm), and the probability of rejecting when it is true (type II error or missed detection). Willie uses classical hypothesis testing and seeks to minimize .
Similar to the definition of covertness in [16, 17, 1, 2, 18, 67, 68], and invisibility in [69, 70], we define covertness:
Definition 1.
(Covertness) Alice and Bob’s communication is covert, if and only if Willie’s sum of probabilities of error is lower bounded by for any [13], for Willie in each of his possible locations.
We present results under the assumption that . However, this results in covertness for the general case [71, Appendix A].
We use standard “Big O”,“Little Omega”, and “Big Theta” notations [72].
Iii Poisson Channels (Assumption 1)
In this section, we consider the fundamental limits of covert packet insertion for the Poisson channel (Assumption 1). As evident from the proof, the possibility that Willie is located after Bob (Setting 2) is trivially addressed under Assumption 1. We will see this is not the case for the renewal channel model considered in Section IV.
Theorem 1.
In a Poisson channel with rate , Alice can covertly insert insert packets in a time interval of length . Conversely, if Alice attempts to insert packets in a time interval of length , there exists a detector that Willie can use to detect her with arbitrarily low sum of error probabilities .
Proof.
(Achievability)
Construction: Alice generates a Poisson process with parameter independent of the timings of Jack’s packets and, at each point of the process, inserts her own packet into the channel between Jack and Steve. Bob collects and removes the packets inserted by Alice.
Analysis: (Covertness) First, for each of the two possible locations of Willie, we show that communication is covert. Next, we calculate the number of covert packets transmitted by Alice.
(Setting 1Willie is between Alice and Bob, as show in Fig. 0(a)): Willie observes the packets on the channel between Alice and Bob and decides whether Alice has inserted packets intended for Bob () or not (). Note that and correspond to Poisson processes with rates and respectively. By the NeymanPearson lemma [73, Ch. 3.2 and 13.1], an optimal hypothesis test that minimizes the sum of error probabilities is the likelihood ratio (LRT) between the null hypothesis and the alternative hypothesis is given by [74, Ch. 3.5.2]:
(3) 
where is the number of packets that Willie observes in , is the probability mass function (pmf) of the number of packets that Willie observes under the null hypothesis corresponding to a Poisson process with rate , and is the pmf for the number of packets that Willie observes under hypothesis corresponding to a Poisson process with rate . Suppose Alice sets
(4) 
By (III), we can see that the number of packets observed during the time interval of length is a sufficient statistic by which Willie can perform the optimal hypothesis test to decide whether Alice transmits or not. For any test on the number of packets during time [13],
(5) 
where is the relative entropy between and . Next, we show how Alice can lower bound the sum of average error probabilities by upper bounding . For the given and the relative entropy is [75]
where the second to last step is true because for , and the last step is due to the definition of given in (III). Consequently, , and thus for .
(Setting 2Willie is between Bob and Steve, as show in Fig. 0(b)): Willie observes the packets transmitted by Bob. Since Alice inserts her own packets independent of the channel, her insertion does not change the timing of Jack’s packets. Since Bob removes Alice’s inserted packets, Willie observes the original timings of the packets transmitted by Jack, and thus Alice and Bob’s communication is covert.
(Number of Covert Packets) Alice inserts packets according to a Poisson process with rate . Let denote the time that Alice inserts the packet, and denote the number of packets inserted by Alice. We focus on . By (IIA),
where the s are i.i.d. exponentially distributed IPDs with mean , which goes to infinity as . We introduce ; is a sequence of i.i.d. exponential random variables with finite mean and variance . Consider
(6) 
where the last step follows from the weak law of large numbers (WLLN) which yields . By (6), , as , for all . Consequently, Alice can insert packets covertly.
(Converse) To establish the converse, we provide an explicit detector for Willie that is sufficient to limit Alice’s throughput across all potential transmission schemes (i.e., not necessarily insertion according to a Poisson process). Suppose that Willie observes a time interval of length and wishes to detect whether Alice transmits or not. Since he knows that the packet arrival process for the link between Jack and Steve is a Poisson process with parameter , he knows the expected number of packets in an interval . Therefore, he counts the number of packets in this interval and performs a hypothesis test by setting a threshold and compares to . If , Willie decides ; otherwise, he decides . Consider ,
(7) 
When is true, Willie observes a Poisson process with parameter ; hence,
Therefore, applying Chebyshev’s inequality on (7) yields . Thus, , if Willie sets , he can achieve
Next, we will show that if Alice inserts packets, she will be detected by Willie with high probability. Consider :
(8)  
where is the number of packets inserted by Alice and is the number of packets inserted by Jack. We show in the Appendix A that for all ,
(9) 
Since and are arbitrary, is arbitrarily small whenever . ∎
Iv Renewal Channels (Assumption 2)
The packet arrival processes measured in many networks demonstrate nonPoisson behavior. Hence, in this section, we extend our results from Section III to the general renewal channel. Per Section II, we assume that the IPDs of Jack’s transmitted stream are i.i.d. with pdf ; thus, Jack’s transmission rate is .
For Poisson channels, we took advantage of the fact that the superposition of two independent Poisson processes is a Poisson process. However, the superposition of two independent renewal processes is not necessarily a renewal process. Therefore, if Alice inserts her packets in the channel according to a renewal process, since the packet timings that Willie observes under () is not a necessarily a renewal process, the derivation of and the calculation of the relative entropy between and , which is required in the covertness analysis becomes challenging. Note that there is no special class of renewal processes (except Poisson processes) that makes the calculation easier; if the superposition of two ordinary renewal processes is an ordinary renewal process, then those processes are either Poisson [76, 77] or binomiallike processes [77], which are not applicable to our scenarios. Therefore, we employ an alternative technique for Alice’s insertion of packets.
In [2], we employed the following technique: Alice and Bob employ a twophase scheme. In the first phase, Alice (slightly) slows down the packet stream to buffer packets. In the second phase, she generates a renewal process with a rate higher than Jack’s transmission rate. For each packet transmission during the second phase, Alice flips an unfair coin to decide whether to send one of her packets or one of Jack’s packets. Although this technique is reasonable and its covertness analysis is accurate, the reliability analysis in [2] relied on the approximation that a regular random walk can model Alice’s buffer length in the second phase, which is not strictly true. Besides, it did not allow for the case where Willie is between Bob and Steve (Setting 2) in the covertness analysis. We can employ [78, Theorem 9.1] which is also mentioned in [79, Theorem 4] to relax the approximation in the reliability analysis.
Here, we introduce another strategy that allows for accurate analysis. Alice and Bob employ a twophase scheme. In the first phase, Bob transmits Jack’s packets at a rate (slightly) smaller than Jack’s packet rate so as to build up a backlog of packets in his buffer. In this phase, Alice remains idle except for calculating by simulating Bob’s buffering process. In the second phase, Alice replaces of Jack’s packets with packets of her own and Bob replaces Alice’s inserted packets with packets in his buffer. The second phase ends when the total number of (Alice’s and Jack’s) packets transmitted by Alice is .
In Lemma 1, we derive the number of packets that Bob can buffer when the total number of packets that Bob transmits is . Consider which is the scaled version of , where . Since , the renewal processes whose interarrival timings are governed by has a smaller rate than that of . Lemma 1 requires that satisfies the following conditions [80, Ch. 2.6] which are mentioned in [81, Theorem 1] as regularity conditions for maximum likelihood estimators with :
(10)  
(11)  
(12) 
Among the probability distributions that satisfy conditions (10)(12) are the generalized gamma distribution and its special cases: exponential distribution, Chisquared distribution, Rayleigh distribution, Weibull distribution, Gamma distribution, and Erlang distribution.
We require that the support of be because 1) IPDs are positive; and 2) among the distributions with nonnegative support, conditions (10)(12) do not satisfy for the distributions whose support is not , such as Pareto distribution, uniform distribution, and Beta distribution. Intuitively, the latter is required since Bob scales up the pdf of IPDs to where is defined later. If the support of is not , then with high probability, the new pdf of the interpacket delays produces an IPD that does not fall in the support of . Hence, Willie will observe an interpacket delay that cannot be generated from , and thus Willie detects Bob’s buffering.
Lemma 1.
Under the conditions given above for [81, Theorem 1] for the renewal process characterizing the packet timings on the link from Jack to Steve, Bob can covertly buffer packets while transmitting of Jack’s packets, as long as satisfies conditions (10)(12). Conversely, if Bob buffers packets while receiving of Jack’s packets, there exists a detector that Willie can use to detect such a buffering with arbitrarily low sum of error probabilities .
Proof.
(Achievability)
Construction: Since Alice does not insert any packets and she only relays Jack’s packets, Bob receives Jack’s packet stream. In this lemma, the term “packet” will refer to Jack’s packets. For a fixed number of packets , Bob scales up the IPDs by where , i.e, if he receives the packet at , he sends it at time , as shown in Fig. 2.
First, we show that Bob can buffer packets, then we demonstrate covertness, and finally in the converse case we show that Bob cannot buffer packets covertly.
Analysis: (Number of Buffered Packets) Bob sets
where and is a constant defined later. Note that the first phase ends at time when Bob transmits the packet. From to , if Bob receives a packet of jack at time , he transmits it at time . Let be the total number of packets received from Jack within the interval , and be the time of arrival of the packet from Jack. The total number of packets that Bob receives from Jack and the total number of packets that Bob buffers are and
respectively.
We show in Appendices B and C, respectively, that
(13)  
(14) 
By (13), for all , , as . Therefore, Bob buffers packets when he transmits of Jack’s packets.
(Covertness) Now, we show that Bob’s buffering is covert. If Willie is between Alice and Bob (Setting 1), he will not observe any changes in packet timings due to Bob’s buffering, and thus the covertness follows immediately. Therefore, we present the analysis for the case where Willie is between Bob and Steve (Setting 2). We assume Willie knows the number of packets being slowed down and the scaling factor that Bob has possibly used. Upon observing the first packets, Willie decides whether Bob has not modified the packet timings (), or he has slowed down those packets . If Willie applies an optimal hypothesis test that minimizes on the IPDs, then arguments similar to those leading to (III) yield:
(15) 
where:
Therefore,
(16) 
Since the regulatory conditions (1012) hold, [80, Ch. 2.6] yields:
(17) 
where is a positive constant derived in Appendix D,
(18) 
Note that depends on . By (16) and (17),
Because , . Thus, by (15), as and Bob covertly buffers packets when he transmits of Jack’s packets.
(Converse) Since Willie knows , he knows the expected sum of the IPDs of packets. Therefore, he calculates the average observed IPD and performs a hypothesis test by setting a threshold and comparing with . If , he decides ; otherwise, he decides . Observe
(19) 
When is true, Willie observes a renewal process with rate , with variance ; hence,
Therefore, applying Chebyshev’s inequality on (19) yields . Therefore, if Willie sets , for any , he achieves .
Next, we will show that if Bob buffers packets, he will be detected by Willie with high probability.
When Bob buffers packets, he will transmit packets during the time that Jack transmits packets. Therefore, . Now, let us consider . When is true, . Thus:
Note that is the sum of i.i.d. random variables with mean and variance . Therefore, the central limit theorem (CLT) yields , where is a Gaussian random variable with mean zero and variance . Therefore, as ,
(20) 
where (20) is true since . Thus, if then . Combined with the results for the probability of false alarm above, if Bob collects packets, Willie can choose a to achieve any (small) and desired. ∎
Next, we leverage the results of Lemma 1 to present and prove the results for packet insertion on a renewal channel. Although Alice and Bob do not know the actual location of Willie, their strategy guarantees covertness irrespective of Willie’s location. Then, we conclude that if he analyzes the whole stream of packets transmitted by Alice, the communication is covert.
Theorem 2.
In a renewal channel whose IPDs have , with conditions (1012) true, Alice can covertly insert packets in a packet stream of length . Conversely, if Alice attempts to insert packets in a packet stream of length , there exists a detector that Willie can use to detect her with arbitrarily low sum of error probabilities .
Proof.
(Achievability)
Construction: Alice and Bob employ a twophase scheme. During the buffering phase, Alice is idle but Bob slows down Jack’s packets to build up packets in his buffer, i.e., if he receives a packet at time , he transmits it at time where
(21) 
and is any constant that satisfies . The first phase ends when Bob transmits the packet of Jack. From Lemma 1, Bob can buffer packets covertly. Alice knows Bob’s buffering process because she knows the timings of packets transmitted by Jack and ; thus, she calculates the number of packets buffered by Bob. In the second phase, Alice replaces of Jack’s packets with packets of her own, and Bob replaces Alice’s packets by Jack’s packets in his buffer. Alice and Bob do this without changing the order of Jack’s packets. Furthermore, Bob delays each packet in the second phase for seconds, where is the time elapsed between the moment that Bob receives the last packet in the first phase until the end of the first phase. We will later explain how this delay makes the pdf of Bob’s first IPD in the second phase equal to .
Since Willie cannot verify the source of the packets, Alice can choose any subset of size of packets transmitted by Jack in the second phase to replace them with her own packets. Here, we propose a scheme where the locations of Alice packets are random. To decide whether to replace a packet, she uses a Bernoulli decision, i.e., each time she receives a packet from Jack, first she generates a random variable according to a Bernoulli distribution with . If she observes “Success”, she replaces the packet; otherwise, she does not. She stops when she replaces the packet. The second phase ends when the total number of (Alice’s and Jack’s) packets transmitted by Alice is . At the end of the second phase, Alice will have of Jack’s packets in her buffer. After the transmission, Alice and Bob will relay Jack’s packets. Alice transmits Jack’s oldest packet in her buffer and stores the newly received pack to keep the packets transmitted by Jack in order, and Bob, whose buffer is empty, forwards Jack’s packets.
Analysis: (Covertness) First, for each of the two possible locations of Willie, we show that communication is covert. Next, we calculate the number of covert packets transmitted by Alice.
(Setting 1Willie is between Alice and Bob, as show in Fig. 0(a) ): Since Alice does not change packet timings and Willie is between Alice and Bob, Willie observes the original packet timings transmitted by Jack and covertness follows immediately.
(Setting 2Willie is between Bob and Steve, as show in Fig. 0(b)): Recall that Willie knows Alice and Bob’s transmission scheme and parameters, the time they start and end each phase, and the scaling factor that Bob has used. We first assume Willie analyzes the packets in the two phases separately and show that the communication is covert. Then, we conclude that if he analyzes the whole stream of packets transmitted by Bob together, the communication is covert.
In the first phase, Bob slows down packets from Jack to buffer packets until he transmits packet of Jack. By Lemma 1, Bob buffers packets, while for all ,
where and are joint pdfs of the IPDs in the first phase, when and are true respectively, and and are the probability of rejecting when it is true and the probability of rejecting when it is true, respectively in the first phase. Thus, Bob’s buffering is covert.
Next, we show that Willie observes Jack’s original IPDs in the second phase. When the first phase ends Bob has buffered packets, transmitted packets, and received packets from Jack. Recall that in the second phase Bob delays each packet seconds, where is the time elapsed between the moment that Bob receives the last packet in the first phase () until the end of the first phase (), i.e., . Denote by the time elapsed between end of the first phase () and the moment that Bob receives the first packet in the second phase (), i.e.,
Since Bob delays the packets in the second phase seconds, Bob’s first IPD in the second phase will be
which is Jack’s original IPD, and thus has the pdf (see Fig 3). Since all other IPDs in the second phase are also Jack’s original IPDs, Willie observes the original IPDs transmitted by Jack and thus the covertness follows immediately for the second phase. Denote by and the joint pdfs of the IPDs in the second phase, when and are true, respectively, and by and the probability of rejecting when it is true and the probability of rejecting when it is true, respectively, in the second phase. Thus,
(22) 
Combined with the results of covertness for the first phase, if Willie analyzes the two sequences of packets in the first and second phase separately, the communication is covert, i.e., his sum of error probabilities in each phase is upper bounded by for all .
Now assume that Willie analyzes the entire sequence of packets from the first and second phase together. Since and ,
Consequently, Alice can achieve for all .
(Number of Packets) Recall that the first phase ends when Bob transmits the packet of Jack. Thus, replacing with in (13) and (14) yields and , respectively, as . Recall that is given in (18). Since for all , Alice can insert packets in a packet stream of length .
(Converse) The argument follows analogously to that of the converse in Lemma 1. Suppose that Willie observes packets and wishes to detect whether Alice has done nothing over the channel () or she has inserted packets. He calculates average observed IPD and sets a threshold ; if , he decides ; otherwise, he decides . We can show that if Willie sets , for any , he achieves . Willie knows that if Alice chooses to insert packets, she will use the time of transmission of packets from Jack to do so. Therefore, if is true, then . Using this we can show that if Alice inserts packets, then . Thus, if Alice inserts packets, Willie can choose a to achieve any (small) and desired. ∎
V Discussion
Va Alice’s insertion without the buffering phase
In Theorem 2, Alice and Bob use a twophase scheme. However, we could consider a simpler (onephase) scheme. Alice generates a process with a (slightly) higher rate by generating packet transmission events for the following pdf , where
(23) 
and . Note that for large enough . Then:

She buffers every packet she receives.

Every time she generates a packet transmission event, she transmits one of Jack’s packet from her buffer if one is there; if not, she sends a packet of her own.
Although this scheme does not yield an infinite delay for packets unlike Alice and Bob’s twophase scheme in Theorem 2 (see (24)), it does not enable Alice to covertly insert packets; in fact, Alice cannot insert packets.
Theorem 3.
Consider the above scheme. There is no function such that , where is the number of packets that Alice can insert packets intended for Bob in a packet stream of length .
See the proof in Appendix E.
VB Packet delays due to buffering
Our scheme requires Bob to slow down the packet stream to buffer packets, which results in packet delays. According to Lemma 1, since Bob receives the packet at and he sends it at time , Bob causes a delay of for the packet. The average delay for the packets transmitted by Bob goes to as because:
(24) 
Note that each packet is delayed for an amount of time which is proportional to its time of arrival. According to the proof of Lemma 1, this large delay does not help Willie detect Bob’s actions because Willie does not know the original packet timings but instead only knows the statistical properties of them, which change only slightly.
VC Higher throughput via timing channel and bit insertion
In this paper, Alice is allowed to buffer packets transmitted by Jack and release them when it is necessary; thus she is able to alter the timings of the packets. This suggests that Alice can also alter the timings of the packets to send information to Bob [65] to achieve a higher throughput for sending covert information. However, this would require Alice and Bob to share a secret key (unknown to adversary Willie) prior to the communication which is not possible in many scenarios. Also, sending the information through IPDs (timing channel) is sensitive to the noise of the timings and thus not applicable in channels with a high level of noise in timings, such as complex channels in which multiple streams of packets are mixed and separated. In addition, a timing channel approach does not work over channels with zero capacity when packet timing is employed (e.g., deterministic queues). However, packet insertion works over such channels. Fig. 4 depicts an example.
If we assume Alice and Bob can share a codebook and the altering of timings in the channel can be modeled by a queue, sending information via packet timing is studied for Poisson packet channels in [1, Theorem 2] and for renewal channels in [2, Theorem 5].
Another way to communicate covertly on a packet channel is bit insertion, where Alice inserts bits in a subset of the packets [71]. This technique requires that packets have available space in their payload and a minimum of one bit in their header. In addition, Alice and Bob need to share a secret prior to the communication. These conditions can be satisfied only in some scenarios such as video streaming applications with variable bit rate codecs.
VD Covertness of scaling up/down the IPDs
In Lemma 1, we showed that when Bob scales down the IPDs such that their pdf becomes , if conditions (1012) hold, Bob’s scaling is covert as long as . Similarly, we can show that if he scales up IPDs such that their pdf becomes , Bob’s scaling is covert as long as and conditions (1012) hold when is replaced with .
VE Packets in Alice’s buffer after the second phase
The construction of Theorems 2 implies that Alice will have packets in her buffer at the end of the second phase. Assuming that Alice will be always on the link, having packets in her buffer does not cause any problems since after Alice and Bob’s communication is done, they will only relay Jack’s packets. Note that Alice can insert the packets in her buffer into the channel according to the timings of a Poisson process with a small rate. The covertness analysis of this scheme is challenging and requires calculating the relative entropy between a renewal process and its superposition with a Poisson process, which is relegated to future work.
Vi Future Work
A key goal is to establish the fundamental limits of packet insertion in channels whose packet timings follow a general point process. We will let Alice insert packets on the channel according to a Poisson process with a small rate, independent of the channel. Then, we plan to employ the results of Girsanov’s theorem to calculate the relative entropy between the point process governing the timings of the packets on the channel and the superposition of the point processes with a Poisson process. Another future work is analyzing covert throughout when the packet timings of the channel follow a Poisson process with a variable rate; in this case, we expect to be able to exploit Willie’s difficulty in estimating the current packet rate under to allow for the insertion of packets.
Vii Conclusion
We present two scenarios for covert communication on a packet channel. In a Poisson channel where packet timings are governed by a Poisson process, Alice inserts her own packets into the channel but does not modify the timing of other packets. We established that Alice can covertly transmit packets to Bob in a time interval of length ; conversely, if Alice inserts packets, she will be detected by Willie. In a renewal channel where the packet timings are governed by a general renewal process, we showed that Alice can covertly insert packets into the channel in a packet stream of length . Conversely, if she inserts packets, she will be detected by Willie with high probability.
A Proof of (9)
Define and . Employing (8) and the law of total probability yields:
(25) 
Consider the first term on the right hand side (RHS) of (25). Substituting the events and yields:
(26) 
where is true since the condition in the probability is . If Alice inserts packets, then (26) yields
Consider the second term on the RHS of (25). From [82, p. 40],
Since is arbitrary, we can choose small enough such that . Thus, if , then for all .