Binary CausalAdversary Channels
Abstract
In this work we consider the communication of information in the presence of a causal adversarial jammer. In the setting under study, a sender wishes to communicate a message to a receiver by transmitting a codeword bitbybit over a communication channel. The adversarial jammer can view the transmitted bits one at a time, and can change up to a fraction of them. However, the decisions of the jammer must be made in an online or causal manner. Namely, for each bit the jammer’s decision on whether to corrupt it or not (and on how to change it) must depend only on for . This is in contrast to the “classical” adversarial jammer which may base its decisions on its complete knowledge of . We present a nontrivial upper bound on the amount of information that can be communicated. We show that the achievable rate can be asymptotically no greater than . Here is the binary entropy function, and equals for , and otherwise.
I Introduction
Consider the following adversarial communication scenario. A sender Alice wishes to transmit a message to a receiver Bob. To do so, Alice encodes into a codeword and transmits it over a binary channel. The codeword is a binary vector of length . However, Calvin, a malicious adversary, can observe and corrupt up to a fraction of the transmitted bits, i.e., bits.
In the classical adversarial channel model, e.g., [4], it is usually assumed that Calvin has full knowledge of the entire codeword , and based on this knowledge (together with the knowledge of the code shared by Alice and Bob) Calvin can maliciously plan what error to impose on . We refer to such an adversary as an omniscient adversary. For binary channels, the optimal rate of communication in the presence of an omniscient adversary has been an open problem in classical coding theory for several decades. The best known lower bound is given by the GilbertVarshamov bound [10, 18], which implies that Alice can transmit at rate to Bob. Conversely, the tightest upper bound was given by McEliece et al. [12], and has a positive gap from the lower bound for all (see Fig. 1).
In this work we initiate the analysis of coding schemes that allow communication against certain adversaries that are weaker than the omniscient adversary. We consider adversaries that behave in a causal or online manner. Namely, for each bit , we assume that Calvin decides whether to change it or not (and if so, how to change it) based on the bits , for alone, i.e., the bits that he has already observed. In this case we refer to Calvin as a causal adversary.
Causal adversaries arise naturally in practical settings, where adversaries typically have no a priori knowledge of Alice’s message . In such cases they must simultaneously learn based on Alice’s transmissions, and jam the corresponding codeword accordingly. This causality assumption is reasonable for many communication channels, both wired and wireless, where Calvin is not colocated with Alice. For example consider the scenario in which the transmission of is done during channel uses over time, where at time the bit is transmitted over the channel. Calvin can only corrupt a bit when it is transmitted (and thus its error is based on its view so far). To decode the transmitted message, Bob waits until all the bits have arrived. As in the omniscient model, Calvin is restricted in the number of bits he can corrupt. This might be because of limited processing power or limited transmit energy.
Recently, the problem of codes against causal adversaries was considered and solved by the authors [6] for large channels, i.e., channels where Alice’s codeword is considered to be a vector of length over a field of “large” size . Each symbol may represent a large packet of bits in practice. Calvin is allowed to arbitrarily corrupt a fraction of the symbols, rather than bits. A tight characterization of the rateregion for various scenarios is given in [6], and computationally efficient codes that achieve these rateregions are presented. However, the techniques used in characterizing the rateregion of causal adversaries over large channels do not work over binary channels. This is because each symbol in a large channel can contain within it a “small” hash that can be used to verify the symbol. This is the crux of the technique used to achieve the lower bounds in [6]. We currently do not know how to extend this method to binary channels. Conversely, for upper bounds, the geometry of the space of length codewords over large alphabets is significantly different than that corresponding to binary alphabets. For instance, for large channels the volume of an sphere of radius () over is , This leads to simpler bounds for large channels.
In this work we initiate the study of binary causaladversary channels, and present two upper bounds on their capacity: , and . The upper bound of is very “natural”. Namely, it is not hard to verify that if Calvin attacks Alice’s transmission by simulating the wellstudied Binary Symmetric Channel [4], he can force a communication rate of no more than . The upper bound of presented in this work is nontrivial for both its implications and its proof techniques. The bound demonstrates that at least for some values of , the achievable rate is bounded away from . For , is strictly less than (here is the value of satisfying , and can be computed to be approximately ). In fact for our bound implies that no communication at positive rate is possible, which is much stronger than the result obtained by the upper bound of (see Fig. 1). Our proof techniques include a combination of tools from the fields of Extremal Combinatorics (e.g. Turán’s theorem [17]), and classical Coding Theory (e.g. the Plotkin bound [14, 2]).
Ii Model
For any integer let denote the set . Let be Alice’s rate. An code is defined by Alice’s encoder and Bob’s corresponding decoder, as below.
Alice: Alice’s message is assumed to be a random variable with entropy , over alphabet . We consider two types of encoding schemes for Alice.
For deterministic codes, Alice’s message is assumed to be uniformly distributed over . Her deterministic encoder is a deterministic function that maps every in to a vector in . Alice’s codebook is the collection of all possible transmitted codewords.
More generally, Alice and Bob may use probabilistic codes. For such codes, the random variable corresponding to Alice’s message may have an arbitrary distribution (with entropy ) over an arbitrary alphabet . Alice’s codebook is an arbitrary collection of subsets of . For each subset , there is a corresponding codeword random variable with codeword distribution over . For any value of the message, Alice’s encoder choses a codeword from randomly from the distribution . Alice’s message distribution , codebook , and all the codebook distributions are all known to both Bob and Calvin, but the values of the random variables and are unknown to them. If , then the transmitted codeword has the probability distribution given by . Let be the overall distribution of codewords of Alice. It holds that and .
Calvin/Channel: Calvin possesses jamming functions and arbitrary jamming random variables that satisfy the following constraints.
Causality constraint: For each , the jamming function maps and to an element of .
Power constraint: The number of indices for which the value of equals is at most . That is, for all , .
The output of the channel is the set of bits for .
Bob: Bob’s decoder is a (potentially) probabilistic function of the received vector . It maps the vectors in to the messages in .
Code parameters: Bob is said to make a decoding error if the message he decodes differs from the message encoded by Alice. The probability of error for a given message is defined as the probability, over Alice, Calvin and Bob’s random variables, that Bob makes a decoding error. The probability of error of the code is defined as the average over all of the probability of error for message .
We define two types of rates and corresponding capacities.
The rate is said to be weakly achievable if for every , and every sufficiently large there exists an code that allows communication with probability of error at most . The supremum over of the weakly achievable rates is called the weak capacity and is denoted by .
The rate is said to be strongly achievable
Remark: Since a rate that is strongly achievable is always weakly achievable but the converse is not true in general, .
Iii Related work and our results
To the best of our knowledge, communication in the presence of a causal adversary has not been explicitly addressed in the literature (other than our prior work for causal adversaries over large channels). Nevertheless, we note that the model of causal channels, being a natural one, has been “on the table” for several decades and the analysis of the online/causal channel model appears as an open question in the book of Csiszár and Korner [5] (in the section addressing Arbitrary Varying Channels [1]). Various variants of causal adversaries have been addressed in the past, for instance [1, 11, 15, 16, 13] – however the models considered therein differ significantly from ours.
At a high level, we show that for causal adversaries, for a large range of (for all ), the maximum achievable rate equals that of the classical “omniscient” adversarial model (i.e., ). This may at first come as a surprise, as the online adversary is weaker than the omniscient one, and hence one may suspect that it allows a higher rate of communication.
We have two main results. Theorem 1 gives an upper bound on the weak capacity if Alice’s encoder is deterministic. Theorem 2 gives an upper bound on the strong capacity in the more general case where Alice’s encoder is probabilistic. Due to certain limitations of our proof techniques, we do not present any bounds on the weak capacity in the latter setting. The upper bound in both cases equals .
Theorem 1 (Deterministic encoder)
For deterministic codes, .
Theorem 2 (Probabilistic encoder)
For probabilistic codes, .
We note that under a very weak notion of capacity in which one only requires the success probability to be bounded away from zero (instead of approaching ), the capacity of the omniscient channel, and thus the binary causaladversary channel, approaches . This follows by the fact that for sufficiently large and there exists codes which are list decodable with [7]. Communicating using an list decodable code allows Bob to decode a list of size of messages which includes the message transmitted by Alice. Choosing a message uniformly at random from his list, Bob decodes correctly with probability at least .
Iiia Outline of proof techniques
The upper bound of follows directly by describing an attack for Calvin wherein he approximately simulates a BSC() (Binary Symmetric Channel [4] with crossover probability ). More precisely, for each and any sufficiently small , Calvin flips with probability until he runs out of his budget of bitflips. By the Chernoff bound [3], with very high probability he does not run out of his budget, and is therefore indistinguishable from a BSC(). But it is wellknown [4] that in this case the optimal rate of communication from Alice to Bob is . Taking the limit when implies our bound.
The upper bound of is more involved. For the case where Alice’s encoder is deterministic, the proof of Theorem 1 has the following overall structure. Assume for sake of contradiction that Alice attempts to communicate at rate greater than . To prove our upper bound we design the following waitandpush attack for Calvin.
Calvin starts by waiting for Alice to transmit approximately bits. As Alice is assumed to communicate at rate greater than , the set of Alice’s codewords consistent with the bits Calvin has seen so far is “large” with “high probability”. Calvin constructs and chooses a codeword uniformly at random from . He then actively “pushes” in the direction of by flipping, with probability , each future that differs from . If Calvin succeeds in pushing to a word roughly midway between and , a careful analysis demonstrates that regardless of Bob’s decoding strategy, Bob is unable to determine whether Alice transmitted or — causing a decoding error of in this case. So, to prove our bound, we must show that with constant probability (independent of the block length ) Calvin will indeed succeed in pushing to . Namely, that Alice’s codeword and the codeword chosen at random by Calvin are of distance at most . Roughly speaking, we prove the above by a detailed analysis of the distance structure of the set of codewords in any code using tools from extremal combinatorics and coding theory.
The case where Alice’s encoder may be randomized is more technically challenging, and is considered in Theorem 2. At a high level, the strategy of Calvin for a probabilistic encoder follows that outlined for the deterministic case. However, there are two main difficulties in its extended analysis. Firstly, the symmetry between and no longer exists. Namely, the fact that Bob may not be able to distinguish which of the two were transmitted by Alice does not necessarily cause a significant decoding error, since the probability of being transmitted by Alice may well be significantly smaller than the probability that was transmitted. Secondly, the fact that both and may correspond to the same message places the entire scheme in jeopardy. As it now no longer matters if Bob decodes to or , in both cases the decoded message will be that sent by Alice.
To overcome these difficulties, we describe a more intricate analysis of Calvin’s attack. Roughly speaking, we prove that a “large” subset of behaves “well”. Any chosen uniformly at random from , with “significant” probability, is in , and has three properties corresponding to those when Alice uses a deterministic encoder. That is, is sufficiently close to as desired, it has approximately the same probability of transmission that does (thus preserving the needed symmetry), and it also corresponds to a message that differs from that corresponding to . All in all, we show that the above three properties hold with probability , which suffices to bound the strong capacity of the channel at hand (but not the weak capacity).
In case of a randomized encoder of Alice, we assume that the messages may have nonuniform distribution, and also any message is encoded into one of a set of possible codewords as per some probability distribution in that set. One may think of various other ways of encoding, for example the following, to confuse Calvin. But as we discuss in the next paragraph, such schemes are also covered in our setup.
Multiple codebooks: In this scheme, Alice maintains a set of codes . For transmitting a message , she randomly selects the code with probability . If the set of messages is with a probability distribution given by , and the code contains the codewords , then in our setup, the corresponding codebook for the message will be . This codebook may have less than codewords due to common codewords in the original codes. The induced probability distribution in this codebook of is given by .
If Alice picks a code and uses it to encode several messages, even then she does not gain anything. First, if she uses the same code to encode too many messages (and calvin knows the encoding scheme, as assumed), then both Bob and Calvin will know the code used after receiving or ‘reading’ some codewords. On the other hand, if a randomly chosen code is used only to encode a block of few messages this is equivalent to using a longer (‘superblock’) code in our setup. The only difference is that the probability of error analysed in our set up is the probability of error in decoding the ‘superblocks’ rather than the smaller blocks/codewords.
Iv Proof of Theorem 1
Let for some . Let denote the binary logarithm, here and throughout.
By assumption for deterministic codes, Alice’s message space is of size .
Here we assume for that in an integer.
This implies that the set of Alice’s transmitted codewords is of size .
We now present Calvin’s attack. We show that for any fixed , regardless of Bob’s decoding strategy, there is a decoding error with constant probability (namely, the error probability is independent of ). Calvin’s attack is in two stages. First Calvin passively waits until Alice transmits bits over the channel. Let be the value of the codeword observed so far. He then considers the set of codewords that are consistent with the observed . Namely, Calvin constructs the set . He then chooses an element uniformly at random. In the second stage, Calvin follows a random bitflip strategy. That is, for each remaining bit of that differs from the corresponding bit of transmitted, he flips the transmitted bit with probability , until he has either flipped bits, or until .
We analyze Calvin’s attack by a series of claims. We first show that with high probability (w.h.p.) the set is large.
Claim IV.1
With probability at least , the set is of size at least .
Proof: The number of messages for which is of size less than is at most the number of distinct prefixes times , which in turn is at most .
Now assume that the message is such that its corresponding set is of size at least . We now show that this implies that the transmitted codeword and the codeword chosen by Calvin are distinct and of small Hamming distance apart with a positive probability (independent of ).
Claim IV.2
Conditioned on Claim IV.1, with probability at least , and .
Proof: Consider the undirected graph in which the vertex set consists of the set and two nodes are connected by an edge if their Hamming distance is less than . An independent set in corresponds to a subset of codewords in that are all (pairwise) at distance greater than .
Since the codewords in all have the same prefix , one may consider only the suffix (of length ) of the codewords in . Here we assume , minor modifications in the proof are needed for larger . The set of vectors defined by the suffixes in an independent set of now corresponds to a binary errorcorrecting code of length , with codewords and minimum distance .
By Plotkin’s bound [2] there do not exist binary error correcting codes with more than codewords. Thus , any maximal independent set in , must satisfy
(1) 
By Turán’s theorem [17], any undirected graph of size and average degree has an independent set of size at least . This, along with (1) implies that the average degree of our graph satisfies
This in turn implies that
The second inequality is for large enough , since is of size at least . To summarize the above discussion, we have shown that our graph has large average degree of size . We now use this fact to analyze Calvin’s attack.
By the definition of deterministic codes, any codeword in is transmitted with equal probability. Also, by definition both (the transmitted codeword) and (the codeword chosen by Calvin) are in . Hence both and are uniform in . This implies that with probability the nodes corresponding to codewords and are distinct and connected by an edge in . This in turn implies that with probability , and , as required. Now
Conditioned on Claim IV.2, Calvin’s codeword is very close to Alice’s transmitted codeword . Specifically, . We now show that if Calvin follows the random bitflip strategy, from Bob’s perspective (w.h.p.), both or were equally likely to have been transmitted by Alice.
We first show that during Calvin’s random bitflip process, w.h.p., Calvin does not “run out” of his budget of bit flips.
Claim IV.3
Conditioned on Claim IV.2, with probability at least
(2) 
Proof: The expected number of locations flipped by Calvin is . Assume that (for smaller values of the bound is only tighter). By Sanov’s theorem [4, Theorem 12.4.1], the probability that the number of bits flipped by Calvin deviates from the expectation by more than is at most for large enough .
It should be noted that , and so implies that the number of bits flipped by Calvin does not exceed . Since Calvin possibly flips only the bits of which differ from the corresponding bits in , (2) also implies
(3) 
We conclude by proving that if the number of bits flipped by Calvin lies in the range , then indeed Bob cannot distinguish between the case in which or were transmitted.
Claim IV.4
Conditioned on Claim IV.3 Bob makes a decoding error with probability at least .
Proof: By Bayes’ Theorem [8], if Bob receives , the a posteri probability that Alice transmitted , denoted , equals . Here is the probability (over her encoding strategy) that Alice transmits , is the probability (over Calvin’s random bitflipping strategy) that Bob receives given that Alice transmits , and is the resulting probability that Bob receives . Similarly, . Taking the ratio and noting that for deterministic codes , we have
(4) 
Since Calvin’s random bitflip strategy involves him flipping bits of (which are different from the corresponding bits of ) with probability , for all satisfying (2), the probabilities and are equal. This observation and (4) together imply . Thus, Bob cannot distinguish whether or were transmitted. Namely, on the pair of events in which Alice transmits and Calvin chooses and in which Alice transmits and Calvin chooses , no matter which decoding process Bob uses, he will have an average decoding error of at least . This suffices to prove our assertion.
Thus a decoding error happens if the conditions of Claims IV.1, IV.2, IV.3 and IV.4 are all satisfied. This happens with probability at least for large enough .
V Proof of Theorem 2
We start by proving the following technical Lemma that we use in our proof. Let be an arbitrary probability distribution over an index set . Let be arbitrary discrete random variables with probability distributions over alphabets respectively. Let . Let be a random variable that equals the random variable with probability . Then the following Lemma describing an elementary property of the entropy function is useful in the proof of Theorem 2.
Lemma V.1
The entropies of and satisfy , with equality if and only if for each for which both and are positive it holds that .
Proof: For any , the probability of occurrence of , equals . Hence
Here (V) follows from Jensen’s inequality, e.g. [4], with equality if and only if for each positive , there is a unique such that (here ).
We now turn to prove Theorem 2. Recall our notation: let be the random variable corresponding to Alice’s message and its distribution (with entropy ). Throughout we assume the message set (the support of ) is at most of size . Let be Alice’s codebook. is a collection of subsets of . For each subset , there is a corresponding codeword random variable with codeword distribution over . For any value of the message, Alice’s encoder choses a codeword from randomly from the distribution . Alice’s message distribution , codebook , and all the codebook distributions are all known to both Bob and Calvin, but the values of the random variables and are unknown to them. If , then the transmitted codeword has the probability distribution given by . Let the the overall distribution of codewords of Alice. It holds that and .
For any , let .
We start by specifying Calvin’s attack.
Calvin uses a very similar attack to the one described in the proof of Theorem 1.
That is, Calvin first passively waits until Alice transmits bits over the channel.
Let be the value of the codeword observed so far.
He then considers the set of codewords consistent with
the observed .
Here and throughout this section, we denote codewords by their corresponding message and index in .
As it may be that is exactly the same codeword as , the sets in the definitions to follow and in this section are in a sense multisets.
Namely, Calvin constructs the set . Let be the probability, under the probability distribution , corresponding to the event that Calvin observes in the first transmissions. Let
and
be the probability distributions
and also respectively
conditioned on the same event.
Calvin then
chooses an element with probability
Recall that in the proof of Theorem 1, our goal was to prove that with some constant probability, the distance between and is approximately . Loosely speaking, this allows the success of Calvin’s attack (i.e., imply a decoding error). Following the same outline of proof, we now show that with probability the codeword chosen by Calvin has the following three properties:

It’s corresponding message differs from that corresponding to (i.e., ).

is close to and thus Calvin will be able to “push” to a codeword at approximately the same distance from and .

Given , Bob is unable to distinguish whether or was transmitted.
To this end, we partition the set into disjoint subsets for . Let be the probability mass of . Let and be the probability distributions and respectively conditioned on the event that Alice transmitted in . The partition is obtained in two steps – first we partition into subsets , then we partition each into sets . We also use the probability distribution , and defined accordingly. All in all, we prove the existence of a subset with the following properties

is “large”.

is large with respect to .

For any it holds that has approximately the same value.

is approximately uniform on its support.
Roughly speaking, proving these properties on reduces us to the case of a deterministic encoder (addressed in Theorem 1) and allows us to complete our proof.
We now present our proof for the existence of as specified above. We first show that with positive probability the set has high entropy.
Claim V.1
With probability at least , .
Proof: Let be the probability distribution over for which for all possible . Let be the probability distribution . Now using Lemma V.1 we obtain
(6) 
By our definitions . Moreover, (since is defined over an alphabet of size ). Thus (6) becomes
As the average of is at least , then with probability at least (by a Markov type inequality, here we use the fact that ).
We now define the sets . For , let be the set of codewords in for which is in the range . The set is defined to be the set of codewords in for which is in the range . Let be the probability mass of . Namely . Let be the distribution over taking w.p. . Notice that (as its support is of size ). Conditioning on Claim V.1 and using Lemma V.1 it can be verified that
Claim V.2
(7) 
Consider sets with (relative) mass . It holds that
The above follows from the fact that (for sufficiently large ). Here we use the fact that .
We conclude the existence of a set such that and . We now further partition . For , let be the set of codewords in for which is in the range . is defined to be the set of codewords in for which is in the range . Let be the probability mass of . Namely . Let be the distribution over taking w.p. . Notice that (as its support is of size ). As before, conditioning on Claim V.2 and using Lemma V.1 it can be verified that (for the index specified above),
Claim V.3
(8) 
Again, consider sets with mass . It holds that
We conclude the existence of a set such that

.

.

For any it holds that is approximately .

For any it holds that is approximately equal.
The set is exactly what we are looking for. Roughly speaking, by Claim V.1, with probability at least Calvin views a prefix for which . Conditioning on this event, both Alice and Calvin choose codewords , in with probability at least .
We now sketch to remainder of the proof which closely follows that of Theorem 1. We partition into groups of messages consisting of all codewords in corresponding to . Recall that each codeword has approximately the same probability , and for each it holds that is approximately the same value. This implies that each group has approximately the same size. Moreover, as it holds that there are at least nonempty subsets in .
So, all in all, has a very symmetric structure: it includes many groups, each consisting of elements with the same transmission probability, and each of approximately the same size and mass (w.r.t. ). This reduces us to the case considered in Theorem 1 in which our subset included many messages, each with the same probability, details follow.
Consider the graph in which the vertex set consists of the set and two nodes are connected by an edge if their Hamming distance is less than .
Now, it is can be verified (using analysis almost identical to that given in the proof of Theorem 1) that

With probability at least the codewords and satisfy . Here one needs to take into consideration the slight difference in the group sizes and the probabilities for each codeword.

With probability the vertices in corresponding to and are connected by an edge.

During Calvin’s random bitflip process, with high probability of , Calvin does not “run out” of his budget of bit flips.

Conditioning on the above, Bob cannot distinguish between the case in which or were transmitted.

Finally, on the pair of events in which Alice transmits and Calvin chooses , and Alice transmits and Calvin chooses , no matter which decoding process Bob uses, he has an average decoding error that is bounded away from zero. Here again we take into account the slight differences between and .
To summarize, Calvin causes a decoding error with probability as desired. This concludes our proof.
Vi Conclusions
We analyze the capacity of the causaladversarial channel and show (for both deterministic and probabilistic encoders) that the capacity is bounded by above by . For a large range of (for all ), the maximum achievable rate equals that of the stronger classical “omniscient” adversarial model (i.e., ).
Several questions remain open. In this work we do not address achievability results (i.e., the construction of codes). It would be very interesting to obtain codes for the causaladversary channel which obtain rate greater than that know for the “omniscient” adversarial model (i.e., the GilbertVarshamov bound) for ). As we do not believe that the upper bound of presented in this work is actually tight, such codes, if they exist, may give a hint to the correct capacity.
As done in our work on large alphabets [6], one may also consider the more general channel model in which for a delay parameter , the jammer’s decision on the corruption of must depend solely on for . This might correspond to the scenario in which the error transmission of the adversarial jammer is delayed due to certain computational tasks that the adversary needs to perform. The capacity of the causal channel with delay is an intriguing problem left open in this work.
Footnotes
 footnotetext: The work of B. K. Dey was supported by Bharti Centre for Communication in IIT Bombay, that of M. Langberg was supported in part by ISF grant 480/08, and that of S. Jaggi was partially supported by MSCUJL grants.
 This definition is motivated by the extensive literature on error exponents in information theory – for large classes of informationtheoretic problems, e.g. [9, 5], the probability of error of the coding scheme is required to decay exponentially in block length.
 In fact, may be smaller, however we note that for codes of optimal rate, is of size exactly . If , then for some transmitted codeword at least two messages and must both be encoded to . On receiving , Bob’s probability of error is maximal – it is at least . Therefore changing the codebook so as to encode as some cannot increase the probability of decoding error.
 This is one significant difference from the attack in the proof of Theorem 1 – there Calvin chooses each uniformly at random from the corresponding consistent set.
References
 D. Blackwell, L. Breiman, and A. J. Thomasian. The capacities of certain channel classes under random coding. The Annals of Mathematical Statistics, 31(3):558–567, 1960.
 A. E. Brouwer. Bounds on the size of linear codes. In V. S. Pless and W. C. Huffman, editors, Handbook of Coding Theory