A Tamper-Free Semi-Universal Communication System for Deletion Channels

# A Tamper-Free Semi-Universal Communication System for Deletion Channels

Shahab Asoodeh, Yi Huang, and Ishanu Chattopadhyay Computation Institute and Institute of Genomics and System Biology, The University of Chicago, Chicago, IL 60637 shahab@uchicago.edu The University of Chicago, Chicago, IL yhuang10@uchicago.eduComputation Institute, Chicago, IL ishanu@uchicago.edu
###### Abstract

We investigate the problem of reliable communication between two legitimate parties over deletion channels under an active eavesdropping (aka jamming) adversarial model. To this goal, we develop a theoretical framework based on probabilistic finite-state automata to define novel encoding and decoding schemes that ensure small error probability in both message decoding as well as tamper detecting. We then experimentally verify the reliability and tamper-detection property of our scheme.

## I Introduction

The deletion channel is the simplest point-to-point communication channel that models synchronization errors. In the simplest form, the inputs are either deleted independently with probability or transmitted over the channel noiselessly. As a result, the length of channel output is a random variable depending on . Surprisingly, the capacity of deletion channel has been one of the outstanding open problems in information theory [1]. A random coding argument for proving a Shannon-like capacity result for deletion channel (in general for all channels with synchronization errors) was given by Dobrushin [2] which is recently improved by Kirsch and Drinea [3] to derive several lower bounds. Readers interested in most recent results on deletion channels are referred to the recent survey by Mitzenmacher [4] that provides a useful history and known results on deletion channels.

As the problem of computing capacity of deletion channels is infamously hard, we focus on another problem in deletion channels. In this paper, we study the behavior of the deletion channel under an active eavesdropper attack. Secrecy models in information theory literature, initiated by Yamamoto [5], assume that there exists a passive eavesdropper who can observe the symbols being transmitted over the channel. The objective is to design a pair of (randomized) encoder and decoder such that the message is decoded with asymptotically vanishing error probability at the legitimate receiver while ensuring that the eavesdropper gains negligible information about the message. In all secrecy models (see, e.g., [6, 7, 8, 9, 10, 11, 12]) the crucial assumption is that the eavesdropper can neither jam the communication channel between legitimate parties nor can she modify any messages exchanged between them. However, in many practical scenarios, the eavesdropper can potentially change the channel, for instance, add stronger noise to change the crossover probability of a binary symmetric channel or the deletion probability of a deletion channel.

In our adversarial model, we assume that two parties (say Alice and Bob) wish to communicate over a public deletion channel while an eavesdropper (say Eve) can potentially tamper the statistics of the channel. We focus on deletion channel and assume that Eve can have possibly more bits deleted, and hence increases the deletion probability of the channel. The objective is to allow a reliable communication between Alice and Bob (with vanishing error probability) regardless of the eavesdropper’s action. To this goal, we design (i) a randomized encoder using probabilistic finite-state automata which, given a fixed message, generates a random vector as the channel input and (ii) a decoder which generates an estimate of the message only when the channel is not tampered. In case the channel is indeed tampered, the decoder can declare it with asymptotically small Type I and Type II error probabilities. It is worth mentioning that the rate of our scheme is (almost) zero and hence we do not intend to study capacity of deletion channels.

Unlike the classical channel coding where the set of all possible channel inputs (aka, codebook) must be available at the decoder, our scheme requires that only the set of PFSA’s used in the encoder to be available at the decoder. This model, that we call semi-universal, is contrasted with universal channel coding [13] where neither channel statistics nor codebook are known and the decoder is required to find the pattern of the message.

The rest of the paper is organized as follows. In Section II, we discuss briefly the notion of PFSA and its properties required for our scheme. Section III specifies the channel model, encoder, decoder, and different error events. In Section IV, we discuss the effects of deletion channels on PFSA. Section V concerns the thoeretical aspects of our coding scheme and Section VI contains several experimental results.

Notation We use calligraphic uppercase letters for sets (e.g. ), sans serif font for functions (e.g. ), uppercase letters for matrices (e.g. ), bold lower case letters for vectors (e.g. ). Throughout, we use to denote a PFSA and and to denote its state and symbol, respectively. We use for a sequence of symbols or interchangeably, if its size is clear in context. Also, for th entry of vector , and for the th row or column of the matrix , respectively. We use to denote a vector with the entry indexed by and a matrix with the column indexed by . Finally, .

## Ii Probabilistic finite state automata

In this section, we introduce a new measure of similarity between two vectors. To do this, we first need to define probabilistic finite-state automata (PFSA).

###### Definition 1 (Pfsa).

A probabilistic finite-state automaton is a quadruple , where is a finite state space, is a finite alphabet with , is the state transition function, and specifies the conditional distribution of generating a symbol conditioned on the state.

In fact, a PFSA is a directed graph with a finite number of vertices (i.e., states) and directed edges emanating from each vertex to the other. An edge from state to state is specified by two labels: (i) a symbol that updates the current state from to , that is, , and (ii) the probability of generating when the system resides in state , i.e., . For instance, in the PFSA described in Fig. 1, thus, the system residing in states evolve to state with probability and it generates symbol . Clearly, for all .

Given two symbols and , one can define the transition function for the concatenation as . Letting denote the set of all possible concatenation of finitely many symbols from , one can easily proceed to define as above for each and . We say that a PFSA is strongly connected if for any pair of distinct states and , there exists a sequence such that . Let be the set of all strongly connected PFSAs. The significance of strongly connected PFSAs is that their corresponding Markov chains (i.e., the Markov chain with state space and transition matrix whose entry is ) has a unique stationary distribution (thus initial state can be assumed to be irrelevant).

###### Definition 2 (Γ-expression for PFSA).

We notice that a PFSA is uniquely determined by given by

 (Γg,x)i,j={Pg(si,x),Tg(si,x)=sj,0,otherwise.

The state-to-state transition matrix is defined as

 Pg=∑x∈XΓg,x, (1)

and the state-to-symbol transition matrix is given by

 ˜Pg=(Γg,x1|S|)x∈X,

where is the length- all-one vector.

For the PFSA illustrated in Fig. 1, we have

 Γg,0=⎛⎜ ⎜ ⎜⎝.300000.60.800000.50⎞⎟ ⎟ ⎟⎠,Γg,1=⎛⎜ ⎜ ⎜⎝0.700000.40.200000.5⎞⎟ ⎟ ⎟⎠,
 Pg=⎛⎜ ⎜ ⎜⎝.3.70000.6.4.8.20000.5.5⎞⎟ ⎟ ⎟⎠,\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak and\leavevmode\nobreak \leavevmode\nobreak \leavevmode\nobreak ˜Pg=⎛⎜ ⎜ ⎜⎝.3.7.6.4.8.2.5.5⎞⎟ ⎟ ⎟⎠.
###### Definition 3 (Generalized PFSA).

Generalized PFSA is a PFSA whose can have more than one non-zero (positive) entries. In this case, we still have

 (Γg,x1|S|)i=Pg(si,x).

However, might not be deterministic, and instead it is a probability distribution.

Shannon [14] appears to be the first one who made use of PFSAs to describe stationary and ergodic sources. Given , first a state is chosen randomly according to the stationary distribution, then a symbol is generated with probability which takes the system from state to state . A new symbol is then generated with probability . Letting this process run for time units, we obtain a sequence . In this case, we say that is a realization of . According to Shannon, each state captures the "residue of influence" of the preceding symbol on the system.

For , we denote by the fact that generates .

## Iii System Model and Setup

Suppose Alice has a message which takes value in a finite set and seeks to transmit it reliably to Bob over a deletion channel with deletion probability . The communication channel is assumed to be public, that is, an active eavesdropper, say Eve, can access and possibly tamper the channel. For simplicity, we assume that Eve may delete extra bits and thus changing the channel from to with .

The objective is to design a pair of encoder and decoder that enables Alice and Bob to reliably communicate over only when he is ensured that the channel is not tampered. In classical information theory, the decoder must be tuned with the channel statistics. Hence, reliable communication occurs only when Bob knows the deletion probability . However, Eve might have tampered the channel and increased deletion probability to , and since Bob’s decoding policy was tuned to , this might cause a decoding error –regardless of Bob’s decoding algorithm. Therefore, reliability of the decoding must be always conditioned on the fact that the channel has not been tampered during communication.

Motivated by this observation, we propose the following coding scheme. We first propose a two-step encoder: each message is first sent to a function which maps to a PFSA in , then another function generates a realization of PFSA and sends it over the memoryless channel . Therefore, the encoder function is the composition (see Fig. 2). Unlike the classical setting, Bob need not know the set of all channel inputs for each (aka codebook). Instead, we assume Bob knows (thus the name semi-universal scheme). The output of the channel is an -valued random vector whose length is a binomial random variable (corresponding to how many elements of are deleted). Upon receiving , Bob applies to generate where is an estimate of Alice’s message and specifies whether or not the channel has been tampered. He then declares as the message only when . Therefore, the goal is to design such that for sufficiently large

 Pr(T=0\leavevmode\nobreak |\leavevmode\nobreak channel\leavevmode% \nobreak\ is\leavevmode\nobreak\ tampered)+Pr(T=1\leavevmode\nobreak |\leavevmode\nobreak channel\leavevmode\nobreak\ is\leavevmode\nobreak\ % not\leavevmode\nobreak\ tampered)<ε, (2)

and simultaneously

 Pr(M≠^M|T=0)≤ε, (3)

for any uniformly chosen message . We say that the reliable tamper-free communication is possible if (2) and (3) hold simultaneously for any .

## Iv PFSA through deletion channel

In this section, we study the channel effect on PFSA’s by monitoring the change of the likelihood of being generated by a PFSA at the channel output. To do this, we first study the likelihood when in Section IV-A, and then move on to the case of positive in Section IV-B. One of the main results in this section is to show that the output of (i.e., ) can be equivalently generated by a generalized PFSA whose and state-to-state transition matrix follow simple closed forms (cf. Theorem 1). In section IV-C, we discuss some basic properties of that will be useful for later development. We conclude this section by introducing the class M2 of PFSAs which is closed under deletion. For notational brevity, we remove the subscript when it is is clearly understood from context.

### Iv-a PFSA over W(0): no deletion

Let a sequence of symbols be given. We define (or simply ) to be the probability that generates . Then we have

 p(xn)=p(x1)p(x2|x1)⋯p(xn|xn−1),

where is the conditional probability of generating given that generated . It is clear from section II that

 p0=p (4) p(x1)=(pT0˜P)x1, \leavevmode\nobreak pT1=pT0Γx1∥∥pT0Γx1∥∥1, p(x2|x1)=(pT1˜P)x2, \leavevmode\nobreak pT2=pT1Γx2∥∥pT1Γx2∥∥1, ⋮ p(xn−1|xn−2)=(pTn−2˜P)xn−1, \leavevmode\nobreak pTn−1=pTn−2Γxn−1∥∥pTn−2Γxn−1∥∥1,

and finally, , where denotes matrix transpose.

It is clear from the above update rule that any sequence induces two probability distribution: one on the state space , i.e., and the other one on . Let denote the former by and the latter by . Update rules in (4) imply that and . More precisely, since

 ∥∥pTg(x)Γg,x∥∥1=pTg(x)Γg,x1|S|=pTg(x)(˜Pg)⋅,x=(pTg(x)˜Pg)x=p(x|x),

we have

 pT(x|x)pg(xx)=pTg(x)Γg,x. (5)

We also call the symbolic derivative of induced by .

### Iv-B PFSA over W(δ): deletion with probability δ>0

Now we move forward to investigate the effect of deletion probability on PFSA transmission. The following result is a ket for our analysis.

###### Theorem 1.

Let be a channel input and be a channel output with positive deletion probability . Then , where is a generalized PFSA identified by for all , where is the state-to-state transition matrix of and is as defined in (6).

###### Proof.

Assume Bob has observed . Then we have

 p(xi|xi−1) = (1−δ)(pTi−1˜P)xi+δ(1−δ)(pTi−1P˜P)xi+δ2(1−δ)(pTi−1P2˜P)xi+⋯ = (1−δ)(pTi−1(∞∑i=0δiPi)˜P)xi = (pTi−1Q(P,δ)˜P)xi,

where

 Q(P,δ)=(1−δ)∞∑i=0δiPi=(1−δ)(I−δP)−1. (6)

Analogous to (4), we can define the follwoing distribution induced on

 pi=pTi−1Q(P,δ)Γxi∥∥pTi−1Q(P,δ)Γxi∥∥1. (7)

Comparing (7) with expressions in (4), the result follows. ∎

###### Remark 1.

Notice that while the row-stochastic matrix may not be invertible, is non-singular for all , as the the eigenvalues of are less than or equal to . Moreover, it is clear from (6) that is also a row-stochastic matrix with being its eigenvector corresponding to eigenvalue one. We will give a closer look at the eigenvalues of in the next section.

### Iv-C Properties of the generalized PFSA

We start by analyzing the eigenspace of the state-to-state transition matrix of . Note that it follows from (1) that .

###### Theorem 2.

Let be the stationary distribution of strongly connected . Then the generalized PFSA is also strongly connected with stationary distribution .

###### Proof.

Let be an eigenvalue of . Then is an eigenvalue of . Define . Then the result follows from the following observations:

1. For , for all , and hence .

2. For , for all , and furthermore, . ∎

Then following is an immediate corollary.

###### Corollary 1.

We have for all

 pg(x)=pg(δ)(x).
###### Proof.

We have

 pTg(δ)˜Pg(δ)=pTg˜Pg(δ)=pTgQ(Pg,δ)˜Pg=pTg˜Pg.\qed

A natural question is what happens when . Letting denote the machine corresponding to , we now show that, quite expectedly, is a single-state machine.

###### Theorem 3.

is a single-state PFSA.

###### Proof.

First note that the observations given in the proof of Theorem 2 imply that

 limδ→1Q(Pg,δ)=1|S|pTg,

and consequently is a PFSA specified by for .

Suppose is observed. Following the argument given in section IV-B, we get

 pg(1)(xx) = pT(1pTΓx1)(1pTΓx2)⋯(1pTΓxn)(˜Pg(1))⋅,x = pT(1pTΓx1)(1pTΓx2)⋯(1pTΓxn)(1pTΓx1) = (pT1)(pTΓx11)⋯(pTΓxn1)(pTΓx1),

and hence, by induction, for all . Since an i.i.d. process corresponds to a single-state PFSA, we conclude that is in fact a single-state PFSA. ∎

### Iv-D M2 Class of PFSA

We note that of a PFSA is not necessarily a PFSA. As an example, the -expression of the generalized PFSA for being the PFSA described in Fig. 1 is

Nevertheless, we introduce M2 a class of PFSAs which is closed under deletion, i.e.  implies for all . As this class is instrumental in our experimental results, we shall study it in more details.

M2 is the collection of -state PFSAs on a binary alphabet: with is specified by a quadruple , where and

 Γg(μ,ν),0=(μ0ν0),Γg(μ,ν),1=(01−μ01−ν).

Fig. 3 illustrates and its corresponding , which is obtained from Theorem 1. Since has exactly the same form – containing a single column of non-zero entries for all , it is clear that .

Since each is specified by two numbers, we can parametrize M2 by a square in . In Fig. 4, we show the effect of deletion probability on M2 machines. The key observation is that deletion probability drives machines to line.

## V The convergence of likelihood

The goal of this section is to lay the theoretical ground for our algorithms for decoding and tamper detecting with PFSAs. In Section V-C, we employ maximum likelihood framework to decode the generating PFSA given the channel output. We show that likelihood is closely related to entropy rate and KL divergence of PFSAs (to be defined and calculated in V-A and V-B).

### V-a Entropy rate of PFSA

Let be a PFSA. We define as the following:

 Hn(g)\coloneqq−∑|x|=npg(x)logpg(x).

Then the entropy rate of is defined as

 H(g)\coloneqqlimn→∞1nHn(g).

Note that is in fact the entropy rate of the stochastic process corresponding to [15]. In the next theorem, we show that the above limit exists and and the entropy rate has a simple closed form.

###### Theorem 4.

We have

 H(g)=∑s∈S(pg)sH((˜Pg)s,⋅)
###### Proof.

See Appendix -A. ∎

It readily follows from the theorem above that the entropy rate for is

 H(g(μ,ν))=νhb(μ)¯μ+ν+¯μhb(ν)¯μ+ν,

where and is the binary entropy function for any .

Next, we show that deletion increases entropy rate, which will be critical for tamper detection purpose.

###### Theorem 5.

The map is monotonically increasing when .

###### Proof.

We have

 μ(δ)=μ−δ(μ−ν)1−δ(μ−ν),ν(δ)=ν1−δ(μ−ν),

and

 H(g(μ,ν)(δ))=ν1−μ+νhb(μ−δ(μ−ν)1−δ(μ−ν))+1−μ1−μ+νhb(ν1−δ(μ−ν)).

We can then write

 ddδH(g(μ,ν)(δ))=α¯μν(1−αδ)2¯αlog(μ−δα)(¯ν−δα)¯μν,

where . It’s straightforward to check that the derivative is always positive when . ∎

### V-B KL divergence of two PFSAs

Let . The -th order KL divergence between and is the KL divergence on the space of length- sequences, i.e.

 Dn(g1∥g2)=∑|x|=npg1(x)logpg1(x)pg2(x).

Analogous to entropy rate, we can define the KL divergence between and as

 DKL(g1∥g2)\coloneqqlimn→∞1nDn(g1∥g2).

We show in Theorem 6 below shows that the limit exists and also derived a closed form for the KL divergence between two PFSAs. But before we can state the theorem, we need to introduce a very useful construction on two PFSAs, called synchronous composition.

###### Definition 4 (synchronous composition).

Let and be two PFSAs with the same alphabet and let be the probabilistic automata specified by the quadruple where

 Sc =S1×T={(s,t)}s∈S1,t∈T

is the Cartesian product of and , and

 Tc((s,t),x) =(T1(s,x),T2(t,x)), Pc((s,t),x) =P1(s,x),

for all , , and . Then the synchronous composition is defined to be any absorbing strongly connected component of , i.e. strongly connected component without any out-going edges.

It is not clear that there is only one absorbing strongly connected component in . However, as proved in Theorem 8 in Appendix -B, is equivalent to irrespective of the choice of absorbing strongly connected component, i.e., for .

In Figs. 6, 7, 8, and 9, we provide examples of synchronous compositions for several and which shed light on the fact that the synchronous composition of two strongly connected PFSA might not be strongly connected.

###### Theorem 6.

Let and be the stationary distribution of . Then we have

###### Proof.

See Appendix -B. ∎

In light of this theorem, one can easily show

 DKL(g1∥g2)=ν1DKL(μ1∥μ2)¯μ1+ν1+¯μ1DKL(ν1∥ν2)¯μ1+ν1.

### V-C Convergence of log likelihood

According to Shannon-McMillan-Breiman Theorem [15, Theorem 16.8.1], we have for any sequence . A natural question is that what the log-likelihood converges to if is generated by a different machine. The following theorem states that the log-likelihood converges to entropy of generating machine plus the KL divergence which accounts for the mismatch.

###### Theorem 7.

For any , we have with probability one

 −1nn∑i=1logpg′(xi|xi−1)→H(g)+DKL(g∥∥g′),

for any PFSA .

###### Proof.

First note that

 −1nn∑i=1logpg′(xi|xi−1)=−1nlogpg(x)+1nn∑i=1logpg(xi|xi−1)pg′(xi|xi−1). (8)

Clearly, the first term in the above sum converges to . To show the convergence of the second term, let . Notice that for any PFSA in M2 and for , equals for all with , and to for all with , and hence the process is a Markov process. Let and denote the set of indices such that and , respectively. Then we have

 1nn∑i=1Zi=1n∑i∈Z0Zi+1n∑i∈Z1Zi. (9)

It is straightforward to show that for all

 Zi=1{xi=0}logμgμg′+1{xi=1}log¯μg¯μg′,

and for all

 Zi=1{xi=0}logνgνg′+1{xi=1}log¯νg¯νg′.

It follows from (9) that

 1nn∑i=1Zi = 1n(logμgμg′)n∑i=11{xi−1=0,xi=0}+1n(log¯μg¯μg′)n∑i=11{xi−1=0,xi=1}+1n(logνgνg′)n∑i=11{xi−1=1,xi=0} +1n(log¯νg¯νg′)n∑i=11{xi−1=1,xi=1} \lx@stackreln→∞⟶

For ease of presentation, we define

 L(g′,xn←g)\coloneqq−1nn∑i=1logpg′(xi|xi−1).

When the generating machine is not known, we use to identify likelihood of generating .

## Vi Algorithm and simulation

### Vi-a Decoding

In this and the following section, we assume that we have a set of PFSAs , with for all . We will briefly discuss heuristics on how to generate a set of PFSAs that are good for tamper detecting and decoding in SectionVI-C.

We saw in Theorem 7 that

 L(gj(δ),xn←gi(δ))→H(gi(δ))+DKL(gi(δ)∥∥gj(δ)), (10)

which motivates the following definition for the decoding function in Fig. 2

 ψ(x)=argminm∈ML(gm(δ),xn).

We apply this decoding strategy in Fig. 5 when and two different message sets with or .

### Vi-B Tamper detecting

We assume that active eavesdropper tampers the channel in such a way that with some . Following Theorems 5 and 7, we get

 L(gj(δ),x←gi(δ′)) →H(gi(δ′))+DKL(gi(δ′)∥∥gj(δ)) ≥H(gi(δ)), (11)

where the inequality is due to Theorem 5. Hence, tampering the channel results in an increase in the likelihood. This leads to our temper detecting procedure detailed in Algorithm 1.

### Vi-C Generate machines with good separation

For fixed number of messages, we need to choose a set of M2 PFSAs with the best decoding and tamper detection performance. It is important to indicate that (1) decoding error will be significantly lowered by increasing according to (10), and (2) the tampering detection error will be improved by making sure is large for , according to (VI-B). However, there is a trade-off here – to increase pairwise KL divergence, we want the machines to be spread more evenly in the parameter space while, according to Theorem 5, to increase , we need the machines to stay away from being single-state, i.e. away from the line.

Here, we describe briefly how we design