Efficient Steganography with Provable Security Guarantees

# Efficient Steganography with Provable Security Guarantees

## Abstract

We provide a new provably-secure steganographic encryption protocol that is proven secure in the complexity-theoretic framework of Hopper et al.

The fundamental building block of our steganographic encryption protocol is a “one-time stegosystem” that allows two parties to transmit messages of length shorter than the shared key with information-theoretic security guarantees. The employment of a pseudorandom generator (PRG) permits secure transmission of longer messages in the same way that such a generator allows the use of one-time pad encryption for messages longer than the key in symmetric encryption. The advantage of our construction, compared to that of Hopper et al., is that it avoids the use of a pseudorandom function family and instead relies (directly) on a pseudorandom generator in a way that provides linear improvement in the number of applications of the underlying one-way permutation per transmitted bit. This advantageous trade-off is achieved by substituting the pseudorandom function family employed in the previous construction with an appropriate combinatorial construction that has been used extensively in derandomization, namely almost -wise independent function families.

Keywords: Information hiding, steganography, data hiding, steganalysis, covert communication.

## 1 Introduction

In a canonical steganographic scenario, Alice and Bob wish to communicate securely in the presence of an adversary, called the “Warden,” who monitors whether they exchange “conspicuous” messages. In particular, Alice and Bob may exchange messages that adhere to a certain channel distributions that represents “inconspicuous” communication. By controlling the messages that are transmitted over such a channel, Alice and Bob may exchange messages that cannot be detected by the Warden. There have been two approaches in formalizing this problem, one based on information theory [2, 13, 7] and one based on complexity theory [6]. The latter approach is more concrete and has the potential of allowing more efficient constructions. Most steganographic constructions supported by provable security guarantees are instantiations of the following basic procedure (often referred to as “rejection-sampling”).

The problem specifies a family of message distributions (the “channel distributions”) that provide a number of possible options for a so-called “covertext” to be transmitted. Additionally, the sender and the receiver possess some sort of private information (typically a keyed hash function, MAC, or other similar function) that maps channel messages to a single bit. In order to send a message bit , the sender draws a covertext from the channel distribution, applies the function to the covertext and checks whether it happens to produce the “stegotext” he originally wished to transmit. If this is the case, the covertext is transmitted. In case of failure, this procedure is repeated. While this is a fairly concrete procedure, there are a number of choices to be made with both practical and theoretical significance. From the security viewpoint, one is primarily interested in the choice of the function that is shared between the sender and the receiver. From a practical viewpoint, one is primarily interested in how the channel is implemented and whether it conforms to the various constraints that are imposed on it by the steganographic protocol specifications (e.g., are independent draws from the channel allowed? does the channel remember previous draws? etc.).

As mentioned above, the security of a stegosystem can be naturally phrased in information-theoretic terms (cf. [2]) or in complexity-theoretic terms [6]. Informally, the latter approach considers the following experiment for the warden-adversary: The adversary selects a message to be embedded and receives either covertexts that embed the message or covertexts simply drawn from the channel distribution (without any embedding). The adversary is then asked to distinguish between the two cases. Clearly, if the probability of success is very close to it is natural to claim that the stegosystem provides security against such (eavesdropping) adversarial activity. Formulation of stronger attacks (such as active attacks) is also possible. Given the above framework, Hopper et al. [6] provided a provably secure stegosystem that pairs rejection sampling with a pseudorandom function family. Given that rejection sampling, when implemented properly and paired with a truly random function, is indistinguishable from the channel distribution, the security of their construction followed from the pseudorandom function family assumption. From the efficiency viewpoint, this construction required about 2 evaluations of the pseudorandom function per bit transmission. Constructing efficient pseudorandom functions is possible either generically [5] or, more efficiently, based on specific number-theoretic assumptions [9]. Nevertheless, pseudorandom function families are a conceptually complex and fairly expensive cryptographic primitive. For example, the evaluation of the Naor-Reingold pseudorandom function on an input requires modular exponentiations. Similarly, the generic construction [5] requires PRG doublings of the input string where is the length of the key.

In this article we take an alternative approach to the design of provably secure stegosystems. Our main contribution is the design of a building block that we call a one-time stegosystem: this is a steganographic protocol that is meant to be used for a single message transmission and is proven secure in an information-theoretic sense, provided that the key that is shared between the sender and the receiver is of sufficient length (this length analysis is part of our result). In particular we show that we can securely transmit an bit message with a key of length ; here is the size of the channel alphabet (see Section 3.4 for more details regarding the exact complexity). Our basic building block is a natural analogue of a one time-pad for steganography. It is based on the rejection sampling technique outlined above in combination with an explicit almost -wise independent [1] family of functions. We note that such combinatorial constructions have been extremely useful for derandomization methods and here, to the best of our knowledge, are employed for the first time in the design of steganographic protocols. Given a one-time stegosystem, it is fairly straightforward to construct provably secure steganographic encryption for longer messages by using a pseudorandom generator (PRG) to stretch a random seed that is shared by the sender and the receiver to sufficient length.

The resulting stegosystem is provably secure in the computational sense of Hopper et al. [6] and is in fact much more efficient: in particular, while the Hopper, et al. stegosystem requires 2 evaluations per bit of a pseudorandom function, amounting to a linear (in the key-size) number of applications of the underlying PRG (in the standard construction for pseudorandom functions of [5]), in our stegosystem we require per bit a constant number of PRG applications.

## 2 Definitions and Tools

We say that a function is negligible if for every positive polynomial there exists an such that for all , .

We let denote an alphabet and treat the channel, which will be used for data transmission, as a family of random variables ; each is supported on . These channel distributions model a history-dependent notion of channel data: if have been sent along the channel thus far, determines the distribution of the next channel element.

###### Definition 1.

A one-time stegosystem consists of three probabilistic polynomial time algorithms

 S=(SK,SE,SD)

where:

• is the key generation algorithm; we write . It takes as input, the security parameter and the length of the message and produces a key of length . (We typically assume that is a monotonically increasing function of .)

• is the embedding procedure, which can access the channel; . It takes as input the length of the message , the key , a message to be embedded, and the history of previously drawn covertexts. The output is the stegotext .

• is the extraction procedure; . It takes as input , , and some . The output is a message or the token fail.

Recall that the min entropy of a random variable , taking values in a set , is the quantity

 H∞(X)≜minv∈V(−logPr[X=v]).

We say that a channel has min entropy if for all , .

###### Definition 2 (Soundness).

A stegosystem is said to be -sound provided that for all channels of minimum entropy ,

 ∀m∈Mn,Pr[SD(1n,k,SE(1κ,k,m,h))≠m∣k←SK(1n,log(1/ϵsec))]≤ s(κ).

One-time stegosystem security is based on the indistinguishability between a transmission that contains a steganographically embedded message and a transmission that contains no embedded messages. An adversary against a one-time stegosystem is a pair of algorithms , that plays the following game, denoted :

1. A key is generated by .

2. Algorithm receives as input the length of the message and outputs a triple , where is some additional information that will be passed to . is provided access to via an oracle , which takes the history as input.

, on input , returns to an element selected according to .

3. A bit is chosen uniformly at random.

• If let , so is a stegotext.

• If let , where denotes string concatenation and .

4. The input for is , , and . outputs a bit . If then we say that succeeded and write .

The probability includes the coin tosses of and , as well as the coin tosses of . The (information-theoretic) insecurity of the stegosystem is defined as

this maximum taken over all (time unbounded) adversaries .

###### Definition 3.

(Security) We say that a stegosystem is -secure if for all channels with min entropy we have .

### 2.1 Error-correcting Codes

Our steganographic construction requires an efficient family of codes that can recover from errors introduced by certain binary symmetric channels. In particular, we require an efficient version of the Shannon coding theorem [11, 10]. For an element , we let be the random variable equal to , where is a random error vector defined by independently assigning each with probability . (Here denotes the vector with th coordinate equal to .)

The classical coding theorem asserts that for every pair of real numbers and , there is a binary code , with , so that for each , maximum-likelihood decoding recovers from with probability , where

 H(p)=plogp−1+(1−p)log(1−p)−1=1−C.

The quantity (determined by ), is the capacity of the binary symmetric channel induced by ; the quantity is the rate of the code . In this language, the coding theorem asserts that at rates lower than capacity, codes exist that correct random errors with exponentially decaying failure probability.

We formalize our requirements below:

###### Definition 4.

An error-correcting code is a pair of functions , where is the encoding function and the corresponding decoding function. Specifically, we say that is a -code if for all ,

 Pr[Dec(Enc(m)⊕e)=m]≥1−ϵ

where and each is independently distributed in so that . We say that E is efficient if both and are computable in polynomial time.

###### Proposition 1.

Let lie in the interval , , and . Let be a message length for which Then there is an efficient family of -error-correcting codes for which

 ϵ(n)≤e−4n/lognandℓ(n)≤(1+57/3√τ2logn)2n/R′.
###### Proof.

This is a consequence of Forney’s [3] efficient realizations of the Shannon coding theorem [11, 10]; we work out the technical details in the full version of the paper.

We refer to [12, 4] for detailed discussions of error-correcting codes over binary symmetric channels.

### 2.2 Function Families and Almost t-wise Independence

We will employ the notion of (almost) -wise independent function families (cf. [1], [8]).

###### Definition 5.

A family of Boolean functions on is said to be -away from -wise independent or -independent if for any distinct domain elements we have

 ∑α∈{0,1}t∣∣∣Prf[fk(q1)fk(q2)⋯fk(qt)=α]−12t∣∣∣≤ϵ, (1)

where chosen uniformly from .

The above is equivalent to the following formulation quantified over all computationally unbounded adversaries :

 ∣∣Prf\lx@stackrelr←F[GAf[t](1κ)=1]−Prf\lx@stackrelr←R[GAf[t](1κ)=1]∣∣≤ϵ, (2)

where is the collection of all functions from to and is an unbounded adversary that is allowed to determine up to queries to the function before he outputs his bit.

###### Lemma 2.

is -away from -wise independence according to equation (1) if and only if is -away from -wise independence according to equation (2) above.

We employ the construction of almost -wise independent sample spaces given by [8], [1].

###### Theorem 3 ([8], [1]).

There exist families of Boolean functions on that are -away from -wise independent, are indexed by keys of length , and are computable in polynomial time.

### 2.3 Rejection Sampling

A common method used in steganography employing a channel distribution is that of rejection sampling (cf. [2, 6]). Assuming that one wishes to transmit a single bit and employs a random function that is secret from the adversary, one performs the following “rejection sampling” process:

rejsam

if
then
Output:

Here, as above, denotes the output alphabet of the channel, denotes the history of the channel data at the start of the process, and denotes the distribution on given by the channel after history . The receiver (also privy to the function ) applies the function to the received message and recovers with probability greater than . The sender and the receiver may employ a joint state denoted by in the above process (e.g., a counter), that need not be secret from the adversary. Note that the above process performs only two draws from the channel with the same history (more draws could, in principle, be performed). These draws are assumed to be independent. One basic property of rejection sampling that we use is:

###### Lemma 4.

If is drawn uniformly at random from the collection of all functions and has min entropy , then

 Prf←R[f(rejsamfh(m))=m]≥12+τ,

where .

###### Proof.

Define the event to be

 E=[f(c1)=m]∨[f(c1)≠m∧f(c2)=m];

thus is the event that rejection sampling is successful for . Here are two independent random variables distributed according to the channel distribution and is determined by the history of channel usage. Recalling that is the support of the channel distribution , let denote the probability that occurs. As is chosen uniformly at random,

 Pr[f(c1)=m]=1/2.

Then , where is the event that . To bound , let denote the event that . Observe that conditioned on , occurs with probability exactly ; on the other hand, cannot occur simultaneously with . Thus

 Pr[E]=12+Pr[A∣D]⋅Pr[D]+Pr[A∣¯¯¯¯¯D]⋅Pr[¯¯¯¯¯D]=12+14Pr[D].

To bound , note that

 Pr[¯D]=∑ip2i≤maxipi∑ipi=maxipi

and hence that . Considering that , we have and the success probability is

 Pr[E]≥12+14⋅(1−pi)≥12+14(1−12δ)=12+τ,

where . ∎

## 3 The construction

In this section we outline our construction of a one-time stegosystem as an interaction between Alice (the sender) and Bob (the receiver). Alice and Bob wish to communicate over a channel with distribution . We assume that has min entropy , so that , . As above, let . For simplicity, we assume that the support of is of size .

### 3.1 A one-time stegosystem

Fix an alphabet for the channel and choose a message length and security parameter . Alice and Bob agree on the following:

An error-correcting code.

Let be an efficient -error-correcting code;

A pseudorandom function family.

Let be a function family that is -independent. We treat elements of as Boolean functions on and, for such a function we let denote the function .

We will analyze the stegosystem below in terms of arbitrary parameters , , and , relegating discussion of how these parameters determine the overall efficiency of the system to Section 3.4.

Key generation consists of selecting an element . Alice and Bob then communicate using the algorithms for embedding and for extracting as described in Figure 1.

In , after applying the error-correcting code , we use rejsam to obtain an element of the channel for each bit of the message. The resulting stegotext is denoted . In the received stegotext is parsed block by block by evaluating the key function at ; this results in a message bit. After performing this for each received block, a message of size is received, which is subjected to decoding via Dec. Note that we sample at most twice from the channel for each bit we wish to send. The error-correcting code is needed to recover from the errors introduced by this process. The detailed security and correctness analysis follow in the next two sections.

### 3.2 Correctness

We focus on the mapping between and determined by the procedure of the one-time stegosystem. In particular, for an initial history and a key function ,

recall that the covertext of the message is given by the procedure , described in Figure 2; here is the initial history. We remark now that the procedure defining samples at no more than points and that the family used in is -away from -wise independent. For a string and a function , let . If were chosen uniformly among all Boolean functions on then we could conclude from Lemma 4 above that each bit is independently recovered by this process with probability at least . As E is an -error-correcting code, this would imply that

 Prf←R[Rf(Pfh(m))=m]≥1−ϵenc.

This is a restatement of the correctness analysis of Hopper, et al [6]. Recalling that the procedure defining involves no more than samples of , condition (2) following Definition 5 implies that

 Prf←F[Rf(Pfh(m))=m]≥1−ϵenc−ϵF (3)

so long as is -independent. (We remark that as described above, the procedure depends on the behavior of channel; note, however, that if there were a sequence of channel distributions which violated (3) then there would be a fixed sequence of channel responses, and thus a deterministic process , which also violated (3).) To summarize

###### Lemma 5.

With and described as above, the probability that a message is recovered from the stegosystem is at least .

### 3.3 Security

In this section we argue about the security of our one-time stegosystem. First we will observe that the output of the rejection sampling function , with a truly random function , is indistinguishable from the channel distribution . (This is a folklore result implicit in previous work.) We then show that if is selected from a family that is -away from -wise independent, the advantage of an adversary to distinguish between the output of the protocol and is bounded above by . Let . First we characterize the probability distribution of the rejection sampling function:

###### Proposition 6.

The function is a random variable with probability distribution expressed by the following function: Let and . Let and . Then

 Pr[rejsamfh(m)=c]={pc⋅(1+missf(m))iff(c)=m,pc⋅missf(m)iff(c)≠m.
###### Proof.

Let and be the two (independent) samples drawn from during rejection sampling. (For simplicity, we treat the process as having drawn two samples even in the case where it succeeds on the first draw.) Note, now, that in the case where , the value is the result of the rejection sampling process precisely when and ; as these samples are independent, this occurs with probability .

In the case where , however, we observe whenever or and . As these events are disjoint, their union occurs with probability , as desired. ∎

###### Lemma 7.

For any , the random variable is perfectly indistinguishable from the channel distribution when is drawn uniformly at random from the space of .

###### Proof.

Let be a random function, as described in the statement of the lemma. Fixing the elements , and , we condition on the event , that . In light of Proposition 6, for any drawn under this conditioning we shall have that is equal to

 Prc′←Ch[c′=c]⋅missf(m)=pc⋅missf(m),

where we have written and . Conditioned on , then, the probability of observing is

 Ef[pc⋅missf(m)∣E≠]=pc(pc+12(1−pc)).

Letting be the event that , we similarly compute

 Ef[pc⋅missf(m)∣E=]=pc(1+12(1−pc)).

As , we conclude that the probability of observing is exactly

 12(pc(pc+1−pc2)+pc(1+1−pc2))=pc,

as desired. ∎

The following corollary follows immediately from the lemma above.

###### Corollary 8.

For any , the random variable is perfectly indistinguishable from the channel distribution when is drawn uniformly at random from the space of all Boolean functions on .

Having established the behavior of the rejection sampling function when a truly random function is used, we proceed to examine the behavior of rejection sampling in our setting where the function is drawn from a function family that is -away from -wise independence. In particular we will show that the insecurity of the defined stegosystem is characterized as follows:

###### Lemma 9.

The insecurity of the stegosystem of Section 3.1 is bound by , i.e., , where is the bias of the almost -wise independent function family employed; recall that is the stretching of the input incurred due to the error-correcting code.

###### Proof.

Let us play the following game with the adversary .

In each round we either select or :

) 1. 2. , 3. 4. = 5. 6. if then success
1. 2. , 3. 4. = 5. 6. if then success
and the lemma follows by the definition of insecurity. ∎

### 3.4 Putting it all together

The objective of this section is to integrate the results of the previous sections of the paper into one unifying theorem. As our system is built over two-sample rejection sampling, a process that faithfully transmits each bit with probability , we cannot hope to achieve rate exceeding

 R′=1−H(1/2+τ)=1−H(1/4+2−δ/4).

Indeed, as described in the theorem below, the system asymptotically converges to the rate of this underlying rejection sampling channel. (We remark that with sufficiently large channel entropy, one can draw more samples during rejection sampling without interfering with security; this can control the noise introduced by rejection sampling.)

###### Theorem 10.

For the stegosystem uses private keys of length no more than

 (2+o(1))[λ(n)+log1/ϵF+logloglog|Σ|]

and is both -sound and -secure. The length of the stegotext is

 λ(n)≤(1+1log(logn))2nR′,

where and .

###### Proof.

Let denote an alphabet and define the channel as a family of random variables ; each supported on . Also, the channel has min entropy , so that , . Fix an alphabet for the channel and choose a message length such that

 −log(1−4√104log(logn)3/logn)≤δ.

Under the assumption that the channel has min entropy , the binary symmetric channel induced by the rejection sampling process of Lemma 4 has transition probability no more than . We have an efficient error-correcting code as discussed in Section 2.1 that encodes messages of length as codewords of length

 λ(n)=(1+573√τ2logn)2nR′≤(1+1log(logn))2nR′ bits
 (2+o(1))[λ(n)+log1/ϵF+logloglog|Σ|]

random bits; these serve as the key for the stegosystem. In light of the conclusions of Lemma 9 and Lemma 5, this system achieves the -soundness and -security.∎

For concreteness, we record two corollaries:

###### Corollary 11.

There exists a function so that the stego system , using private keys of length no more than

 O(n+log|Σ|+log1/ϵF),

is both -sound and -secure. Here, the length of the stegotext is

 λ(n)=(1+o(1))nR′

where .

###### Corollary 12.

For any constant , the stegosystem uses private keys of length and transmits no more than symbols.

## 4 A provably secure stegosystem for longer messages

In this section we show how to apply the “one-time” stegosystem of Section 3.1 together with a pseudorandom number generator so that longer messages can be transmitted.

###### Definition 6.

Let denote the uniform distribution over . A polynomial time deterministic program is a pseudorandom generator (PRG) if the following conditions are satisfied:

Variable output

For all seeds and , and, furthermore, is a prefix of .

Pseudorandomness

For every polynomial the set of random variables is computationally indistinguishable from the uniform distribution .

Note that there is a procedure that if it holds that (i.e., if one maintains , one can extract the bits that follow the first bits without starting from the beginning). For a PRG , if is some statistical test, then we define the advantage of over the PRNG as follows:

The insecurity of the PRNG is then defined

Note that typically in PRGs there is a procedure as well as the process produces some auxiliary data of small length so that the rightmost bits of may be sampled directly as . Consider now the following stegosystem that can be used for arbitrary many and long messages and employs a PRG and the one-time stegosystem of Section 3.1. The two players Alice and Bob, share a key of length denoted by . They also maintain a state that holds the number of bits that have been transmitted already as well the auxiliary information (initially empty). The function is given input where is the message to be transmitted. in turn employs the PRG to extract a number of bits as follows . The length is selected to match the number of key bits that are required to transmit the message using the one-time stegosystem of section 3.1. Once the key is produced by the PRG the procedure invokes the one-time stegosystem on input . After the transmission is completed the history , the count , as well as the auxiliary PRG information are updated accordingly. The function is defined in a straightforward way based on .

###### Theorem 13.

The stegosystem is provably secure in the model of [6] (universally steganographically secret against chosen hiddentext attacks); in particular

 InSecSSS′(t,q,l)≤InSecPRG(t+γ(ℓ(l)),ℓ(l)+polylog(l))

(where is the time required by the adversary, is the number of chosen hiddentext queries it makes, is the total number of bits across all queries and is the time required to simulate the oracle for bits).

### 4.1 Performance Comparison of the Stegosystem S′ and the Hopper, Langford, von Ahn System

The system of Hopper, et al. [6] concerns a situation where the min entropy of all is at least 1 bit. In this case, we may select an -error-correcting code E. Then the system of Hopper, et al. correctly decodes a given message with probability at least and makes no more than calls to a pseudorandom function family. Were one to use the pseudorandom function family of Goldreich, Goldwasser, and Micali [5], then this involves production of pseudorandom bits, where is the security parameter of the pseudorandom function family. Of course, the security of the system depends on the security of the underlying pseudorandom generator. On the other hand, with the same error-correcting code, the steganographic system described above utilizes pseudorandom bits, correctly decodes a given message with probability , and possesses insecurity no more than . In order to compare the two schemes, note that by selecting , both the decoding error and the security of the two systems differ by at most , a negligible function in terms of the security parameter . (Note also that pseudorandom functions utilized in the above scheme have security no better than with security parameter .) In this case, the number of pseudorandom bits used by our system,

 (2+o(1))[λ(n)+log1/ϵF+logloglog|Σ|],

is a dramatic improvement over the bits of the scheme above.

### References

1. Noga Alon, Oded Goldreich, Johan Håstad, and René Peralta. Simple construction of almost k-wise independent random variables. Random Struct. Algorithms, 3(3):289–304, 1992.
2. Christian Cachin. An information-theoretic model for steganography. In Information Hiding, pages 306–318, 1998.
3. G. D. Forney, Jr. Concatenated Codes. Research Monograph No. 37. MIT Press, 1966.
4. R. G. Gallager. A simple derivation of the coding theorem and some applications. IEEE Transactions on Information Theory, IT-11:3–18, Jan. 1965.
5. Oded Goldreich, Shafi Goldwasser, and Silvio Micali. How to construct random functions. J. ACM, 33(4):792–807, 1986.
6. Nicholas J. Hopper, John Langford, and Luis von Ahn. Provably secure steganography. In CRYPTO, pages 77–92, 2002.
7. Thomas Mittelholzer. An information-theoretic approach to steganography and watermarking. In Information Hiding, pages 1–16, 1999.
8. Joseph Naor and Moni Naor. Small-bias probability spaces: Efficient constructions and applications. SIAM J. Comput., 22(4):838–856, 1993.
9. Moni Naor and Omer Reingold. Number-theoretic constructions of efficient pseudo-random functions. J. ACM, 51(2):231–262, 2004.
10. C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423 and 623–656, July and October, 1948.
11. C. E. Shannon and W. Weaver. The Mathematical Theory of Communication. University of Illinois Press, Urbana, Illinois, 1949.
12. J. H. van Lint. Introduction to Coding Theory. Number 86 in Graduate Texts in Mathematics. Springer-Verlag, 3rd edition edition, 1998.
13. Jan Zöllner, Hannes Federrath, Herbert Klimant, Andreas Pfitzmann, Rudi Piotraschke, Andreas Westfeld, Guntram Wicke, and Gritta Wolf. Modeling the security of steganographic systems. In Information Hiding, pages 344–354, 1998.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters