Evaluating Quality of Chaotic Pseudo-Random Generators: Application to Information Hiding

Evaluating Quality of Chaotic Pseudo-Random Generators: Application to Information Hiding

Abstract

Guaranteeing the security of information transmitted through the Internet, against passive or active attacks, is a major concern. The discovery of new pseudo-random number generators with a strong level of security is a field of research in full expansion, due to the fact that numerous cryptosystems and data hiding schemes are directly dependent on the quality of these generators. At the conference Internet’09, we described a generator based on chaotic iterations which behaves chaotically as defined by Devaney. In this paper which is an extension of the work presented at the conference Internet‘10, the proposal is to improve the speed, the security, and the evaluation of this generator, to make its use more relevant in the Internet security context. In order to do so, a comparative study between various generators is carried out and statistical results are improved. Finally, an application in the information hiding framework is presented with details, to give an illustrative example of the use of such a generator in the Internet security field.

{IEEEkeywords}

Internet security; Pseudo-random number generator; Chaotic sequences; Statistical tests; Discrete chaotic iterations; Information hiding.

\IEEEpeerreviewmaketitle

1 Introduction

Due to the rapid development of the Internet in recent years, the need to find new tools to reinforce trust and security through the Internet has become a major concern. Its recent role in everyday life implies the need to protect data and privacy in digital world. This extremely rapid development of the Internet brings more and more attention to the information security techniques in all kinds of applications. For example, new security concerns have recently appeared because of the evolution of the Internet to support such activities as e-Voting, VoD (Video on demand), and the protection of intellectual property. In all these emerging techniques, pseudo-random number generators (PRNG) play an important role, because they are fundamental components of almost all cryptosystems and information hiding schemes [1, 2]. PRNGs are typically defined by a deterministic recurrent sequence in a finite state space, usually a finite field or ring, and an output function mapping each state to an input value. Following [3], this value is often either a real number in the interval or an integer in some finite range. PRNGs based on linear congruential methods and feedback shift-registers are popular for historical reasons [4], but their security level often has been revealed to be inadequate by today’s standards. However, to use a PRNG with a high level of security is a necessity to protect the information contents sent through the Internet. This level depends both on theoretical properties and on statistical tests.

Many PRNGs have already been proven to be secure following a probabilistic approach [5, 6, 7]. However, their performances must regularly be improved, among other things by using new mathematical tools. This is why the idea of using chaotic dynamical systems for this purpose has recently been explored [8, 9]. The random-like and unpredictable dynamics of chaotic systems, their inherent determinism and simplicity of realization suggest their potential for exploitation as PRNGs. Such generators can strongly improve the confidence put in any information hiding scheme and in cryptography in general: due to their properties of unpredictability, the possibilities offered to an attacker to achieve his goal are drastically reduced in that context. For example, in cryptography, keys are needed to be unpredictable enough, to make sure any search optimization based on the reduction of the key space to the most probable values is impossible to work on. But the number of generators claimed as chaotic, which actually have been proven to be unpredictable (as it is defined in the mathematical theory of chaos) is very small.

2 Outline of our Work

This paper extends the study initiated in [10, 11, 12], and tries to fill this gap. In [11], it is mathematically proven that chaotic iterations (CIs), a suitable tool for fast computing distributed algorithms, satisfies the topological chaotic property, following the definition given by Devaney [13]. In the paper [12] presented at Internet’09, the chaotic behavior of CIs is exploited in order to obtain an unpredictable PRNG that depends on two logistic maps. We have shown that, in addition to being chaotic, this generator can pass the NIST (National Institute of Standards and Technology of the U.S. Government) battery of tests [14], widely considered as a comprehensive and stringent battery of tests for cryptographic applications. In this paper, which is an extension of [10], we have improved the speed, security, and evaluation of the former generator and of its application in information hiding. Chaotic properties, statistical tests, and security analysis [15] allow us to consider that this generator has good characteristics and is capable to withstand attacks. After having presented the theoretical framework of the study and a security analysis, we will give a comparison based on statistical tests. Finally a concrete example of how to use these pseudo-random numbers for information hiding through the Internet is detailed.

The remainder of this paper is organized in the following way. In Section 3, some basic definitions concerning chaotic iterations and PRNGs are recalled. Then, the generator based on discrete chaotic iterations is presented in Section 4. Section 5 is devoted to its security analysis. In Section 6, various tests are passed with a goal to achieve a statistical comparison between this new PRNG and other existing ones. In Section 7, a potential use of this PRNG in some Internet security field is presented, namely in information hiding. The paper ends with a conclusion and intended future work.

3 Review of Basics

3.1 Notations

the term of a sequence
the component of a vector
    
composition of a function
strategy a sequence which elements belong in
the set of all strategies
the binomial coefficient
bitwise exclusive or
the integer addition
the usual shift operators
a metric space
a modulo or remainder operator
returns the highest integer smaller than
the factorial
the set of positive integers {1,2,3,…}

3.2 XORshift

XORshift is a category of very fast PRNGs designed by George Marsaglia [16]. It repeatedly uses the transform of exclusive or (XOR) on a number with a bit shifted version of it. The state of a XORshift generator is a vector of bits. At each step, the next state is obtained by applying a given number of XORshift operations to -bit blocks in the current state, where or . A XORshift operation is defined as follows. Replace the -bit block by a bitwise XOR of the original block, with a shifted copy of itself by positions either to the right or to the left, where . This Algorithm 3.2 has a period of .

{algorithm}\SetAlgoLined\KwIn

the internal state (a 32-bit word) \KwOut (a 32-bit word)   return An arbitrary round of XORshift algorithm

3.3 Continuous Chaos in Digital Computers

In the past two decades, the use of chaotic systems in the design of cryptosystems, pseudo-random number generators (PRNG), and hash functions, has become more and more frequent. Generally speaking, the chaos theory in the continuous field is used to analyze performances of related systems. However, when chaotic systems are realized in digital computers with finite computing precisions, it is doubtful whether or not they can still preserve the desired dynamics of the continuous chaotic systems. Because most dynamical properties of chaos are meaningful only when dynamical systems evolve in the continuous phase space, these properties may become meaningless or ambiguous when the phase space is highly quantized (i.e., latticed) with a finite computing precision (in other words, dynamical degradation of continuous chaotic systems realized in finite computing precision). When chaotic systems are realized in finite precision, their dynamical properties will be deeply different from the properties of continuous-value systems and some dynamical degradation will arise, such as short cycle length and decayed distribution. This phenomenon has been reported and analyzed in various situations [17, 18, 19, 20, 21].

Therefore, continuous chaos may collapse into the digital world and the ideal way to generate pseudo-random sequences is to use a discrete-time chaotic system.

3.4 Chaos for Discrete Dynamical Systems

Consider a metric space and a continuous function , for one-dimensional dynamical systems of the form:

(1)

the following definition of chaotic behavior, formulated by Devaney [13], is widely accepted:

Definition 1

A dynamical system of Form (1) is said to be chaotic if the following conditions hold.

  • Topological transitivity:

    (2)
  • Density of periodic points in :

    Let the set of periodic points of . Then is dense in :

    (3)
  • Sensitive dependence on initial conditions: and

When is chaotic, then the system is chaotic and quoting Devaney: “it is unpredictable because of the sensitive dependence on initial conditions. It cannot be broken down or decomposed into two subsystems which do not interact because of topological transitivity. And, in the midst of this random behavior, we nevertheless have an element of regularity.” Fundamentally different behaviors are consequently possible and occur in an unpredictable way.

3.5 Discrete Chaotic Iterations

Definition 2

The set denoting , let be an “iteration” function and be a chaotic strategy. Then, the so-called chaotic iterations [22] are defined by and

(4)

In other words, at the iteration, only the th cell is “iterated”. Note that in a more general formulation, can be a subset of components and can be replaced by , where , describing for example delays transmission. For the general definition of such chaotic iterations, see, e.g., [22].

Chaotic iterations generate a set of vectors (Boolean vector in this paper), they are defined by an initial state , an iteration function , and a chaotic strategy .

The next section gives the outline proof that chaotic iterations satisfy Devaney’s topological chaos property. Thus they can be used to define a chaotic pseudo-random bit generator.

4 The Generation of CI Pseudo-Random Sequence

4.1 A Theoretical Proof for Devaney’s Chaotic Dynamical Systems

The outline proofs, of the properties on which our pseudo-random number generator is based, are given in this section.

Denote by the discrete Boolean metric, Given a function , define the function such that

where + and . are the Boolean addition and product operations.

Consider the phase space: and the map

then the chaotic iterations defined in (3.5) can be described by the following iterations [11]

Let us define a new distance between two points by

where


It is then proven in [11] by using the sequential continuity that

Proposition 1

is a continuous function on .

Then, the vectorial negation satisfies the three conditions for Devaney’s chaos, namely, regularity, transitivity, and sensitivity in the metric space . This leads to the following result.

Proposition 2

is a chaotic map on in the sense of Devaney.

4.2 Chaotic Iterations as Pseudo-Random Generator

Presentation

The CI generator (generator based on chaotic iterations) is designed by the following process. First of all, some chaotic iterations have to be done to generate a sequence (, is not necessarily equal to 32) of Boolean vectors, which are the successive states of the iterated system. Some of these vectors will be randomly extracted and our pseudo-random bit flow will be constituted by their components. Such chaotic iterations are realized as follows. Initial state is a Boolean vector taken as a seed (see Section 4.2.2) and chaotic strategy is an irregular decimation of a XORshift sequence (Section 4.2.4). The iterate function is the vectorial Boolean negation:

At each iteration, only the -th component of state is updated, as follows: if , else . Finally, some are selected by a sequence as the pseudo-random bit sequence of our generator. is computed from a XORshift sequence (see Section 4.2.3). So, the generator returns the following values:
Bits:

or States:

The seed

The unpredictability of random sequences is established using a random seed that is obtained by a physical source like timings of keystrokes. Without the seed, the attacker must not be able to make any predictions about the output bits, even when all details of the generator are known [23].

The initial state of the system and the first term of the XORshift are seeded either by the current time in seconds since the Epoch, or by a number that the user inputs. Different ways are possible. For example, let us denote by the decimal part of the current time. So can be written in binary digits and .

Sequence of returned states

The output of the sequence is uniform in , because it is produced by a XORshift generator. However, we do not want the output of to be uniform in , because in this case, the returns of our generator will not be uniform in , as it is illustrated in the following example. Let us suppose that . Then .

  • If , then no bit will change between the first and the second output of our PRNG. Thus .

  • If , then exactly one bit will change, which leads to three possible values for , namely , , and .

  • etc.

As each value in must be returned with the same probability, then the values , , , and must occur for with the same probability. Finally we see that, in this example, must be three times more probable than . This leads to the following general definition for :

(5)

In order to evaluate our proposed method and compare its statistical properties with various other methods, the density histogram and intensity map of adjacent outputs have been computed. The length of is bits, and the initial conditions and control parameters are the same. A large number of sampled values are simulated ( samples). Figure 1(a) shows the intensity map for . In order to appear random, the histogram should be uniformly distributed in all areas. It can be observed that a uniform histogram and a flat color intensity map are obtained when using our scheme. Another illustration of this fact is given by Figure 1(b), whereas its uniformity is further justified by the tests presented in Section 6.

(a)
(b)
Figure 1: Histogram and intensity maps

Chaotic strategy

The chaotic strategy is generated from a second XORshift sequence . The only difference between the sequences and is that some terms of are discarded, in such a way that does not contain any given integer twice, where . Therefore, no bit will change more than once between two successive outputs of our PRNG, increasing the speed of the former generator by doing so. is said to be “an irregular decimation” of . This decimation can be obtained by the following process.

Let be a mark sequence, such that whenever , then (, the sequence is reset when contains times the number 1). This mark sequence will control the XORshift sequence as follows:

  • if , then , , and ,

  • if , then is discarded.

For example, if and , then However, if we do not use the mark sequence, then one position may change more than once and the balance property will not be checked, due to the fact that . As an example, for and as in the previous example, and lead to the same outputs (because switching the same bit twice leads to the same state).

To check the balance property, a set of 500 sequences are generated with and without decimation, each sequence containing bits. Figure 2 shows the percentages of differences between zeros and ones, and presents a better balance property for the sequences with decimation. This claim will be verified in the tests section (Section 6).

Another example is given in Table 1, in which means “reset” and the integers which are underlined in sequence are discarded.

Figure 2: Balance property

4.3 CI(XORshift, XORshift) Algorithm

The basic design procedure of the novel generator is summed up in Algorithm 4.3. The internal state is , the output state is . and are those computed by the two XORshift generators. The value is an integer, defined as in Equation 5. Lastly, is a constant defined by the user. {algorithm} \SetAlgoLined\KwInthe internal state ( bits) \KwOuta state of bits \For \For \If \ElseIf   return An arbitrary round of the new CI(XORshift,XORshift) generator

As a comparison, the basic design procedure of the old generator is recalled in Algorithm 4.3 ( and are computed by logistic maps, and are constants defined by the user). See [12] for further information.

{algorithm}\SetAlgoLined\KwIn

the internal state ( bits) \KwOuta state of bits \If \Else

\For   return An arbitrary round of the old CI PRNG

4.4 Illustrative Example

In this example, is chosen for easy understanding. As stated before, the initial state of the system can be seeded by the decimal part of the current time. For example, if the current time in seconds since the Epoch is 1237632934.484088, so , then in binary digits, i.e., .

To compute sequence, Equation 5 can be adapted to this example as follows:

(6)

where is generated by XORshift seeded with the current time. We can see that the probabilities of occurrences of , , , , , are , , , , , respectively. This determines what will be the next output . For instance,

  • If , the following will be .

  • If , the following can be , , , or .

  • If , the following can be , , , , , or .

  • If , the following can be , , , or .

  • If , the following will be .

In this simulation, Additionally, is computed with a XORshift generator too, but with another seed. We have found

Chaotic iterations are made with initial state , vectorial logical negation , and strategy . The result is presented in Table 1. Let us recall that sequence gives the states to return, which are here So, in this example, the output of the generator is: 10100111101111110011… or 4,4,11,8,1…

0 4 2 2
0 4 2 2
1 4 2 2 3 3 4 1 1 4
r
1 4 2 3 3 4 1 4
0 0 1 1 0
1 1 0 0 0
0 0 1 0 0
0 0 1 0 1

Binary Output:
Integer Output:

Table 1: Example of New CI(XORshift,XORshift) generation

5 Security Analysis

PRNG should be sensitive with respect to the secret key and its size. Here, chaotic properties are also in close relation with the security.

5.1 Key Space

The PRNG proposed in this paper is based on discrete chaotic iterations. It has an initial value . Considering this set of initial values alone, the key space size is equal to . In addition, this new generator combines digits of two other PRNGs. We used two different XORshifts here. Let be the key space of XORshift, so the total key space size is close to . Lastly, the impact of Equation 5, in which is defined the sequence with a selector function , must be taken into account. This leads to conclude that the key space size is large enough to withstand attacks.

Let us notice, to conclude this subsection, that our PRNG can use any reasonable function as selector. In this paper, and are adopted for demonstration purposes, where:

(7)

We will show later that both of them can pass all of the performed tests.

5.2 Key Sensitivity

As a consequence of its chaotic property, this PRNG is highly sensitive to the initial conditions. To illustrate this fact, several initial values are put into the chaotic system. Let be the number of differences between the sequences obtained in this way. Suppose is the length of these sequences. Then the variance ratio , defined by , is computed. The results are shown in Figure 3 ( axis is sequence lengths, axis is variance ratio ). For the two PRNGs, variance ratios approach , which indicates that the system is extremely sensitive to the initial conditions.

Figure 3: Sensitivity analysis

5.3 Linear Complexity

The linear complexity (LC) of a sequence is the size in bits of the shortest linear feedback shift register (LFSR) which can produce this sequence. This value measures the difficulty of generating – and perhaps analyzing – a particular sequence. Indeed, the randomness of a given sequence can be linked to the size of the smallest program that can produce it. LC is the size required by a LFSR to be able to produce the given sequence. The Berlekamp-Massey algorithm can measure this LC, which can be used to evaluate the security of a pseudo-random sequence. It can be seen in Figure 4 that the LC curve of a sample sequence of 2000 bits is close to the ideal line , which implies that the generator has high linear complexity.

Figure 4: Linear complexity

5.4 Devaney’s Chaos Property

Generally speaking, the quality of a PRNG depends, to a large extent, on the following criteria: randomness, uniformity, independence, storage efficiency, and reproducibility. A chaotic sequence may satisfy these requirements and also other chaotic properties, as ergodicity, entropy, and expansivity. A chaotic sequence is extremely sensitive to the initial conditions. That is, even a minute difference in the initial state of the system can lead to enormous differences in the final state, even over fairly small timescales. Therefore, chaotic sequence fits the requirements of pseudo-random sequence well. Contrary to XORshift, our generator possesses these chaotic properties [11],[12]. However, despite a large number of papers published in the field of chaos-based pseudo-random generators, the impact of this research is rather marginal. This is due to the following reasons: almost all PRNG algorithms using chaos are based on dynamical systems defined on continuous sets (e.g., the set of real numbers). So these generators are usually slow, requiring considerably more storage space, and lose their chaotic properties during computations as mentioned earlier in this paper. These major problems restrict their use as generators [24].

In this paper, we do not simply integrate chaotic maps hoping that the implemented algorithm remains chaotic. Indeed, the PRNG we conceive is just discrete chaotic iterations and we have proven in [11] that these iterations produce a topological chaos as defined by Devaney: they are regular, transitive, and sensitive to initial conditions. This famous definition of a chaotic behavior for a dynamical system implies unpredictability, mixture, sensitivity, and uniform repartition. Moreover, as only integers are manipulated in discrete chaotic iterations, the chaotic behavior of the system is preserved during computations, and these computations are fast.

Let us now explore the topological properties of our generator and their consequences concerning the quality of the generated pseudo-random sequences.

5.5 Topological Consequences

We have proven in [25] that chaotic iterations are expansive and topologically mixing. These topological properties are inherited by the generators we presented here. In particular, any error on the seed are magnified until being equal to the constant of expansivity. We will now investigate the consequences of being chaotic, as defined by Devaney.

First of all, the transitivity property implies the indecomposability of the system:

Definition 3

A dynamical system is indecomposable if it is not the union of two closed sets such that .

Thus it is impossible to reduce the set of the outputs generated by our PRNG, in order to reduce its complexity. Moreover, it is possible to show that Old and New CI generators are strongly transitive:

Definition 4

A dynamical system is strongly transitive if .

In other words, for all , it is possible to find a point in the neighborhood of such that an iterate is . Indeed, this result has been established during the proof of the transitivity presented in [11]. Among other things, the strong transitivity property leads to the fact that without the knowledge of the seed, all of the outputs are possible. Additionally, no point of the output space can be discarded when studying our PRNG: it is intrinsically complicated and it cannot be simplified.

Finally, these generators possess the instability property:

Definition 5

A dynamical system is unstable if for all , the orbit is unstable, that is: and

This property, which is implied by the sensitive dependence to the initial condition, leads to the fact that in all of the neighborhoods of any , there are points that are separate from under iterations of . We thus can claim that the behavior of our generators is unstable.

6 Statistical Analysis

6.1 Basic Common Tests

Comparative test parameters

In this section, five well-known statistical tests [26] are used as comparison tools. They encompass frequency and autocorrelation tests. In what follows, denotes a binary sequence of length . The question is to determine whether this sequence possesses some specific characteristics that a truly random sequence would be likely to exhibit. The tests are introduced in this subsection and results are given in the next one.

Frequency test (monobit test) The purpose of this test is to check if the numbers of 0’s and 1’s are approximately equal in , as it would be expected for a random sequence. Let denote these numbers. The statistic used here is:

which approximately follows a distribution with one degree of freedom when .

Serial test (2-bit test) The purpose of this test is to determine if the number of occurrences of 00, 01, 10, and 11 as subsequences of are approximately the same. Let , and denote the number of occurrences of , and respectively. Note that since the subsequences are allowed to overlap. The statistic used here is:

which approximately follows a distribution with 2 degrees of freedom if .

Poker test The poker test studies if each pattern of length (without overlapping) appears the same number of times in . Let and . Divide the sequence into non-overlapping parts, each of length . Let be the number of occurrences of the type of sequence of length , where . The statistic used is

which approximately follows a distribution with degrees of freedom. Note that the poker test is a generalization of the frequency test: setting in the poker test yields the frequency test.

Runs test The purpose of the runs test is to figure out whether the number of runs of various lengths in the sequence is as expected for a random sequence. A run is defined as a pattern of all zeros or all ones, a block is a run of ones, and a gap is a run of zeros. The expected number of gaps (or blocks) of length in a random sequence of length is . Let be equal to the largest integer such that . Let be the number of blocks and gaps of length in , for each . The statistic used here will then be:

which approximately follows a distribution with degrees of freedom.

Autocorrelation test The purpose of this test is to check for coincidences between the sequence and (non-cyclic) shifted versions of it. Let be a fixed integer, . The value is the amount of bits not equal between the sequence and itself displaced by bits. The statistic used here is:

which approximately follows a normal distribution if . Since small values of are as unexpected as large values, a two-sided test should be used.

Comparison

Method Monobit () Serial () Poker () Runs () Autocorrelation () Time
Logistic map 0.1280 0.1302 240.2893 26.5667 0.0373 0.965s
XORshift 1.7053 2.1466 248.9318 18.0087 0.5009 0.096s
Old CI(Logistic, Logistic) 1.0765 1.0796 258.1069 20.9272 1.6994 0.389s
New CI(XORshift,XORshift) 0.3328 0.7441 262.8173 16.7877 0.0805 0.197s
Table 2: Comparison with Old CI(Logistic, Logistic) for a bits sequence

We show in Table 2 a comparison among our new generator CI(XORshift, XORshift), its old version denoted Old CI(Logistic, Logistic), a basic PRNG based on logistic map, and a simple XORshift. In this table, time (in seconds) is related to the duration needed by each algorithm to generate a bits long sequence. The test has been conducted using the same computer and compiler with the same optimization settings for both algorithms, in order to make the test as fair as possible. The results confirm that the proposed generator is a lot faster than the old one, while the statistical results are better for most of the parameters, leading to the conclusion that the new PRNG is more secure than the old one. Although the logistic map also has good results, it is too slow to be implemented in Internet applications, and this map is known to present various bias leading to severe security issues.

Figure 5: Comparison of monobits tests

As a comparison of the overall stability of these PRNGs, similar tests have been computed for different sequence lengths (see Figures 5 - 9). For the monobit test comparison (Figure 5), almost all of the PRNGs present the same issue: the beginning values are a little high. However, for our new generator, the values are stable in a low level which never exceeds 1.2. Indeed, the new generator distributes very randomly the zeros and ones, whatever the length of the desired sequence. It can also be remarked that the old generator presents the second best performance, due to its use of chaotic iterations.

Figure 6: Comparison of serial tests

Figure 6 shows the serial test comparison. The new generator outperforms this test, but the score of the old generator is not bad either: their occurrences of 00, 01, 10, and 11 are very close to each other.

Figure 7: Comparison of poker tests

The poker test comparison with is shown in Figure 7. XORshift is the most stable generator in all of these tests, and the logistic map also becomes good when producing sequences of length greater than . Our old and new generators present a similar trend, with a maximum in the neighborhood of . These scores are not so good, even though the new generator has a better behavior than the old one. Indeed, the value of and the length of the sequences should be enlarged to be certain that the chaotic iterations express totally their complex behavior. In that situation, the performances of our generators in the poker test can be improved.

Figure 8: Comparison of runs tests

The graph of the new generator is the most stable one during the runs test comparison (Figure 8). Moreover, this trend is reinforced when the lengths of the tested sequences are increased.

Figure 9: Comparison of autocorrelation tests

The comparison of autocorrelation tests is presented in Figure 9. The new generator clearly dominates these tests, whereas the score of the old generator is surprisingly bad. This difference between two generators based on chaotic iterations can be explained by the fact that the improvements realized to define the new generator lead to a more randomly output.

To sum up we can claim that the new generator, which is faster than its former version, outperforms all of the other generators in these statistical tests, especially when producing long output sequences.

6.2 NIST Statistical Test Suite

Presentation

Among the numerous standard tests for pseudo-randomness, a convincing way to prove the quality of the produced sequences is to confront them with the NIST (National Institute of Standards and Technology) Statistical Test Suite SP 800-22, released by the Information Technology Laboratory in August 25, 2008.

The NIST test suite, SP 800-22, is a statistical package consisting of 15 tests. They were developed to measure the randomness of (arbitrarily long) binary sequences produced by either hardware or software based cryptographic pseudorandom number generators. These tests focus on a variety of different types of non-randomness that could occur in such sequences. These 15 tests include in the NIST test suite are described in the Appendix.

Interpretation of empirical results

is the “tail probability” that the chosen test statistic will assume values that are equal to or worse than the observed test statistic value when considering the null hypothesis. For each statistical test, a set of s is produced from a set of sequences obtained by our generator (i.e., 100 sequences are generated and tested, hence 100 s are produced).

Empirical results can be interpreted in various ways. In this paper, we check whether the s are uniformly distributed, via an application of a distribution and the determination of a corresponding to the Goodness-of-Fit distributional test on the s obtained for an arbitrary statistical test.

If , then the sequences can be considered to be uniformly distributed. In our experiments, 100 sequences (s = 100) of 1,000,000 bits are generated and tested. If the value of a least one test is smaller than 0.0001, the sequences are considered to be not good enough and the generator is unsuitable.

Table 3 shows for the sequences based on discrete chaotic iterations using different schemes. If there are at least two statistical values in a test, this test is marked with an asterisk and the average is computed to characterize the statistical values.

We can conclude from Table 3 that the worst situations are obtained with the New CI () and New CI (no mark) generators. Old CI, New CI (), and New CI () have successfully passed the NIST statistical test suite. These results and the conclusion obtained from the aforementioned basic tests reinforce the confidence that can be put in the good behavior of chaotic CI PRNGs, thus making them suitable for security applications as information hiding and digital watermarking.

7 Application Example in Information Hiding

7.1 Introduction

Information hiding is now an integral part of Internet technologies. In the field of social search engines, for example, contents like pictures or movies are tagged with descriptive labels by contributors, and search results are determined by these descriptions. These collaborative taggings, used for example in Flickr [27] and Delicious [28] websites, contribute to the development of a Semantic Web, in which any Web page contains machine-readable metadata that describe its content. Information hiding technologies can be used for embedding these metadata. The advantage of its use is the possibility to realize social search without websites and databases: descriptions are directly embedded into media, whatever their formats. Robustness is required in this situation, as descriptions should resist to modifications like resizing, compression, and format conversion.

The Internet security field is also concerned by watermarking technologies. Steganography and cryptography are supposed to be used by terrorists to communicate through the Internet. Furthermore, in the areas of defense or in industrial espionage, many information leaks using steganographic techniques have been reported. Lastly, watermarking is often cited as a possible solution to digital rights managements issues, to counteract piracy of digital work in an Internet based entertainment world [29].

7.2 Definition of a Chaos-Based Information Hiding Scheme

Let us now introduce our information hiding scheme based on CI generator.

Most and least significant coefficients

Let us define the notions of most and least significant coefficients of an image.

Definition 1

For a given image, most significant coefficients (in short MSCs), are coefficients that allow the description of the relevant part of the image, i.e., its richest part (in terms of embedding information), through a sequence of bits.

For example, in a spatial description of a grayscale image, a definition of MSCs can be the sequence constituted by the first four bits of each pixel (see Figure 10). In a discrete cosine frequency domain description, each block of the carrier image is mapped onto a list of 64 coefficients. The energy of the image is mostly contained in a determined part of themselves, which can constitute a possible sequence of MSCs.

Definition 2

By least significant coefficients (LSCs), we mean a translation of some insignificant parts of a medium in a sequence of bits (insignificant can be understand as: “which can be altered without sensitive damages”).

These LSCs can be, for example, the last three bits of the gray level of each pixel (see Figure 10). Discrete cosine, Fourier, and wavelet transforms can be used also to generate LSCs and MSCs. Moreover, these definitions can be extended to other types of media.

(a) Lena.

(b) MSCs of Lena.

(c) LSCs of Lena ().

Figure 10: Example of most and least significant coefficients of Lena.

LSCs are used during the embedding stage. Indeed, some of the least significant coefficients of the carrier image will be chaotically chosen by using our PRNG. These bits will be either switched or replaced by the bits of the watermark. The MSCs are only useful in case of authentication; mixture and embedding stages depend on them. Hence, a coefficient should not be defined at the same time as a MSC and a LSC: the last can be altered while the first is needed to extract the watermark.

Stages of the scheme

Our CI generator-based information hiding scheme consists of two stages: (1) mixture of the watermark and (2) its embedding.

Watermark mixture Firstly, for security reasons, the watermark can be mixed before its embedding into the image. A first way to achieve this stage is to apply the bitwise exclusive or (XOR) between the watermark and the New CI generator. In this paper, we introduce a new mixture scheme based on chaotic iterations. Its chaotic strategy, which depends on our PRNG, will be highly sensitive to the MSCs, in the case of an authenticated watermarking.

Watermark embedding Some LSCs will be switched, or substituted by the bits of the possibly mixed watermark. To choose the sequence of LSCs to be altered, a number of integers, less than or equal to the number of LSCs corresponding to a chaotic sequence , is generated from the chaotic strategy used in the mixture stage. Thus, the -th least significant coefficient of the carrier image is either switched, or substituted by the bit of the possibly mixed watermark. In case of authentication, such a procedure leads to a choice of the LSCs that are highly dependent on the MSCs [30].

On the one hand, when the switch is chosen, the watermarked image is obtained from the original image whose LSBs are replaced by the result of some chaotic iterations. Here, the iterate function is the vectorial Boolean negation,

(8)

the initial state is , and the strategy is equal to . In this case, the whole embedding stage satisfies the topological chaos properties [30], but the original medium is required to extract the watermark. On the other hand, when the selected LSCs are substituted by the watermark, its extraction can be done without the original cover (blind watermarking). In this case, the selection of LSBs still remains chaotic because of the use of the New CI generator, but the whole process does not satisfy topological chaos [30]. The use of chaotic iterations is reduced to the mixture of the watermark. See the following sections for more detail.

Extraction The chaotic strategy can be regenerated even in the case of an authenticated watermarking, because the MSCs have not changed during the embedding stage. Thus, the few altered LSCs can be found, the mixed watermark can be rebuilt, and the original watermark can be obtained. In case of a switch, the result of the previous chaotic iterations on the watermarked image should be the original cover. The probability of being watermarked decreases when the number of differences increase.

If the watermarked image is attacked, then the MSCs will change. Consequently, in case of authentication and due to the high sensitivity of our PRNG, the LSCs designed to receive the watermark will be completely different. Hence, the result of the recovery will have no similarity with the original watermark.

The chaos-based data hiding scheme is summed up in Figure 11.

Figure 11: The chaos-based data hiding decision tree.

7.3 Application Example

Experimental protocol

In this subsection, a concrete example is given: a watermark is encrypted and embedded into a cover image using the scheme presented in the previous section and CI(XORshift, XORshift). The carrier image is the well-known Lena, which is a 256 grayscale image, and the watermark is the pixels binary image depicted in Figure 12.

(a) The original image
(b) The watermark
Figure 12: Original images
(a) Differences with the original
(b) Encrypted watermark
Figure 13: Encrypted watermark and differences

The watermark is encrypted by using chaotic iterations: the initial state is the watermark, considered as a Boolean vector, the iteration function is the vectorial logical negation, and the chaotic strategy is defined with CI(XORshift, XORshift), where initial parameters constitute the secret key and . Thus, the encrypted watermark is the last Boolean vector generated by these chaotic iterations. An example of such an encryption is given in Figure 13.

Let be the Booleans vector constituted by the three last bits of each pixel of Lena and defined by the sequence:

(9)

The watermarked Lena is obtained from the original Lena, whose three last bits are replaced by the result of chaotic iterations with initial state and strategy (see Figure 13).

The extraction of the watermark can be obtained in the same way. Remark that the map of the torus, which is the famous dyadic transformation (a well-known example of topological chaos [13]), has been chosen to make highly sensitive to the strategy. As a consequence, is highly sensitive to the alteration of the image: any significant modification of the watermarked image will lead to a completely different extracted watermark, thus giving a way to authenticate media through the Internet.

Method New CI () New CI (no mark) Old CI New CI () New CI ()
Frequency (Monobit) Test 0.0004 0.0855 0.595549 0.474986 0.419
Frequency Test within a Block 0 0 0.554420 0.897763 0.6786
Runs Test 0.2896 0.5544 0.455937 0.816537 0.3345
Longest Run of Ones in a Block Test 0.0109 0.4372 0.016717 0.798139 0.8831
Binary Matrix Rank Test 0 0.6579 0.616305 0.262249 0.7597
Discrete Fourier Transform (Spectral) Test 0 0 0.000190 0.007160 0.0008
Non-overlapping Template Matching Test* 0.020071 0.37333 0.532252 0.449916 0.51879
Overlapping Template Matching Test 0 0 0.334538 0.514124 0.2492
Maurer’s “Universal Statistical” Test 0.6993 0.9642 0.032923 0.678686 0.1296
Linear Complexity Test 0.3669 0.924 0.401199 0.657933 0.3504
Serial Test* (m=10) 0 0.28185 0.013396 0.425346 0.2549
Approximate Entropy Test (m=10) 0 0.3838 0.137282 0.637119 0.7597
Cumulative Sums (Cusum) Test* 0 0 0.046464 0.279680 0.34245
Random Excursions Test* 0.46769 0.34788 0.503622 0.287409 0.18977
Random Excursions Variant Test* 0.28779 0.46505 0.347772 0.486686 0.26563
Success 8/15 11/15 15/15 15/15 15/15
Table 3: SP 800-22 test results ()

Let us now evaluate the robustness of the proposed method.

Robustness evaluation

In what follows, the embedding domain is the spatial domain, CI(XORshift,XORshift) has been used to encrypt the watermark, MSCs are the four first bits of each pixel (useful only in case of authentication), and LSCs are the three next bits.

To prove the efficiency and the robustness of the proposed algorithm, some attacks are applied to our chaotic watermarked image. For each attack, a similarity percentage with the watermark is computed, this percentage is the number of equal bits between the original and the extracted watermark, shown as a percentage. Let us notice that a result less than or equal to implies that the image has probably not been watermarked.

Zeroing attack In this kind of attack, a watermarked image is zeroed, such as in Figure 14(a). In this case, the results in Table 1 have been obtained.

(a) Cropping attack

(b) Rotation attack

Figure 14: Watermarked Lena after attacks.
UNAUTHENTICATION AUTHENTICATION
Size (pixels) Similarity Size (pixels) Similarity
10 99.08% 10 91.77%
50 97.31% 50 55.43%
100 92.43% 100 51.52%
200 70.75% 200 50.60%

Table. 1.  Cropping attacks

In Figure 15, the decrypted watermarks are shown after a crop of 50 pixels and after a crop of 10 pixels, in the authentication case.

(a) Unauthentication ().

(b) Authentication ().

(c) Authentication ().

Figure 15: Extracted watermark after a cropping attack.

By analyzing the similarity percentage between the original and the extracted watermark, we can conclude that in case of unauthentication, the watermark still remains after a zeroing attack: the desired robustness is reached. It can be noticed that zeroing sizes and percentages are rather proportional.

In case of authentication, even a small change of the carrier image (a crop by pixels) leads to a really different extracted watermark. In this case, any attempt to alter the carrier image will be signaled, the image is well authenticated.

Rotation attack Let be the rotation of angle around the center of the carrier image. So, the transformation is applied to the watermarked image, which is altered as in Figure 14. The results in Table 2 have been obtained.

UNAUTHENTICATION AUTHENTICATION
Angle (degree) Similarity Angle (degree) Similarity
2 96.44% 2 73.40%
5 93.32% 5 60.56%
10 90.68% 10 52.11%
25 78.13% 25 51.97%

Table. 2.  Rotation attacks

The same conclusion as above can be declaimed: this watermarking method satisfies the desired properties.

JPEG compression A JPEG compression is applied to the watermarked image, depending on a compression level. Let us notice that this attack leads to a change of the representation domain (from spatial to DCT domain). In this case, the results in Table 3 have been obtained.

UNAUTHENTICATION AUTHENTICATION
Compression Similarity Compression Similarity
2 85.76% 2 56.42%
5 67.62% 5 52.12%
10 62.43% 10 48.22%
20 54.74% 20 49.07%

Table. 3.  JPEG compression attacks

A very good authentication through JPEG attack is obtained. As for the unauthentication case, the watermark still remains after a compression level equal to 10. This is a good result if we take into account the fact that we use spatial embedding.

Gaussian noise Watermarked image can be also attacked by the addition of a Gaussian noise, depending on a standard deviation. In this case, the results in Table 4 have been obtained.

UNAUTHENTICATION AUTHENTICATION
Standard dev. Similarity Standard dev. Similarity
1 81.14% 1 55.57%
2 75.01% 2 52.63%
3 67.64% 3 52.68%
5 57.48% 5 51.34%

Table. 4.  Gaussian noise attacks

Once again we remark that good results are obtained, especially if we keep in mind that a spatial representation domain has been chosen.

8 Conclusion and Future Work

In this paper, the pseudo-random generator proposed in [12] has been improved. By using XORshift instead of logistic map and due to a rewrite of the way to generate strategies, the generator based on chaotic iterations works faster and is more secure. The speed and randomness of this new PRNG has been compared to its former version, to XORshift, and to a generator based on logistic map. This comparison shows that CI(XORshift, XORshift) offers a sufficient speed and level of security for a wide range of Internet usages as cryptography and information hiding.

In future work, we will continue to try to improve the speed and security of this PRNG, by exploring new strategies and iteration functions. Its chaotic behavior will be deepened by using the numerous tools provided by the mathematical theory of chaos. New statistical tests will be used to compare this PRNG to existing ones. Additionally a probabilistic study of its security will be done. Lastly, new applications in computer science will be proposed, especially in the Internet security field.

Appendix

The NIST Statistical Test Suite

In what follows, the objectives of the fifteen tests contained in the NIST Statistical tests suite are recalled. A more detailed description for those tests can be found in [14].

Frequency (Monobit) Test is to determine whether the number of ones and zeros in a sequence are approximately the same as would be expected for a truly random sequence.

Frequency Test within a Block is to determine whether the frequency of ones in an M-bits block is approximately M/2, as would be expected under an assumption of randomness (M is the length of each block).

Runs Test is to determine whether the number of runs of ones and zeros of various lengths is as expected for a random sequence. In particular, this test determines whether the oscillation between such zeros and ones is too fast or too slow.

Test for the Longest Run of Ones in a Block is to determine whether the length of the longest run of ones within the tested sequence is consistent with the length of the longest run of ones that would be expected in a random sequence.

Binary Matrix Rank Test is to check for linear dependence among fixed length substrings of the original sequence.

Discrete Fourier Transform (Spectral) Test is to detect periodic features (i.e., repetitive patterns that are near each other) in the tested sequence that would indicate a deviation from the assumption of randomness.

Non-overlapping Template Matching Test is to detect generators that produce too many occurrences of a given non-periodic (aperiodic) pattern.

Overlapping Template Matching Test is the number of occurrences of pre-specified target strings.

Maurer’s “Universal Statistical” Test is to detect whether or not the sequence can be significantly compressed without loss of information.

Linear Complexity Test is to determine whether or not the sequence is complex enough to be considered random.

Serial Test is to determine whether the number of occurrences of the m-bit (m is the length in bits of each block) overlapping patterns is approximately the same as would be expected for a random sequence.

Approximate Entropy Test is to compare the frequency of overlapping blocks of two consecutive/adjacent lengths (m and m+1) against the expected result for a random sequence (m is the length of each block).

Cumulative Sums (Cusum) Test is to determine whether the cumulative sum of the partial sequences occurring in the tested sequence is too large or too small relative to the expected behavior of that cumulative sum for random sequences.

Random Excursions Test is to determine if the number of visits to a particular state within a cycle deviates from what one would expect for a random sequence.

Random Excursions Variant Test is to detect deviations from the expected number of visits to various states in the random walk.

References

  1. X. Tong and M. Cui, “Image encryption scheme based on 3d baker with dynamical compound chaotic sequence cipher generator,” Signal Processing, vol. 89, no. 4, pp. 480 – 491, 2009.
  2. E. Erclebi and A. SubasI, “Robust multi bit and high quality audio watermarking using pseudo-random sequences,” Computers Electrical Engineering, vol. 31, no. 8, pp. 525 – 536, 2005.
  3. P. L’ecuyer, “Comparison of point sets and sequences for quasi-monte carlo and for random number generation,” SETA 2008, vol. LNCS 5203, pp. 1–17, 2008.
  4. D. E. Knuth, The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Reading, Mass, and third edition, Eds.   Addison-Wesley, 1998.
  5. A. Marchi, A. Liverani, and A. D. Giudice, “Polynomial pseudo-random number generator via cyclic phase,” Mathematics and Computers in Simulation, vol. 79, no. 11, pp. 3328–3338, 2009.
  6. S. Sachez, R. Criado, and C. Vega, “A generator of pseudo-random numbers sequences with a very long period,” Mathematical and Computer Modelling, vol. 42, pp. 809 – 816, 2005.
  7. C. J. K. Tan, “The plfg parallel pseudo-random number generator,” Future Generation Computer Systems, vol. 18, no. 5, pp. 693 – 698, 2002.
  8. M. Falcioni, L. Palatella, S. Pigolotti, and A. Vulpiani, “Properties making a chaotic system a good pseudo random number generator,” arXiv, vol. nlin/0503035, 2005.
  9. S. Cecen, R. M. Demirer, and C. Bayrak, “A new hybrid nonlinear congruential number generator based on higher functional power of logistic maps,” Chaos, Solitons and Fractals, vol. 42, pp. 847–853, 2009.
  10. Q. Wang, J. M. Bahi, C. Guyeux, and X. Fang, “Randomness quality of CI chaotic generators. application to internet security,” in INTERNET’2010. The 2nd Int. Conf. on Evolving Internet.   Valencia, Spain: IEEE seccion ESPANIA, Sep. 2010, pp. 125–130.
  11. J. M. Bahi and C. Guyeux, “Topological chaos and chaotic iterations, application to hash functions,” WCCI’10: 2010 IEEE World Congress on Computational Intelligence, vol. Accepted paper, 2010.
  12. Q. Wang, C. Guyeux, and J. M. Bahi, “A novel pseudo-random generator based on discrete chaotic iterations for cryptographic applications,” INTERNET ’09, pp. 71–76, 2009.
  13. R. L. Devaney, An Introduction to Chaotic Dynamical Systems, 2nd ed.   Redwood City: Addison-Wesley, 1989.
  14. N. S. Publication, “A statistical test suite for random and pseudorandom number generators for cryptographic applications,” Aug. 2008.
  15. F. Zheng, X. Tian, J. Song, and X. Li, “Pseudo-random sequence generator based on the generalized henon map,” The Journal of China Universities of Posts and Telecommunications, vol. 15(3), pp. 64–68, 2008.
  16. G. Marsaglia, “Xorshift rngs,” Journal of Statistical Software, vol. 8(14), pp. 1–6, 2003.
  17. P. M. Binder and R. V. Jensen, “Simulating chaotic behavior with finite-state machines,” Physical Review A, vol. 34, no. 5, pp. 4460–4463, 1986.
  18. D. D. Wheeler, “Problems with chaotic cryptosystems,” Cryptologia, vol. XIII, no. 3, pp. 243–250, 1989.
  19. J. Palmore and C. Herring, “Computer arithmetic, chaos and fractals,” Physica D, vol. 42, pp. 99–110, 1990.
  20. M. Blank, “Discreteness and continuity in problems of chaotic dynamics,” Translations of Mathematical Monographs, vol. 161, 1997.
  21. S. Li, G. Chen, and X. Mou, “On the dynamical degradation of digital piecewise linear chaotic maps,” Bifurcation an Chaos, vol. 15, no. 10, pp. 3119–3151, 2005.
  22. F. Robert, Discrete Iterations. A Metric Study.   Springer Series in Computational Mathematics, 1986, vol. 6.
  23. M. S. Turan, A. Doganaksoy, and S. Boztas, “On independence and sensitivity of statistical randomness tests,” SETA 2008, vol. LNCS 5203, pp. 18–29, 2008.
  24. L. Kocarev, “Chaos-based cryptography: a brief overview,” IEEE Circ Syst Mag, vol. 7, pp. 6–21, 2001.
  25. C. Guyeux, N. Friot, and J. M. Bahi, “Chaotic iterations versus spread-spectrum: chaos and stego security,” in IIH-MSP’10, 6-th Int. Conf. on Intelligent Information Hiding and Multimedia Signal Processing, Darmstadt, Germany, Oct. 2010, pp. 208–211, to appear.
  26. A. Menezes, P. van Oorschot, and S. Vanstone, Handbook of applied cryptography, Bocarton, Ed.   CRC Press, 1997.
  27. “The frick collection, http://www.frick.org/,” Last visit the 7th of June, 2011.
  28. “Delicious social bookmarking, http://delicious.com/,” Last visit the 7th of June, 2011.
  29. Y. Nakashima, R. Tachibana, and N. Babaguchi, “Watermarked movie soundtrack finds the position of the camcorder in a theater,” IEEE Transactions on Multimedia, 2009, accepted for future publication Multimedia.
  30. J. M. Bahi and C. Guyeux, “Topological chaos and chaotic iterations, application to hash functions,” in WCCI’10, IEEE World Congress on Computational Intelligence.   Barcelona, Spain: IEEE, Jul. 2010, pp. 1–7.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
128907
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description