Randomness Evaluation with the Discrete Fourier Transform Test Based on Exact Analysis of the Reference Distribution
In this paper, we study the problems in the discrete Fourier transform (DFT) test included in NIST SP 800-22 released by the National Institute of Standards and Technology (NIST), which is a collection of tests for evaluating both physical and pseudo-random number generators for cryptographic applications. The most crucial problem in the DFT test is that its reference distribution of the test statistic is not derived mathematically but rather numerically estimated; the DFT test for randomness is based on a pseudo-random number generator (PRNG). Therefore, the present DFT test should not be used unless the reference distribution is mathematically derived. Here, we prove that a power spectrum, which is a component of the test statistic, follows a chi-squared distribution with 2 degrees of freedom. Based on this fact, we propose a test whose reference distribution of the test statistic is mathematically derived. Furthermore, the results of testing non-random sequences and several PRNGs showed that the proposed test is more reliable and definitely more sensitive than the present DFT test.
H. Okada and K. Umeno are with the Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto, JAPAN.
e-mail: email@example.com, firstname.lastname@example.org
Keywords: Computer security, random sequences, statistical analysis
Random numbers are used in many types of applications, such as cryptography, numerical simulations, and so on. However, it is not easy to generate “truly” random number sequences. Pseudo-random number generators (PRNGs) generate the sequences by iterating some recurrence relation; therefore, the sequences are theoretically not “truly” random. The binary “truly” random sequence is defined as the sequence in which each element has a probability of exactly of being “0” or “1” and in which the elements are statistically independent of each other. It is also difficult to ascertain if the sequence is truly random; therefore, the randomness of the sequences is evaluated statistically.
NIST SP 800-22 [1, 2] is one of the famous statistical test suites for randomness that was used for selecting the Advanced Encryption Standard (AES) algorithm. NIST SP 800-22 consists of fifteen tests, and every test is hypothesis testing, where the hypothesis is that the input sequence is truly random; if the hypothesis is not rejected in all the tests, it is implied that the input sequences are random. Among the tests included in NIST SP 800-22, the DFT test is of the greatest concern to us. This test detects periodic features of a random number sequence; input sequences are discrete Fourier transformed, and the test statistic is composed of the Fourier coefficients. In 2003, Kim et al. [3, 4] reported that the DFT test and the Lempel-Ziv test in the original NIST SP 800-22  have crucial theoretical problems. Regarding the DFT test, it is reported that the test statistic does not follow the expected reference distribution because of the problem that the DFT test regards Fourier coefficients as independent stochastic variables although they are not. Kim et al. numerically estimated the distribution of the test statistic with pseudo-random numbers generated with a PRNG and proposed a new DFT test with the estimated distribution. In 2005, Hamano  theoretically scrutinized the distribution of the Fourier coefficients in the original DFT test. However, he could not derive the theoretical distribution of the test statistic, but he did make the problems in the DFT test clearer. In 2005, because of these reports, in NIST SP 800-22 version 1.7, the Lempel-Ziv test was deleted, and the DFT test was revised according to the report of Kim et al. The DFT test has not subsequently been revised. In 2012, Pareschi et al.  reviewed three tests included in NIST SP 800-22, and they also numerically estimated the distribution of the test statistic. Consequently, they reported that the distribution estimated by Kim et al. is not sufficiently accurate. As stated above, several researchers have attempted to revise the DFT test. However, the distribution of the test statistic has still not been derived theoretically but rather numerically estimated.
In this paper, we review the problems in the DFT test, and we prove three facts, which are important for analyzing the reference distribution of the test statistic: Under the assumption that the input sequence is an ideal random number sequence, when ,
The asymptotic distributions of both and are the standard normal distribution () when .
When is sufficiently large, and are statistically independent of each other.
The asymptotic distribution of is a chi-squared distribution with 2 degrees of freedom when .
Here, is an -bit binary sequence, is the -th discrete Fourier coefficient of , and and are the real and imaginary parts of , and they are defined in , and in Section 2, respectively. There is no information about these factors in NIST SP800-22, and, to the best of our knowledge, no researchers who have studied the DFT test have ever provided rigorous proofs. These factors are necessary for analyzing the reference distribution of the test statistic. Furthermore, we propose a new DFT test based on the fact that is the asymptotic distribution of . By comparing the results of several PRNGs, we show that our test is more reliable and definitely more sensitive than the present DFT test.
2 Discrete Fourier Transform Test
In this section, we explain the procedure of the original DFT test (), released in 2001 , before the revision in 2005 . We also explain the problems reported by several researchers [4, 5]. The focus of this test is the peak heights in the discrete Fourier transform of the sequence. The purpose of this test is to detect periodic features in the tested sequence that would indicate a deviation from the assumption of randomness. The intention is to detect whether the number of peaks exceeding the 95 % threshold is significantly different than 5 %.
2.1 The procedure of the original DFT test
The zeros and ones of the input sequence are converted to values of and to create the sequence , where . For simplicity, let be even.
Apply a discrete Fourier transform (DFT) to to produce Fourier coefficients . The Fourier coefficient and its real and imaginary parts and are defined as follows:
(1) (2) (3)
Compute , where
Because , are discarded.
Compute a threshold value . The 95% values are supposed to be .
According to SP800-22, is considered to follow , and is defined by the following equation.
If are mutually independent, then under the assumption of randomness, can be considered to follow , where is the binomial distribution.
According to the central limit theorem, when is sufficiently large, the approximation to is given by the normal distribution . Therefore, when is sufficiently large, under the assumption of randomness,
Compute a test static
When is sufficiently large, under the assumption of randomness, the test statistic can be considered to follow
Compute -; .
If , then conclude that the sequence is non-random, where is a significance level of the DFT test. NIST recommends . Therefore, we also define . If , conclude that the sequence is random.
Perform 1) to 7) for sample sequences ; -s are computed.
(Second-level test I: Proportion of sequences passing a test)
Count the number of sample sequences for which - and define it as . Then, under the assumption of randomness, follows , which approximates when is sufficiently large. Therefore, the proportion of sequences passing a test () approximately follows . The range of acceptable is determined using the significance interval defined as
If the proportion falls outside of this interval, there is evidence that the data are non-random.
(Second-level test II: Uniform distribution of -s)
Uniformity may also be determined by applying a test and determining a - corresponding to the goodness-of-fit distributional test on the -s obtained for an arbitrary statistical test (i.e., the - of the -s). This is performed by computing
where is the number of -s in sub-interval . A - is calculated such that
where igamc is the complementary incomplete gamma function. If
the sequences can be considered to be uniformly distributed, where is the significance level for .
2.2 The fundamental problems of the original and present DFT tests
The test statistic does not follow ;
does not follow .
Furthermore, Kim et al., using Secure Hash Generator (G-SHA1)  as a PRNG, estimated that
and was revised according to this report of Kim et al. ; the present DFT test, denoted as , has not been revised since then. Therefore, the reference distribution of the test statistic of is not mathematically derived. Furthermore, Pareschi et al. reported that the numerical estimation is not sufficiently accurate; they numerically estimated that
Moreover, Pareschi et al. proposed that the DFT test with this test statistic () is more reliable. (The definition of the reliability of a test is discussed in Section 5.) Therefore, it can be considered that still has errors. First, and are performed based on a PRNG, whose randomness should be evaluated with a randomness test; they cannot be used unless the reference distribution is mathematically derived.
As stated in step 5) in Section 2.1, are considered to be mutually independent. However, are not mutually independent, and this problem is expected to be the main factor for why does not follow [4, 5]. Furthermore, before considering this problem, it is also necessary to ensure that follows . Although is considered to follow in step 4) in Section 2.1, there is no information about this in SP800-22, and no researchers studying the DFT test have ever provided rigorous proofs to the best of our knowledge. We provide a proof for the DFT test in Section 3.
3 The asymptotic distribution of
In this section, we analyze the asymptotic distribution of . From the definition of in (1),
Under the assumption that is an ideal random number sequence, and are mutually independent, and . Therefore, as a consequence of the central limit theorem, when is sufficiently large, follows , and follows a chi-squared distribution with 1 degree of freedom . Thus, does not follow .
In the following, we consider the case when . Here, follows if the following is true:
Both and follow .
and are mutually independent.
In the following 2 subsections, we prove the following Theorem 1, Theorem 2 and Theorem 3:
|Theorem 1:||When is sufficiently large, both and follow .|
|Theorem 2:||When is sufficiently large, and are mutually independent.|
|Theorem 3:||follows when is sufficiently large.|
From the definition of , Theorem 3 can be proven by combing Theorem 1 and Theorem 2.
3.1 Proof of Theorem 1: The asymptotic distribution of
In this subsection, we prove Theorem 1. Hamano  showed that the average, variance, skewness, and kurtosis of and are the same. However, it cannot be proven that is the asymptotic distribution of based only on these factors.
is expressed as , where . Under the assumption that is an ideal random number sequence, the characteristic function of denoted by is expressed as follows:
Using the Taylor expansion about a point , we obtain
Thus, is the asymptotic distribution of . Likewise, it can be proven that is the asymptotic distribution of .
3.2 Proof of Theorem 2: Statistical independence of and
In this subsection, we prove Theorem 2. Let us define a 2-dimensional stochastic variable as the following equation:
Under the assumption that is an ideal random number sequence, the characteristic function of denoted by is expressed as follows:
Using the Taylor expansion about a point , we obtain
Therefore, when is sufficiently large, the joint probability distribution function is described as follows:
As we proved before, is the asymptotic distribution of both and . Thus, when is sufficiently large, the probability distribution functions of and are and , respectively. Therefore, when is sufficiently large, the following equation is obtained:
This means that and are mutually independent when is sufficiently large.
4 The proposed DFT test
In Section 3, we proved Theorem 3, stating that follows when is sufficiently large. Therefore, if are mutually independent, we can consider that follows . However, are not mutually independent. Therefore, it is necessary to mathematically analyze the distribution of the test statistic under the condition that are not mutually independent. Hamano  attempted to mathematically derive the distribution of the set , but he could not do so, and we also could not derive this distribution. However, we rigorously proved that the asymptotic distribution of is , and we develop the new DFT test () based on this fact. The reference distribution of the test statistic of is mathematically derived, whereas that of is estimated with a PRNG. We explain the test statistic of in the next subsection.
4.1 The procedure of the proposed DFT test
In the standard approach in NIST SP800-22, each sequence is analyzed; thus, sequences give -s. However, generates (: length of a sequence) -s. Therefore, more -s are generated since is generally larger than . Since the number of -s should not be too large (see Section 5.3), before conducting , it is necessary to adjust the length of the sequences and make them into more sets of short sequences (see also Table 5), assuming that the set input sequences are continuously generated by an RNG. Therefore, is theoretically not appropriate for the isolated set of sequences.
The procedure of the proposed DFT test is described as follows:
The zeros and ones of the -length input sequence are converted to values of and to create the sequence , where . For simplicity, let be even.
Apply a discrete Fourier transform (DFT) to each to produce Fourier coefficients . The Fourier coefficient and its real and imaginary parts and are defined as follows:
For all , perform the Kolmogorov-Smirnov (KS) test [8, 9] on the empirical cumulative distribution function of defined as based on the difference from and compute the - . Here, the KS statistic and are defined as follows.
where is the cumulative distribution function of the Kolmogorov-Smirnov distribution:
Note that -s are computed in this step, while the computes -s.
In this section, we explain the experiments that we performed and the conclusions derived from their results. In these experiments, we compare the reliability and sensitivity of and . The reliability of tests means a low probability of false positives (type I error) (see Table 1), and the sensitivity of tests means a low probability of false negatives (type II error). Now, the null hypothesis of the tests () is that the “generator is ideal”. Therefore, a false positive (type I error) means an erroneous identification of an ideal generator as not random, and a false negative (type II error) means an erroneous identification of a generator that is not ideal as random. Comparing the probability of type I error and type II error, we can conclude which test is better.
|= “generator is ideal”||True||False|
|Judgment of||Reject||False Positive||True Positive|
|(Type I error)|
|Fail to reject||True Negative||False Negative|
|(Type II error)|
For simplicity, in this experiment, we modify the significance interval of the second-level test I defined in (4) as follows:
With this modified significance interval, the significance level of the second-level test I () is modified to be .
5.1 Experiment 1: Test results for periodic sequences
In this experiment, we compare the sensitivity of and . Sensitivity means a low false negative rate (low probability of type I error), i.e., high true positive rate. Here, we compare the true positive rate of each test result.
|low probability of type II error|
|low false negative rate|
|high true positive rate|
Now, we define an -length input sequence as
We purposely create non-random (periodic) sequences from the -length sequence using the method described as follows:
We can clearly state this sequence is a non-random sequence. Therefore, if the test does not reject the (=null hypothesis: “generator is random”), then it is a false negative (type II error).
For each , we use sets of an -length () input sequence generated by the Mersenne Twister algorithm  and covert them to non-random -length sequences . Table 5 in Section 5.3 shows the parameters and for each test. In Section 5.3, we explain why the parameters and for are different from the other tests. Note that is the same. Table 2, Fig. 1 and Fig. 2 show the passing rate , which is defined as follows:
Because we know that is non-random, we know that =FALSE, and the passing rate means a false negative rate in this experiment. Now, the significance levels of second-level tests I and II are and (defined in (5)), respectively. Therefore, the significance intervals defined in Eq. of and are described as follows:
Therefore, if or , we can conclude that the true positive rate is high, and we can conclude that the test is sensitive.
5.2 Experiment 2: Test results for existing pseudo-random number generators