Randomness Evaluation with the Discrete Fourier Transform Test Based on Exact Analysis of the Reference Distribution
Abstract
In this paper, we study the problems in the discrete Fourier transform (DFT) test included in NIST SP 80022 released by the National Institute of Standards and Technology (NIST), which is a collection of tests for evaluating both physical and pseudorandom number generators for cryptographic applications. The most crucial problem in the DFT test is that its reference distribution of the test statistic is not derived mathematically but rather numerically estimated; the DFT test for randomness is based on a pseudorandom number generator (PRNG). Therefore, the present DFT test should not be used unless the reference distribution is mathematically derived. Here, we prove that a power spectrum, which is a component of the test statistic, follows a chisquared distribution with 2 degrees of freedom. Based on this fact, we propose a test whose reference distribution of the test statistic is mathematically derived. Furthermore, the results of testing nonrandom sequences and several PRNGs showed that the proposed test is more reliable and definitely more sensitive than the present DFT test.
H. Okada and K. Umeno are with the Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto, JAPAN.
email: irokada@kddi.com, umeno.ken.8z@kyotou.ac.jp
Keywords: Computer security, random sequences, statistical analysis
1 Introduction
Random numbers are used in many types of applications, such as cryptography, numerical simulations, and so on. However, it is not easy to generate “truly” random number sequences. Pseudorandom number generators (PRNGs) generate the sequences by iterating some recurrence relation; therefore, the sequences are theoretically not “truly” random. The binary “truly” random sequence is defined as the sequence in which each element has a probability of exactly of being “0” or “1” and in which the elements are statistically independent of each other. It is also difficult to ascertain if the sequence is truly random; therefore, the randomness of the sequences is evaluated statistically.
NIST SP 80022 [1, 2] is one of the famous statistical test suites for randomness that was used for selecting the Advanced Encryption Standard (AES) algorithm. NIST SP 80022 consists of fifteen tests, and every test is hypothesis testing, where the hypothesis is that the input sequence is truly random; if the hypothesis is not rejected in all the tests, it is implied that the input sequences are random. Among the tests included in NIST SP 80022, the DFT test is of the greatest concern to us. This test detects periodic features of a random number sequence; input sequences are discrete Fourier transformed, and the test statistic is composed of the Fourier coefficients. In 2003, Kim et al. [3, 4] reported that the DFT test and the LempelZiv test in the original NIST SP 80022 [1] have crucial theoretical problems. Regarding the DFT test, it is reported that the test statistic does not follow the expected reference distribution because of the problem that the DFT test regards Fourier coefficients as independent stochastic variables although they are not. Kim et al. numerically estimated the distribution of the test statistic with pseudorandom numbers generated with a PRNG and proposed a new DFT test with the estimated distribution. In 2005, Hamano [5] theoretically scrutinized the distribution of the Fourier coefficients in the original DFT test. However, he could not derive the theoretical distribution of the test statistic, but he did make the problems in the DFT test clearer. In 2005, because of these reports, in NIST SP 80022 version 1.7, the LempelZiv test was deleted, and the DFT test was revised according to the report of Kim et al. The DFT test has not subsequently been revised. In 2012, Pareschi et al. [6] reviewed three tests included in NIST SP 80022, and they also numerically estimated the distribution of the test statistic. Consequently, they reported that the distribution estimated by Kim et al. is not sufficiently accurate. As stated above, several researchers have attempted to revise the DFT test. However, the distribution of the test statistic has still not been derived theoretically but rather numerically estimated.
In this paper, we review the problems in the DFT test, and we prove three facts, which are important for analyzing the reference distribution of the test statistic: Under the assumption that the input sequence is an ideal random number sequence, when ,

The asymptotic distributions of both and are the standard normal distribution () when .

When is sufficiently large, and are statistically independent of each other.

The asymptotic distribution of is a chisquared distribution with 2 degrees of freedom when .
Here, is an bit binary sequence, is the th discrete Fourier coefficient of , and and are the real and imaginary parts of , and they are defined in , and in Section 2, respectively. There is no information about these factors in NIST SP80022, and, to the best of our knowledge, no researchers who have studied the DFT test have ever provided rigorous proofs. These factors are necessary for analyzing the reference distribution of the test statistic. Furthermore, we propose a new DFT test based on the fact that is the asymptotic distribution of . By comparing the results of several PRNGs, we show that our test is more reliable and definitely more sensitive than the present DFT test.
2 Discrete Fourier Transform Test
In this section, we explain the procedure of the original DFT test (), released in 2001 [1], before the revision in 2005 [2]. We also explain the problems reported by several researchers [4, 5]. The focus of this test is the peak heights in the discrete Fourier transform of the sequence. The purpose of this test is to detect periodic features in the tested sequence that would indicate a deviation from the assumption of randomness. The intention is to detect whether the number of peaks exceeding the 95 % threshold is significantly different than 5 %.
2.1 The procedure of the original DFT test

The zeros and ones of the input sequence are converted to values of and to create the sequence , where . For simplicity, let be even.

Apply a discrete Fourier transform (DFT) to to produce Fourier coefficients . The Fourier coefficient and its real and imaginary parts and are defined as follows:
(1) (2) (3) 
Compute , where
Because , are discarded.

Compute a threshold value . The 95% values are supposed to be .
According to SP80022, is considered to follow , and is defined by the following equation.

Count
If are mutually independent, then under the assumption of randomness, can be considered to follow , where is the binomial distribution.
According to the central limit theorem, when is sufficiently large, the approximation to is given by the normal distribution . Therefore, when is sufficiently large, under the assumption of randomness,

Compute a test static
When is sufficiently large, under the assumption of randomness, the test statistic can be considered to follow

Compute ; .
If , then conclude that the sequence is nonrandom, where is a significance level of the DFT test. NIST recommends [2]. Therefore, we also define . If , conclude that the sequence is random.

Perform 1) to 7) for sample sequences ; s are computed.

(Secondlevel test I: Proportion of sequences passing a test)
Count the number of sample sequences for which  and define it as . Then, under the assumption of randomness, follows , which approximates when is sufficiently large. Therefore, the proportion of sequences passing a test () approximately follows . The range of acceptable is determined using the significance interval defined as
(4) If the proportion falls outside of this interval, there is evidence that the data are nonrandom.

(Secondlevel test II: Uniform distribution of s)
Uniformity may also be determined by applying a test and determining a  corresponding to the goodnessoffit distributional test on the s obtained for an arbitrary statistical test (i.e., the  of the s). This is performed by computing
where is the number of s in subinterval . A  is calculated such that
where igamc is the complementary incomplete gamma function. If
(5) the sequences can be considered to be uniformly distributed, where is the significance level for .
2.2 The fundamental problems of the original and present DFT tests
Kim et al. [4] and Hamano [5] reported the following:

The test statistic does not follow ;

does not follow .
Furthermore, Kim et al., using Secure Hash Generator (GSHA1) [2] as a PRNG, estimated that
and was revised according to this report of Kim et al. [2]; the present DFT test, denoted as , has not been revised since then. Therefore, the reference distribution of the test statistic of is not mathematically derived. Furthermore, Pareschi et al. reported that the numerical estimation is not sufficiently accurate; they numerically estimated that
Moreover, Pareschi et al. proposed that the DFT test with this test statistic () is more reliable. (The definition of the reliability of a test is discussed in Section 5.) Therefore, it can be considered that still has errors. First, and are performed based on a PRNG, whose randomness should be evaluated with a randomness test; they cannot be used unless the reference distribution is mathematically derived.
As stated in step 5) in Section 2.1, are considered to be mutually independent. However, are not mutually independent, and this problem is expected to be the main factor for why does not follow [4, 5]. Furthermore, before considering this problem, it is also necessary to ensure that follows . Although is considered to follow in step 4) in Section 2.1, there is no information about this in SP80022, and no researchers studying the DFT test have ever provided rigorous proofs to the best of our knowledge. We provide a proof for the DFT test in Section 3.
3 The asymptotic distribution of
In this section, we analyze the asymptotic distribution of . From the definition of in (1),
When ,
Under the assumption that is an ideal random number sequence, and are mutually independent, and . Therefore, as a consequence of the central limit theorem, when is sufficiently large, follows , and follows a chisquared distribution with 1 degree of freedom . Thus, does not follow .
In the following, we consider the case when . Here, follows if the following is true:

Both and follow .

and are mutually independent.
In the following 2 subsections, we prove the following Theorem 1, Theorem 2 and Theorem 3:
Theorem 1:  When is sufficiently large, both and follow . 

Theorem 2:  When is sufficiently large, and are mutually independent. 
Theorem 3:  follows when is sufficiently large. 
From the definition of , Theorem 3 can be proven by combing Theorem 1 and Theorem 2.
3.1 Proof of Theorem 1: The asymptotic distribution of
In this subsection, we prove Theorem 1. Hamano [5] showed that the average, variance, skewness, and kurtosis of and are the same. However, it cannot be proven that is the asymptotic distribution of based only on these factors.
is expressed as , where . Under the assumption that is an ideal random number sequence, the characteristic function of denoted by is expressed as follows:
where
Using the Taylor expansion about a point , we obtain
Since
Thus, is the asymptotic distribution of . Likewise, it can be proven that is the asymptotic distribution of .
3.2 Proof of Theorem 2: Statistical independence of and
In this subsection, we prove Theorem 2. Let us define a 2dimensional stochastic variable as the following equation:
Under the assumption that is an ideal random number sequence, the characteristic function of denoted by is expressed as follows:
where
Therefore,
Using the Taylor expansion about a point , we obtain
Since
we obtain
Therefore, when is sufficiently large, the joint probability distribution function is described as follows:
As we proved before, is the asymptotic distribution of both and . Thus, when is sufficiently large, the probability distribution functions of and are and , respectively. Therefore, when is sufficiently large, the following equation is obtained:
This means that and are mutually independent when is sufficiently large.
4 The proposed DFT test
In Section 3, we proved Theorem 3, stating that follows when is sufficiently large. Therefore, if are mutually independent, we can consider that follows . However, are not mutually independent. Therefore, it is necessary to mathematically analyze the distribution of the test statistic under the condition that are not mutually independent. Hamano [5] attempted to mathematically derive the distribution of the set , but he could not do so, and we also could not derive this distribution. However, we rigorously proved that the asymptotic distribution of is , and we develop the new DFT test () based on this fact. The reference distribution of the test statistic of is mathematically derived, whereas that of is estimated with a PRNG. We explain the test statistic of in the next subsection.
4.1 The procedure of the proposed DFT test
In the standard approach in NIST SP80022, each sequence is analyzed; thus, sequences give s. However, generates (: length of a sequence) s. Therefore, more s are generated since is generally larger than . Since the number of s should not be too large (see Section 5.3), before conducting , it is necessary to adjust the length of the sequences and make them into more sets of short sequences (see also Table 5), assuming that the set input sequences are continuously generated by an RNG. Therefore, is theoretically not appropriate for the isolated set of sequences.
The procedure of the proposed DFT test is described as follows:

The zeros and ones of the length input sequence are converted to values of and to create the sequence , where . For simplicity, let be even.

Apply a discrete Fourier transform (DFT) to each to produce Fourier coefficients . The Fourier coefficient and its real and imaginary parts and are defined as follows:

For all , perform the KolmogorovSmirnov (KS) test [8, 9] on the empirical cumulative distribution function of defined as based on the difference from and compute the  . Here, the KS statistic and are defined as follows.
where is the cumulative distribution function of the KolmogorovSmirnov distribution:
Note that s are computed in this step, while the computes s.
5 Experiments
In this section, we explain the experiments that we performed and the conclusions derived from their results. In these experiments, we compare the reliability and sensitivity of and . The reliability of tests means a low probability of false positives (type I error) (see Table 1), and the sensitivity of tests means a low probability of false negatives (type II error). Now, the null hypothesis of the tests () is that the “generator is ideal”. Therefore, a false positive (type I error) means an erroneous identification of an ideal generator as not random, and a false negative (type II error) means an erroneous identification of a generator that is not ideal as random. Comparing the probability of type I error and type II error, we can conclude which test is better.
Null hypothesis  is  

= “generator is ideal”  True  False  
Judgment of  Reject  False Positive  True Positive 
(Type I error)  
Fail to reject  True Negative  False Negative  
(Type II error) 
For simplicity, in this experiment, we modify the significance interval of the secondlevel test I defined in (4) as follows:
(6) 
With this modified significance interval, the significance level of the secondlevel test I () is modified to be .
5.1 Experiment 1: Test results for periodic sequences
In this experiment, we compare the sensitivity of and . Sensitivity means a low false negative rate (low probability of type I error), i.e., high true positive rate. Here, we compare the true positive rate of each test result.
low probability of type II error  
low false negative rate  
high true positive rate 
Now, we define an length input sequence as
where
We purposely create nonrandom (periodic) sequences from the length sequence using the method described as follows:
Therefore,
We can clearly state this sequence is a nonrandom sequence. Therefore, if the test does not reject the (=null hypothesis: “generator is random”), then it is a false negative (type II error).
For each , we use sets of an length () input sequence generated by the Mersenne Twister algorithm [10] and covert them to nonrandom length sequences . Table 5 in Section 5.3 shows the parameters and for each test. In Section 5.3, we explain why the parameters and for are different from the other tests. Note that is the same. Table 2, Fig. 1 and Fig. 2 show the passing rate , which is defined as follows:
Because we know that is nonrandom, we know that =FALSE, and the passing rate means a false negative rate in this experiment. Now, the significance levels of secondlevel tests I and II are and (defined in (5)), respectively. Therefore, the significance intervals defined in Eq. of and are described as follows:
(7) 
(8) 
Therefore, if or , we can conclude that the true positive rate is high, and we can conclude that the test is sensitive.
5.2 Experiment 2: Test results for existing pseudorandom number generators
Test  

Passing rate  \arraybackslash  \arraybackslash  \arraybackslash  \arraybackslash  \arraybackslash  \arraybackslash 
\arraybackslash0.0  \arraybackslash0.0  \arraybackslash0.0  \arraybackslash0.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash0.5  \arraybackslash0.9  \arraybackslash0.8  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash0.7  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash0.9  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash0.9  \arraybackslash0.9  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash0.7  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash0.9  \arraybackslash1.0  \arraybackslash0.9  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash0.7  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash0.8  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash0.9  \arraybackslash1.0  \arraybackslash0.9  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash0.9  \arraybackslash1.0  \arraybackslash0.9  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash0.9  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash0.9  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash0.9  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  
\arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash1.0  \arraybackslash0.0  \arraybackslash0.0  