Statistical Tests for SPRNG

Overview

The directory TESTS contains statistics-testing programs for the SPRNG random number generators. All tests are implementations of the sequential tests in Knuth's book, but with modification that can examine both sequential and parallel streams. You can also run these tests on your own random number generators by modifing TESTS/Makefile.

Installation

To create the executables, set the variables $LIBDIR and $SRCDIR in TESTS/Makefile to the appropriate directory, as explained in the makefile, and type make in the directory TESTS. A test executable xxx.yyy is generated for each type of test xxx and each type of SPRNG random number generator yyy.

If you wish to run the tests on a parallel machine, you should first set certain variables in TESTS/make.mpi for locating the MPI directories on your system. Then, typing make -f make.mpi will create the executables that perform the tests in parallel. We have set the default location to that of the SGI power challenge machines here at NCSA.

NOTE: If you have already compiled the sequential version of the tests, you must first delete the intermediate files (by typing make clean) before compiling the parallel version.

Sequential Stream Tests

This section briefly describes the statistical tests, as they are applied to sequential random-number streams.

Equidistribution test: We divide the interval [0,1) into d subintervals of equal size. Given a sequence of random numbers, we first count how many of them fall into each subinterval. Then, we apply a chi-square test to determine the probability that such an emprical distribution could have been obtained from a sequence of random numbers that is uniformly distributed in the interval [0,1).
Parameters:
- d: number of divisions between 0 and 1
- n: number of random numbers under test
Usage: equidist.xxx d n

Serial test: We divide the interval [0,1) into d subintervals of equal size, and set up an d by d array of bins. Given a sequence of random numbers, we partition the sequence into disjoint pairs. For each pair of numbers (a, b), if a falls in the ith interval and b falls in the jth interval, we place the pair in bin (i, j). In the end, we count the number of pairs in each bin, and apply a chi-square test to determine the probability that such an emprical distribution could have been obtained from a sequence of random numbers with pairs uniformly distributed in [0,1)² space.
Parameters:
- d: number of divisions between 0 and 1
- n: number of pairs of random numbers under test
Usage: serial.xxx d n

Gap test: We focus on a subinterval [a, b) of [0,1). Given a sequence of random numbers, we check each number in turn to see whether it falls within this subinterval. We note the gap, in the original sequence, between two successive numbers that do. As we scan the sequence, we record all the gap lengths. In the end, we apply a chi-square test on the distribution of gap lengths to determine the probability that such an emprical distribution could have been obtained from a sequence of truly random numbers.
l+2 bins are set up, numbered 0, 1, 2, ..., l, l+1. Bin 0 counts gaps with gap length 0; bin 1 counts gaps with gap length 1; and so on. Gaps with gap length longer than l are lumped together and assigned to bin l+1.
Parameters:
- l: maximum gap length that are individually counted
- a: lower bound on the subinterval
- b: upper bound on the subinterval
- n: number of gaps under test
Usage: gap.xxx l a b n

Poker test: Given a sequence of random numbers, we multiply each random nubmer by d and truncate the product to obtain an integer in the range [0, d-1]. We consider n disjoint subsequences, each of length k, from the sequence. In each subsequence, we count the number of distinct integers. This should be an integer in [1, min(d-1, k)]. We apply a chi-square test on the distribution of the counts to determine the probability that such an emprical distribution could have been obtained from a sequence of truly random numbers.
Parameters:
- d: multiplier
- k: length of each subsequence
- n: number of subsequences under test
Usage: poker.xxx n k d

Coupon collector's test: Given a sequence of random numbers, we multiply each random number by d and truncate the product to obtain an integer in the range [0,d-1]. We then scan the sequence in order. We wish to find all the d numbers in our range. When we first find such a "complete" set, we note the length of the segment over which this complete set was obtained. We repeat the process n times. In the end, we apply a chi-square test on the distribution of the segment lengths to determine the probability that such an emprical distribution could have been obtained from a sequence of truly random numbers.
t-d+1 bins are set up, numbered d, d+1, ..., t-1, t. Bin d counts segments with length d; Bin d+1 counts segments with length d+1; and so on. Segments with length greater than t-1 are lumped together and assigned to bin t.
Parameters:
- d: multiplier
- t: maximum segment length that are individually counted
- n: number of segments under test
Usage: coupon.xxx n t d

Permutations test: Given a sequence of random numbers, we partition the sequence into n disjoint subsequences of length l each. We enumerate each random number by its magnitude relative to the other numbers in the subsequence. For example, the smallest number will be enumerated as 1 and the largest l. (We assume that there are no ties). This enumeration gives us a permutation in [1, t]. There are t! such permutations. We apply a chi-square test on the distribution of the observed permutations to determine the probability that such an emprical distribution could have been obtained from a sequence of truly random numbers where each of these permutations is equally likely.
Parameters:
- l: length of each subsequence
- n: number of subsequences under test
Usage: perm.xxx l n

Runs test: Given a sequence of random numbers, we partition the sequence into disjoint subsequences in which the random numbers monotonically increase. Each such subsequence is called a "run". We apply a chi-square test on the distribution of the observed run lengths, to determine the probability that such an emprical distribution could have been obtained from a sequence of truly random numbers.
l+1 bins are set up, numbered 1, 2, ..., l, l+1. Bin 1 counts runs with run length 1; bin 2 counts runs with run length 2; and so on. Runs with run length greater than l are lumped together and assigned to bin l+1.
Parameters:
- l: maximum run length that are individually counted
- n: number of runs under tests
Usage: run.xxx l n

Maximum-of-t test: Given a sequence of random numbers, we divide the sequence into n subsequences of length t each. For each subsequence, we find the largest number. The distribution of the maximum values follows the cumulative distribution function F(x)=x^t for trully uniformly distributed random numbers. We apply a Kolmogorov-Smirnov test to determine the probability that our empirical distribution are consistent with this.
Parameters:
- t: length of each subsequence
- n: number of subsequences under test
Usage: maxt.xxx n t

Collision test: Given a sequence of random numbers, we multiply each random number by d and truncate the product to obtain an integer in the range [0, d-1]. We then divide the sequence into n disjoint subsequences of length m each. There are d^m possible such subsequences. We count the number of distinct subsequences observed empirically, and substract it from n, obtaining a count of "collisions". We apply a chi-square test on the distribution of the observed collisions to determine the probability that such an empirical distribution could have been obtained from a sequence of truly random numbers.
Parameters:
- d: multiplier
- m: length of each subsequence
- n: number of subsequences under test
Usage: collisions.xxx n m d

Parallel Stream Tests

The above tests examine only individual sequential random number streams. To test for correlations between different random-number streams, we mix the random numbers from several different streams to produce a single stream. In other words, we create a logical stream by combining several real streams. (Right now, we generate a logical stream by interleaving the real streams.) We then feed the logical stream to the above sequential tests. If each individual stream is truly random and the streams are uncorrelated, then the logical stream should pass the tests.

If you specify more than one logical stream per test, each logical stream is generated from a different group of real streams. In this case, a Kolmogorov-Smirnov test is applied to the test results of the logical streams.

Running the Tests

The statistical tests themselves use only the parameters as described above. However, to run the tests on SPRNG generators, four additional parameters are required:

g: number of logical streams under test
r: ratio of real/logical streams
s: seed to the SPRNG generator
f: generator family

These four parameters should appear right after the test name but before the other test parameters. For example, to run equidistribution test on the SPRNG linear congruential generator, type: equidist.lcg g r s f d n

Translator

To aid you in running the statistical tests, we have included Translator in this document. Translator is a manual-driven Java applet that will translate your choices of test parameters into the appropriate command line. In addition to relieving you from remembering the command line syntax, Translator will also catch certain simple errors in your choice of parameters.

NOTE: The memory required for each SPRNG generator is not monitored. You can still set the test to run on an extremely long sequence of random numbers, which will take forever to finish.

Click here to invoke the Translator:

If you are seeing this, you need to enable your browser's Java feature.