Poisson Distribution method for runs analysis
of random bit sequences
By Arthur R. Kopp 8/20/14
In a n bit sequence of random bits, a run of length two (L = 2 ) can be defined as a 0110 pattern for runs of ONES and as a
1001 pattern for runs of ZEROS. There are sixteen different possible patterns of four bits:
0000 0001 0010 0011 0100 0101 0110 0111
1000 1001 1010 1011 1100 1101 1110 1111
It can be seen that each run pattern, so defined, exists once in sixteen possibilities so that we might expect
each pattern to occur almost n / 16 times on average. Similarly, runs of length three can be expected almost
n / 32 times on average. Keeping runs of ONES separate from runs of ZEROS, each can be expected to occur
(n - L - 1) / (2 ^ (L + 2 )) times on average. If runs of ONES and ZEROS are lumped together, and we're interested only
in RUN counts, we expect to have on average (n - L - 1) / (2 ^ (L + 1)) (inclusive mean is twice the exclusive mean).
Mark Nelson Mark discussed a Fibonacci
n-step (Fibo) method he used to answer the question of the probability of having a run
of twenty heads in one million tosses of a fair coin. Would a Poisson Distribution (PD) method provide a far simpler
means of computation? Indeed it does. In fact, all that's needed in this case is the Poisson result for a count
of zero. Then two of the PD formula terms simply equal one. The zero count result means the probabilty of no
runs, so subtracting that result from one gives the probability of at least one run (which is what the Fibo
method calculates). The Fibo method, developed as Mark did it, calculates the inclusive case so we use the inclusive mean.
Using (n)(L) notation we have for the (1,000,000)(20) problem:
lamda = (1000000 - 21) / (2 ^ 21) = 0.476837
P(0) = epsilon ^ (- lamda) = 0.620743
Probability of at least one run = 1 - P(0) = 0.379256
The result is the same as Mark's result out to five decimal places. Here's a comparison of probability
calculations of the Fibo method to the Poisson method for n = 10,000 and n = 100,000:
I've wrtten a experimental program that uses PD cumulative sums to obtain a test metric for long runs
(low probabilities). Smaller run lengths are handled differently as discussed below. The program tests
run lengths as small as one (010 and 101 patterns). The test page shown below displays the actual
counts found on a random sample along with calculated probabilities (n = 131,072 bits).
The bell shaped curve below illustrates a Poisson Distribution (PD) having good symmetry. The curves are not symmetrical
for long runs, so cumulative sums must be done on one side or the other of the mean (center of the bell curve). If a
target count is less than the mean, summing of the PD probabilities begins at x = 0 and ends when x = the target count value
(sum a from left to right). If a target count is greater than the mean, summing begins at a large x and x is decremented until
x equals the target count value (sum b from right to left). If the sum is greater than the test limit (white portion of a
cumulative sum) the test is passed.
SUM A + SUM B = 1. If a test limit of 0.01 is set, the required pass/fail threshold is set at half that or 0.005. Since the
threshold is in effect for both sums, the result is 0.5% + 0.5% = 1.0% of the total sum of 1.
For large values of the mean and symmetrical bell curves, the complementary error function (erfc) can be used instead of
cumulative summations. For consistency, [erfc(x)] /2 is used which has a value of 0.5 for x = 0. An adjustment to the value
of x passed to erfc(x) is made to make the results agree with binomial distribution (BD) cumulative sums. Note that lamda * qa
below is the BD variance.
For run length L:
pa = 1 / (2 ^ (L+2)) qa = 1 - pa
diff = ABS(lamda - count) where ABS is the absolute value
x = diff / sqrt(2 * lambda * qa)
P = [erfc(x)]/2
Program TRAND.EXE uses the erfc method for lamda > 30
My DOS programs can be downloaded from here. Fibocomp is based on the Fibonacci n-step method. Trand is the experimental
program mentioned that tests random bit files. The programs run in Windows XP.
Download fibocomp: FIBOCOMP.EXE
Download trand: TRAND.EXE
NIST Special Publication 800-22 Revision 1a is a excellent testing reference: 800-22