Poisson Distribution method for runs analysis
of random bit sequences
By Arthur R. Kopp 4/5/15 update
See my probability tidbits page TIDBITS
In a n bit sequence of random bits, a run of length two (L = 2 ) can be defined as a 0110 pattern for runs of ONES and as a
1001 pattern for runs of ZEROS. There are sixteen different possible patterns of four bits:
0000 0001 0010 0011 0100 0101 0110 0111
1000 1001 1010 1011 1100 1101 1110 1111
It can be seen that each run pattern, so defined, exists once in sixteen possibilities so that we might expect
each pattern to occur about n / 16 times on average. Similarly, runs of length three can be expected about
n / 32 times on average. Keeping runs of ONES separate from runs of ZEROS, each can be expected to occur
n / (2 ^ (L + 2 )) times on average. If runs of ONES and ZEROS are lumped together, and we're interested only
in RUN counts, we expect to have on average n / (2 ^ (L + 1)) (inclusive mean is twice the exclusive mean).
Mark Nelson Mark discussed a Fibonacci
n-step (Fibo) method he used to answer the question of the probability of having a run
of twenty heads in one million tosses of a fair coin. Would a Poisson Distribution (PD) method provide a far simpler
means of computation?
Indeed it does. In fact, all that's needed in this case is the Poisson result for X = 0. Then two of the PD formula terms simply equal one.
The zero count result means the probabilty of no runs, so subtracting that result from one gives the probability of at least one run
(which is what the Fibo method calculates). The Fibo method, developed as Mark did it, calculates the inclusive case so we use the inclusive mean.
Using (n)(L) notation we have for the (1,000,000)(20) problem:
λ = 1000000 / (2 ^ 21) = 0.4768371
P(0) = ε ^ (- λ) = 0.6207436
Probability of at least one run = 1 - P(0) = 0.3792563
The result is the same as Mark's result out to five decimal places. Here's a comparison of probability
calculations of the Fibo method to the Poisson method for n = 10,000 and n = 100,000:
I've wrtten a experimental program (TRAND) that uses PD for long runs and small λ values.
The program tests run lengths as small as one (010 and 101 patterns). The test page shown below displays the
number of runs on a random sample along with calculated probabilities.
A pseudo random bits generator was used in a study of the method. 1,000 files of 262,144 bits were
generated, and counts of various run lengths for each file were averaged. Here's a sample of the first
three run size results.
Run length 1 result = 32,767.874 (32,768 assumed mean)
Run length 2 result = 16,384.92 (16,384 assumed mean)
Run length 3 result = 8,193.396 (8,192 assumed mean)
My DOS program can be downloaded from here. The program runs in Windows without installing any special provisions.
Download trand: TRAND.EXE
NIST Special Publication 800-22 Revision 1a is a excellent testing reference: 800-22
Back to main page: RUNS