[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Random Number Testing




[email protected] writes:
> > The "randomness" of the output of my sieve is obviously not perfect.
> >  However, it will remove bytes from a data stream that "randomness" tests
> > describe as "nonrandom".

And Perry writes:
> You are displaying a profound ignorance of the notion of "random"
> here. An individual byte cannot be random or nonrandom. Random is a
> statistical property, and therefore can only apply to a mass of
> data.

The limitation seems to be in the testing process.

An inviolable rule of test and measurement is that the testing process
must have a higher degree of accuracy and precision than is expected from
the item under test. If all you have is a non-graduated stick one meter 
long, you can't measure increments of less than one meter unless you add 
some additional resolution to the system (such as a human's judgment
that the object is four sticks, plus about another third of a stick,
long).

Our ability to synthesize pseudo-random data, and the degree to which
it approaches true random data, is limited by our ability to discern
between the two. If the best testing process available yields the
same results for the output of a true RNG and a particular PRNG, what
can be done to improve the randomness of the PRNG? How could we tell
the difference based on the data itself? 

About all you could do would be to critique the PRNG or sieving process 
from an empirical perspective and hope that you weren't introducing some 
other non-random bias to the system. I'm sure that some perfectly random 
data would be discarded as being non-random, especially for a small sample 
size. The sequence  HHHHTTTT doesn't look random, but this outcome of 8 
flips is equally probable for a "perfect" coin. Do you keep this data or 
toss it because it doesn't "look" random? If you toss it, aren't you
removing entropy from the sample?

It may be (relatively) easy to say that radioactive decay is a truly 
random process and a sieve which discards certain re-occuring values 
is not. Improving the testing process, and increasing the size of the
data used for that testing, will improve the sieve output. However,
it looks like an asymptotic situation: the PRNG can approach, but never
quite reach, the true RNG case. As long as human analysis and discretion
is involved, there will always be a certain amount of non-randomness
in the output of a PRNG. 

My question: if you don't know the source of the data, how can you
_really_ determine if it's random enough for crypto use? Given any
PRNG string of a certain length, does the fact that I might be able to 
find that exact same data string buried in a mountain of truly random 
data mean that it's suitable for crypto use? If you knew that I created 
the data from the clock of my workstation, then maybe not. But otherwise?

Fred  <[email protected]>