[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

/dev/random - using up entropy?

The discussions of what to do when /dev/random has handed out all of its
available entropy have assumed that entropy gets used up; I'd like to 
propose that maybe it doesn't, at least in the computational-complexity
sense that says that you don't have the computational power around to
calculate the information inside /dev/random from the output,
giving a sort of "computational entropy" that reflects not only the 
uncertainty you have because of randomness but also the uncertainty
you have because of your computational limitations.

Most of the designs I've seen look like this:
        A Reservoir of entropy R = R1....Rn, where n is large, 1024 or 4096
        An input stream I = I1....Ik, which is mixed into R
        A mixing function F which is used to mix R <= F(R,I) 
                for some chunk of I, possibly empty.
        A hash function H, typically MD5.
        An output O = O1...Om = H(R), and E gets mixed after every output.
                (These are capital-o, not zero...)

The entropy E of the reservoir E before an output is
        -SUM(all X) p(X) log p(X)
where X is an event R1=x1, R2=x2 ... Rn=Xn
which is equal to n, assuming the Ri are iid equiprobable 0 or 1.

After an output, the entropy is
        - SUM p(X | H(R)=O) log p(X|H(R)=O)
which works out to n-m, since p(X) is zero if H(R)!=O, and 2**m/2**n if it does.
So that says you use up m bits of entropy if you get m bits of good output.

However, what I'd like to suggest is that you don't, from the perspective
of a user who doesn't have direct access to the reservoir R of random bits.
For that user, p(X|H(R)=O) is the same as p(X) or P(X|H(R)=O'), because
the user is neither able to invert H, nor to enumerate all possible R,
nor to calculate anything useful based on multiple outputs, since the
reservoir R is shuffled between outputs; even a simple circular shift
may be enough.   This doesn't apply to the case where n is 32 or 48
and the hash function produces n-bit outputs, or even m<<n bit outputs,
because that maybe be inverted or brute-forced, but it seems to apply
for the case where n is sufficiently large and the hash is good.
If the hash is simpler than MD5, it may apply anyway, since the hash
produces far fewer bits than its input, as long as the hash and the
mixing function don't give away any information about the reservoir
between successive outputs.

This would suggest that /dev/random ought to have a mode that says
"give me output of whatever quality you have available",
and that it ought to be OK to use it, as long as the reservoir has 
been seeded with sufficient high-entropy input to have decent randomness.
#                                       Thanks;  Bill
# Bill Stewart, Freelance Information Architect, [email protected]
# Phone +1-510-247-0664 Pager/Voicemail 1-408-787-1281