# Re: Cracking MD5 for \$10M

```>
> Hal discusses using the Distinguished Points method to find hash
> collisions presented by Michael Wiener with Paul van Oorschot at Rump
> Crypto '94, and lists two benefits:
>
> (1) saves space in searching for loops on a single processor;
> (2) allows parallel searches for collisions over multiple processors.
>
> I claim it's useful only for (2), because another algorithm dominates it
> for single processor loop detection... at least in storage space.
>
[...describes nifty algorithm (which seems to be well-known in the
folklore?) for finding cycles in linear time and constant space...]

Yeah!  I was discussing this algorithm 4 or 5 months ago on alt.math.iams;
it's quite elegant.  If there is a collision after the n-th value, then
I believe this algorithm will find it after generating (at most) 2n
values.  It's been kinda simmering in the back of my head for months,
me wondering how to parallelize this algorithm -- and it's really cool
to see how Wiener and van Oorschot found a way to find cycles efficiently
in parallel!

Apparently two professors here (Yao & Sedgewick) wrote a paper on
this in SIAM Journal of Computer in 1981 -- I'm gonna go dig through
the library to see if I can find this, when I get a chance...

>
> The distinguished points across machines is a great idea for (2), though,
> and doesn't depend on anything looping...  cool stuff!
>

Uh..  I think it *does* depend on looping!

A collision in *any* point means that there will soon be a collision
in a distinguished point, when you use looping.  This probably won't
be true with any other generation method.

Suppose we use the sequence a_n = MD5(n).  Then a collision a_i = a_j
will only be detected if a_i is a distinguished point.

But because we use the sequence a_n = MD5( a_{n-1} ), a collision
a_i = a_j implies that there will soon be a collision a_{i+m} = a_{j+m}
with a_{i+m} a distinguished point (after m ~= 2^32 extra iterations,
on average, if 1 in 2^32 points are distinguished).

>
> Do you (Hal?) or anybody else know whether Wiener and van Oorschot were
> taking into account the contraction of the range each time you iterate
> MD5?  I think the size of the set of all numbers that are the result of
> MD5ing a 128-bit number is considerably smaller than 2^128... is it 1/e of
> that?
>

Hrmm, why should this change the expected number of iterations required
to find a collision?  If I'm being dense, hopefully you'll spell it out
for me. :-)

I've been thinking about writing a program to test the single-processor
cycling algorithm with (for example) crypt(3) for a while now -- maybe
this'd be a good excuse to write it now, and try the parallel distinguished
point stuff, too.  Does anybody think it'd be interesting to get some
practical experience here?  Sound like an interesting doable project?

A few things I've been thinking about, which maybe will spark your
interest enough to answer all my questions. (one can always hope! :-)

First of all, there's some non-zero probability that (when using the
parallelized distinguished points algorithm) two processors will have
their streams match exactly without yielding a useful collision.
Suppose one processor picks the random starting value 3 and generates
a sequence starting with 3,1,4,5,2,7,9,...  Now further suppose that
MD5(6)=3 and that another processor picks the random starting value 6;
then the second processor will generate 6,3,1,4,5,2,7,9,...  We'll
eventually notice this: if 9 is a distinguished point, then we'll see
that two processors have seen the value 9, and we'll start backtracing,
but we won't get any useful collision in MD5 out of this -- we'll only
get the information that MD5(6)=3, which is useless, since both 6 and 3
were random choices.  This means that the second processor's computer
power was wasted.  Can anyone estimate how often this will happen so
that we can know it won't slow things down too much?

Also, there was the arbitrary choice of making the distinguished points
be those with the lower 32 bits all zero -- I wonder what is the effect
of requiring (say) all 48 least significant bits to be zero?  This will
increase the time required to backtrack (unless some fancy schmancy
rescursive or parallel algorithm is used?) but it would also decrease
the space and inter-chip communication required significantly.  Any

Another thing -- I'm not sure this method is (directly) useful for
generating lots of collisions, if that is what is desired.  I believe
Dr. Hellman wrote some paper about the cycling properties of random
functions (out of interest in DES), and he concluded (if I remember
correctly) that when you generate lots of random starting values and
look at their cycling properties, most starting values will drain into
a very few specific cycles.  [I think this was in some volume of CRYPTO:
maybe '86 or so?   I think the title was something like "Drainage
properties of the DES" or somesuch.  I'll have to look it up.]  Doesn't
that reduce the number of different collisions that you can generate
by a large factor?  If so, are there any simple modifications to the
iteration function which would help?  How about

a_n = MD5( a_{n-1} XOR V )

for some random V picked anew each time we want a new collision?

Finally, is there a way to adopt an approach like this to reduce the
space requirements needed to break double DES?  Let P and P' be two
plaintexts, and C=E(k,E(k',P)) and C'=E(k,E(k',P')) be their encipherment
under double DES; we want to find the unknown keys k, k'.  For any X in
{0,1}^128, , define the function function h : {0,1}^128 -> {0,1}^128 by

h(X) = E(y,P) concatenated with E(y,P')		if z=0, or
h(X) = D(y,P) concatenated with D(y,P')		if z=1

where y consists of bits 0-55 of X and z is bit 56 of X.  If h(X)=h(X')
and X != X' and w != w', then with high probability the collision in h
gives us the enciphering keys y and y'.  Can we use some parallel
distinguished points cycling - like algorithm to find the appropriate
collision in h?  If we generate enough values of h, we will exhaust
the entire keyspace, and will necessarily find the enciphering keys.
(By the coupon collector's paradox, this should require something like
2^57 * 57 * log 2 iterations or so on average.)  The only problem is
that there will probably be lots of collisions X,X' with h(X)=h(X')
and X != X' and w = w' -- I think.  Can anyone think of a way to deal
with these useless collisions in h to make finding a useful collision
in h easy?  If so, this should give a method to break double DES in
2^64 time and very little memory.  But maybe this all useless drivel...

Anyhow, this message has gotten very long.  Thanks for reading.  And
many many thanks to Hal for typing in the description of Wiener and van
Oorschot's idea!

-------------------------------------------------------------------------------
David Wagner                                             [email protected]
```