# Re: Cracking MD5 for \$10M

```Jim Gillogly <[email protected]> writes:

>Hal discusses using the Distinguished Points method to find hash
>collisions presented by Michael Wiener with Paul van Oorschot at Rump
>Crypto '94, and lists two benefits:

>(1) saves space in searching for loops on a single processor;
>(2) allows parallel searches for collisions over multiple processors.

>I claim it's useful only for (2), because another algorithm dominates it
>for single processor loop detection... at least in storage space.
>["rho" method elided]

Yes, this is a good point, the main advantage of the DP algorithm is
that it parallelizes.  Rho does have the problem that you have to run
3 MD5's for each step, but OTOH it does not have the overhead of saving
and checking the distinguished points, so which one would be best on a
single processor would depend on the relative costs.

>Do you (Hal?) or anybody else know whether Wiener and van Oorschot were
>taking into account the contraction of the range each time you iterate
>MD5?  I think the size of the set of all numbers that are the result of
>MD5ing a 128-bit number is considerably smaller than 2^128... is it 1/e of
>that?  Anybody know about random mappings?

if they had considered it.  My intuition was that x=MD5(x) would cover a
large fraction of the 128 bit output space, but on further thought Jim
appears to be right: with n input values into a random function (n would
be 2^128 in this case), the chance of a particular output being missed for
any one input would be 1-1/n, and the chance of it being missed for all
n inputs would be (1-1/n)^n.  Taking the limit as n approaches infinity
gives 1/e as the fraction of values which would be missed.  This means
that the fraction of hits would be 1 - 1/e, much lower than I had
guessed.

>Subsequent iterations reduce
>it further, though of course not by 1/e each time, so that the set of
>numbers that are the result of iteratively MD5ing a number N times should
>be an appreciably smaller set to be groping around in.

The way I figure it, if the fraction of the original n is f (which would be
1 before the first iteration, and 1 - 1/e before the 2nd iteration based on
the above), the chance of a point being missed is (1-1/n)^(nf), which is
1/e^f.  So f would be found by f = 1 - 1/e^f, iterating once per MD5
iteration and starting f at 1.  I just did an experiment of iterating this.
After 100 times f was about .02; after 1000 times f was about .002,
suggesting f = 2/iterations.  If this is right, you might be able to get
a birthday match after only the cube root of n tries rather than the
square root of n, or about 2^44 iterations or so rather than 2^64, because
at that point you are only looking at 2^85 possible output values.

This result is only really valid for serial machines; parallel ones
search more per iteration so this would move you back towards the 2^64
number.  It does imply that you don't really get k-fold speedup with k
machines if you take this effect into consideration.

>	Jim Gillogly
>	18 Halimath S.R. 1994, 16:12

Gee, my calendar must be off!

Hal

```