[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: fast 386 DES code figures
Phil Karn <[email protected]> writes:
> I've completely translated the encrypt and decrypt routines
> to assembler, with no calls or jumps inside either routine.
> I picked up Richard Outerbridge's seriously clever initial
> and final permutation algorithm from Schneier, along with a
> few of his other tricks.
I should confess that I am probably the only person on the list
who has not yet read Schneier. So I apologize in advance if the
following comments turn out to be redundant.
> What still bugs me is that Schneier lists the speed of one
> commercial DES implementation as 40,600 encryptions/sec on a
> 33 Mhz 486. I just don't see how that's possible without
> using a lot more memory for lookup table space (I use only
> 2K, which is nice in a DOS environment).
Since 2k is exactly what is needed for a precomputed table which
combines the S-boxes and the wirecrossing, I will assume this is
the approach you used.
Given this data structure, there are a number of cute tricks
which will get DES down to around 30 machine instructions per
each of the 16 rounds on a machine with enough registers and a
decent set of addressing modes.
The important trick is to reorder the S-boxes so that you do
lookups on the odd numbered ones and the even numbered ones
separately. (1,3,5,7,2,4,6,8) works nicely. This permits the
results to be ORed together in two groups of four with all the
necessary indexing held in a single 32 bit register, which can be
appropriately repositioned each time. The precomputed key
schedule needs to be adjusted to reflect the new order. Note
that with this ordering, the blocks of six bits used for lookup
are byte aligned if you consider the even and odd S-boxes
separately.
If you store the upper two bits of lookup table addressing in the
precomputed key schedule and shift both it and the right hand
block left two bits, all explicit table indexing vanishes and you
can accumulate the result of a lookup with a single indexed OR
instruction.
I'm not sure what 30-something instructions per round translates
into for a 33 Mhz 486, but 40,600 encryptions per second doesn't
sound too outrageous using the above approach.
--
Mike Duvos $ PGP 2.6 Public Key available $
[email protected] $ via Finger. $