[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: alleged-RC4



Actually, in looking at the assembly code generated by three different
compilers (GCC on i386, GCC on PA, and HP's PA compiler), strangely
enough, the `% 256' should be `& 0xff' (it shaves a few instructions
off the inner loop for some reason which isn't immediately apparant to
me..).

On the PA, I got a ~30% speedup by unrolling the inner loop 4x,
assembling the pad into an `unsigned long', and doing one 4-byte-wide
XOR with the user data.  I think most of the speedup comes from giving
the instruction scheduler more instructions to reorder to avoid
load-store conflicts.  Your milage will vary on other architectures.

					- Bill