[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: alleged-RC4
Actually, in looking at the assembly code generated by three different
compilers (GCC on i386, GCC on PA, and HP's PA compiler), strangely
enough, the `% 256' should be `& 0xff' (it shaves a few instructions
off the inner loop for some reason which isn't immediately apparant to
me..).
On the PA, I got a ~30% speedup by unrolling the inner loop 4x,
assembling the pad into an `unsigned long', and doing one 4-byte-wide
XOR with the user data. I think most of the speedup comes from giving
the instruction scheduler more instructions to reorder to avoid
load-store conflicts. Your milage will vary on other architectures.
- Bill