[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fast modular reduction

To: [email protected] (Ray Cromwell)
Subject: Re: fast modular reduction
From: Ray Cromwell <[email protected]>
Date: Thu, 7 Sep 1995 04:11:08 -0400 (EDT)
Cc: [email protected], [email protected]
In-Reply-To: <[email protected]> from "Ray Cromwell" at Sep 6, 95 09:57:53 pm
Sender: [email protected]

I wrote:
> modifications. Is there an algorithmic analysis available? I also
> I think there is a bug in your description. Let k+1 = n+1
> (e.g. the dividend is 1 more "block" than the modulus). Then
> i=n starting out, and we have

 Upon a closer look, I see there's no mistake. The algorithm will
never reach k=n because the loop stops at n+1.

 Anyway, I played around with the algorithm a little, and it's neat
and easy to implement, but the speed increase is not worth
the patent hassle (assuming there is a speed increase, I saw none)

  The algorithm is still basically O(n^2) if used in a modexp
routine. It requires n^2 multiplications and additions. Whereas,
a typical Karatsuba multiplication using a high precision
reciprocal will only use 2*n^1.5 multiplications and 5*n^1.5/8
additions. (for n=64 which is a 2048-bit number being reduced, 
it's about 1/5 the multiplications, but 5 times the additions)

Two other possible algorthms are:

Let P(x) = sum(i=0 to n-1) a_i x^i be a multiprecision integer
radix x.

If m is a modulus, of length n/2, rewrite P(x) as

sum(i=0 to n/2-1) a_i x^i  + x^(n/2) (a_{n/2 + i} x^i)

break the summation into two parts. Focus on the second term.
(both terms are not equal, or one digit larger than the modulus)
Perform modular reduction of the right hand polynomial using
Horner's method

x*(x*(x*...(x*a_i + a{i-i} mod m)mod m)mod m) 

Those internal mod m's can be done quickly with a 2-digit
trial quotient estimation.

It's still O(n^2), but might be quicker.

Still another technique..

Rewrite P(x) 

(a_0 + a_2 x^2 + a_4 x^4 + ...) + x (a_1 + a_3 x^2 + a_5 x^4 + ...)

[broken into two Polys with odd and even terms)
Factor out x^2 out of each piece and write

a_0 + ((a_2 + a_4 x^2 + a_6 x^4 + ...)*x^2) + 
x*(a_1 + x^2*(a_3 + a_5 x^2 + a_7 x^4 + ...)

Now keep applying the recursive rule until the length of the
poly pieces are the same or smaller than the modulus. Now,
start evaluating from the inner layers. Multiply each piece by
x^2 (two shifts), and take the mod. Sum the results, shifting one side by 
1 (for the x factor). Shifts are free because an array representation
yields a shift with a pointer movement.

It looks kinda like the method for evaluating  FFTs a little bit,
but it's not. Just something off the top of my head just now.
(I hereby place it in the public domain assuming it's worth
anything, no patents please)  I think with a clever implementation, you can 
trade some mults for more adds, but still use less additions than russian 
peasant.

-Ray

Follow-Ups:
- Re: fast modular reduction
  - From: Wei Dai <[email protected]>

References:
- Re: fast modular reduction
  - From: Ray Cromwell <[email protected]>

Prev by Date: Re: What is truth?
Next by Date: subscripe
Prev by thread: Re: fast modular reduction
Next by thread: Re: fast modular reduction
Index(es):
- Date
- Thread