# Re: fast modular reduction (proof?)

```> In the following pseudocode, B is the radix in which the numbers are
> represented (2^32 for a 32-bit machine), n is the length of modulus in
> blocks, U is B^(n+1) mod the modulus, X is the number to be reduced, k+1
> is the length of X, and Y is the result.
>
> 1. Y = X
> 2. For i from k down to n+1, repeat steps 3 and 4
> 3.	Y = Y - Y[i] * B^i + Y[i] * U * B^(i-n-1)
> 4.	If Y >= B^i, then Y = Y - B^i + U * B^(i-n-1)

To do a proof I rewrite the algorithm:

n = len(modulus)                   // modulus < B^n

Y = X                              // obviously Y = X mod modulus

K =  B ^ (n+1) - U                 // U = B ^ (n+1) mod modulus,
// therefore K = 0 mod modulus
// furthermore K > 0

for (i=len(Y)-1 ; i>n  ;  i--)
{

F = B ^ (i-n-1) * K             // F > 0
// F = 0 mod modulus

Y -=  Y[i] * F                  // Y shrinking, but
// Y still the same mod modulus

if ( Y >= B^i )
Y -= F                       // again shrinking,
// still the same mod modulus

}

This shows that Y was shrinking, but is still equal to X mod modulus.

To see whether Y really shrinks enough:

Y = sum(i=0..len(Y)-1)  Y[i] * B^i

In the step Y = Y - Y[i] * B^i the highest block of Y is deleted (what
could be done fast by reducing the length of Y).

Now  Y < B^i

Afterwards the same value mod modulus is added to keep Y constant:

Y = Y + Y[i] * U * B^(i-n-1).

Y[i]<B   ->  Y[i]+1  <= B

U < modulus < B^n , therefore U < B^n

->    (Y[i]+1) * U < B * B^n  = B^(n+1)

->    Y[i] * U  < B^(n+1) - U

->    Y[i] * U *  B^ (i-n-1 )  < F

Y < B^i + F

Check of the last step:

0 <= U < B^n therefore

B^n  <  B^(n+1) - U  <= B^(n+1)

Therefore in every loop

B^(i-1) < F <= B^i

->   Y-F  < B^i

Partial Correctness:

Y = X                              [ Y = X mod
Y < B^len(X) ]

K =  B ^ (n+1) - U                 [ K = 0 mod

B^n < K <= B^(n+1) ]

for (i=len(X)-1 ; i>n  ;  i--)
{
[ Y = X mod  ,  Y < B^(i+1) ]

F = B ^ (i-n-1) * K

[ F = 0 mod  ,   B^(i-1) < F <= B^i  ]

[ 0 <= Y[i] < B ]

[ Y[i] * F  = 0 mod ,
0 <= Y[i] * F  < B^(i+1) ]

[ Y >= Y[i] * B^i  ->  Y >= Y[i] * F ]

Y -=  Y[i] * F

[ Y = X mod ,
Y < B^i + F  (reason see above) ,
Y >= 0 ]

if ( Y >= B^i )
Y -= F                       // again shrinking,
// still the same mod modulus

[ Y = X mod ,
Y >= 0    ,
Y < B^i ]

}

Last i was n+1, therefore

Y = X mod ,  Y >= 0 , Y < B^(n+1)

This is not enough,  Y < B^n is requested. The loop can't be done once
more because i-n-1 would become negative.

k+1 was the length of X, and  n the length of the modulus. You walk
down from k to n+1 . In every loop you remove one block of the
number. This means you have to do len(X)-len(modulus) loops. In the
pseudocode you do only len(X)-len(modulus)-1 loops.

One loop seems to be missing. This may be a
result of confusion whether your Y starts with Y[0] or Y[1].

I do understand the algorithm as:

n = len(modulus)
U = B^n  mod modulus
K = B^n - U                         // = 0 mod modulus,  0 < K < B^n
Y = X

for(i=len(X)-1 ; i>= n ; i--)       // squeeze Block i in Number Y
{                                  // Y < B ^ (i+1)

F = B ^ ( i-n )  * K             // F = 0 mod modulus

Y -= Y[i] * F                    // subtract  Y[i] * B^i, now Y < B ^ i
// add the equivalent Y[i] * B^(i-n)*U <= F
// now Y < B^i + F

if ( Y >= B[i] )
Y -= F                         // now Y < B^i

}

Last i was n, therefore Y < B^n , Y = X mod modulus ,
but perhaps still Y >= modulus.

Ok, algorithm understood and agreed (after modifying the loop counter).

Any more agreement or disagreements?