[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: OCR and Machine Readable Text



Before you get too carried away by the capabilities of off the shelf
character recognition, read Phil Karn's experience trying to produce
a working DES program from the pages of Applied Cryptography.  It was
easier than writing it himself, but it was still far from a turnkey
operation.  This is from <URL:
http://www.qualcomm.com/people/pkarn/export/karndecl.html >


> 5. I began by first photocopying, on a standard office photocopier, the
> 18 pages containing the Triple DES source code listing from Part V of
> the Book. This took about 5 minutes. Second, I scanned in the 18 sheets
> on a Macintosh Quadra 610 computer system equipped with an HP ScanJet II
> flatbed scanner and Omnipage Professional optical character recognition
> (OCR) software. The computer, scanner, and software are all readily
> available through normal consumer computer supply channels. The total
> scanning process took about one and a half hours. About an hour of this
> time was spent learning to use the scanning system and conducting trial
> runs, as I had only used it briefly some time ago. The actual scan of the
> 18 pages took about 15-20 minutes. Third, I transferred the resulting
> machine-readable file from the Macintosh to my own personal computer
> and brought it up under GNU EMACS, a popular and widely available text
> editing program that I have used for many years. In EMACS I compared,
> by eye, the scanned file displayed on my screen against the printed
> listing in the Book. I began correcting the scanner's many errors, such
> as mistaking the digit '0' for the letter 'O' or mistaking the vertical
> bar '|' for the letter 'I'.
> 
> 6. After manually correcting those errors noticed through visual
> comparison with the Book, I invoked the "C" language compiler on the
> (partially) corrected file. The compiler immediately pointed out
> additional errors I had overlooked in my visual inspection so I could
> also correct them by reference to the Book. I also noticed several errors
> in the listing printed in the Book. However, the programmer's intentions
> were obvious from the context of each error and were easily fixed. About
> fifty minutes later, I successfully compiled the file without error.
> 
> 7. The fourth step was to write a small test program to execute
> the DES code with the test vectors given at the end of the source
> code listing. This trivial program took less than 5 minutes to
> write. Unfortunately, the test did not succeed, meaning that at
> least one error went undetected by the compiler in either the code as
> printed in the Book or as scanned. Scrutinizing the code more closely,
> I quickly found another error in the printed version that was easily
> corrected. However, it still did not produce correct results. After about
> an hour of searching, I finally located the error in a list of numbers
> in a table -- another error in the printed version. By reference to the
> DES algorithm description in the first part of the Book, which includes
> the correct numbers in tabular form, I found and corrected the error.
> 
> 8. At this point the test finally succeeded, so I knew I had a correct
> program. However, to increase my confidence further I tried a few
> other DES test vectors that were not included in the source code, but
> were openly published by the US National Institute of Standards and
> Technology (NIST). All passed. At this point it was beyond doubt that I
> had a correct, working copy of the DES source code identical to that on
> the Disk with all errors removed, including those printed in the Book
> as well as those added by the scanning process.
> 
> 9. Finally, in about 40 minutes I wrote and debugged a "driver" program
> analogous to that included in the Crowell declaration. This driver program
> takes a sample plaintext file, encrypts it, displays the encrypted file
> in hexadecimal and then decrypts it.