[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Dictionary searching code
At 8:02 PM 4/19/96 -0500, Adam Shostack wrote:
> Does anyone have some code that will search a dictionary, and
>tell me *quickly* if an arbitrary chunk of text is in the dictionary?
>Pre-indexing steps are fine, as is using big chunks of disk for hash
>tables. The point of course, is to check arbitrary possible plaintext
>that a test decryption produces.
This application sounds perfect for Bloom filters. The basic idea of a
Bloom filter is to build a database by taking each word in the dictionary
and hashing with N different hashes. The hashes do not need to by
cryptographically secure, but they do need to be good hashes, XOR doesn't
make it. You use those hashes as bit offsets in a giant bit map, which is
the database. When building the database you turn on the bits at each of
these N locations.
When accessing the database, you hash the chunk of text with the same
hashes, and then test the bits in the database at those offsets. If any of
the bits are zero, then the chunk of text is not in the dictionary.
The failure mode is to say something is in the dictionary when it isn't.
If half the bits in the database are on, then the probability of failure is
2**(-N), so if N==10, then the failure rate is 1 in 1024. If empirically,
you get a higher failure rate, check the quality of your hashes.
For cryptanalysis, I might pick a higher N and eyeball check for failures.
Say you want 1 in a million failure rate, and have an 80,000 word
dictionary. You need a 20 * 80,000 * 2 bit database, which is 400,000
bytes.
Regards - Bill
------------------------------------------------------------------------
Bill Frantz | The CDA means | Periwinkle -- Computer Consulting
(408)356-8506 | lost jobs and | 16345 Englewood Ave.
[email protected] | dead teenagers | Los Gatos, CA 95032, USA