[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Re: entropy

Eric Hollander writes:
>I seem to remember that English text is about 1.5 bits per character.  I can
>find a reference if you're interested.

There are lots of entropies available to measure.  There is "true"
entropy, the lower bound for all other entropy measures.  This is the
compressibility limit.

The entropy I was referring to was simply the single character
entropy.  That is, the probabilities p_i in the entropy expression are
the probabilities that a given single character appear in the text.
This will be higher than the true entropy.  Shannon's estimate for H_1
was 4.03 bits/character.  This assumes a 27 character alphabet.  The
entropy for ASCII-represented English will be higher because of
punctuation and capitals.

The true entropy of English is much lower than this, of course.  But
for an simple measure to automatically distinguish between plaintext
and ciphertext, it should suffice.

Re: uuencoding.  In my analysis before I assume that the uuencoding
would be of random data.  If it is not from random data, then the
entropy will be lower.  Thanks for the clarification.