[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: how much entropy in common answers



At 01:26 PM 11/23/96 -0800, Hal Finney wrote:
>From: Pat Farrell <[email protected]>
>> Clearly there are cultural issues involved. The entropy in a question
>> such as "what is your favorite brother's name?" is low in an Irish
>> family like mine where names cluster arround choices such as are Patrick,
>> John, Sean, and Dan.
>> So how do we measure the entropy objectively?
>
>You have to estimate the probability that the attacker will guess what you
>have chosen.  This will depend on how much the attacker knows about you.
>If he knows that you're Irish, it will help in the question above.  If he
>knows the names of your brothers, it will help a lot more.  Probably
>it is best to be conservative in assuming what your attacker knows.

I was really hoping for some insight into the general problem.
If you knew that my family is Irish, that makes certain names
much more likely. Obviously if you know that I've got five brothers,
a little bit of work will probably let you know that they are Tom, Dick,
Harry, Mike, and John. But that is an example of a terrible question for Carl's
approach. I was asking the more general question.

Carl suggested that in general a first name has about eight bits of entropy.
But knowledge of the social environment can seriously reduce it. Jenifer was
a hugely popular name for girls in the US ten to 20 years ago. You'd
expect more Juan's and Jose's in a Hispanic community, just like you'd expect
the Dan's, Pat's, Mike's, in an Irish community.

I know that the classic definition of entropy is, but without knowledge of
the statistical universe that we're dealing with, how can I measure it?
The probability that a male's first name is Harry is probably pretty low
in general, yet it is exactly 20% if you restrict the world to my brothers.

Carl suggested "What was the name of the first person on whom I had a crush?"
But if 33% of the women are named "Maria" in the local universe, then
that is not much entropy. Yet a name of "Maria McGee" is probably 
fairly high entropy, as it is an unlikely combination. If you were raised
in a small rural area, there might not be all that many possible
answers to Carl's question.

>If you have four brothers and nobody whom the attacker could ask will
>know who is your favorite, but you think he could find out there names,
>then he has probably a 1/4 chance of guessing right.  (Actually he
>might do better by preferring older brothers rather than younger, etc.)

This is exactly the type of local social bias that I want to measure.
We would expect that an older brother could be a role model, etc. and thus
be more likely to be the "favorite"

How do I know when I've got Carl's 90 bits of entropy?

Pat

Pat Farrell    CyberCash, Inc. 			(703) 715-7834
[email protected]
#include standard.disclaimer