[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Statistical analysis of anonymous databases




One solution to this is to have a database that 'generalizes' its
answers as it provides them.  For example, rather than returning 

Clay Olbon, 32, m, left handed, cholesterol 350, bp 200/160, 5'9", 175#, 
it would return:
fooblat martin,25-35, m, left handed, cholest. 3-400, 5.5-6ft, heavy.

researchers could then provide ranges to get answers.  Thus, if I'm
very concerned about the correlation between age and weight, I could
get that information very specifically and nothing else.

The generalization filter could be written to only allow N queries of
a given level of detail, so that the more detail you wanted in one
area, the more you give up in others.

There could be a review comittee (This is the way hospitals & medical
research works) to review requests for more specific data.

Doctors like having names, so you could genrate arbitrary names for
patients, or use a sylable genarator to come up with pronounceable
nonsense.


Adam

Clay Olbon II wrote:

| In medical research (this particular application - there are others I am
| sure) it is desirable to have a large database of individual medical
| histories available to search for correlations, risk factors, etc.  The
| problem, of course, is that many individuals want their medical histories
| kept private.  It is therefore necessary to maintain a database that is not
| traceable back to individuals.  An additional requirement is that people
| must be able to add additional information to their records as it becomes
| available.  The researcher who initially posed the question suggested
| adding random data to "encrypt anonymity".
| 

-- 
"It is seldom that liberty of any kind is lost all at once."
					               -Hume