[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Statistical analysis of anonymous databases
One solution to this is to have a database that 'generalizes' its
answers as it provides them. For example, rather than returning
Clay Olbon, 32, m, left handed, cholesterol 350, bp 200/160, 5'9", 175#,
it would return:
fooblat martin,25-35, m, left handed, cholest. 3-400, 5.5-6ft, heavy.
researchers could then provide ranges to get answers. Thus, if I'm
very concerned about the correlation between age and weight, I could
get that information very specifically and nothing else.
The generalization filter could be written to only allow N queries of
a given level of detail, so that the more detail you wanted in one
area, the more you give up in others.
There could be a review comittee (This is the way hospitals & medical
research works) to review requests for more specific data.
Doctors like having names, so you could genrate arbitrary names for
patients, or use a sylable genarator to come up with pronounceable
nonsense.
Adam
Clay Olbon II wrote:
| In medical research (this particular application - there are others I am
| sure) it is desirable to have a large database of individual medical
| histories available to search for correlations, risk factors, etc. The
| problem, of course, is that many individuals want their medical histories
| kept private. It is therefore necessary to maintain a database that is not
| traceable back to individuals. An additional requirement is that people
| must be able to add additional information to their records as it becomes
| available. The researcher who initially posed the question suggested
| adding random data to "encrypt anonymity".
|
--
"It is seldom that liberty of any kind is lost all at once."
-Hume