[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ID of anonymous posters via word analysis?



In article <[email protected]> [email protected] writes:
 >   I think that identification by buzzwords, habitual misspellings, etc. 
 > could be used to identify anonymous posters. Sentence structure is also 
 > revealing. Le style, c'est l'homme, said Voltaire.  Of course, it all 
 > comes down to how much time and effort you want to put into proving, say, 
 > that SBoxx=LDetweiler.

I had a go at this just for fun when an8785 was doing his thing.  I'm
pretty sure I identified him correctly in the end.  (The guy I thought
it was, when I asked him, said 'If I were I wouldn't tell you', whereas
all the other people I suspected but not as strongly all denied it
violently, heh heh heh)

I think this sort of analysis could be automated to a reasonable
extent, to cut out the TypeI errors that the guys who did Shakespeare/Bacon
analysis made.  It's very easy to fool yourself if you don't have predefined
criteria of comparison and a rigid marking scheme.

I'm fairly sure that a sufficiently detailed analysis looking at enough
different points of style would still catch someone's fingerprint even if
they went out of their way to disguise their postings.  The only approach
I can think of that would be successful in hiding individual style is for
person A to write something, person B reads it quickly, then attempts to
write something with the same semantic content, but of course it will
have B's grammar and phraseology and punctuation idiosyncracies.  (And
this only works if B is not a net poster, otherwise you recognise B and
work out who his friends are :-) )

G
-- 
Personal mail to [email protected] (I read it in the evenings)
Business mail to [email protected] (Be careful with the spelling!)
Faxes to An Teallach Limited: +44 31 662 4678  Voice: +44 31 668 1550 x212