[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Identifying GIFs, was Re: criminal gif upload



In message Tue,  5 Oct 1993 17:11:17 -0400 (EDT),
  Matthew J Ghio <[email protected]>  writes:
>  Seriously tho, just posting a list of MS-DOS filenames is rather
> useless as filenames do get changed.  It is highly likely that a sysop
> or user might have changed the filenames to something else, especially
> if their operating system supported filenames longer than 8 characters.

Doesn't this bring up a fundamental question: when is a file equivalent?
we can easily use MD5 or brik to identify identical files.
But GIFs, and other image files (MPEG, JPEG, TIFF, etc.) are subject to both
lossey compression and stegnagraphic [sic, sorry] coding techniques.
If you change  one pixel of the background, the checksums are different, but
it will still show *porm or whatever to a judge who "knows it when he sees
it."

We can prove statistical insignificance of duplication using strong
hashing functions. Can we find a way to statistically prove "looks like"
on a numerical basis?

Pat

Pat Farrell      Grad Student                 [email protected]
Department of Computer Science    George Mason University, Fairfax, VA
Public key availble via finger          #include <standard.disclaimer>