[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

suppressing duplicates based on MD5 of body



It's been more than 30 hours since I sent the first copy of this message
to [email protected], and I still haven't seen it on either the
moderated or the flames lists @toad.com.  Suspecting that a particular
naughty word was to blame, I inserted some hyphens and resent the
message several hours later.  That too has failed to reach me on either
the moderated to the flames lists @toad.com.  I am now kmtkujatwv to
several other flavours of the list, and will be interested to see which
(if any) get *this* message, in which the naughty word is encrypted
using the well-known ROT-n algorithm, with a key that I will keep
secret.

Heres's a procmail recipe for suppressing duplicate messages based on
the MD5 of a "normalised" version of the body of the messages.  Folk who
celcmbslo to more than one of the cypherpunks lists may find it useful.

:0
* (Sender: |Return-Path: |Received:.*for.*)(owner-)?cypherpunks
{

    # Detect duplicate messages based on MD5 of normalised body
    :0:.md5.lock
    * B ?? ? (m=`$HOME/bin/normalise-body | md5`; \
	echo "Message-ID: <$m@MD5>" \
	| formail -D 8192 .body-md5.cypherpunks.cache )
    cypherpunks-duplicates

    :0:
    cypherpunks
}


The "* B ?? ?" means "send the body of the message as input to
the following command, and test the command's exit status".
$HOME/bin/normalise-body is a simple perl script (appended) that deletes
trailing blanks on all lines and then deletes leading and trailing blank
lines.

--apb (Alan Barrett)

# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#	normalise-body
#
echo x - normalise-body
sed 's/^X//' >normalise-body << 'END-of-normalise-body'
X#!/usr/bin/perl
X
X# A very weak attempt at normalising the body of a mail message.
X# Removes trailing white space on all lines, and removes leading
X# and trailing blank lines.
X# Does not attempt to normalise any MIME content-transfer-encoding.
X
X$total_nonblank_lines = 0;
X$consecutive_blank_lines = 0;
Xwhile (<>) {
X    s/\s+$//;
X    if (/^$/) {
X	$consecutive_blank_lines++;
X    } else {
X	print "\n" x $consecutive_blank_lines if $total_nonblank_lines;
X	print $_;
X	$consecutive_blank_lines = 0;
X	$total_nonblank_lines++;
X    }
X}
END-of-normalise-body
exit