[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
suppressing duplicates based on MD5 of body
It's been more than 30 hours since I sent the first copy of this message
to [email protected], and I still haven't seen it on either the
moderated or the flames lists @toad.com. Suspecting that a particular
naughty word was to blame, I inserted some hyphens and resent the
message several hours later. That too has failed to reach me on either
the moderated to the flames lists @toad.com. I am now kmtkujatwv to
several other flavours of the list, and will be interested to see which
(if any) get *this* message, in which the naughty word is encrypted
using the well-known ROT-n algorithm, with a key that I will keep
secret.
Heres's a procmail recipe for suppressing duplicate messages based on
the MD5 of a "normalised" version of the body of the messages. Folk who
celcmbslo to more than one of the cypherpunks lists may find it useful.
:0
* (Sender: |Return-Path: |Received:.*for.*)(owner-)?cypherpunks
{
# Detect duplicate messages based on MD5 of normalised body
:0:.md5.lock
* B ?? ? (m=`$HOME/bin/normalise-body | md5`; \
echo "Message-ID: <$m@MD5>" \
| formail -D 8192 .body-md5.cypherpunks.cache )
cypherpunks-duplicates
:0:
cypherpunks
}
The "* B ?? ?" means "send the body of the message as input to
the following command, and test the command's exit status".
$HOME/bin/normalise-body is a simple perl script (appended) that deletes
trailing blanks on all lines and then deletes leading and trailing blank
lines.
--apb (Alan Barrett)
# This is a shell archive. Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file". Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
# normalise-body
#
echo x - normalise-body
sed 's/^X//' >normalise-body << 'END-of-normalise-body'
X#!/usr/bin/perl
X
X# A very weak attempt at normalising the body of a mail message.
X# Removes trailing white space on all lines, and removes leading
X# and trailing blank lines.
X# Does not attempt to normalise any MIME content-transfer-encoding.
X
X$total_nonblank_lines = 0;
X$consecutive_blank_lines = 0;
Xwhile (<>) {
X s/\s+$//;
X if (/^$/) {
X $consecutive_blank_lines++;
X } else {
X print "\n" x $consecutive_blank_lines if $total_nonblank_lines;
X print $_;
X $consecutive_blank_lines = 0;
X $total_nonblank_lines++;
X }
X}
END-of-normalise-body
exit