[pmmail-list] Filter statistics

Jon Etkins pmmail-list@blueprintsoftwareworks.com
Mon, 08 Mar 2004 16:22:04 -0600


On Mon, 08 Mar 2004 09:37:50 -0500 (EST), Derek W. Keoughan wrote:

>On Sat, 06 Mar 2004 20:30:32 -0600, Jon Etkins wrote:
>
>>Purely out of curiosity, I'd like to start gathering some stats on my
>>incoming mail - specifically how many items are received each day, and
>>what percentage of them are spam.
>
>Couldn't you just filter the spam to a folder, and have PMMail keep
>track of the number of items in there for you?

Yes, but that solution lacks (for me) in two areas...

>I use SpamBayes (for Python) on a Windows box on my network to filter
>eMail - anything coming in gets marked with an X- header as spam, ham
>(keeps), or unsure... I can train the filter by bouncing eMail it using
>a reserved eMail address.

I already have SpamAssassin/pop3proxy set up and flagging spam
appropriately.  That much is working perfectly, with over 99.5% total
accuracy and about 0.02% false positives.

>Anything with "spam" in the X-header gets filtered to a "SpamBayes"
>folder in PMMail, which I go through at least once a day, since there's
>usually something good ("ham") in there... 329 waiting for me this
>morning... once deleted, they wind up in the trash, and since I last
>deleted my Trash folder on the 4th, there's 1529 more in there...

I get between 800 and 1200 spams per day (my address has been around
the 'net for over 10 years, so it's on every spam list known or sold to
man).  I filter anything with an SA score above 10 directly to the
trash, and anything flagged as spam but with a lower score (or not
auto-learned by SA's bayes engine) into a Spam folder.  I have -never-
had a false positive score over 10, and I only get about 1 FP per week
in my Spam folder these days.

>Without SpamBayes, I'd be stuck reading spam instead of all the eMails
>from my clients and friends!

Once a day or so, I scan through the Spam folder, pluck out the (very
rare) False Positive, and pass the remainder through SA-learn to add
them to the bayes corpus.  Without dividing up my spam in this way, I'd
be sifting through 800+ spams per day instead of less than 100.

So, yes, I could simply count how many items wind up in my Spam folder
per day, but I'd still have no count of the spams that were deleted
directly, or of how many messages arrived in total or were non-spam.

What I had planned on doing was adding an "always trigger" filter that
does nothing but fire a hook to increment a total message count, and
adding another hook to my two spam filters to increment a spam count
when a message is either deleted or sent to the Spam folder.

And if anyone knows of a way to achieve something similar without
resorting to external programs, I'm all ears! :)

Cheers,
  Jon Etkins
  Austin, TX


- pmmail-list - The PMMail Discussion List ---------------------------
To POST to the list, send your message to:
pmmail-list@blueprintsoftwareworks.com

To UNSUBSCRIBE, send a message to mdaemon@bmtmicro.com 
with the first line of the message body being...
UNSUBSCRIBE pmmail-list@blueprintsoftwareworks.com
---------------------------------------------------------------------