[pmmail-list] Bayesian spamfiltering

Winfried Tilanus pmmail-list@blueprintsoftwareworks.com
Mon, 16 Sep 2002 16:44:34 +0200 (CDT)


Hi everybody,

Some time ago, somebody mentioned an article about a spam-filter based
on a Bayesian (statistical) filter. It claims: "We now miss less than 5
per 1000 spams, with 0 false positives." (see:
http://www.paulgraham.com/spam.html)

So I did some testing with a filter written in perl (see:
http://www.garyarnold.com/projects.php). I builded the probability
database with 1166 pieces of spam and 2873 good e-mails. After that I
tested it on the same mails and I found the results quite
disappointing:

6 out of the 2873 (= 0.2%) of the good e-mails were incorrectly marked
as 'spam'
and
18 out of the 1166 (= 1.5%) spam-mails were incorrectly marked as 'Not
spam'

I know some other people here are expirementing with Bayesion spam
filtering too, do you have the same results? Has anybody gotten it to
work better?


Best wishes,
Winfried

-- 
http://www.xs4all.nl/~wtilanus/


- pmmail-list - The PMMail Discussion List ---------------------------
To POST to the list, send your message to:
pmmail-list@blueprintsoftwareworks.com

To UNSUBSCRIBE, send a message to mdaemon@bmtmicro.com 
with the first line of the message body being...
UNSUBSCRIBE pmmail-list@blueprintsoftwareworks.com
---------------------------------------------------------------------