January Spam

February 01, 2004

For the month of January 2004 I got 7,275 spam messages. Midway through the month I started using Command-C’s SpamSieve. So far I have been very pleased with SpamSieve.

I trained the corpus with my December spam, and all the good messages I had archived in my mail folders. While there were some false positives and false negatives initially, within days SpamSieve was catch 98.5 percent of my spam. The recent upgrade to release 2.1.2 has made the process even better as now the mail identified as spam moves automatically to my Spam folder. Over all SpamSieve filtered 4,932 messages for my in January. 1,250 were good and 3,682 were spam. Mostly due to my training there were 60 false positives and 14 false negatives for an accuracy of 98.5%.

My corpus has a ratio of 2,459 good messages to 6,967 spam messages, or 74%. There are a total of 231,020 words in the corpus.

The major benefit to using SpamSieve has been the elimination of some 60 inbox mail rules. Now I let all my mail stay in my inbox and archive only the ones I want to keep. Before I was using rules to sort my mail into folders as a way of separating the good from the spam. SpamSieve’s Whitelist has already identified 150 address variations that I want to receive mail from.

All in all I am very pleased with SpamSieve and I am looking forward to reaching an accuracy of 99 percent, at which point I’ll start automatically deleting mail in the spam folder after a weeks time.

Author's profile picture

Mark H. Nichols

I am a husband, cellist, code prole, nerd, technologist, and all around good guy living and working in fly-over country. You should follow me on Twitter.