Nevis SpamAssassin Guide This web page is a guide to using SpamAssassin at Nevis. The main points are:
  • how to use SpamAssassin with your Nevis e-mail account;
  • what SpamAssassin will do to your mail;
  • how SpamAssassin can make mistakes;
  • how SpamAssassin improves its filtering via a collaborative network;
  • how to help SpamAssassin learn what is spam and what isn't
  • and some useful references.

This web page continues a discussion of solutions to the problem of spam at Nevis. In particular, it focuses on a particular tool: SpamAssassin, a freeware Unix-based spam filter for electronic mail.

Before you read any further, keep the following highly important fact in mind: with any spam filter, there will always be:

If you aren't willing to accept the possiblity that some of your e-mail may be altered or mis-routed by a spam filter, then you should not use one. If you feel that using a spam filter should make you 100% immune to receiving false, abusive, or time-wasting e-mail, then you will be disappointed with any spam filter.


How to use SpamAsssassin with your Nevis e-mail account

I've installed SpamAssassin on the Nevis Linux cluster (just for the "man" pages; the program is only executed by the mail server). This program performs a fairly sophisticated content analysis on your e-mail; you can find the details on the SpamAssassin web site.

SpamAssassin is enabled site-wide, which means it scans every e-mail message that arrives at Nevis. As discussed below, SpamAssassin may bounce a message back to the sender if it's obviously spam. However, if it accepts your message, the only way that SpamAssassin alters your mail is to add normally-invisible headers to your message. SpamAssassin does not route your e-mail unless you set up a mail filter in some way. The following section describes how to use the results of SpamAssasin's scan to act on your e-mail.

Notes:


How SpamAssassin will modify your e-mail

If SpamAssassin determines that a mail message is spam, then:

Therefore, in order for you to benefit from SpamAssassin, you have to instruct some program to examine your mail headers and do something with them. There are two ways:

However, SpamAssassin can do more than just flag messages:

The key to enabling this functionality (and more) is in your ~/.spamassassin/user_prefs file, which is created for you when SpamAssassin analyzes your mail for the first time. For more information, see the references below.

Note: SpamAssassin is not an anti-virus tool. However, many viruses are attached to mail messages with malformed headers or other clues that identify them as suspicious to SpamAssassin's filters.

One more time: No spam filter is 100% efficient. There will always be some spam messages that will pass SpamAssassins' filters. There will almost certainly be some legitimate mail that will incorrectly be flagged as spam.

This means that it's unwise to automatically delete the messages that are flagged by SpamAssassin. If your mail reader permits it, I suggest that you send any message whose header contains the text "X-Spam-Status: Yes" into a special folder (I use the name "Junk"), and review the contents of the folder periodically; you can see an example of this in /a/mail/procmailrc/seligman.


How SpamAssassin can make mistakes

Can SpamAssassin identify a legitimate e-mail message as spam? Absolutely! In fact, it's happened to me many times. Here's an example scenario, with some names changed to protect the guilty:

"Uncle Harry" wants to send some vacation pictures to his family. But:

Put it all together, and SpamAssassin has a high probability of labeling Uncle Harry's mail as spam.

Now, perhaps your first instinct is to condemn the receiver of this message. The above situation shouldn't happen at Nevis if you don't use your e-mail account for personal communcation. However, when this happened to me, it wasn't "Uncle Harry" sending messages to the family; it was a respected professor at Fermilab sending out notices inviting fellow scientists to a series of seminars.

Another example: The most common reason why a legitimate message sent to me is flagged by SpamAssassin is that a Nevis user forwards me a message with a virus or spam message that looks particularly distressing. SpamAssassin catches it, drops it into my "Junk" folder, and I don't see it until days or weeks later. If you forward me your spam messages too often, then SpamAssassin's auto-whitelist feature may start flagging all your messages to me as spam, since it will begin to think you're a spam source!

The moral: sooner or later, you will receive legitimate mail that SpamAssassin will flag as spam.


SpamAssassin and collaborative networks

As part of SpamAssassin's filtering process, it consults with collaborative networks for spam detection:

Here's the idea: Assume that you're a volunteer member of one of these communities. You detect a message that's spam. You send that message to the central servers of the Razor or DCC networks. If enough members of that community report that message as spam, it will flag that message as potential spam for everyone else.

Spam filters, including SpamAssassin, can send queries to the these networks, requesting a rating of how likely a particular message is spam. The user of SpamAssassin does not have to be a member of the communities to make this query. (You only have to be a member of the communities if you want to report spam to them.)

I have not made Nevis a member of the Vipul's Razor, DCC, or Pyzor communities. However, I have enabled SpamAssassin to use the Razor, Pyzor, and DCC information to help evaluate whether the messages you receive are spam. Razor and Pyzor store some information on a per-user basis, which is why you see ~/.razor and ~/.pyzor directories created in your home area.

SpamAssassin also makes use of Real-time Blackhole Lists (or RBLs). This are lists of servers which are known as major relays of spam, or have refused to take steps to block spam.


How SpamAssassin can learn to improve its filtering process

SpamAssassin includes several tools that can you "teach" it and improve its spam detection process. These range from simple, manual procedures (adding a bad address to a file), all the way up to Bayesian analysis.

One basic control is your ~/.spamassassin/user_prefs file; this file is created for you the first time SpamAssassin analyizes a mail message for you. You can take a look at my file (~seligman/.spamassassin/user_prefs) to see my configuration. For example, I've never received a message from any address ending in ".com.br" that wasn't spam, so I've blacklisted *@*.com.br in this file.

For more advanced configuration options, and a discussion of Bayesian filtering, see:

Remember, if a legitimate e-mail message is "mangled" by SpamAssassin because it thinks it's spam, the solution is not to disable SpamaAssassin. The solution is to whitelist_from the sender in your ~/.spamassassin/user_prefs file.


References

There are many ways to configure SpamAssassin and Procmail for your personal use. I recommend the following references:


Back to the Nevis Mail Page.

Return to the Nevis Computing Page.

Up to the Nevis Home Page.

E-mail: Send any comments or questions to the webmaster.