CRM114 Spamassassin-Plugin

I wrote this Spamassasin plugin to integrate CRM114 into an existing Amavisd/Spamassassin-system:

With my own configuration I have been using it since April 2007 without any problems. (But there are a lot of different ways to configure and use Spamassassin, so there might still be special cases which reveal errors — if you find one, please let me know).

Short notice (Feb. 2012): The project is not dead. More than one year of inactivity basically means the plugin is finished, I am not aware of any bugs, and I should have released the last version as 1.0…

The only item on the ToDo list is to test the libcrm114 library and rewrite the plugin without requiring a shell and a crm114_command.

Features and Issues

  • The interface to CRM114′s mailreaver.crm is rather fragile because you have to use the correct command line (and also consider different paths and user permissions for the Spamassassin proccess that will run it). The most common error message is “crm114: Error. Failed to get CRM114-Status.“, which means the crm command line did not return the expected output. In that case run the crm command line from the shell (better yet a su amavis or at least a sudo -u amavis shell) to make sure it is correct and prints the CRM114 header lines to stdout.
  • Newer CRM114 versions (I guess all since 20080326) have a “--report_only” option to return only the classification instead of the whole mail. If you have a new version then you can uncomment the line “push(@crm114_options, '--report_only');” for improved performance. By default it is commented out, because the stable versions on CRM114′s download page do not understand it. Mark pointed out that crm will ignore unrecognized options, thus it is always safe to uncomment the line.
  • CRM114 can be configured to save copies of all scanned mails in a cache directory. With this cache it does not need a complete e-mail to train ham/spam but only needs the mail’s Cache-ID to fetch it from cache. This leads to some privacy issues but can be very useful to let users retrain the filter. With some additional scripting you could find and train a cached mail by it’s Message-ID (and even Outlook includes the Message-ID when forwarding).
  • I cannot test the plugin with a user-specific (user_prefs) configuration. I guess most optiones (e.g. different scores) should work on a per user basis. Using seperate databases will probably fail because of access permissions; I would also doubt that possibly small improvements in accuracy justify the space and performance costs of 12M large CSS-databases per user.
  • SpamAssassin 3.3 has a “pluginized Bayes“. I am not sure if it is worth to implement the additional functions to use CRM114 as a Bayes subsystem. Currently I see no advantage in doing so.
  • I pushed the code to GitHub.
  • I think I am going to remove the dynamic score function. It was a neat hack at the time, but in practice it does not return sensible results because CRM114- and SA-scores are too different. Some users really like dynamic scores, so the function stays.

Feedback on these and other questions is welcome.

amavisd-new patches

Just for historic reasons here are some amavisd-new patches to preserve the raw CRM114-Score and Cache-ID. Amavisd-new-2.6.3 and later already include this functionality and patches are no longer necessary.