Stoppt die Vorratsdatenspeicherung! Jetzt klicken &handeln! Willst du auch an der Aktion teilnehmen? Hier findest du alle relevanten Infos und Materialien:

CRM114 Spamassassin-Plugin

I wrote this Spamassasin plugin to integrate CRM114 into an existing Amavisd/Spamassassin-system:

With my own configuration I have been using it since April 2007 without any problems. (But there are a lot of different ways to configure and use Spamassassin, so there might still be special cases which reveal errors — if you find one, please let me know).

To write the raw CRM114-Score and Cache-ID into e-mails I also patched my Amavis; maybe you find this useful as well:

Features and Issues

  • The interface to CRM114’s mailreaver.crm is rather fragile because you have to use the correct command line (and also consider different paths and user permissions for the Spamassassin proccess that will run it). The most common error message is “crm114: Error. Failed to get CRM114-Status.“, which means the crm command line did not return the expected output. In that case run the crm command line from the shell (better yet a su amavis shell) to make sure it is correct and prints the CRM114 header lines to stdout.
  • Newer CRM114 versions (I guess all since 20080326) have a “--report_only” option to return only the classification instead of the whole mail. If you have a new version then you can uncomment the line “push(@crm114_options, '--report_only');” for improved performance. By default it is commented out, because the stable versions on CRM114’s download page do not understand it. Mark pointed out that crm will ignore unrecognized options, thus it is always safe to uncomment the line.
  • CRM114 can be configured to save copies of all scanned mails in a cache directory. With this cache it does not need a complete e-mail to train ham/spam but only needs the mail’s Cache-ID to fetch it from cache. This leads to some privacy issues but can be very useful to let users retrain the filter. With some additional scripting you could find and train a cached mail by it’s Message-ID (and even Outlook includes the Message-ID when forwarding).
  • I cannot test the plugin with a user-specific (user_prefs) configuration. I guess most optiones (e.g. different scores) should work on a per user basis. Using seperate databases will probably fail because of access permissions; I would also doubt that possibly small improvements in accuracy justify the space and performance costs of 12M large CSS-databases per user.
  • SpamAssassin 3.3 will have a “pluginized Bayes“. I am not sure if it is worth to implement the additional functions to use CRM114 as a Bayes subsystem. Currently I see no advantage in doing so.
  • Would you like to have a wiki for this? I could easily move the ‘project’ to a Trac page with SVN, wiki etc. IMO that would be an overkill for such a small module, but if several people are interested then I will reconsider.
  • I think I am going to remove the dynamic score function. It was a neat hack at the time, but in practice it does not return sensible results because CRM114- and SA-scores are too different.

Feedback on these and other questions is welcome.