NAME

Mail::SpamAssassin::Plugin::CRM114 - use CRM114 with SpamAssassin


SYNOPSIS

  loadplugin     Mail::SpamAssassin::Plugin::CRM114


DESCRIPTION

This plugin uses the external program crm114 for classification.


FEATURES


NOTES/PROBLEMS/TODO

If you use CRM114's cache then note that SA will only write headers beginning with X-Spam- but CRM114 looks for X-CRM114-CacheID. Training with spamassassin --report/--revoke should work (because this plugin handles the renaming) but otherwise you will have to change that line before training from cache.

Amavis-Notes: I use Amavis to call SpamAssassin. Here are patches to include the additional CRM114-Headers into every Mail:

against amavisd-new-2.4.5: http://mschuette.name/files/amavisd.245.patch,
against amavisd-new-2.5.2: http://mschuette.name/files/amavisd.252.patch,
against amavisd-new-2.6.1: http://mschuette.name/files/amavisd.261.patch (thanks to Jules M),
against amavisd-new-2.6.2: http://mschuette.name/files/amavisd.262.patch (thanks to Mark M).
amavisd-new-2.6.3 no longer requires these patches


AUTHOR & ACKNOWLEDGEMENT

Thanks to Tomas Charvat for testing.

Initially based on plugin by Eugene Morozov.

Also borrowing from the Mail::SpamAssassin::Plugin-modules.

lookup_crm114_cacheid() contributed by Thomas Mueller <thomas@chaschperli.ch>

Many improvements contributed by Mark Martinec <Mark.Martinec@ijs.si>

Everything else is Copyright 2007-2010, Martin Schuette <info@mschuette.name>


CRM114 INSTALLATION & CONFIGURATION

To use this plugin you have to set up CRM114 so that you have these files: mailreaver.crm, mailfilter.cf, rewrites.mfp, priolist.mfp, and .CSS files (see http://crm114.sourceforge.net/docs/CRM114_Mailfilter_HOWTO.txt for details).

The most important steps are:

    mkdir ~/.crm114
    cp mailfilter.cf rewrites.mfp *.crm ~/.crm114
    cd ~/.crm114
    cssutil -b -r spam.css
    cssutil -b -r nonspam.css
    touch priolist.mfp
    $EDITOR mailfilter.cf
    $EDITOR rewrites.mfp

In mailfilter.cf check the option :add_headers: /yes/! (and do not bother to change the flag_subject_string options -- this plugin ignores them anyway)


PLUGIN CONFIGURATION

To use this plugin you probably have to adjust the crm114_command.

All other settings should have working default values, which are chosen to be cautionary and nonintrusive.

crm114_command string (default: crm -u ~/.crm114 mailreaver.crm)

The commandline used to execute CRM114. It is recommended to run mailreaver.crm and to use absolute paths only.

crm114_learn (0|1) (default: 0)

Set this if CRM114 should be trained by SA.

If enabled, then a call to Mail::SpamAssassin->learn() or spamassassin --report/--revoke also calls the CRM114 plugin and lets CRM114 learn the mail as spam/ham.

crm114_autolearn (0|1) (default: 0)

Set this if CRM114 should be trained by SA's autolearn function.

NB: This is different from :automatic_training: in CRM114's mailfilter.cf because SA's score is influenced by several different factors while CRM114 has to rely on its own classification.

But anyway: Only activate this if you know what you are doing! In other words: it makes sense to enable autolearning only if non-learning SpamAssassin rules (without AWL and Bayes) are already well tuned and are known to provide good results,

crm114_remove_existing_spam_headers (0|1) (default: 0)
crm114_remove_existing_virus_headers (0|1) (default: 0)

Set whether existing X-Spam or X-Virus headers are to be removed before classification.

If SpamAssassin is called by Amavis then set the same value as Amavis does. That way a SA-check from Amavis and one from the command line both see the same headers.

crm114_dynscore (0|1) (default: 0)

Set to use a dynamic score, i.e. calculate a SA score from the CRM114 score. Otherwise the static scores are used.

crm114_dynscore_factor (default: depends on SA required_score)

Dynamic score scaling factor.

With dynamic scoring the SA score is calculated by: CRM score * crm114_dynscore_factor

Notes:

The default is to calculate this factor so that a CRM-score of -25 yields the SA required spam threshold (required_score).

crm114_staticscore_good n (default: -3)
crm114_staticscore_prob_good n (default: -0.5)
crm114_staticscore_unsure n (default: 0)
crm114_staticscore_prob_spam n (default: 0.5)
crm114_staticscore_spam n (default: 3)

Static scores for different classifications and scores.

Scores for good/spam are used according to CRM114's classification.

On very short messages CRM114 often returns scores with the right sign (for spam/ham) but with a low absolute value because there are not enough tokens for sufficiently certain classification. The prob_good/prob_spam were introduced to benefit from these cases as well.

crm114_good_threshold n (default: 10)
crm114_spam_threshold n (default: -10)

The good/spam thresholds as used by CRM114.

mailreaver.crm allows one to set different thresholds for classification. crm114_good_threshold should be set to :good_threshold: and crm114_spam_threshold to :spam_threshold:. This plugin does not need these values to detect classified good/spam mails; but will use them only to determine its additional classes prob_good/prob_spam.

These settings override variables :good_threshold: and :spam_threshold: as used by mailreaver.crm and have their defaults set in mailfilter.cf. Thresholds delimit classification regions SPAM / UNSURE / GOOD based on CRM114 score (either by crm itself or by this plugin when --stats_only is used which only provides a score but not a status to the plugin). They are also used to determine additional classes prob_good/prob_spam when crm114_dynscore is false. default values are +10 for good threshold and -10 for spam threshold

crm114_use_cacheid (default: 0)

Set to preserve the CRM114-CacheID for later training and store messages in a reaver cache.

Enabling this adds additional processing as crm114 is expected to provide a rewritten message, and also causes reaver cache to grow, requiring periodic purging (not provided by the CRM114 system or this plugin).

To use the cache enable it in mailfilter.cf, set this option, and include the CacheID into all Mails with add_header all CRM114-CacheID _CRM114CACHEID_

crm114_lookup_cacheid (default: 0)

If crm114_use_cacheid is true and CRM114-CacheID is not found in the message, do a lookup in the reaver_cache/texts directory.

Note that this can be expensive as the lookup needs to read mail header section from files in the cache directory successively until a message is found, so keep the number of files small by regularly purging a cache directory if you use this option.

You also need to set crm114_cache_dir

crm114_cache_dir (default: ~/.crm114/reaver_cache)

Used to lookup cacheid if set crm114_lookup_cacheid. Needs to be set to reaver_cache/texts directory.

crm114_autodisable_score (default: 999)
crm114_autodisable_negative_score (default: -999)

Skip CRM114 check if a message already has a score >= crm114_autodisable_score or a score <= crm114_autodisable_negative_score from other tests.

This can be used if you think you have to save some CPU cycles and the number of messages reaching very high (or very low) SA scores is non-negligible, e.g. when white- or blacklisting is extensively used.

In that case you will also want to set a priority for CRM114 (e.g. priority CRM114_CHECK 899). This ensures that other (less expensive) tests run first and accumulate some points. 899 is recommended as an optimization because FuzzyOCR runs at 900; thus if CRM114 already yields a high SA score, then FuzzyOCR will decide to skip its tests (just like CRM114 might skip if the previous tests already got us crm114_autodisable_score).

NB: Do not worry too much about performance and CPU costs, unless you know you are really CPU bound. (And not just waiting for your slow DNS server to reply.)

crm114_timeout n (default: 10)

Set timeout of n seconds to cancel an unresponsive CRM114 process.


VERSIONS

 Version: 0.1, 070406
 Version: 0.2, 070408
 Version: 0.3, 070409
 Version: 0.3.1, 070412 (fixed typo)
 Version: 0.3.2, 070414 (checked documentation)
 Version: 0.4, 070421 (added crm114_autolearn)
 Version: 0.4.1, 070430 (fixed crm114_autolearn)
 Version: 0.4.2, 070501 (fixed crm114_autolearn again)
 Version: 0.4.3, 070506 (fixed crm114_autolearn again, now tested)
 Version: 0.5, 070507 (works with SA 3.2.0)
 Version: 0.6, 070514 (crm114_autodisable_score, omit test before learning)
 Version: 0.6.1, 070516 (adjusted 'CRM and SA disagree' condition)
 Version: 0.6.2, 070802 (fixed small bug, thanks to Rick Cooper)
 Version: 0.6.3, 070815 (now trying to prevent zombie processes)
 Version: 0.6.4, 070819 (use helper_app_pipe_open-code from Plugin::Pyzor)
 Version: 0.6.5, 070821 (fixed bug in pipe_open-code, thanks to Robert Horton)
 Version: 0.6.6, 070913 (fixed crm114_use_cacheid, added debug-tag)
 Version: 0.6.7, 070927 (add score for unsure but probably spam/good, fix possibly uninitialized value)
 Version: 0.7, 070928 (add POD documentation, considered stable)
 Version: 0.7.1, 071230 (fix prob-cases, where score did not appear in Spam-Status)
 Version: 0.7.2, 071230 (hopefully better error messages in case of process failure)
 Version: 0.7.3, 080127 (typo in autolearn)
 Version: 0.7.4, 080301 (CLT08-Edition, fixed header filter, thanks to Thomas Mueller)
 Version: 0.7.5, 080421 (added lookup_crm114_cacheid, thanks to Thomas Mueller)
 Version: 0.7.6, 081217 (added crm114_{good,spam}_threshold)
 Version: 0.8.0, 090418 (lots of improvements, thanks to Mark Martinec)
 Version: 0.8.1, 100607 (fix CRM114-Status regexp, thanks to Kevin Chua Soon Jia)