Starbridge Corp
Email, Antispam, Linux

Accueil > HowTo Mail > SpamAssassin Bayes database to MySQL

SpamAssassin Bayes database to MySQL

jeudi 26 avril 2007, par tonio

Migrating our Debian Anti-Spam Anti-Virus Gateway Email Server’s Bayes database to MySQL

adapté de l’article : http://www200.pair.com/mecham/spam/debian-spamassassin-sql.html

Install MySQL 5.0 :

apt-get install mysql-server-5.0

Note that the Debian installation provided the Perl modules SpamAssassin needs.
(See ’Requirements’ at http://spamassassin.apache.org/full/3.x.x/dist/sql/README.bayes)

Once MySQL is installed, log in :

mysql -u root

From the mysql> prompt, create the two passwords required for root :

SET PASSWORD FOR 'root'@'localhost' = PASSWORD('root_password');
SET PASSWORD FOR 'root'@'sfa' = PASSWORD('root_password');

sfa is our hostname, you must change this to match your hostname.
Now create the database we will use to store Bayes and auto-whitelist data :
create database sa_bayes ;

We will now create a user called ’sa_user’, create a password for ’sa_user’ and allow ’sa_user’ to do what it needs to the ’sa_bayes’ database :

GRANT SELECT, INSERT, UPDATE, DELETE ON sa_bayes.* TO 'sa_user'@'localhost' IDENTIFIED BY 'sa_user_password';

Refresh privileges:
FLUSH PRIVILEGES;

Let’s take a look at what we have done :

SELECT Host, User FROM mysql.user;

Note that there is one user the system uses : debian-sys-maint. We are done for the moment, so :

quit

Navigate to where our Bayes data is currently stored :

cd /var/lib/amavis/.spamassassin

As the ’amavis’ user, we need to create a backup file of our Bayes data that will later be restored to our SQL database. This process may take a while :

su amavis -c 'sa-learn --sync --force-expire'
su amavis -c 'sa-learn --backup > backup.txt'

There are four files we need to grab from the source code :

wget http://spamassassin.apache.org/full/3.x.x/dist/sql/bayes_mysql.sql
wget http://spamassassin.apache.org/gtube/gtube.txt

These commands create the tables we need, in the sa_bayes database.

mysql -u root -p sa_bayes < bayes_mysql.sql

Now we will configure SpamAssassin to use MySQL. Once we do this, don’t restart amavisd-new until we are completely finished. If you have to restart amavisd-new before the process is complete, set local.cf back the way it was. Note that Debian places local.cf in the /etc/spamassassin directory while many other OSs place it in /etc/mail/spamassassin.

Also note that we will comment out bayes_path and auto_whitelist_path :

cp /etc/spamassassin/local.cf /etc/spamassassin/local.cf-pre-sql
vi /etc/spamassassin/local.cf

Insert text below, and comment out items as noted :

use_bayes 1
bayes_auto_expire 0
bayes_store_module              Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn                   DBI:mysql:sa_bayes:localhost
bayes_sql_username              sa_user
bayes_sql_password              sa_user_password

#auto_whitelist_path /var/lib/amavis/.spamassassin/auto-whitelist
#bayes_path  /var/lib/amavis/.spamassassin/bayes
#bayes_file_mode 0777

#this next part will store all data in just one user's table, no matter who runs sa-learn
bayes_sql_override_username amavis

bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam    0.1
bayes_auto_learn_threshold_spam       5.0
use_auto_whitelist        0

It is required we initialize the database by learning a message.

su amavis -c 'sa-learn --spam gtube.txt'

OK, now let’s load in the Bayes data ; this may take 5 minutes, or more than an hour and don’t be surprised if your CPU load increases substantially during this process :

su amavis -c 'sa-learn --restore backup.txt'

On a healthy system that has been in use for a while there will be between 100,000 and 150,000 rows of data in the Bayes data tables. If you would like to view our progress, open another PuTTY session, log into MySQL, and monitor the progress :

mysql -u root -p

USE sa_bayes;
SELECT COUNT(*) spam_count FROM bayes_token;

You can use the [up-arrow] to recall commands.

While you wait, commands that provide interesting data :

SHOW DATABASES;
SHOW TABLES;
DESCRIBE bayes_token;
DESCRIBE bayes_vars;
SELECT username FROM bayes_vars;

(should show ’amavis’ and ’root’)

Once the restore is complete (our first PuTTY session has returned to the shell prompt), we will change all tables from MyISAM to InnoDB.

Note that the 4.x version of MySQL we are using does not require any configuration changes to use InnoDB ; 3.x versions do however.

Log back into MySQL if necessary (and use the sa_bayes database), then issue these commands to convert our data :

mysql -u root -p

USE sa_bayes;
ALTER TABLE awl TYPE=InnoDB;
ALTER TABLE bayes_expire TYPE=InnoDB;
ALTER TABLE bayes_global_vars TYPE=InnoDB;
ALTER TABLE bayes_seen TYPE=InnoDB;
ALTER TABLE bayes_token TYPE=InnoDB;
ALTER TABLE bayes_vars TYPE=InnoDB;
ANALYZE TABLE awl;
ANALYZE TABLE bayes_expire;
ANALYZE TABLE bayes_global_vars;
ANALYZE TABLE bayes_seen;
ANALYZE TABLE bayes_token;
ANALYZE TABLE bayes_vars;

If everything went well, we can quit :

quit

The data conversion is complete, now we can test amavisd-new :

amavisd-new stop
amavisd-new debug-sa

If using amavisd-new 2.4.3 or newer, you may be able to limit debug output :

amavisd-new -d bayes debug-sa

or :

amavisd-new -d auto-whitelist debug-sa

Send a test mail through, and observe. You should see :

debug: bayes: Using username: sa_user
debug: bayes: Database connection established
debug: bayes: found bayes db version 3

If everything looks OK, stop debugging with [Ctrl]+c and start amavisd-new :

amavisd-new start

If you would like to see how many messages have been learned and how many tokens are in the database :

su amavis -c 'sa-learn --dump magic'

pour ameliorer les performances on a desactivé le "opportunistic (automatic) Bayes auto-expiry" en specifiant "bayes_auto_expire 0" dans le local.cf
Il faut donc creer une tache cron pour effectuer ceci :
su -c 'sa-learn --force-expire --sync' amavis

un crontabs de l’user amavis par exemple :

16 3 * * * sa-learn --sync --force-expire

Voir en ligne : http://www200.pair.com/mecham/spam/...

SPIP | | Plan du site | Suivre la vie du site RSS 2.0
Habillage visuel © digitalnature sous Licence GPL