Bug 11676

Summary:	Migrate to rspamd
Product:	Infrastructure	Reporter:	Michael Tremer <michael.tremer>
Component:	Mail & Mailing Lists	Assignee:	Peter Müller <peter.mueller>
Status:	CLOSED FIXED	QA Contact:	Peter Müller <peter.mueller>
Severity:	- Unknown -
Priority:	- Unknown -	CC:	jonatan.schlag
Version:	unspecified
Hardware:	unspecified
OS:	Unspecified
See Also:	https://bugzilla.ipfire.org/show_bug.cgi?id=11703
Bug Depends on:
Bug Blocks:	11634

Description Michael Tremer 2018-03-16 19:31:24 UTC

At Chemnitzer Linux-Tage, Peter has been convinced to use rspamd instead of amavisd, opendkim/opendmarc, etc.

This ticket will track the progress of this.

rspamd has been installed on the mail server and redis is set up, too. The daemon starts, but nothing is configured, yet.

Please migrate our current configuration and set up rspamd as a milter. It should be possible to test this easily by setting up another instance of smtpd.

As soon this is done and tested we can just swap the milter configuration and remove the existing milters.

Comment 1 Peter Müller 2018-03-21 16:05:07 UTC

Currently setting up rspamd in my own infrastructure, as soon as I have everything running, I'll migrate ipfire.org to rspamd, too.

Comment 2 Michael Tremer 2018-03-21 16:45:05 UTC

I watched the talk but I cannot see what has convinced you so suddenly. I agree
that rspamd is the way to go, but generally didn't find the talk mind-blowing.

I am especially skeptical about the self-learning approach and if that is going
to work for us that we don't receive a shit ton of email a day. We might just
see a certain kind of spam once or twice, but not hundreds of times to even
identify something as a repeated email following a pattern.

But I guess we will see about that. In the end we do not see that much spam any
ways so even without a spam filter this is manageable. I just would suppose that
we would rely on an external source of rules to fight spam.

However, what I want to say is, that I do not think that we need to run a setup
that has rspamd in passive mode and then amavis to do some actual filtering. We
should be fine to swap it straight away.

Comment 3 Peter Müller 2018-03-24 15:15:03 UTC

(In reply to Michael Tremer from comment #2)
> I watched the talk but I cannot see what has convinced you so suddenly. I
> agree
> that rspamd is the way to go, but generally didn't find the talk
> mind-blowing.
Well, it were actually several points:
(a) Rspamd does not use strict rules as Spamassassin does. I am not an expert when it comes to neural networks, but it seems logic to me that collecting all mail data first and do classifying on the whole set provides better recognition of spam/ham. We'll se how things develop here.
(b) It is the only stable software I know which is supporting ARC.
(c) It actually makes sense to include authentication mechanisms into spam/phishing detection. Although authentication (DKIM, SPF) is not intended to provide protection against spam, it might provide protection against phishing. Previous, we had several milters which do not share their results, and things tend to be more complicated then.
(d) A dynamic phishing detection support (via OpenPhish, Phishtank) is included.
(e) We have IP scoring here - and DNS(B|W)L lookups are not strictly treated anymore, which kind of elimitates false-positive side effects when a sender IP is listed in a DNSBL.
(f) There is selective greylisting. ;-)

I agree none of that points is really outstanding, but the summary looks a lot better than Amavis, Spamassassin + some milters for authentication. Besides, some of them used to crash on my machine, and rspamd doesn't. But that is only a nice side effect.
> 
> I am especially skeptical about the self-learning approach and if that is
> going
> to work for us that we don't receive a shit ton of email a day. We might just
> see a certain kind of spam once or twice, but not hundreds of times to even
> identify something as a repeated email following a pattern.
Good point. My feeling is that rspamd is designed for quite big setups. Hoever, things like DCC might help here.
> 
> But I guess we will see about that. In the end we do not see that much spam
> any
> ways so even without a spam filter this is manageable. I just would suppose
> that
> we would rely on an external source of rules to fight spam.
> 
> However, what I want to say is, that I do not think that we need to run a
> setup
> that has rspamd in passive mode and then amavis to do some actual filtering.
> We
> should be fine to swap it straight away.
All right.

I am still testing rspamd here (currently trying to get DKIM signing to work, and sometimes, milter headers are not added, yet), but things look good so far.

Comment 4 Michael Tremer 2018-03-26 15:31:10 UTC

(In reply to Peter Müller from comment #3)
> (a) Rspamd does not use strict rules as Spamassassin does. I am not an
> expert when it comes to neural networks, but it seems logic to me that
> collecting all mail data first and do classifying on the whole set provides
> better recognition of spam/ham. We'll se how things develop here.

I haven't looked at how this is done precisely, but it is very unlikely that it is a neural network. It will be something like a SVM (Support Vector Machine) and you will try to create some sort of function that puts each email in a higher dimensional space. Then you try to draw a place between all the emails so that all spam emails are on one side and all ham emails are on the other side.

That approach would have a static function to put the emails into the space and the plane would be variable and moved around with every email that you receive but hopefully less and less and less the more information you have and the clearer the two classes are to distinguish.

Not sure how flexible this approach is. It is just some supervised learning. Should be quite effective when you know what you are looking for. The problem with spam is of course that you don't always know that and it seems to become more tricky by the day even for a human to do this. Being able to look at all emails in the world, it is possible to train this to be perfect. The question would be how good can it get with only a limited amount of training.

But an advantage would be that you can of course change this dynamically unlike the static rules from SA. But are SA static when they are updated once a day? Will rspamd learn faster than that?

Comment 5 Michael Tremer 2018-04-03 18:36:51 UTC

Do we have a schedule for this week?

I have already installed the sieve hooks and have been training the spam filter for about a week. Haven't let it learn any ham.

Comment 6 Peter Müller 2018-04-09 19:19:20 UTC

(In reply to Michael Tremer from comment #5)
> Do we have a schedule for this week?
Sorry for the late reply. Things are quite busy here, but I'll try to take care about this on Thursday or so.

So far I am not experiencing any major issues with rspamd here (on 1.7.2 now), except that it is not sending DMARC reports at the moment, but this looks like a configuration error.

Currently testing some additional RBLs such as NiXSpam or blocklist.de for additional spam detection.
> 
> I have already installed the sieve hooks and have been training the spam
> filter for about a week. Haven't let it learn any ham.
Good, thank you.

Comment 7 Michael Tremer 2018-04-10 00:12:32 UTC

As agreed on today's telephone conference, please set up the
configuration so that we can switch over next week.

Should we say Monday or when would you like to do this?

Comment 8 Peter Müller 2018-04-13 17:10:21 UTC

Except fuzzy store and DMARC reports, rspamd setup is working here.

Will push my configuration files to the mail machine over the weekend.

Comment 9 Peter Müller 2018-04-15 21:25:15 UTC

Uploaded configfiles to mail01.

Comment 10 Michael Tremer 2018-04-16 12:29:13 UTC

What is stopping us from using the DKIM Signing? Any problems with
that?

Comment 11 Michael Tremer 2018-04-17 13:19:05 UTC

Migration done. This seems to be working perfectly so far.

Comment 12 Michael Tremer 2018-04-17 14:03:32 UTC

I had to create /var/lib/rspamd/dmarc_reports_last_sent and give permissions to rspamd to send out any reports.

I also set the log level to error.

This guide is using the same keys for ARC as are used for DKIM: https://thomas-leister.de/en/mailserver-debian-stretch/ Any comments on this?

Comment 13 Peter Müller 2018-04-18 06:36:04 UTC

(In reply to Michael Tremer from comment #12)
> I had to create /var/lib/rspamd/dmarc_reports_last_sent and give permissions
> to rspamd to send out any reports.
Sorry, I was missing that one. :-|
> 
> I also set the log level to error.
> 
> This guide is using the same keys for ARC as are used for DKIM:
> https://thomas-leister.de/en/mailserver-debian-stretch/ Any comments on this?
Thanks. I'll have a look at the weekend.

Comment 14 Peter Müller 2018-04-19 21:21:41 UTC

Removed some smtpd_recipient_restrictions so rspamd is going to see more spam. Let me know in case anything goes wrong here...

Comment 15 Peter Müller 2018-04-19 21:42:04 UTC

Enabled ARC module, but not sure if working well...

Comment 16 Michael Tremer 2018-04-20 12:25:28 UTC

Did not experience any increase in spam so far. No idea about ARC either. Should
we expect rejected emails from people who enforce DMARC?

Comment 17 Peter Müller 2018-04-20 18:38:00 UTC

ARC should be fixed now (the keys were expected at a different location):

2018-04-20 17:35:43 #2766(rspamd_proxy) <0bcd38>; arc; arc.lua:514: file /var/lib/rspamd/arc/ipfire.org.201801.key does not exists

Good english, eh? :-)

Comment 18 Michael Tremer 2018-04-20 20:59:31 UTC

Yes, I was considering creating a symlink and just copying the DKIM
configuration file :) Does it have the same selector bug?

Comment 19 Peter Müller 2018-04-21 18:08:47 UTC

Set RCPT restrictions to:

# SMTPD RCPT restrictions
smtpd_recipient_restrictions =
# do not accept crappy mails
	reject_non_fqdn_sender,
	reject_non_fqdn_recipient,
	reject_unknown_sender_domain,
	reject_unknown_recipient_domain,
# permit my networks
	permit_mynetworks,
# reject unauth destination
	reject_unauth_destination,
# reject non-existent recipients
	reject_unverified_recipient,
	permit

That way, (nearly) any mail is passed to rspamd. Let me know if anything goes wrong here.

Comment 20 Peter Müller 2018-04-21 18:09:32 UTC

(In reply to Peter Müller from comment #19)
> Set RCPT restrictions to:
> 
Sorry, that was the wrong example. Here is the correct one:

# recipient restrictions (be careful here)
smtpd_recipient_restrictions = 
# do not allow crappy mails
	reject_non_fqdn_sender,
	reject_non_fqdn_recipient,
	reject_unknown_sender_domain,
	reject_unknown_recipient_domain,
# permit our children :-)
	permit_mynetworks,
	permit_sasl_authenticated,
# define some whitelisting here (should be unnecessary, but we never know)
	check_client_access hash:/etc/postfix/access_ip,
	check_helo_access hash:/etc/postfix/access_ehlo,
	check_sender_access hash:/etc/postfix/access_domain,
# never allow relaying
	reject_unauth_destination,
# and block mails to non-existent users
	reject_unverified_recipient,
	reject_unlisted_recipient,
	permit

> 
> That way, (nearly) any mail is passed to rspamd. Let me know if anything
> goes wrong here.

Comment 21 Peter Müller 2018-04-30 19:33:28 UTC

For the records: rspamd is basically set up (except the problem of signing mails with ARC, which is filed under #11703) without any major issues.

The DKIM selector bug can be found here: https://github.com/vstakhov/rspamd/issues/2188

Some spam messages are still coming through, but they are quite hard to filter, since most of them use *.gmail.com and are already learned as spam by the Bayes classifier. In case they disturb too much, please file them in a separate ticket.

Closing this (if anybody disagrees, please reopen. :-) ).

Comment 22 Michael Tremer 2018-04-30 21:16:15 UTC

Thanks for working on this. This went really smooth! Great job!