Summary: | Migrate to rspamd | ||
---|---|---|---|
Product: | Infrastructure | Reporter: | Michael Tremer <michael.tremer> |
Component: | Mail & Mailing Lists | Assignee: | Peter Müller <peter.mueller> |
Status: | CLOSED FIXED | QA Contact: | Peter Müller <peter.mueller> |
Severity: | - Unknown - | ||
Priority: | - Unknown - | CC: | jonatan.schlag |
Version: | unspecified | ||
Hardware: | unspecified | ||
OS: | Unspecified | ||
See Also: | https://bugzilla.ipfire.org/show_bug.cgi?id=11703 | ||
Bug Depends on: | |||
Bug Blocks: | 11634 |
Description
Michael Tremer
2018-03-16 19:31:24 UTC
Currently setting up rspamd in my own infrastructure, as soon as I have everything running, I'll migrate ipfire.org to rspamd, too. I watched the talk but I cannot see what has convinced you so suddenly. I agree that rspamd is the way to go, but generally didn't find the talk mind-blowing. I am especially skeptical about the self-learning approach and if that is going to work for us that we don't receive a shit ton of email a day. We might just see a certain kind of spam once or twice, but not hundreds of times to even identify something as a repeated email following a pattern. But I guess we will see about that. In the end we do not see that much spam any ways so even without a spam filter this is manageable. I just would suppose that we would rely on an external source of rules to fight spam. However, what I want to say is, that I do not think that we need to run a setup that has rspamd in passive mode and then amavis to do some actual filtering. We should be fine to swap it straight away. (In reply to Michael Tremer from comment #2) > I watched the talk but I cannot see what has convinced you so suddenly. I > agree > that rspamd is the way to go, but generally didn't find the talk > mind-blowing. Well, it were actually several points: (a) Rspamd does not use strict rules as Spamassassin does. I am not an expert when it comes to neural networks, but it seems logic to me that collecting all mail data first and do classifying on the whole set provides better recognition of spam/ham. We'll se how things develop here. (b) It is the only stable software I know which is supporting ARC. (c) It actually makes sense to include authentication mechanisms into spam/phishing detection. Although authentication (DKIM, SPF) is not intended to provide protection against spam, it might provide protection against phishing. Previous, we had several milters which do not share their results, and things tend to be more complicated then. (d) A dynamic phishing detection support (via OpenPhish, Phishtank) is included. (e) We have IP scoring here - and DNS(B|W)L lookups are not strictly treated anymore, which kind of elimitates false-positive side effects when a sender IP is listed in a DNSBL. (f) There is selective greylisting. ;-) I agree none of that points is really outstanding, but the summary looks a lot better than Amavis, Spamassassin + some milters for authentication. Besides, some of them used to crash on my machine, and rspamd doesn't. But that is only a nice side effect. > > I am especially skeptical about the self-learning approach and if that is > going > to work for us that we don't receive a shit ton of email a day. We might just > see a certain kind of spam once or twice, but not hundreds of times to even > identify something as a repeated email following a pattern. Good point. My feeling is that rspamd is designed for quite big setups. Hoever, things like DCC might help here. > > But I guess we will see about that. In the end we do not see that much spam > any > ways so even without a spam filter this is manageable. I just would suppose > that > we would rely on an external source of rules to fight spam. > > However, what I want to say is, that I do not think that we need to run a > setup > that has rspamd in passive mode and then amavis to do some actual filtering. > We > should be fine to swap it straight away. All right. I am still testing rspamd here (currently trying to get DKIM signing to work, and sometimes, milter headers are not added, yet), but things look good so far. (In reply to Peter Müller from comment #3) > (a) Rspamd does not use strict rules as Spamassassin does. I am not an > expert when it comes to neural networks, but it seems logic to me that > collecting all mail data first and do classifying on the whole set provides > better recognition of spam/ham. We'll se how things develop here. I haven't looked at how this is done precisely, but it is very unlikely that it is a neural network. It will be something like a SVM (Support Vector Machine) and you will try to create some sort of function that puts each email in a higher dimensional space. Then you try to draw a place between all the emails so that all spam emails are on one side and all ham emails are on the other side. That approach would have a static function to put the emails into the space and the plane would be variable and moved around with every email that you receive but hopefully less and less and less the more information you have and the clearer the two classes are to distinguish. Not sure how flexible this approach is. It is just some supervised learning. Should be quite effective when you know what you are looking for. The problem with spam is of course that you don't always know that and it seems to become more tricky by the day even for a human to do this. Being able to look at all emails in the world, it is possible to train this to be perfect. The question would be how good can it get with only a limited amount of training. But an advantage would be that you can of course change this dynamically unlike the static rules from SA. But are SA static when they are updated once a day? Will rspamd learn faster than that? Do we have a schedule for this week? I have already installed the sieve hooks and have been training the spam filter for about a week. Haven't let it learn any ham. (In reply to Michael Tremer from comment #5) > Do we have a schedule for this week? Sorry for the late reply. Things are quite busy here, but I'll try to take care about this on Thursday or so. So far I am not experiencing any major issues with rspamd here (on 1.7.2 now), except that it is not sending DMARC reports at the moment, but this looks like a configuration error. Currently testing some additional RBLs such as NiXSpam or blocklist.de for additional spam detection. > > I have already installed the sieve hooks and have been training the spam > filter for about a week. Haven't let it learn any ham. Good, thank you. As agreed on today's telephone conference, please set up the configuration so that we can switch over next week. Should we say Monday or when would you like to do this? Except fuzzy store and DMARC reports, rspamd setup is working here. Will push my configuration files to the mail machine over the weekend. Uploaded configfiles to mail01. What is stopping us from using the DKIM Signing? Any problems with that? Migration done. This seems to be working perfectly so far. I had to create /var/lib/rspamd/dmarc_reports_last_sent and give permissions to rspamd to send out any reports. I also set the log level to error. This guide is using the same keys for ARC as are used for DKIM: https://thomas-leister.de/en/mailserver-debian-stretch/ Any comments on this? (In reply to Michael Tremer from comment #12) > I had to create /var/lib/rspamd/dmarc_reports_last_sent and give permissions > to rspamd to send out any reports. Sorry, I was missing that one. :-| > > I also set the log level to error. > > This guide is using the same keys for ARC as are used for DKIM: > https://thomas-leister.de/en/mailserver-debian-stretch/ Any comments on this? Thanks. I'll have a look at the weekend. Removed some smtpd_recipient_restrictions so rspamd is going to see more spam. Let me know in case anything goes wrong here... Enabled ARC module, but not sure if working well... Did not experience any increase in spam so far. No idea about ARC either. Should we expect rejected emails from people who enforce DMARC? ARC should be fixed now (the keys were expected at a different location): 2018-04-20 17:35:43 #2766(rspamd_proxy) <0bcd38>; arc; arc.lua:514: file /var/lib/rspamd/arc/ipfire.org.201801.key does not exists Good english, eh? :-) Yes, I was considering creating a symlink and just copying the DKIM configuration file :) Does it have the same selector bug? Set RCPT restrictions to: # SMTPD RCPT restrictions smtpd_recipient_restrictions = # do not accept crappy mails reject_non_fqdn_sender, reject_non_fqdn_recipient, reject_unknown_sender_domain, reject_unknown_recipient_domain, # permit my networks permit_mynetworks, # reject unauth destination reject_unauth_destination, # reject non-existent recipients reject_unverified_recipient, permit That way, (nearly) any mail is passed to rspamd. Let me know if anything goes wrong here. (In reply to Peter Müller from comment #19) > Set RCPT restrictions to: > Sorry, that was the wrong example. Here is the correct one: # recipient restrictions (be careful here) smtpd_recipient_restrictions = # do not allow crappy mails reject_non_fqdn_sender, reject_non_fqdn_recipient, reject_unknown_sender_domain, reject_unknown_recipient_domain, # permit our children :-) permit_mynetworks, permit_sasl_authenticated, # define some whitelisting here (should be unnecessary, but we never know) check_client_access hash:/etc/postfix/access_ip, check_helo_access hash:/etc/postfix/access_ehlo, check_sender_access hash:/etc/postfix/access_domain, # never allow relaying reject_unauth_destination, # and block mails to non-existent users reject_unverified_recipient, reject_unlisted_recipient, permit > > That way, (nearly) any mail is passed to rspamd. Let me know if anything > goes wrong here. For the records: rspamd is basically set up (except the problem of signing mails with ARC, which is filed under #11703) without any major issues. The DKIM selector bug can be found here: https://github.com/vstakhov/rspamd/issues/2188 Some spam messages are still coming through, but they are quite hard to filter, since most of them use *.gmail.com and are already learned as spam by the Bayes classifier. In case they disturb too much, please file them in a separate ticket. Closing this (if anybody disagrees, please reopen. :-) ). Thanks for working on this. This went really smooth! Great job! |