Bug 12078

Summary: IPS blocks Cloudflare DNS traffic over time
Product: IPFire Reporter: Wayne <mentalic>
Component: ---Assignee: Stefan Schantl <stefan.schantl>
Status: CLOSED FIXED QA Contact:
Severity: Major Usability    
Priority: Will affect most users CC: ipfb, michael.tremer, peter.mueller, stefan.schantl
Version: 2   
Hardware: all   
OS: Unspecified   
See Also: https://bugzilla.ipfire.org/show_bug.cgi?id=12087
Bug Depends on:    
Bug Blocks: 12052, 12108    

Description Wayne 2019-05-13 19:22:34 UTC
After running IPS on Red interface for a few hours I noticed that some sites stopped working, for example these two:

https://www.ipfire.org/
https://en.wikipedia.org/wiki/Inverse-square_law

Found that disabling IPS on the Red interface returned normal access to the two sites listed. Then enabled Red again and access still worked but did fail again after a while. Did not try and measure for how long it takes to fail though but it's a number of minutes.

Tried disabling all Rulesets and that did not restore normal access.

Reinstalled 131 ISO with same result.
Comment 1 Wayne 2019-05-17 17:32:13 UTC
More info:

-Noticed that rDNS is indicating failed on Status>Network gui when I cannot access some sites. Cannot ping by name ipfire.org for example.

-Disabling IPS on Red interface restores rDNS status to normal. Then Web browser works as does ping by name to ipfire.org.

I'm using Cloudfare's 1.1.1.1/1.0.0.1 DNS

Regards
Wayne
Comment 2 Michael Tremer 2019-05-17 19:14:43 UTC
Hey Wayne,

we need to find out if you have some rules there that are blocking any traffic. I cannot imagine that the IPS itself would be losing packets. I haven't seen that before.

So what is in fast.log and what is in stats.log?
Comment 3 Wayne 2019-05-17 23:15:45 UTC
Michael,

Nothing in the fast.log for the period after I stop start the IPS until I cannot access some sites.
Lots of stuff in the stats.log, not sure what I'm looking at here.

Counter                                       | TM Name                   | Value
------------------------------------------------------------------------------------
decoder.pkts                                  | Total                     | 1721
decoder.bytes                             | Total                     | 457405
decoder.ipv4                                  | Total                     | 1721
decoder.tcp                                   | Total                     | 1085
decoder.udp                                   | Total                     | 514
decoder.icmpv4                                | Total                     | 122
decoder.avg_pkt_size                          | Total                     | 265
decoder.max_pkt_size                          | Total                     | 1500
flow.tcp                                      | Total                     | 227
flow.udp                                      | Total                     | 261
flow.icmpv4                                   | Total                     | 1
tcp.sessions                                  | Total                     | 225
tcp.invalid_checksum                          | Total                     | 1
tcp.syn                                       | Total                     | 349
tcp.synack                                    | Total                     | 487
tcp.rst                                       | Total                     | 2
tcp.pkt_on_wrong_thread                       | Total                     | 377
app_layer.flow.dns_udp                        | Total                     | 254
app_layer.tx.dns_udp                          | Total                     | 254
app_layer.flow.failed_udp                     | Total                     | 7
ips.accepted                                  | Total                     | 1234
ips.blocked                                   | Total                     | 489
flow_mgr.new_pruned                           | Total                     | 186
flow_mgr.est_pruned                           | Total                     | 190
flow.spare                                | Total                     | 10000
flow_mgr.flows_checked                        | Total                     | 2
flow_mgr.flows_notimeout                      | Total                     | 2
flow_mgr.rows_checked                     | Total                     | 65536
flow_mgr.rows_skipped                     | Total                     | 65529
flow_mgr.rows_empty                           | Total                     | 5
flow_mgr.rows_maxlen                          | Total                     | 1
tcp.memuse                                | Total                     | 2293760
tcp.reassembly_memuse                     | Total                     | 196608
dns.memuse                                    | Total                     | 6656
flow.memuse                               | Total                     | 7269568
Comment 4 Michael Tremer 2019-05-17 23:18:05 UTC
(In reply to Wayne from comment #3)
> app_layer.flow.dns_udp                        | Total                     |
> 254
> app_layer.tx.dns_udp                          | Total                     |
> 254
> app_layer.flow.failed_udp                     | Total                     | 7

Stefan, do you think this line could suggest dropped DNS requests?
Comment 5 Wayne 2019-05-20 01:43:56 UTC
Switched DNS provider from Cloudfair to Verisign and problems has been resolved.
Comment 6 Michael Tremer 2019-05-20 11:03:30 UTC
Hmm, very interesting.

I will close this ticket then and maybe someone will re-open if they have the same issue.
Comment 7 Wayne 2019-05-20 15:00:36 UTC
Perhaps a different ticket for why Cloudfare no longer works. Others on the forum are seeing the same problem with Cloudfare + IPS enabled blocks some sites.
Comment 8 Michael Tremer 2019-05-20 15:11:22 UTC
(In reply to Wayne from comment #7)
> Perhaps a different ticket for why Cloudfare no longer works. Others on the
> forum are seeing the same problem with Cloudfare + IPS enabled blocks some
> sites.

Yes, I can confirm. A customer had the same issue. Nothing in the logs.

The workaround is not to use Cloudflare.
Comment 9 Michael Tremer 2019-05-23 09:39:58 UTC
I have a follow up on this:

We have failed to receive some emails on our IPFire mail server. The reason for that was, that the connection was interrupted by Suricata after the DATA command only for emails that came in without TLS. Those emails also needed to have an attachment.

There was a rule in our ruleset activated that would have filtered this, but nothing has been logged.

We have a preprocessor for DNS and SMTP. I *think* that there might be a chance that when the preprocessor finds a problem in a packet, that it won't be logged when it is dropped. The fast.log kind seems to be incomplete. But I could not set up the EVE logging kind - probably because our Suricata is compiled without support for JSON.

This is quite a serious issue now and debugging it is quite hard. The only solution is to disable suricata entirely which is a security risk now.

Also see https://bugzilla.ipfire.org/show_bug.cgi?id=12087 which is kind of similar to this problem.
Comment 10 Stefan Schantl 2019-06-26 21:05:06 UTC
Hello Wayne,

is this issue still present with core update 133 ?

Thanks in advance,

-Stefan
Comment 11 Wayne 2019-06-26 22:25:36 UTC
Hi Stefan

Just checked by switching DNS to Cloudfares 1.1.1.1/1.0.0.1. Yes the problem still is present in core update 133. Sites work initially but go back 30min later and seemingly random sites no longer work.

Regards
Wayne
Comment 12 Peter Müller 2019-06-27 13:18:54 UTC
I can confirm this bug is still valid using other DNS servers as well.

There are also some ICMP issues here (Nagios check_ping does not work),
so I guess things might be related to Suricata preprocessors dropping
packages.

Trying to investigate here over the weekend, let me know if there is
something to test. :-)
Comment 13 Michael Tremer 2019-07-05 13:53:07 UTC
Hello guys,

can I ask you to investigate this all again. I am not sure what we know about this problem and what we do not know. However, this problem is still affecting a large number of users and I would be happy to ship a fix with the next Core Update so that enabling the IPS does not have any unwanted side-effects any more.
Comment 14 Peter Müller 2019-10-13 10:57:57 UTC
As far as I am concerned, this was caused by a kernel bug and affected much more than just Cloudflare DNS.

It will be fixed in upcoming Core Update 137: https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=415969cc1b8edd06ee84375614c4eb06cf182d36

Please refer to https://lists.ipfire.org/pipermail/development/2019-September/006244.html for mailing list conversation on this.
Comment 16 Wayne 2019-10-29 16:25:25 UTC
Loaded up 137 version  	IPFire 2.23 (x86_64) - Development Build: master/f48920d8 

Can confirm cloudfare DNS still fails reverse lookup, some sites do not load.
Comment 17 Michael Tremer 2019-10-29 16:59:15 UTC
(In reply to Wayne from comment #16)
> Can confirm cloudfare DNS still fails reverse lookup, some sites do not load.

Reverse lookup? Can you explain? This wasn't mentioned here before...
Comment 18 Wayne 2019-10-29 17:12:01 UTC
Reverse lookup? Can you explain? This wasn't mentioned here before...


See comment #1
rDNS fails.
Comment 19 Michael Tremer 2019-10-30 21:34:04 UTC
(In reply to Wayne from comment #18)
> Reverse lookup? Can you explain? This wasn't mentioned here before...

Okay, I get you. Why would these be any different than regular DNS queries?
Comment 20 Peter Müller 2019-12-11 20:20:54 UTC
By the way: https://blog.ipfire.org/post/ipfire-2-23-core-update-137-released
Comment 21 Peter Müller 2019-12-16 16:24:40 UTC
Is this bug still valid?
Comment 22 Michael Tremer 2019-12-16 17:25:55 UTC
I cannot say for certain, but some people have confirmed that this works okay now.

> https://lists.ipfire.org/pipermail/development/2019-December/006743.html