Bug 12548

Summary: Suricata 6.x causes high CPU load in Core Update 153 (testing)
Product: IPFire Reporter: Fred Kienker <fkienker>
Component: ---Assignee: Stefan Schantl <stefan.schantl>
Status: CLOSED FIXED QA Contact:
Severity: Major Usability    
Priority: Will affect an average number of users CC: adolf.belka, matthias.fischer, michael.tremer, peter.mueller, peter.mueller
Version: 2   
Hardware: unspecified   
OS: Unspecified   
See Also: https://bugzilla.ipfire.org/show_bug.cgi?id=12536
Attachments: Load graph
Load graph for 5.0.5 (idle)
Load graph for patched 6.0.1 (idle)
6.0.1 patched with usleep(5000)
Startup sequence: suricata 6.0.8
CPU load during the last hour - idle

Description Fred Kienker 2020-12-12 00:51:32 UTC
Created attachment 815 [details]
Load graph

After updating to C153 Testing from C152 Stable, Suricata uses 2 to 3 times the CPU percentage it did previously. Changes show up on the CPU and the Load graphs and consistently put Suricata at the top of the htop list. Systems with plenty of CPU resources are okay but slower systems may overload.
Comment 1 Michael Tremer 2020-12-12 09:39:17 UTC
> https://forum.suricata.io/t/cpu-usage-of-version-6-0-0/706

This is known upstream, but no solution is available, yet.
Comment 2 Matthias Fischer 2020-12-12 10:14:33 UTC
FYI:
As a temporary 'workaround(?) I pushed a 'downdate' to 'suricata 5.0.5' today.

This version is running here without any changes in cpu load. Runs like 5.0.4.

Don't know if we'll like that.

I also updated 'libhtp' (used by 'suricata') to 0.5.36.
Comment 3 Michael Tremer 2020-12-15 16:10:57 UTC
> https://redmine.openinfosecfoundation.org/issues/4096#note-26

I have posted on the upstream bugtracker.

I believe that we might have to pull the release and rebuild it all again with a downgraded suricata.
Comment 4 Michael Tremer 2021-02-15 13:25:25 UTC
A fix has been posted upstream:

> https://github.com/OISF/suricata/pull/5840/commits/17a38f1823adeb9eb059f666686e35509f3a13d2
Comment 5 Matthias Fischer 2021-02-15 13:51:19 UTC
Thanks!

Working on it. ('Devel' is running...)
Comment 6 Matthias Fischer 2021-02-16 10:34:40 UTC
Created attachment 858 [details]
Load graph for 5.0.5 (idle)
Comment 7 Matthias Fischer 2021-02-16 10:36:07 UTC
Tested with 5.0.5 - compared to a patched 6.0.1 - see attachments (idle load).

IMHO there is NO significant improvements. The cpu load is almost as high as before.

Running on Core 153 /x86_64.

Profil-ID:
https://fireinfo.ipfire.org/profile/5f68a6360ffbecb6877dcac75f5b8c8030f43ce8
Comment 8 Matthias Fischer 2021-02-16 10:37:29 UTC
Created attachment 859 [details]
Load graph for patched 6.0.1 (idle)
Comment 9 Michael Tremer 2021-02-16 11:41:43 UTC
I posted your findings upstream on the same ticket and requested for it to be reopened since this fix does not actually fix the problem.

I have no idea how we can help the suricata team apart from testing and validating any proposed fixes.
Comment 10 Matthias Fischer 2021-02-16 12:50:36 UTC
Just looked around a bit.

Searching for 'usleep()' found a lot of pages, declaring this function as "obsolete, use nanosleep() instead". Always assuming that usleep is the culprit in this case.

But I have no idea how to change that.
Comment 11 Michael Tremer 2021-02-16 19:11:22 UTC
It looks like our glibc is using nanosleep internally when calling usleep():

> https://git.ipfire.org/?p=thirdparty/glibc.git;a=blob;f=sysdeps/posix/usleep.c;hb=25251c0707fe34f30a27381a5fabc35435a96621
Comment 12 Matthias Fischer 2021-02-17 13:24:59 UTC
Created attachment 860 [details]
6.0.1 patched with usleep(5000)

Based on:

https://redmine.openinfosecfoundation.org/issues/4096#note-23

(It looks like changing the usleep value to 200 already gives a big improvement, but setting it much higher to something like 5000 gives better results.), I'm now testing with "usleep (5000);".

First impressions: CPU load is significantly lower (0.6-~2%) compared to the previous build, raising to ~35% during a 100MBit download (1GB).

But: "Need to look at what the impact is of changing this."

I cannot judge what the consequences might be, too...
Comment 13 Michael Tremer 2021-03-06 11:08:21 UTC
There is some movement to track this:

> https://redmine.openinfosecfoundation.org/issues/4379

This ticket/change is targeted for suricata 7, which seems to suggest that we are going to skip the 6 series.
Comment 14 Matthias Fischer 2022-09-28 17:59:41 UTC
FYI:

'suricata 6.0.7' + '6.0.8' came out yesterday.

Excerpt from Changelog:

"Bug #4421: flow manager: using too much CPU during idle (6.0.x backport)"

I'll give it another try and test - Devel is working...it can't go more wrong than it does... ;-)
Comment 15 Matthias Fischer 2022-09-29 17:03:26 UTC
Last news:

Compiled (suricata 6.0.8 and libhtp 0.50.41).

Installed.

Running on Core 170.

Profile ID:
https://fireinfo.ipfire.org/profile/5f68a6360ffbecb6877dcac75f5b8c8030f43ce8


First impressions:

- CPU load dropped to 0.0%-2.0% in idle mode compared to v6.0.x.

- Max. CPU load: ~54% = during downloading a pure bin file with with 100Mbit = ~12.8MB/sec.

Had to set

...
    mqtt:
      enabled: yes
...

in '/etc/suricata/suricata.yaml' to avoid an error message during startup:

"[ERRCODE: SC_ERR_CONF_YAML_ERROR(242)] - App-Layer protocol mqtt enable status not set, so enabling by default. This behavior will change in Suricata 7, so please update your config. See ticket #4744 for more details."

=> Looking good. ;-)

Complete startup: see attachment
Comment 16 Matthias Fischer 2022-09-29 17:05:33 UTC
Created attachment 1094 [details]
Startup sequence: suricata 6.0.8
Comment 17 Michael Tremer 2022-09-29 17:42:51 UTC
Looks good - or did I miss anything? How is your CPU usage?
Comment 18 Matthias Fischer 2022-09-29 18:36:52 UTC
Created attachment 1095 [details]
CPU load during the last hour - idle

No, you didn't miss anything - as I wrote: it's looking good.

Find attached the graph for the last hour. Much better than before.

I'm just checking the build with latest 'next' on  the second 'Devel' and then I'll prepare a patch for both 'suricata 6.0.8' and 'libhtp 0.50.41'.
Comment 19 Adolf Belka 2022-12-20 22:10:12 UTC
This was released into Core Update 171 Testing and evaluated there and confirmed to be fixed.
Comment 20 Adolf Belka 2022-12-20 22:10:50 UTC
Core Update 171 has been released so this bug can now be marked as closed - fixed