Bug 12266

Summary: kernel: Massive memory leak
Product: IPFire Reporter: Michael Tremer <michael.tremer>
Component: ---Assignee: Arne.F <arne.fitzenreiter>
Status: CLOSED UPSTREAM QA Contact:
Severity: Minor Usability    
Priority: Will affect all users CC: list
Version: 2   
Hardware: unspecified   
OS: Unspecified   
Attachments: Monthly memory graph
Yearly memory graph

Description Michael Tremer 2020-01-06 16:10:51 UTC
Created attachment 726 [details]
Monthly memory graph

I have a problem with two firewalls of the IPFire infrastructure. One in Dublin, the other one in Frankfurt.

They both have 512 MB of RAM which they totally consume after a short time (graphs are attached).

The kernel starts killing userspace processes when it runs out of memory and unbound is usually one of the first to die. The firewall dies very slowly. Only a reboot fixes this as it is all kernel space.

The systems however do not have a large number of connections open (very few actually) so it cannot be the connection tracking or any other legitimate use of memory I can think of.

Although those ones are virtual instances on AWS, I can reproduce this on our physical appliance in Hanover as well. It just has more memory and for some reason memory usage seems to grew slower. Therefore I assume that this problem exists on all systems (x86_64 at least).

I could now increase the size of the instances in this case to buy myself more time before I have to reboot, but clearly this isn't right. What can we do to find this problem?
Comment 1 Michael Tremer 2020-01-06 16:11:07 UTC
Created attachment 727 [details]
Yearly memory graph
Comment 2 Michael Tremer 2020-01-06 16:12:10 UTC
I forgot to mention that the systems do not have no swap space.
Comment 3 Stephan Mending 2020-03-23 17:35:41 UTC
Have you tried https://www.kernel.org/doc/html/v4.10/dev-tools/kmemleak.html ? 
In BSD there is a tool called systat which shows you the output for memory consumption of the different kernel subsystems. I wasn't able to find anything comparable for linux.
Comment 4 Michael Tremer 2020-03-23 19:42:46 UTC
No, but I think we might have solved this but merging patches from upstream.

So far we just need to sit and validate that this is actually gone.
Comment 5 Stephan Mending 2020-03-23 20:26:08 UTC
(In reply to Michael Tremer from comment #4)
> No, but I think we might have solved this but merging patches from upstream.
> 
> So far we just need to sit and validate that this is actually gone.

Nice ! Do you have a cross reference for me to the patch which solves this issue ? I'd be interested ! Just out of curiosity.