Bug 12266

Summary:	kernel: Massive memory leak
Product:	IPFire	Reporter:	Michael Tremer <michael.tremer>
Component:	---	Assignee:	Arne.F <arne.fitzenreiter>
Status:	CLOSED UPSTREAM	QA Contact:
Severity:	Minor Usability
Priority:	Will affect all users	CC:	list
Version:	2
Hardware:	unspecified
OS:	Unspecified
Attachments:	Monthly memory graph Yearly memory graph

Description Michael Tremer 2020-01-06 16:10:51 UTC

Created attachment 726 [details]
Monthly memory graph

I have a problem with two firewalls of the IPFire infrastructure. One in Dublin, the other one in Frankfurt.

They both have 512 MB of RAM which they totally consume after a short time (graphs are attached).

The kernel starts killing userspace processes when it runs out of memory and unbound is usually one of the first to die. The firewall dies very slowly. Only a reboot fixes this as it is all kernel space.

The systems however do not have a large number of connections open (very few actually) so it cannot be the connection tracking or any other legitimate use of memory I can think of.

Although those ones are virtual instances on AWS, I can reproduce this on our physical appliance in Hanover as well. It just has more memory and for some reason memory usage seems to grew slower. Therefore I assume that this problem exists on all systems (x86_64 at least).

I could now increase the size of the instances in this case to buy myself more time before I have to reboot, but clearly this isn't right. What can we do to find this problem?

Comment 1 Michael Tremer 2020-01-06 16:11:07 UTC

Created attachment 727 [details]
Yearly memory graph

Comment 2 Michael Tremer 2020-01-06 16:12:10 UTC

I forgot to mention that the systems do not have no swap space.

Comment 3 Stephan Mending 2020-03-23 17:35:41 UTC

Have you tried https://www.kernel.org/doc/html/v4.10/dev-tools/kmemleak.html ? 
In BSD there is a tool called systat which shows you the output for memory consumption of the different kernel subsystems. I wasn't able to find anything comparable for linux.

Comment 4 Michael Tremer 2020-03-23 19:42:46 UTC

No, but I think we might have solved this but merging patches from upstream.

So far we just need to sit and validate that this is actually gone.

Comment 5 Stephan Mending 2020-03-23 20:26:08 UTC

(In reply to Michael Tremer from comment #4)
> No, but I think we might have solved this but merging patches from upstream.
> 
> So far we just need to sit and validate that this is actually gone.

Nice ! Do you have a cross reference for me to the patch which solves this issue ? I'd be interested ! Just out of curiosity.

Comment 6 Michael Tremer 2020-03-23 20:38:42 UTC

> https://git.ipfire.org/?p=ipfire-2.x.git;a=history;f=lfs/linux;h=4d24752e3db1a870d971094aa503c52728eab74e;hb=70af65df4198c58f99a333748faa39b39ad1c3c4

It is one of the kernel updates between .154 and .173.