Created attachment 726 [details]
Monthly memory graph
I have a problem with two firewalls of the IPFire infrastructure. One in Dublin, the other one in Frankfurt.
They both have 512 MB of RAM which they totally consume after a short time (graphs are attached).
The kernel starts killing userspace processes when it runs out of memory and unbound is usually one of the first to die. The firewall dies very slowly. Only a reboot fixes this as it is all kernel space.
The systems however do not have a large number of connections open (very few actually) so it cannot be the connection tracking or any other legitimate use of memory I can think of.
Although those ones are virtual instances on AWS, I can reproduce this on our physical appliance in Hanover as well. It just has more memory and for some reason memory usage seems to grew slower. Therefore I assume that this problem exists on all systems (x86_64 at least).
I could now increase the size of the instances in this case to buy myself more time before I have to reboot, but clearly this isn't right. What can we do to find this problem?
Created attachment 727 [details]
Yearly memory graph
I forgot to mention that the systems do not have no swap space.
Have you tried https://www.kernel.org/doc/html/v4.10/dev-tools/kmemleak.html ?
In BSD there is a tool called systat which shows you the output for memory consumption of the different kernel subsystems. I wasn't able to find anything comparable for linux.
No, but I think we might have solved this but merging patches from upstream.
So far we just need to sit and validate that this is actually gone.
(In reply to Michael Tremer from comment #4)
> No, but I think we might have solved this but merging patches from upstream.
> So far we just need to sit and validate that this is actually gone.
Nice ! Do you have a cross reference for me to the patch which solves this issue ? I'd be interested ! Just out of curiosity.
It is one of the kernel updates between .154 and .173.