Bug 12963

Summary: openVPN - Out of memory
Product: IPFire Reporter: Jürgen Schamberger <juergen.schamberger>
Component: ---Assignee: Michael Tremer <michael.tremer>
Status: CLOSED FIXED QA Contact:
Severity: Crash    
Priority: Will affect an average number of users CC: adolf.belka, emptythevoid, jaylubo, jon.murphy, michael.tremer, peter.mueller
Version: 2   
Hardware: x86_64   
OS: Linux   
Attachments: screenshot

Description Jürgen Schamberger 2022-10-21 17:06:40 UTC
Created attachment 1110 [details]
screenshot

I created a new certificate for openVPN on Core 170.
After a new connection/disconnection or stop of openVPN, the process "openvpn-authentic" starts and fills the memory under High CPU until the process was terminated.

An update to Core 171 brought no change.

I created the old certificate under Core 164 and this ran without any problems.


Oct 21 01:10:00 famschamrouter ipfire: ConnSched disconnect
Oct 21 01:10:00 famschamrouter pppd[4532]: Terminating on signal 15
Oct 21 01:10:00 famschamrouter pppd[4532]: Connect time 437.5 minutes.
Oct 21 01:10:00 famschamrouter pppd[4532]: Sent 392473603 bytes, received 15497873 bytes.
Oct 21 01:10:00 famschamrouter pppd[4532]: Connection terminated.
Oct 21 01:10:00 famschamrouter openvpnserver[7877]: event_wait : Interrupted system call (code=4)
Oct 21 01:10:00 famschamrouter openvpnserver[7877]: /sbin/ip route del 10.80.147.0/24
Oct 21 01:10:00 famschamrouter openvpnserver[7877]: ERROR: Linux route delete command failed: external program exited with error status: 2
Oct 21 01:10:00 famschamrouter openvpnserver[7877]: /sbin/ip route del 10.90.147.0/24
Oct 21 01:10:00 famschamrouter openvpnserver[7877]: ERROR: Linux route delete command failed: external program exited with error status: 2
Oct 21 01:10:00 famschamrouter openvpnserver[7877]: Closing TUN/TAP interface
Oct 21 01:10:00 famschamrouter openvpnserver[7877]: /sbin/ip addr del dev tun0 local 10.90.147.1 peer 10.90.147.2
Oct 21 01:10:00 famschamrouter openvpnserver[7877]: Linux ip addr del failed: external program exited with error status: 2


Oct 21 01:49:21 famschamrouter kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=openvpn-authent,pid=7917,uid=0
Oct 21 01:49:21 famschamrouter kernel: Out of memory: Killed process 7917 (openvpn-authent) total-vm:8616456kB, anon-rss:6662008kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:15108kB oom_score_adj:0
Comment 2 Jürgen Schamberger 2022-10-22 12:36:15 UTC
I installed Core 168, 169, 170 and 171 on another machine today and tested it with a minimal configuration.

The problem seems to have existed since Core 169. On Core 168 the openvpn-authentic process doesn’t start because the file in /usr/sbin doesn’t exist.

I created a certificate in Core 168 and then upgraded to Core 171. With this old certificate, openVPN can be stopped and started without any problems.

I think there is a problem with Core 169 certificate generation.
Comment 3 Jay Lubomirski 2022-10-27 02:33:43 UTC
I want to add some details I've been seeing with this. I started to see this in core update 170 and 171. It was a new install with core update 170 that I restored settings from back up to (server hardware upgrade.) 

The issue is triggered for me when the red interface goes down (ie pull the cable so dhcp lease is released and interface is brought down). 

The openvpn-authenticator script goes to 100% CPU and starts consuming tons of memory until the OOM killer starts. 

25969 root      20   0 2314.9m   2.2g 100.0  14.5   3:47.88 R /usr/bin/python3 /usr/sbin/openvpn-authenticator -d                           
30506 root      20   0  136.9m 130.4m 100.0   0.8   0:11.97 R /usr/bin/python3 /usr/sbin/openvpn-authenticator -d                           

After OOM, another openvpn-authenticator process starts and does the same thing. 

I can stop the cycle by stopping openvpn from the web ui, then manually killing any openvpn-authenticator processes and then starting openvpn again.
Comment 4 Adolf Belka 2022-10-28 14:09:51 UTC
I will look at testing this out on my vm testbed. All my previous testing with earlier Core Updates was always with an existing OpenVPN setup. I only tested creating a new client connection.
I will do a new root/host certificate generation test to see if I can duplicate this.
Comment 5 Adolf Belka 2022-11-11 17:01:26 UTC
Sorry for the delay but I was tied up with other things.

I have now clones a vm running CU171, with a confirmed working OpenVPN system.
Running htop it showed openvpn-authenticator running at 0% cpu and 1.1% of the 1GB memory.


I then removed the x509 certificate set, which also removed all the client connection details.

I then created a new Root and Host certificate set using the same details as used previously. Then created a new client connection which was then transferred to my Android phone and I was able to successfully run a tunnel to a vm client in the vm testbed network just as previously done.

Running htop it showed openvpn-authenticator at 0% cpu and 1.0% of the 1GB memory.

I left the system running for 2 hours and nothing changed in the openvpn authenticator values of cpu and memory usage.

I then rebooted the IPFire vm and ran it again for 2 hours and again htop showed 0% cpu and 1.0% memory.

I will leave this running overnight but so far I have not been able to duplicate the effect that has been reported here.

@Jürgen did you do any other steps compared to what I have described. What sort of timescale was it for the memory consumption to become visible?
Comment 6 Jürgen Schamberger 2022-11-11 17:09:46 UTC
(In reply to Adolf Belka from comment #5)
> Sorry for the delay but I was tied up with other things.
> 
> I have now clones a vm running CU171, with a confirmed working OpenVPN
> system.
> Running htop it showed openvpn-authenticator running at 0% cpu and 1.1% of
> the 1GB memory.
> 
> 
> I then removed the x509 certificate set, which also removed all the client
> connection details.
> 
> I then created a new Root and Host certificate set using the same details as
> used previously. Then created a new client connection which was then
> transferred to my Android phone and I was able to successfully run a tunnel
> to a vm client in the vm testbed network just as previously done.
> 
> Running htop it showed openvpn-authenticator at 0% cpu and 1.0% of the 1GB
> memory.
> 
> I left the system running for 2 hours and nothing changed in the openvpn
> authenticator values of cpu and memory usage.
> 
> I then rebooted the IPFire vm and ran it again for 2 hours and again htop
> showed 0% cpu and 1.0% memory.
> 
> I will leave this running overnight but so far I have not been able to
> duplicate the effect that has been reported here.
> 
> @Jürgen did you do any other steps compared to what I have described. What
> sort of timescale was it for the memory consumption to become visible?

Hello Adolf, you need to stop OpenVPN in the web interface or stop and start the connection from the Red Network. When Ipfire restarts, nothing happens to me either.
Comment 7 Adolf Belka 2022-11-11 17:31:14 UTC
Thanks for the clarification @Jürgen.

I turned off the OpenVPN Server and openvpn-authenticator went to something like 89% cpu% and memory spiked very quickly and triggered swap to be used to nearly 100% and then openvpn-authenticator was removed by the system - no longer in the htop table at all.

Looking in the kernel logs I have the oom message

18:17:20	kernel: 	Out of memory: Killed process 9525 (openvpn-authent) total-vm:831900kB, anon-rss :741952kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1556kB oom_score_adj:0

18:17:20	kernel: 	oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,glob al_oom,task_memcg=/,task=openvpn-authent,pid=9525,uid=0

So, yes I can duplicate the oom problem but it only seems to occur when the OpenVPN Server is turned off. It would appear that openvpn-authenticator doesn't like to be running without the OpenVPN server turned on. Maybe we need to turn off the openvpn-authenticator when the OpenVPN server is stopped.

I will do the same test on a clone with the original Root/Host certificate to see if that has a different response or not.

I will also have alook at the openvpn-authentocator code to see if I can spot anything obvious with regard to this effect.
Comment 8 Adolf Belka 2022-11-11 17:51:59 UTC
I have also confirmed that the same thing happens on my original clone vm which has an old Root/Host certificate from many many CU’s ago.

So the problem looks not to be related to creating a new Root/Host certificate but just on openvpn-authenticator not liking to be running when OpenVPN Server is stopped.

Unfortunately the openvpn-authenticator is python code which I am not familiar with at all, so it will take a bit of work to look through it, but I will give it a go.
Comment 9 Adolf Belka 2022-11-11 18:09:29 UTC
There is a stopAuthenticator command in the openvpnctrl c code and it is part of the stopDaemon command for OpenVPN, so I think the Authenticator is supposed to be getting stopped as part of stopping the daemon.

It looks like the OpenVPN Server daemon is stopped first and then the stopAuthenticator command run. Maybe the stopAuthenticator command needs to be run before the OpenVPN daemon is stopped.

I can probably try some slight changing of the order in the c code and then do a build of IPFire and test out if that works or not.
Comment 10 Adolf Belka 2022-11-11 18:20:35 UTC
I think I have found out what the problem is.

The code in openvpnctrl is trying to kill openvpn-authenticator with the command /sbin/killall but in sbin there is only killall5

The killall command is in /bin

I will change the c code to /bin/killall and do a build of IPFire and install and test it.
Comment 11 Adolf Belka 2022-11-15 17:02:51 UTC
Changing /sbin/killall to /bin/killall did not fix the problem. I am pretty certain this is something that needed to be fixed but it is not the cause for openvpn-authenticator to consume 100% cpu and continuously increase the memory% until the oom killer is triggered.

I then moved the stopAuthenticator command to the start of the stopDaemon command but that also did not fix the problem.

I am not familiar enough with c or python to be able to fault find this any more at the moment, so I have added @Michael Tremmer to the cc list to help with this problem or to identify someone else who can do so.
Comment 12 Michael Tremer 2022-12-06 10:03:35 UTC
I just submitted a couple of patches to the list that should fix the problem:

> https://patchwork.ipfire.org/project/ipfire/list/?series=3228

Could you please test and confirm?
Comment 13 Adolf Belka 2022-12-06 13:28:50 UTC
Sorry for the delay. Thought it was a compiled program so I ran a full build first then realised that it was a python script.

I copied the patched script into my vm testbed system. Then I switched off OpenVPN and the cpu and memory then went very high and swap was heavily used. After that first time then stopping OpenVPN no longer gave a problem with memory or cpu. I thought that might just be that something was not set consistently yet by just copying in the script.

So I copied back the unpatched version of openvpn-authenticator and confirmed that the memory and cpu shot up again every time OpenVPN was stopped.

Then I copied the patched version of openvpn-authenticator back in again and this time first rebooted the IPFire vm and then turned off OpenVPN server and the cpu and memory stayed within normal variations. No climbing and swap did not get used.

The patch looks to have fixed the problem.
Comment 14 Michael Tremer 2022-12-07 11:36:50 UTC
(In reply to Adolf Belka from comment #13)
> The patch looks to have fixed the problem.

Very good.

Technically the daemon just stops itself and then needs to be restarted again with the OpenVPN server. I suppose this should work fine.
Comment 15 Michael Tremer 2022-12-07 11:37:00 UTC
(In reply to Michael Tremer from comment #14)
> (In reply to Adolf Belka from comment #13)
> > The patch looks to have fixed the problem.
> 
> Very good.
> 
> Technically the daemon just stops itself and then needs to be restarted
> again with the OpenVPN server. I suppose this should work fine.

Thanks for testing.
Comment 16 Jürgen Schamberger 2022-12-08 05:46:29 UTC
With the patch the client cannot connect to openVPN, get the message 'push_request' (status=1)
Comment 17 Michael Tremer 2022-12-08 10:58:12 UTC
Hello,

is the authenticator running and connected to the OpenVPN server?

Could you start it with --verbose and see if it logs anything?
Comment 20 Peter Müller 2022-12-13 15:41:21 UTC
After discussing this issue with Michael, we agreed on _not_ merging these patches for Core Update 172, since the (scanty) feedback expressed by Jürgen in comment #16 suggests that they patches broke OpenVPN entirely.

Adolf's response, however, suggests the opposite.

Resetting back to ASSIGNED.

@Jürgen: Please provide feedback to Michael's questions, so we can resolve whatever issue you are facing as soon as possible, decide whether the proposed patches are fine, and bring them to the IPFire userbase as soon as possible.
Comment 21 Jürgen Schamberger 2022-12-13 16:15:33 UTC
Sorry, I'm just a user and only have a basic knowledge of Linux. I can only describe what I see. With the patch I have connection problems without not. I reset my system back to Core 168 and unfortunately I don't have the opportunity to test it anymore. But if you all don't have this problem, maybe it's because of my configurations.
If Adolf was able to successfully connect to openVPN, it's ok.
Comment 22 Adolf Belka 2022-12-13 23:45:45 UTC
My apologies but my feedback was not clear enough.

I only tested that the memory and cpu percentages did not climb up to 100% and cause the oom to trigger.

I did test the OpenVPN connection but without the 2FA as I have no clients that will work with the current 2FA approach. I only have Linux Network Manager and the  OpenVPN for Android app and neithe of these works with the the IPFire 2FA approach.

If @Jürgen did his test with the 2FA then his results and logs are showing that the patch requires further investigation.

Sorry for any confusion I caused.
Comment 23 Greg Ramey 2023-01-06 14:57:11 UTC
I am in a position to test this patch against OTP.  I'm not used to working with diff patches, though.  Would someone be able to roll their eyes at me and link to a patched openvpn-authenticator file so that I make sure I'm using the correct modifications?
Comment 24 Greg Ramey 2023-01-06 15:40:32 UTC
I was able to get the patches applied to /usr/sbin/openvpn-authenticator.  I can confirm that they cause OTP to fail with 'PUSH REQUEST' (status=1).  I'm willing and able to assist with testing/troubleshooting. Just let me know what I can do.
Comment 25 Greg Ramey 2023-01-06 15:57:27 UTC
I attempted to do what was requested of @Jürgen and running the openvpn-authenticator with --verbose.  I believe we both made the same mistake.  We created a new file and failed to chmod +x it.  As of right now, I can confirm that the patch fixes the memory issue and *does not* break OTP.

The Push_Request Status 1 error seems to have been caused by openvpn-authenticator not opening due to the executable permission issue.
Comment 26 Adolf Belka 2023-01-06 17:08:12 UTC
That is great news @Greg.

The patch set will be able to go into the current next, which will mean it should be released with Core Update 173.
Comment 28 Greg Ramey 2023-01-09 17:21:19 UTC
While investigating another issue I was having, I think I need to state that OTP is working when using the OpenCPN Community client.  I haven't tried any others successfully yet.
Comment 29 Jon 2023-01-29 21:05:07 UTC
FWIW - I got the oom error shortly after my cable modem when down and came back up.  

I do not use OpenVPN except for testing OpenVPN.  I do not use 2FA or OTP.

From the message log:

#    Cable Modem gone here...
Jan 29 11:51:57 ipfire dhcpcd[2977]: red0: carrier lost
Jan 29 11:51:57 ipfire charon: 14[KNL] 73.36.250.0 disappeared from red0 
Jan 29 11:51:57 ipfire dhcpcd[2977]: red0: deleting route to 73.36.248.0/21
Jan 29 11:51:57 ipfire dhcpcd[2977]: red0: deleting default route via 73.36.248.1
Jan 29 11:51:57 ipfire dhcpcd.exe[16242]: red0 has been brought down (EXPIRE)

Jan 29 11:51:57 ipfire openvpnserver[4251]: event_wait : Interrupted system call (code=4)
Jan 29 11:51:57 ipfire openvpnserver[4251]: /sbin/ip route del 10.24.171.0/24
Jan 29 11:51:57 ipfire openvpnserver[4251]: ERROR: Linux route delete command failed: external program exited with error status: 2
Jan 29 11:51:57 ipfire openvpnserver[4251]: Closing TUN/TAP interface
Jan 29 11:51:57 ipfire openvpnserver[4251]: /sbin/ip addr del dev tun0 local 10.24.171.1 peer 10.24.171.2
Jan 29 11:51:57 ipfire openvpnserver[4251]: Linux ip addr del failed: external program exited with error status: 2
. . .
Jan 29 11:51:57 ipfire openvpnserver[4251]: SIGTERM[hard,] received, process exiting


#    After the modem came back it took about 50 minutes to fill up the memory and error:

Jan 29 12:42:30 ipfire kernel: bash invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Jan 29 12:42:30 ipfire kernel: CPU: 1 PID: 8766 Comm: bash Not tainted 5.15.71-ipfire #1
Jan 29 12:42:30 ipfire kernel: Hardware name: PC Engines apu4/apu4, BIOS v4.15.0.1 11/23/2021
Jan 29 12:42:30 ipfire kernel: Call Trace:
Jan 29 12:42:30 ipfire kernel:  <TASK>
Jan 29 12:42:30 ipfire kernel:  dump_stack_lvl+0x46/0x66
Jan 29 12:42:30 ipfire kernel:  dump_header+0x4a/0x19b
Jan 29 12:42:30 ipfire kernel:  oom_kill_process.cold+0xb/0x10
Jan 29 12:42:30 ipfire kernel:  out_of_memory+0x1f7/0x2a0
Jan 29 12:42:30 ipfire kernel:  __alloc_pages_slowpath.constprop.0+0xbac/0xc90
Jan 29 12:42:30 ipfire kernel:  __alloc_pages+0x2e0/0x300
Jan 29 12:42:30 ipfire kernel:  pagecache_get_page+0x157/0x430
Jan 29 12:42:30 ipfire kernel:  filemap_fault+0x600/0x940
Jan 29 12:42:30 ipfire kernel:  ? filemap_map_pages+0x128/0x5d0
Jan 29 12:42:30 ipfire kernel:  __do_fault+0x36/0x140
Jan 29 12:42:30 ipfire kernel:  __handle_mm_fault+0xb3d/0x1120
Jan 29 12:42:30 ipfire kernel:  handle_mm_fault+0x1c6/0x2a0
Jan 29 12:42:30 ipfire kernel:  exc_page_fault+0x1e0/0x7f0
Jan 29 12:42:30 ipfire kernel:  asm_exc_page_fault+0x22/0x30
Jan 29 12:42:30 ipfire kernel: RIP: 0033:0x7ef46ca62f6e
Jan 29 12:42:30 ipfire kernel: Code: Unable to access opcode bytes at RIP 0x7ef46ca62f44.
Jan 29 12:42:30 ipfire kernel: RSP: 002b:00007fff618d2148 EFLAGS: 00010246
Jan 29 12:42:30 ipfire kernel: RAX: 0000000000000030 RBX: 00000000b85fdd6e RCX: 0000000000000019
Jan 29 12:42:30 ipfire kernel: RDX: 000000000000016e RSI: 00000000004b8e59 RDI: 00000000010d6930
Jan 29 12:42:30 ipfire kernel: RBP: 00000000004b8e59 R08: 00000000e2d46767 R09: 00007fff618d21f5
Jan 29 12:42:30 ipfire kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00000000010d6950
Jan 29 12:42:30 ipfire kernel: R13: 0000000000000000 R14: 000000000000016e R15: 00000000010cabb0
Jan 29 12:42:30 ipfire kernel:  </TASK>
Jan 29 12:42:30 ipfire kernel: Mem-Info:
Jan 29 12:42:30 ipfire kernel: active_anon:7909 inactive_anon:957942 isolated_anon:0
Jan 29 12:42:30 ipfire kernel:  active_file:144 inactive_file:106 isolated_file:0
Jan 29 12:42:30 ipfire kernel:  unevictable:0 dirty:0 writeback:0
Jan 29 12:42:30 ipfire kernel:  slab_reclaimable:6418 slab_unreclaimable:10136
Jan 29 12:42:30 ipfire kernel:  mapped:8579 shmem:7602 pagetables:3202 bounce:0
Jan 29 12:42:30 ipfire kernel:  kernel_misc_reclaimable:0
Jan 29 12:42:30 ipfire kernel:  free:6303 free_pcp:62 free_cma:0
Jan 29 12:42:30 ipfire kernel: Node 0 active_anon:31636kB inactive_anon:3831768kB active_file:576kB inactive_file:424kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:34316kB dirty:0kB writeback:0kB shmem:30408kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 55296kB writeback_tmp:0kB kernel_stack:3600kB pagetables:12808kB all_unreclaimable? yes
Jan 29 12:42:30 ipfire kernel: DMA free:15360kB min:28kB low:40kB high:52kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 29 12:42:30 ipfire kernel: lowmem_reserve[]: 0 3219 3865 3865
Jan 29 12:42:30 ipfire kernel: DMA32 free:9336kB min:6792kB low:10116kB high:13440kB reserved_highatomic:0KB active_anon:10280kB inactive_anon:3270040kB active_file:128kB inactive_file:232kB unevictable:0kB writepending:0kB present:3389948kB managed:3324412kB mlocked:0kB bounce:0kB free_pcp:248kB local_pcp:0kB free_cma:0kB
Jan 29 12:42:30 ipfire kernel: lowmem_reserve[]: 0 0 646 646
Jan 29 12:42:30 ipfire kernel: Normal free:516kB min:1364kB low:2028kB high:2692kB reserved_highatomic:0KB active_anon:21356kB inactive_anon:561728kB active_file:680kB inactive_file:452kB unevictable:0kB writepending:0kB present:770048kB managed:667996kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 29 12:42:30 ipfire kernel: lowmem_reserve[]: 0 0 0 0
Jan 29 12:42:30 ipfire kernel: DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15360kB
Jan 29 12:42:30 ipfire kernel: DMA32: 547*4kB (ME) 212*8kB (UME) 80*16kB (ME) 38*32kB (UME) 20*64kB (UME) 3*128kB (UM) 5*256kB (UM) 2*512kB (U) 0*1024kB 0*2048kB 0*4096kB = 10348kB
Jan 29 12:42:30 ipfire kernel: Normal: 49*4kB (UME) 20*8kB (UM) 8*16kB (U) 1*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 516kB
Jan 29 12:42:30 ipfire kernel: 8180 total pagecache pages
Jan 29 12:42:30 ipfire kernel: 233 pages in swap cache
Jan 29 12:42:30 ipfire kernel: Swap cache stats: add 250557, delete 250325, find 347/391
Jan 29 12:42:30 ipfire kernel: Free swap  = 0kB
Jan 29 12:42:30 ipfire kernel: Total swap = 1001880kB
Jan 29 12:42:30 ipfire kernel: 1043997 pages RAM
Jan 29 12:42:30 ipfire kernel: 0 pages HighMem/MovableOnly
Jan 29 12:42:30 ipfire kernel: 42055 pages reserved
Jan 29 12:42:30 ipfire kernel: 0 pages hwpoisoned
Jan 29 12:42:30 ipfire kernel: Tasks state (memory values in pages):
Jan 29 12:42:30 ipfire kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jan 29 12:42:30 ipfire kernel: [    654]     0   654     1032        0    49152       39             0 lvmetad
Jan 29 12:42:30 ipfire kernel: [    673]     0   673     4884       78    53248      336         -1000 udevd
Jan 29 12:42:30 ipfire kernel: [   2712]     0  2712     1378      235    53248        7             0 vnstatd
Jan 29 12:42:30 ipfire kernel: [   2723]     0  2723     2368     1683    61440      105             0 klogd
Jan 29 12:42:30 ipfire kernel: [   2730]     0  2730      624       31    45056        3             0 syslogd
Jan 29 12:42:30 ipfire kernel: [   2772]    99  2772    10281     8504   122880       48             0 unbound
Jan 29 12:42:30 ipfire kernel: [   2783]     0  2783      629        2    45056       23             0 acpid
Jan 29 12:42:30 ipfire kernel: [   2801]  1001  2801     1041       27    49152       25             0 dbus-daemon
Jan 29 12:42:30 ipfire kernel: [   2976]    52  2976      759      108    45056       17             0 dhcpcd
Jan 29 12:42:30 ipfire kernel: [   2977]     0  2977      748       64    45056       34             0 dhcpcd
Jan 29 12:42:30 ipfire kernel: [   2978]    52  2978      709       22    45056       53             0 dhcpcd
Jan 29 12:42:30 ipfire kernel: [   2979]    52  2979      709       28    45056       47             0 dhcpcd
Jan 29 12:42:30 ipfire kernel: [   4283]     0  4283   666210   476231  4866048   122562             0 openvpn-authent
Jan 29 12:42:30 ipfire kernel: [   4410]     0  4410      724       24    40960       10             0 saslauthd
Jan 29 12:42:30 ipfire kernel: [   4411]     0  4411      724       24    40960       11             0 saslauthd
Jan 29 12:42:30 ipfire kernel: [   4422]     0  4422    20917      158    65536        1             0 cupsd
Jan 29 12:42:30 ipfire kernel: [   4508]     0  4508    18551       94    65536       79             0 ntpd
Jan 29 12:42:30 ipfire kernel: [   4584]     0  4584     2380     1060    57344        0             0 dhcpd
Jan 29 12:42:30 ipfire kernel: [   4594]     0  4594     4716     2031    86016      406             0 unbound-dhcp-le
Jan 29 12:42:30 ipfire kernel: [   4611]     0  4611     2890      386    61440       83             0 httpd
Jan 29 12:42:30 ipfire kernel: [   4662]     0  4662    20468       74    61440       67             0 netatalk
Jan 29 12:42:30 ipfire kernel: [   4664]     0  4664     2690      190    65536        0             0 afpd
Jan 29 12:42:30 ipfire kernel: [   4666]     0  4666     1845       98    49152        0             0 cnid_metad
Jan 29 12:42:30 ipfire kernel: [   4676]     0  4676      904       33    49152       17             0 fcron
Jan 29 12:42:30 ipfire kernel: [   4687]     0  4687     8057     5483    98304       83             0 pmacctd
Jan 29 12:42:30 ipfire kernel: [   4697]     0  4697     9508     6880   106496      146             0 pmacctd
Jan 29 12:42:30 ipfire kernel: [   4699]     0  4699     8722     6392   110592       80             0 pmacctd
Jan 29 12:42:30 ipfire kernel: [   4701]     0  4701    39313      583    77824       13             0 monit
Jan 29 12:42:30 ipfire kernel: [   4712]     0  4712     4844       43    49152        4             0 apcupsd
Jan 29 12:42:30 ipfire sshd[8764]: Received disconnect from 192.168.60.212 port 54518:11: disconnected by user
Jan 29 12:42:30 ipfire kernel: [   4726]   998  4726     1365      287    53248        3             0 avahi-daemon
Jan 29 12:42:30 ipfire kernel: [   4727]   998  4727     1121       39    49152        9             0 avahi-daemon
Jan 29 12:42:30 ipfire kernel: [   4743]     0  4743      620       24    45056        2             0 mdns-repeater
Jan 29 12:42:30 ipfire kernel: [   4744]     0  4744      632       20    45056        2             0 agetty
Jan 29 12:42:30 ipfire kernel: [   4745]     0  4745      752       51    45056       14             0 agetty
Jan 29 12:42:30 ipfire kernel: [   4746]     0  4746      752       52    45056       14             0 agetty
Jan 29 12:42:30 ipfire kernel: [   4747]     0  4747      752       52    45056       16             0 agetty
Jan 29 12:42:30 ipfire kernel: [   4748]     0  4748      752       64    45056        1             0 agetty
Jan 29 12:42:30 ipfire kernel: [   4749]     0  4749      752       59    45056        9             0 agetty
Jan 29 12:42:30 ipfire kernel: [  19324]     0 19324    78770     1092   114688       90             0 collectd
Jan 29 12:42:30 ipfire kernel: [   2401]    99  2401     2764      221    57344      133             0 httpd
Jan 29 12:42:30 ipfire kernel: [   2405]    99  2405   334711     1808   212992      120             0 httpd
Jan 29 12:42:30 ipfire kernel: [   8764]     0  8764     1803      205    57344       24             0 sshd
Jan 29 12:42:30 ipfire kernel: [   8766]     0  8766     1943      216    49152       14             0 bash
Jan 29 12:42:30 ipfire kernel: [   8929]     0  8929     1861      238    57344       32             0 sshd
Jan 29 12:42:30 ipfire kernel: [   8931]     0  8931      663       13    45056        9             0 sftp-server
Jan 29 12:42:30 ipfire kernel: [   8937]     0  8937     1861      261    49152       25             0 sshd
Jan 29 12:42:30 ipfire kernel: [   8940]     0  8940      663       15    45056        7             0 sftp-server
Jan 29 12:42:30 ipfire kernel: [  16681]    52 16681      748       48    45056       49             0 dhcpcd
Jan 29 12:42:30 ipfire kernel: [  18314]     0 18314   592644   447378  4620288   121258             0 openvpn-authent
Jan 29 12:42:30 ipfire kernel: [  20014]    23 20014     5545       28    73728      313             0 squid
Jan 29 12:42:30 ipfire kernel: [  20017]    23 20017     8775     1703   102400      729             0 squid
Jan 29 12:42:30 ipfire kernel: [  20022]    23 20022     4075     1007    65536      395             0 redirect_wrappe
Jan 29 12:42:30 ipfire kernel: [  20023]    23 20023     4074     1040    73728      360             0 redirect_wrappe
Jan 29 12:42:30 ipfire kernel: [  20025]    23 20025     4074      991    69632      410             0 redirect_wrappe
Jan 29 12:42:30 ipfire kernel: [  20029]    23 20029     4073      968    69632      430             0 redirect_wrappe
Jan 29 12:42:30 ipfire kernel: [  20075]    23 20075     1231       89    53248      106             0 squidGuard
Jan 29 12:42:30 ipfire kernel: [  20078]    23 20078     1231      136    53248       59             0 squidGuard
Jan 29 12:42:30 ipfire kernel: [  20082]    23 20082     2219      357    53248      106             0 updxlrator
Jan 29 12:42:30 ipfire kernel: [  20086]    23 20086     2219      322    53248      141             0 updxlrator
Jan 29 12:42:30 ipfire kernel: [  20091]    23 20091     1363      320    53248       18             0 squidGuard
Jan 29 12:42:30 ipfire kernel: [  20094]    23 20094     2219      377    53248      109             0 updxlrator
Jan 29 12:42:30 ipfire kernel: [  20098]    23 20098     1265      162    53248       33             0 squidGuard
Jan 29 12:42:30 ipfire kernel: [  20101]    23 20101     2219      465    57344       20             0 updxlrator
Jan 29 12:42:30 ipfire kernel: [  20143]     0 20143     1175       36    49152       41             0 starter
Jan 29 12:42:30 ipfire kernel: [  20144]     0 20144   298463      451   200704       19             0 charon
Jan 29 12:42:30 ipfire kernel: [  20192]    99 20192     1871      130    49152       36             0 openvpn
Jan 29 12:42:30 ipfire kernel: [  20227]     0 20227     4058      552    73728     1268             0 openvpn-authent
Jan 29 12:42:30 ipfire kernel: [  20265]    52 20265      748       37    45056       60             0 dhcpcd
Jan 29 12:42:30 ipfire kernel: [  23249]     0 23249      904       35    49152       17             0 fcron
Jan 29 12:42:30 ipfire kernel: [  23250]     0 23250     1084       58    49152        0             0 run-parts
Jan 29 12:42:30 ipfire kernel: [  23251]     0 23251      904       35    49152       17             0 fcron
Jan 29 12:42:30 ipfire kernel: [  23252]     0 23252     1054       20    40960        0             0 bash
Jan 29 12:42:30 ipfire kernel: [  23253]     0 23253      904       35    49152       17             0 fcron
Jan 29 12:42:30 ipfire kernel: [  23254]     0 23254      338        6    36864        0             0 bash
Jan 29 12:42:30 ipfire kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=openvpn-authent,pid=4283,uid=0
Jan 29 12:42:30 ipfire kernel: Out of memory: Killed process 4283 (openvpn-authent) total-vm:2664840kB, anon-rss:1904924kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:4752kB oom_score_adj:0

The IPFire device become VERY sluggish and it took about 5 minutes to get to the WebGUI shutdown page.  After reboot all went back to normal.

I have not tested the patch or repeated this test since this is my Production box.

<jab mode on - meant lovingly>
I was able to quickly find the time of the issue by looking at the collectd RRD graphs!  Made finding the issue in the logs so much simpler!)
<jab mode off>
Comment 30 Adolf Belka 2023-01-30 11:00:58 UTC
Hi @Jon

From your log it can be seen that you had the OpenVPN server running when you lost your modem connection and that after some time the OpenVPN Server was stopped.

This is what the original openvpn-authenticator program does not deal with well.

After your reboot was the OpenVPN server running again. I would expect yes. In my testing the only time I saw the oom effect was when the OpenVPN server was stopped.

With the fix patch that has been submitted you can stop the OpenVPN server and the openvpn-authenticator program now handles that cleanly and the memory consumption level does not increase.

So I believe that when you have CU173 in place then losing your modem connection will not result in the same problem as even if the OpenVPN server is stopped the openvpn-authenticator will stay stable with its memory consumption.
Comment 31 Adolf Belka 2023-02-06 14:25:43 UTC
Core Update 173 Testing has been released

https://blog.ipfire.org/post/ipfire-2-27-core-update-173-is-available-for-testing
Comment 32 Adolf Belka 2023-02-06 14:28:46 UTC
Tested out turning off OpenVPN server while watching htop in an ssh console window.

openvpn-authenticator memory consumption climbs up to 100% and then swap starts getting used until openvpn-authenticator is stopped by the oom killer.

So the bug is not fixed in Core Update 173 Testing, although the code changes when applied to a clone of the same Core Update 172 vm did fix the problem.
Comment 33 Adolf Belka 2023-02-06 14:42:44 UTC
Had a look through the openvpn-authenticator code that is in the CU173 Testing version I just tested and found the following.

The dropped code from commit
https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=d7618ccba8ba312916593e6ebbadee1d90575c0f

is gone.

The modified code for commits

when losing socket connection
avoid infinite loop
https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=92a9ce54bc63ebea153fc46365a1aa299856fbbe

and 

break read loop when daemon goes away
https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=7ec3664c320707b51407fce854e19b6254eb4836

are not in the code, so somehow they missed getting merged into CU173 Testing
Comment 34 Adolf Belka 2023-02-06 14:56:12 UTC
The file config/ovpn/openvpn-authenticator in next in the git repo has the changes included in it but that modified version appears to not have been installed in the core update
Comment 35 Adolf Belka 2023-02-06 16:35:06 UTC
I looked through /var/log/pakfire/update-core-upgrade-173.log and there was no usr/sbin/openvpn-authenticator in the Extracting files section of the log.

Doing a grep on the log file for openvpn finds nothing.
Comment 36 Adolf Belka 2023-02-06 16:44:00 UTC
I may know what the problem is.

I think openvpn rootfiles needed to be shipped as it also contains the usr/sbin/openvpn-authenticator not just the file usr/sbin/openvpn-authenticator.
At least this is my best guess at the moment.
Comment 37 Peter Müller 2023-02-07 11:44:19 UTC
Thank you for flagging this. I have changed the Core Update to ensure OpenVPN will be shipped entirely rather than just the authenticator script. The commit should land in "master" soon.

https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=6ee5ecc86552999d36f15702e4b5ab523e060733
Comment 38 Adolf Belka 2023-02-10 20:03:51 UTC
Okay, the latest master build ((Build: master/deb11b3f) now has the changes for this bug.

I confirmed the changes in the openvpv-authenticator code and also confirmed by stopping the Openvpn Server and the memory stayed stable and normal using htop.

So fix is now confirmed for this bug.