Summary: | IPsec N2N connections flapping after upgrade to Core 119 | ||
---|---|---|---|
Product: | IPFire | Reporter: | Peter Müller <peter.mueller> |
Component: | --- | Assignee: | Michael Tremer <michael.tremer> |
Status: | CLOSED FIXED | QA Contact: | |
Severity: | Major Usability | ||
Priority: | Will affect most users | CC: | alexander.marx, marcel.lorenz, michael.tremer, oliver.fuhrer, tomvend |
Version: | 2 | ||
Hardware: | unspecified | ||
OS: | Unspecified | ||
See Also: | https://bugzilla.ipfire.org/show_bug.cgi?id=11559 | ||
Attachments: |
redacted ipsec logs
ipsec.conf |
Description
Peter Müller
2018-03-25 17:01:48 UTC
This problem can be reproduced with just one GREEN-to-GREEN VPN. The tunnel seems to be teared down and rebuilt every ~ 10 seconds. Are you using Curve25519? If so, could you use any of the other algrotihms for testing and then restart strongSwan on both sides? (In reply to Michael Tremer from comment #2) > Are you using Curve25519? If so, could you use any of the other algrotihms > for testing and then restart strongSwan on both sides? I am (it is the only algorithm used in that group, don't know if that makes any difference). I'll test and report. In my test scenario it wasn't the only one, but it was the one that was negotiated to be used with the other peer. So there was no failover to a second one. Hi I'm facing the same Issue between two x86_64 boxes and an arm Installation which is causing quite some additional load as some of the tunnels are re-initialized every 10 seconds. Current configuration for all tunnels: Encryption: 256bit AES-CBC (IKE+ESP) Integrity: SHA2 256bit (IKE+ESP) Gruptype: ECP-386 (NIST) (IKE+ESP) IKE Lifetime: 3Hrs, ESP 1Hr. Use only proposed Settings for IKE+ESP and PFS enabled. Let me know if you need any additional information. Regards Oliver Looks like not only Curve25519 is causing the problem, then. Have not tested it yet (things are currently quite busy), but will do at the weekend. Since you are using a different curve this could be a diffent issue. Could you please send logs? Potentially redacted? Created attachment 569 [details]
redacted ipsec logs
Please find ipsec related messages attached.
The server is being asked to close the connection, and then it starts them again. That is kind of works as designed. Do you have an idea why the other end is closing it? I don't really have a clue why this is happening. What I see, is that, I have multiple connections active at the same time on each site. Running 'ipsecctrl I' in a loop shows reconnects every 10 seconds. This is also currently causing around 80-100 MB of log entries on the site with 2 tunnels. [root@firewall ~]# ipsecctrl I Security Associations (2 up, 0 connecting): remotebackup[9144]: ESTABLISHED 8 seconds ago, 83.219.xxx.xxx[sitea.domain.tld]...85.5.xxx.xxx[siteb.domain.tld] remotebackup{22064}: INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c99a0959_i cd1e46c5_o remotebackup{22064}: 10.219.220.1/32 === 192.168.10.0/24 192.168.55.0/24 192.168.92.0/24 remotebackup{22065}: INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c61903e6_i caa3aee5_o remotebackup{22065}: 10.219.220.1/32 === 192.168.10.0/24 192.168.55.0/24 192.168.92.0/24 remotebackup{22066}: INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: ce1bfee6_i c8a4af15_o remotebackup{22066}: 10.219.220.1/32 === 192.168.10.0/24 192.168.55.0/24 192.168.92.0/24 remotebackup[9143]: ESTABLISHED 23 seconds ago, 83.219.xxx.xxx[sitea.domain.tld]...85.5.xxx.xxx[siteb.domain.tld] remotebackup{22061}: INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: cb9984cd_i c935d6bb_o remotebackup{22061}: 10.219.220.1/32 === 192.168.10.0/24 192.168.55.0/24 192.168.92.0/24 remotebackup{22062}: INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c38fbc97_i c054d563_o remotebackup{22062}: 10.219.220.1/32 === 192.168.10.0/24 192.168.55.0/24 192.168.92.0/24 remotebackup{22063}: INSTALLED, TUNNEL, reqid 1, ESP in UDP SPIs: c77669ff_i cc6c7b82_o remotebackup{22063}: 10.219.220.1/32 === 192.168.10.0/24 192.168.55.0/24 192.168.92.0/24 Here's the config for this particular tunnel ipfire 1 cat /var/ipfire/vpn/config 1,on,remotebackup,siteb.domain.tld,net,cert,,off,@sitea.domain.tld,10.219.220.1/32,@siteb.domain.tld,siteb.domain.tld,192.168.10.0/24|192.168.55.0/24|192.168.92.0/24,off,,,off,3,1,aes256,sha2_256,e384,aes256,sha2_256,e384,on,,,restart,on,ikev2,120,30,off,start,0 ipfire 2 cat /var/ipfire/vpn/config 1,on,remotebackup,sitea.domain.tld,net,cert,,off,@siteb.domain.tld,192.168.10.0/24|192.168.55.0/24|192.168.92.0/24,@sitea.domain.tld,sitea.domain.tld,10.219.220.1/32,off,,,off,3,1,aes256,sha2_256,e384,aes256,sha2_256,e384,on,,,restart,on,ikev2,120,30,off,start,0 One site is located behind a NAT Router, but I don't think this should be an Issue. There also seems to be someone else facing a similar Issue in the Forum https://forum.ipfire.org/viewtopic.php?f=16&t=20286 I can try to set up another tunnel between the two sites which are not natted and see what happens then or set up multiple tunnels, 1 for each subnet and see what happens. I have multiple tunnels running with similar settings and previously have seen that sometimes strongSwan logs something funny and then restarts the tunnel when Curve25519 was involved. This looks different. The other side is an IPFire system, too? Do the settings match exactly? Yes, all involved endpoints are ipfire systems (site 2 has tunnels to ipfire 1 and ipfire 3). Tunnel Settings were identical, however I recreated all tunnels in the meantime using PSK instead of certificates and using ipfire's default values except for deselecting curve25519. New config for all 3 involved systems: site 3: [root@firewall ~]# cat /var/ipfire/vpn/config 1,on,radmin,,net,psk,redacted=,off,@sitec.domain.tld,172.28.7.0/24|172.28.9.0/24|172.28.10.0/23,@siteb.domain.tld,siteb.domain.tld,192.168.10.0/24|192.168.55.0/24|192.168.92.0/24,off,,,off,3,1,aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gcm64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256,4096|3072|2048,aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gcm64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256,4096|3072|2048,on,,,restart,on,ikev2,120,30,off,start,900 site 2: [root@firewall ~]# cat /var/ipfire/vpn/config 1,on,remotebackup,,net,psk,redacted=,off,@siteb.domain.tld,192.168.10.0/24|192.168.55.0/24|192.168.92.0/24,@sitea.domain.tld,sitea.domain.tld,10.219.220.1/32,off,,,off,3,1,aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gcm64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256,4096|3072|2048,aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gcm64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256,4096|3072|2048,on,,,restart,on,ikev2,120,30,off,start,900 2,on,radmin,,net,psk,redacted=,off,@siteb.domain.tld,192.168.10.0/24|192.168.55.0/24|192.168.92.0/24,@sitec.domain.tld,sitec.domain.tld,172.28.7.0/24|172.28.9.0/24|172.28.10.0/23,off,,,off,3,1,aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gcm64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256,4096|3072|2048,aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gcm64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256,4096|3072|2048,on,,,restart,on,ikev2,120,30,off,start,900 site 1: [root@firewall ~]# cat /var/ipfire/vpn/config 1,on,remotebackup,,net,psk,redacted=,off,@sitea.domain.tld,10.219.220.1/24,@siteb.domain.tld,siteb.domain.tld,192.168.10.0/24|192.168.55.0/24|192.168.92.0/24,off,,,off,3,1,aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gcm64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256,4096|3072|2048,aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gcm64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256,4096|3072|2048,on,,,restart,on,ikev2,120,30,off,start,900 I'll keep an eye on the logs and see what happens until tomorrow morning. This might be related to #11559 since it seems to occur only in case multiple IPsec connections are announcing the same source and/or destination networks. (In reply to Oliver Fuhrer from comment #12) > Yes, all involved endpoints are ipfire systems (site 2 has tunnels to ipfire > 1 and ipfire 3). Tunnel Settings were identical, however I recreated all > tunnels in the meantime using PSK instead of certificates and using ipfire's > default values except for deselecting curve25519. > > New config for all 3 involved systems: > site 3: > [root@firewall ~]# cat /var/ipfire/vpn/config > 1,on,radmin,,net,psk,redacted=,off,@sitec.domain.tld,172.28.7.0/24|172.28.9. > 0/24|172.28.10.0/23,@siteb.domain.tld,siteb.domain.tld,192.168.10.0/24|192. > 168.55.0/24|192.168.92.0/24,off,,,off,3,1, > aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gc > m64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256, > 4096|3072|2048, > aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gc > m64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256, > 4096|3072|2048,on,,,restart,on,ikev2,120,30,off,start,900 > > site 2: > [root@firewall ~]# cat /var/ipfire/vpn/config > 1,on,remotebackup,,net,psk,redacted=,off,@siteb.domain.tld,192.168.10.0/ > 24|192.168.55.0/24|192.168.92.0/24,@sitea.domain.tld,sitea.domain.tld,10.219. > 220.1/32,off,,,off,3,1, > aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gc > m64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256, > 4096|3072|2048, > aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gc > m64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256, > 4096|3072|2048,on,,,restart,on,ikev2,120,30,off,start,900 > 2,on,radmin,,net,psk,redacted=,off,@siteb.domain.tld,192.168.10.0/24|192.168. > 55.0/24|192.168.92.0/24,@sitec.domain.tld,sitec.domain.tld,172.28.7.0/24|172. > 28.9.0/24|172.28.10.0/23,off,,,off,3,1, > aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gc > m64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256, > 4096|3072|2048, > aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gc > m64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256, > 4096|3072|2048,on,,,restart,on,ikev2,120,30,off,start,900 > > site 1: > [root@firewall ~]# cat /var/ipfire/vpn/config > 1,on,remotebackup,,net,psk,redacted=,off,@sitea.domain.tld,10.219.220.1/24, > @siteb.domain.tld,siteb.domain.tld,192.168.10.0/24|192.168.55.0/24|192.168. > 92.0/24,off,,,off,3,1, > aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gc > m64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256, > 4096|3072|2048, > aes256gcm128|aes256gcm96|aes256gcm64|aes256|aes192gcm128|aes192gcm96|aes192gc > m64|aes192|aes128gcm128|aes128gcm96|aes128gcm64|aes128,sha2_512|sha2_256, > 4096|3072|2048,on,,,restart,on,ikev2,120,30,off,start,900 > > I'll keep an eye on the logs and see what happens until tomorrow morning. Could you rather post /etc/ipsec.conf? That is easier to read. Created attachment 572 [details]
ipsec.conf
Consolidated ipsec.conf Files of all 3 IPFire Systems.
Disabling Curve25519 did not solve the problem here - after a while, the connection becomes unstable and switches up and down every ~ 10 seconds. Not very surprisingly, quality of VoIP calls over VPN becomes poor. I guess this problem is neither related to multiple connection announcing the same routes partial (as suggested somewhere) nor to Curve25519 - at least not on my systems. :-) In case anybody needs further information here, I'm happy to provide them. @Oliver: Did you install an experimental Core Update? @Peter: Can you post your ipsec.conf as well? Yes, site 2 ist currently running on core 120, the other 2 are on core 119. However the upgrade was done after the ipsec problems occured. If necessary I can wipe it and reinstall with core 119. Regards Oliver Not necessary. But if you could download the latest version of vpnmain.cgi from https://git.ipfire.org/?p=ipfire-2.x.git;a=blob;f=html /cgi-bin/vpnmain.cgi;h=a52b4d64d9b48d84babe8ec8d1220e4b3d4ddd01;hb=HEAD and copy it to /srv/web/ipfire/cgi-bin/vpnmain.cgi, then edit the connection and hit save so that configuration files are rewritten, that would help. Potentially run "ipsec restart" on the console to restart the entire IPsec stack. Hi Michael I just downloaded the latest vpnmain.cgi to the Core 120 System and diff did not show any changes. Regards Oliver Hmm, it should. This commit should have been reverted: https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=a261cb06c6cdd3ba14ad0163 c8c9e714ae94fc5b That did not land in master nor next, which isn't good. Could you manually remove that line? Ok, I removed the following Lines from the core 120 System, edited and saved all tunnels and bounced ipsec via init Script. So far no change. I will restart ipsec on the remote sites as well in a couple of hours. # Restart the connection immediately when it has gone down # unexpectedly if ($start_action eq 'start') { print CONF "\tcloseaction=restart\n"; } Oliver Yes, please do this on both sides and make sure that strongswan has actually reloaded the configuration. Also changing to on-demand mode should stop the flapping instantly. Hi Michael Thanks a lot. So far, the Logs are quiet again and the tunnels seem to be much more stable. Removing closeaction=restart from ipsec.conf and changing the tunnes to on demand did the trick. Regards Oliver Great. Thanks for the feedback. That shouldn't have been there in the first place. Is everything still okay? Peter, could you confirm the same problem? Flapping VPN tunnels also appear using curves other than Curve25519 here. I am also able to exclude multiple tunnels announcing the same routes partial as a reason for this, as I suspected initially. @Michael: I did not try to apply your patch, yet. Working on it and get back... Om my side tunnels seem to be stable for some time now using On Demand connect. I also cannot see multiple active tunnels in ipsecctrl's output, which is fine for me. Is there anything else I should test in my setups? Regards Oliver Thanks for the feedback. Multiple ESP sessions may happen. There should not be too many. If tunnels are stable for you now, I think we have found the culprit and can consider this as being fixed. (In reply to Michael Tremer from comment #30) > Thanks for the feedback. > > Multiple ESP sessions may happen. There should not be too many. Yes, my constructions was more like a crutch, to work around #11559. > > If tunnels are stable for you now, I think we have found the culprit and can > consider this as being fixed. Are they running stable without "on demand" mode as well? Also changing the bug title so it actual reflects what the problem was. I ported the patch to the core120 branch since we need to build again and I consider this a bug that is severe enough to be a higher priority. Core Update 120 has been released. Will close this as soon as tests show problem gone. Good work! Confirm problem is solved after upgrading to Core Update 120 and reboot. |