As Discussed on the forum here: http://forum.ipfire.org/viewtopic.php?f=27&t=12628 Yesterday we encountered an issue where an IPSec tunnel was up and connected, but traffic did not pass across it. I manually forced it down using 'ipsec down tunnelname' and then brought it back up. When I brought it down I noticed errors related to IPTables rules not existing. When going through the logs, I noticed that the tunnel had a rekey collision (two, I think) and I then found two messages to the StrongSwan Users mailing list that deal with what seems to be the same issue (one from another IPFire user): https://lists.strongswan.org/pipermail/users/2015-January/007365.html https://lists.strongswan.org/pipermail/users/2014-August/006418.html My question is whether this is completely a StrongSwan issue, or if it is related to IPFire's implementation. Based on what I have, it seems that leftfirewall=yes and an updown script that flushes the rules can cause this issue, and I am not certain if IPFire is using the default updown script. If this is a StrongSwan issue, please let me know and I will push back on them for more details. Here is a snippet from the kernel log showing the initial collision(s): Dec 14 13:17:28 ipfire charon: 11[KNL] creating rekey job for CHILD_SA ESP/0xc68f239f/xxx.xxx.xxx.xxx Dec 14 13:17:28 ipfire charon: 11[IKE] establishing CHILD_SA HudsonNew{38} Dec 14 13:17:28 ipfire charon: 11[IKE] establishing CHILD_SA HudsonNew{38} Dec 14 13:17:28 ipfire charon: 11[ENC] generating CREATE_CHILD_SA request 14 [ N(REKEY_SA) N(IPCOMP_SUP) SA No KE TSi TSr ] Dec 14 13:17:28 ipfire charon: 11[NET] sending packet: from xxx.xxx.xxx.xxx[4500] to yyy.yyy.yyy.yyy[4500] (344 bytes) Dec 14 13:17:28 ipfire charon: 05[NET] received packet: from yyy.yyy.yyy.yyy[4500] to xxx.xxx.xxx.xxx[4500] (444 bytes) Dec 14 13:17:28 ipfire charon: 05[ENC] parsed CREATE_CHILD_SA request 10 [ N(REKEY_SA) N(IPCOMP_SUP) SA No KE TSi TSr ] Dec 14 13:17:28 ipfire charon: 05[IKE] CHILD_SA HudsonNew{10483} established with SPIs cc94730d_i c654401c_o and TS 10.100.0.0/23 === 192.168.0.0/24 Dec 14 13:17:28 ipfire charon: 05[IKE] CHILD_SA HudsonNew{10483} established with SPIs cc94730d_i c654401c_o and TS 10.100.0.0/23 === 192.168.0.0/24 Dec 14 13:17:28 ipfire charon: 05[IKE] detected CHILD_REKEY collision with CHILD_REKEY Dec 14 13:17:28 ipfire charon: 05[ENC] generating CREATE_CHILD_SA response 10 [ N(IPCOMP_SUP) SA No KE TSi TSr ] Dec 14 13:17:28 ipfire charon: 05[NET] sending packet: from xxx.xxx.xxx.xxx[4500] to yyy.yyy.yyy.yyy[4500] (332 bytes) Dec 14 13:17:28 ipfire charon: 12[NET] received packet: from yyy.yyy.yyy.yyy[4500] to xxx.xxx.xxx.xxx[4500] (332 bytes) Dec 14 13:17:28 ipfire charon: 12[ENC] parsed CREATE_CHILD_SA response 14 [ N(IPCOMP_SUP) SA No KE TSi TSr ] Dec 14 13:17:28 ipfire charon: 12[IKE] CHILD_SA HudsonNew{10482} established with SPIs cc4f7fb7_i ccc34f5a_o and TS 10.100.0.0/23 === 192.168.0.0/24 Dec 14 13:17:28 ipfire charon: 12[IKE] CHILD_SA HudsonNew{10482} established with SPIs cc4f7fb7_i ccc34f5a_o and TS 10.100.0.0/23 === 192.168.0.0/24 Dec 14 13:17:28 ipfire charon: 12[IKE] CHILD_SA rekey collision lost, deleting rekeyed child Dec 14 13:17:29 ipfire charon: 12[IKE] closing CHILD_SA HudsonNew{10482} with SPIs cc4f7fb7_i (120 bytes) ccc34f5a_o (145 bytes) and TS 10.100.0.0/23 === 192.168.0.0/24 Dec 14 13:17:29 ipfire charon: 12[IKE] closing CHILD_SA HudsonNew{10482} with SPIs cc4f7fb7_i (120 bytes) ccc34f5a_o (145 bytes) and TS 10.100.0.0/23 === 192.168.0.0/24 Dec 14 13:17:29 ipfire charon: 12[IKE] sending DELETE for ESP CHILD_SA with SPI cc4f7fb7 Dec 14 13:17:29 ipfire charon: 12[ENC] generating INFORMATIONAL request 15 [ D ] Dec 14 13:17:29 ipfire charon: 12[NET] sending packet: from xxx.xxx.xxx.xxx[4500] to yyy.yyy.yyy.yyy[4500] (69 bytes) Dec 14 13:17:29 ipfire charon: 06[NET] received packet: from yyy.yyy.yyy.yyy[4500] to xxx.xxx.xxx.xxx[4500] (69 bytes) Dec 14 13:17:29 ipfire charon: 06[ENC] parsed INFORMATIONAL request 11 [ D ] Dec 14 13:17:29 ipfire charon: 06[IKE] received DELETE for ESP CHILD_SA with SPI c46ba5ea Dec 14 13:17:29 ipfire charon: 06[IKE] closing CHILD_SA HudsonNew{10381} with SPIs c68f239f_i (83067013 bytes) c46ba5ea_o (68814191 bytes) and TS 10.100.0.0/23 === 192.168.0.0/24 Dec 14 13:17:29 ipfire charon: 06[IKE] closing CHILD_SA HudsonNew{10381} with SPIs c68f239f_i (83067013 bytes) c46ba5ea_o (68814191 bytes) and TS 10.100.0.0/23 === 192.168.0.0/24 Dec 14 13:17:29 ipfire charon: 06[IKE] sending DELETE for ESP CHILD_SA with SPI c68f239f Dec 14 13:17:29 ipfire charon: 06[IKE] CHILD_SA closed Dec 14 13:17:29 ipfire charon: 06[IKE] detected CHILD_REKEY collision with CHILD_DELETE Dec 14 13:17:29 ipfire charon: 06[ENC] generating INFORMATIONAL response 11 [ D ] Dec 14 13:17:29 ipfire charon: 06[NET] sending packet: from xxx.xxx.xxx.xxx[4500] to yyy.yyy.yyy.yyy[4500] (69 bytes) Dec 14 13:17:29 ipfire charon: 15[NET] received packet: from yyy.yyy.yyy.yyy[4500] to xxx.xxx.xxx.xxx[4500] (69 bytes) Dec 14 13:17:29 ipfire charon: 15[ENC] parsed INFORMATIONAL response 15 [ D ] Dec 14 13:17:29 ipfire charon: 15[IKE] received DELETE for ESP CHILD_SA with SPI ccc34f5a Dec 14 13:17:29 ipfire charon: 15[IKE] CHILD_SA closed Dec 14 13:17:29 ipfire vpn: client- C=US, ST=TX, O=MyOrg, OU=Engineering Dept, CN=remoteoffice.mycompany.com 192.168.0.0/24 == yyy.yyy.yyy.yyy -- xxx.xxx.xxx.xxx == 10.100.0.0/23 Dec 14 13:17:29 ipfire vpn: tunnel- yyy.yyy.yyy.yyy -- xxx.xxx.xxx.xxx Dec 14 13:17:29 ipfire vpn: snat- red0-xxx.xxx.xxx.xxx : 192.168.0.0/24 - 10.100.0.1 And here is a snippet from when I brought the tunnel down manually (including the IPTables errors: Dec 14 13:27:23 ipfire charon: 11[CFG] received stroke: terminate 'HudsonNew' Dec 14 13:27:23 ipfire charon: 03[IKE] deleting IKE_SA HudsonNew[365] between xxx.xxx.xxx.xxx[C=US, ST=TX, O=MyOrg, OU=Engineering Dept, CN=headquarters.mycompany.com]...yyy.yyy.yyy.yyy[C=US, ST=TX, O=MyOrg, OU=Engineering Dept, CN=remoteoffice.mycompany.com] Dec 14 13:27:23 ipfire charon: 03[IKE] deleting IKE_SA HudsonNew[365] between xxx.xxx.xxx.xxx[C=US, ST=TX, O=MyOrg, OU=Engineering Dept, CN=headquarters.mycompany.com]...yyy.yyy.yyy.yyy[C=US, ST=TX, O=MyOrg, OU=Engineering Dept, CN=remoteoffice.mycompany.com] Dec 14 13:27:23 ipfire charon: 03[IKE] sending DELETE for IKE_SA HudsonNew[365] Dec 14 13:27:23 ipfire charon: 03[ENC] generating INFORMATIONAL request 17 [ D ] Dec 14 13:27:23 ipfire charon: 03[NET] sending packet: from xxx.xxx.xxx.xxx[4500] to yyy.yyy.yyy.yyy[4500] (65 bytes) Dec 14 13:27:23 ipfire charon: 10[NET] received packet: from yyy.yyy.yyy.yyy[4500] to xxx.xxx.xxx.xxx[4500] (57 bytes) Dec 14 13:27:23 ipfire charon: 10[ENC] parsed INFORMATIONAL response 17 [ ] Dec 14 13:27:23 ipfire charon: 10[IKE] IKE_SA deleted Dec 14 13:27:23 ipfire charon: 10[IKE] IKE_SA deleted Dec 14 13:27:23 ipfire charon: 10[CHD] updown: iptables: No chain/target/match by that name. Dec 14 13:27:23 ipfire charon: 10[CHD] updown: iptables: Bad rule (does a matching rule exist in that chain?). Dec 14 13:27:23 ipfire charon: 10[CHD] updown: iptables: Bad rule (does a matching rule exist in that chain?). Dec 14 13:27:23 ipfire charon: 10[CHD] updown: iptables: No chain/target/match by that name. Dec 14 13:27:23 ipfire charon: 10[CHD] updown: iptables: Bad rule (does a matching rule exist in that chain?). Dec 14 13:27:23 ipfire vpn: client- C=US, ST=TX, O=MyOrg, OU=Engineering Dept, CN=remoteoffice.mycompany.com 192.168.0.0/24 == yyy.yyy.yyy.yyy -- xxx.xxx.xxx.xxx == 10.100.0.0/23 Dec 14 13:27:23 ipfire charon: 10[CHD] updown: iptables: Bad rule (does a matching rule exist in that chain?). Dec 14 13:27:23 ipfire last message repeated 2 times Dec 14 13:27:23 ipfire vpn: tunnel- yyy.yyy.yyy.yyy -- xxx.xxx.xxx.xxx Dec 14 13:27:23 ipfire charon: 10[CHD] updown: iptables: No chain/target/match by that name. Dec 14 13:27:23 ipfire vpn: snat- unknown-xxx.xxx.xxx.xxx : 192.168.0.0/24 - 10.100.0.1
Do you know if there is a bug report about this upstream?
Not that I am aware of, and those list messages are from February and August, if memory serves, so I'm guessing that if this is a StrongSwan issue, they are not fully aware of it.
Looks like I spoke too soon. There is this issue: https://wiki.strongswan.org/issues/853 and a duplicate: https://wiki.strongswan.org/issues/959 I don't see any acknowledgment that this is a StrongSwan issue by the devs, but it has been reported.
Michael, The Strongswan team has updated that bug with a proposed patch. I am not really certain as to how to test it, though. Issue #853 has been updated by Tobias Brunner. Tracker changed from Issue to Bug Status changed from New to Feedback Please try the patch in the 853-updown-rekey-collision branch. It tries to suppress the updown event during rekeying in case an installed redundant CHILD_SA is deleted. Tom
This looks like a plausible solution. Should I build you an IPFire build so that you can test the patch? Can you reproduce the issue easily? Would an ISO image be more helpful or just a tarball with the compiled version of strongswan?
To be Honest, Michael, the original poster on the forum was able to reproduce the issue, while I am not really able to do so. I can try to install a tarball and check the log for messages periodically, but that does not seem to be much of a solution. I'll try to reach out to the original poster on the forum, but he seems to have made himself scarce.
I'm not certain if this is relevant, but I am not seeing entries like this in my kernel logs: charon: 07[CHD] running updown script: Should I be looking elsewhere? Also, would it make sense to have IPSec place its logs in its own file? Especially with multiple tunnels (we have 20 or so), the kernel log is completely polluted with IPSec entries.
(In reply to Tom Rymes from comment #7) > I'm not certain if this is relevant, but I am not seeing entries like this > in my kernel logs: > > charon: 07[CHD] running updown script: This may happen if logging is not configured to log these things. We do not log that much - even if it is already a lot. > Should I be looking elsewhere? > > Also, would it make sense to have IPSec place its logs in its own file? > Especially with multiple tunnels (we have 20 or so), the kernel log is > completely polluted with IPSec entries. We could also consider to turn the log level down. Could you open an extra report for that?
Added a new request here: https://bugzilla.ipfire.org/show_bug.cgi?id=11001 I don't think reducing IPSEc logging levels would be a good thing, rather, placing them in their own file (see https://wiki.strongswan.org/projects/strongswan/wiki/LoggerConfiguration) would make them more readable, while simultaneously makig the kernel log significantly easier to parse for non-IPSec items.
Michael: I have been able to set up a test installation and can reproduce this issue. If you can send me a tarball with instructions for installation suitable for a five-year-old, I can install it and test to see if it fixes the issue. Tom
Created attachment 387 [details] strongswan 5.3.5 for IPFire (i586) This is an easy one. Download the attached file and copy it to the firewall system. Then execute "tar xvfa stronsgwan-5.3.5-for-ipfire.tar.gz -C /" and restart strongswan by executing "ipsec restart".
http://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=4d7f9a81ac575207edb6bb69f8bbea8762feab96 The changes will also be in the next nightly build.
From what I can tell, this has solved the issue. I have had a tunnel up with these settings all day: keylife=3m rekeymargin=1m rekeyfuzz=0% It would normally exhibit the issue almost immediately, but it has been up all day now, with no seeming issues. a snippet from the command 'grep coll /var/log/messages': Dec 19 16:42:02 laconia charon: 05[IKE] detected CHILD_REKEY collision with CHILD_REKEY Dec 19 16:42:03 laconia charon: 11[IKE] CHILD_SA rekey collision lost, deleting rekeyed child Dec 19 16:42:04 laconia charon: 03[IKE] detected CHILD_REKEY collision with CHILD_DELETE Dec 19 16:44:15 laconia charon: 15[IKE] detected CHILD_REKEY collision with CHILD_REKEY Dec 19 16:44:15 laconia charon: 14[IKE] CHILD_SA rekey collision lost, deleting rekeyed child Dec 19 16:44:16 laconia charon: 02[IKE] detected CHILD_REKEY collision with CHILD_DELETE Dec 19 16:46:27 laconia charon: 02[IKE] detected CHILD_REKEY collision with CHILD_REKEY Dec 19 16:46:27 laconia charon: 06[IKE] CHILD_SA rekey collision lost, deleting rekeyed child Dec 19 16:46:28 laconia charon: 05[IKE] detected CHILD_REKEY collision with CHILD_DELETE Dec 19 16:48:39 laconia charon: 10[IKE] detected CHILD_REKEY collision with CHILD_REKEY Dec 19 16:48:39 laconia charon: 11[IKE] CHILD_SA rekey collision won, deleting old child Dec 19 16:50:51 laconia charon: 13[IKE] detected CHILD_REKEY collision with CHILD_DELETE Dec 19 16:53:02 laconia charon: 15[IKE] detected CHILD_REKEY collision with CHILD_REKEY Dec 19 16:53:02 laconia charon: 12[IKE] CHILD_SA rekey collision won, deleting old child Dec 19 16:55:14 laconia charon: 03[IKE] detected CHILD_REKEY collision with CHILD_DELETE Dec 19 16:57:25 laconia charon: 04[IKE] detected CHILD_REKEY collision with CHILD_REKEY Dec 19 16:57:25 laconia charon: 12[IKE] CHILD_SA rekey collision won, deleting old child Dec 19 16:59:37 laconia charon: 05[IKE] detected CHILD_REKEY collision with CHILD_DELETE Dec 19 17:01:48 laconia charon: 15[IKE] detected CHILD_REKEY collision with CHILD_REKEY Dec 19 17:01:48 laconia charon: 04[IKE] CHILD_SA rekey collision won, deleting old child Dec 19 17:04:00 laconia charon: 11[IKE] detected CHILD_REKEY collision with CHILD_DELETE Dec 19 17:06:11 laconia charon: 15[IKE] detected CHILD_REKEY collision with CHILD_REKEY Dec 19 17:06:11 laconia charon: 13[IKE] CHILD_SA rekey collision lost, deleting rekeyed child Dec 19 17:06:12 laconia charon: 02[IKE] detected CHILD_REKEY collision with CHILD_DELETE Similarly, the IPTables chains are still populated: Chain IPSECBLOCK (2 references) target prot opt source destination REJECT all -- anywhere 10.100.0.0/23 reject-with icmp-net-unreachable Chain IPSECINPUT (1 references) target prot opt source destination ACCEPT esp -- mydom.dom other.dom ACCEPT ah -- mydom.dom other.dom ACCEPT ip -- mydom.dom other.dom ACCEPT ip -- mydom.dom other.dom policy match dir in pol ipsec reqid 3 proto esp RETURN all -- 10.100.0.0/23 10.44.0.0/24 policy match dir in pol ipsec reqid 3 proto esp ACCEPT udp -- anywhere anywhere udp dpt:isakmp ACCEPT udp -- anywhere anywhere udp dpt:ipsec-nat-t ACCEPT udp -- anywhere anywhere udp dpt:isakmp ACCEPT udp -- anywhere anywhere udp dpt:ipsec-nat-t Chain IPSECOUTPUT (1 references) target prot opt source destination MARK all -- 10.44.0.0/24 10.100.0.0/23 policy match dir out pol ipsec reqid 3 proto esp MARK set 0x32 Chain IPSECFORWARD (1 references) target prot opt source destination RETURN all -- 10.100.0.0/23 10.44.0.0/24 policy match dir in pol ipsec reqid 3 proto esp MARK all -- 10.44.0.0/24 10.100.0.0/23 policy match dir out pol ipsec reqid 3 proto esp MARK set 0x32 Unfortunately, I was rushing this morning and forgot to update both sides of the tunnel, so the other side had the issue and traffic was not traversing the tunnel when I got back this afternoon, so I have updated the other side and I will confirm that traffic is moving still in a number of hours. Tom
(In reply to Tom Rymes from comment #13) > keylife=3m > rekeymargin=1m > rekeyfuzz=0% Are you using these settings just for reproducing the bug?
Yes, only to force re-key collisions.
I have had the tunnel up and running for just under two hours and all is still fine. It does seem like this fix has worked.
Another 12+ hours and hundreds of collisions later it's still working. I will report back to StrongSwan.
Alright. Since this is scheduled for release with C96, I will set this to ON_QA.