Bug 10998 - IPSec Rekey Collisions May Result in IPTable Rule Deletion
Summary: IPSec Rekey Collisions May Result in IPTable Rule Deletion
Status: CLOSED FIXED
Alias: None
Product: IPFire
Classification: Unclassified
Component: --- (show other bugs)
Version: 2
Hardware: unspecified Linux
: - Unknown - Major Usability
Assignee: Michael Tremer
QA Contact: Arne.F
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-15 15:12 UTC by Tom Rymes
Modified: 2016-04-23 00:18 UTC (History)
0 users

See Also:


Attachments
strongswan 5.3.5 for IPFire (i586) (1.03 MB, application/x-tar-gz)
2015-12-19 15:08 UTC, Michael Tremer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tom Rymes 2015-12-15 15:12:56 UTC
As Discussed on the forum here: http://forum.ipfire.org/viewtopic.php?f=27&t=12628

Yesterday we encountered an issue where an IPSec tunnel was up and connected, but traffic did not pass across it. I manually forced it down using 'ipsec down tunnelname' and then brought it back up. When I brought it down I noticed errors related to IPTables rules not existing. When going through the logs, I noticed that the tunnel had a rekey collision (two, I think) and I then found two messages to the StrongSwan Users mailing list that deal with what seems to be the same issue (one from another IPFire user):

https://lists.strongswan.org/pipermail/users/2015-January/007365.html
https://lists.strongswan.org/pipermail/users/2014-August/006418.html

My question is whether this is completely a StrongSwan issue, or if it is related to IPFire's implementation. Based on what I have, it seems that leftfirewall=yes and an updown script that flushes the rules can cause this issue, and I am not certain if IPFire is using the default updown script. If this is a StrongSwan issue, please let me know and I will push back on them for more details.

Here is a snippet from the kernel log showing the initial collision(s):
Dec 14 13:17:28 ipfire charon: 11[KNL] creating rekey job for CHILD_SA ESP/0xc68f239f/xxx.xxx.xxx.xxx
Dec 14 13:17:28 ipfire charon: 11[IKE] establishing CHILD_SA HudsonNew{38}
Dec 14 13:17:28 ipfire charon: 11[IKE] establishing CHILD_SA HudsonNew{38}
Dec 14 13:17:28 ipfire charon: 11[ENC] generating CREATE_CHILD_SA request 14 [ N(REKEY_SA) N(IPCOMP_SUP) SA No KE TSi TSr ]
Dec 14 13:17:28 ipfire charon: 11[NET] sending packet: from xxx.xxx.xxx.xxx[4500] to yyy.yyy.yyy.yyy[4500] (344 bytes)
Dec 14 13:17:28 ipfire charon: 05[NET] received packet: from yyy.yyy.yyy.yyy[4500] to xxx.xxx.xxx.xxx[4500] (444 bytes)
Dec 14 13:17:28 ipfire charon: 05[ENC] parsed CREATE_CHILD_SA request 10 [ N(REKEY_SA) N(IPCOMP_SUP) SA No KE TSi TSr ]
Dec 14 13:17:28 ipfire charon: 05[IKE] CHILD_SA HudsonNew{10483} established with SPIs cc94730d_i c654401c_o and TS 10.100.0.0/23 === 192.168.0.0/24
Dec 14 13:17:28 ipfire charon: 05[IKE] CHILD_SA HudsonNew{10483} established with SPIs cc94730d_i c654401c_o and TS 10.100.0.0/23 === 192.168.0.0/24
Dec 14 13:17:28 ipfire charon: 05[IKE] detected CHILD_REKEY collision with CHILD_REKEY
Dec 14 13:17:28 ipfire charon: 05[ENC] generating CREATE_CHILD_SA response 10 [ N(IPCOMP_SUP) SA No KE TSi TSr ]
Dec 14 13:17:28 ipfire charon: 05[NET] sending packet: from xxx.xxx.xxx.xxx[4500] to yyy.yyy.yyy.yyy[4500] (332 bytes)
Dec 14 13:17:28 ipfire charon: 12[NET] received packet: from yyy.yyy.yyy.yyy[4500] to xxx.xxx.xxx.xxx[4500] (332 bytes)
Dec 14 13:17:28 ipfire charon: 12[ENC] parsed CREATE_CHILD_SA response 14 [ N(IPCOMP_SUP) SA No KE TSi TSr ]
Dec 14 13:17:28 ipfire charon: 12[IKE] CHILD_SA HudsonNew{10482} established with SPIs cc4f7fb7_i ccc34f5a_o and TS 10.100.0.0/23 === 192.168.0.0/24
Dec 14 13:17:28 ipfire charon: 12[IKE] CHILD_SA HudsonNew{10482} established with SPIs cc4f7fb7_i ccc34f5a_o and TS 10.100.0.0/23 === 192.168.0.0/24
Dec 14 13:17:28 ipfire charon: 12[IKE] CHILD_SA rekey collision lost, deleting rekeyed child
Dec 14 13:17:29 ipfire charon: 12[IKE] closing CHILD_SA HudsonNew{10482} with SPIs cc4f7fb7_i (120 bytes) ccc34f5a_o (145 bytes) and TS 10.100.0.0/23 === 192.168.0.0/24
Dec 14 13:17:29 ipfire charon: 12[IKE] closing CHILD_SA HudsonNew{10482} with SPIs cc4f7fb7_i (120 bytes) ccc34f5a_o (145 bytes) and TS 10.100.0.0/23 === 192.168.0.0/24
Dec 14 13:17:29 ipfire charon: 12[IKE] sending DELETE for ESP CHILD_SA with SPI cc4f7fb7
Dec 14 13:17:29 ipfire charon: 12[ENC] generating INFORMATIONAL request 15 [ D ]
Dec 14 13:17:29 ipfire charon: 12[NET] sending packet: from xxx.xxx.xxx.xxx[4500] to yyy.yyy.yyy.yyy[4500] (69 bytes)
Dec 14 13:17:29 ipfire charon: 06[NET] received packet: from yyy.yyy.yyy.yyy[4500] to xxx.xxx.xxx.xxx[4500] (69 bytes)
Dec 14 13:17:29 ipfire charon: 06[ENC] parsed INFORMATIONAL request 11 [ D ]
Dec 14 13:17:29 ipfire charon: 06[IKE] received DELETE for ESP CHILD_SA with SPI c46ba5ea
Dec 14 13:17:29 ipfire charon: 06[IKE] closing CHILD_SA HudsonNew{10381} with SPIs c68f239f_i (83067013 bytes) c46ba5ea_o (68814191 bytes) and TS 10.100.0.0/23 === 192.168.0.0/24
Dec 14 13:17:29 ipfire charon: 06[IKE] closing CHILD_SA HudsonNew{10381} with SPIs c68f239f_i (83067013 bytes) c46ba5ea_o (68814191 bytes) and TS 10.100.0.0/23 === 192.168.0.0/24
Dec 14 13:17:29 ipfire charon: 06[IKE] sending DELETE for ESP CHILD_SA with SPI c68f239f
Dec 14 13:17:29 ipfire charon: 06[IKE] CHILD_SA closed
Dec 14 13:17:29 ipfire charon: 06[IKE] detected CHILD_REKEY collision with CHILD_DELETE
Dec 14 13:17:29 ipfire charon: 06[ENC] generating INFORMATIONAL response 11 [ D ]
Dec 14 13:17:29 ipfire charon: 06[NET] sending packet: from xxx.xxx.xxx.xxx[4500] to yyy.yyy.yyy.yyy[4500] (69 bytes)
Dec 14 13:17:29 ipfire charon: 15[NET] received packet: from yyy.yyy.yyy.yyy[4500] to xxx.xxx.xxx.xxx[4500] (69 bytes)
Dec 14 13:17:29 ipfire charon: 15[ENC] parsed INFORMATIONAL response 15 [ D ]
Dec 14 13:17:29 ipfire charon: 15[IKE] received DELETE for ESP CHILD_SA with SPI ccc34f5a
Dec 14 13:17:29 ipfire charon: 15[IKE] CHILD_SA closed
Dec 14 13:17:29 ipfire vpn: client- C=US, ST=TX, O=MyOrg, OU=Engineering Dept, CN=remoteoffice.mycompany.com 192.168.0.0/24 == yyy.yyy.yyy.yyy -- xxx.xxx.xxx.xxx == 10.100.0.0/23
Dec 14 13:17:29 ipfire vpn: tunnel- yyy.yyy.yyy.yyy -- xxx.xxx.xxx.xxx
Dec 14 13:17:29 ipfire vpn: snat- red0-xxx.xxx.xxx.xxx : 192.168.0.0/24 - 10.100.0.1

And here is a snippet from when I brought the tunnel down manually (including the IPTables errors:
Dec 14 13:27:23 ipfire charon: 11[CFG] received stroke: terminate 'HudsonNew'
Dec 14 13:27:23 ipfire charon: 03[IKE] deleting IKE_SA HudsonNew[365] between xxx.xxx.xxx.xxx[C=US, ST=TX, O=MyOrg, OU=Engineering Dept, CN=headquarters.mycompany.com]...yyy.yyy.yyy.yyy[C=US, ST=TX, O=MyOrg, OU=Engineering Dept, CN=remoteoffice.mycompany.com]
Dec 14 13:27:23 ipfire charon: 03[IKE] deleting IKE_SA HudsonNew[365] between xxx.xxx.xxx.xxx[C=US, ST=TX, O=MyOrg, OU=Engineering Dept, CN=headquarters.mycompany.com]...yyy.yyy.yyy.yyy[C=US, ST=TX, O=MyOrg, OU=Engineering Dept, CN=remoteoffice.mycompany.com]
Dec 14 13:27:23 ipfire charon: 03[IKE] sending DELETE for IKE_SA HudsonNew[365]
Dec 14 13:27:23 ipfire charon: 03[ENC] generating INFORMATIONAL request 17 [ D ]
Dec 14 13:27:23 ipfire charon: 03[NET] sending packet: from xxx.xxx.xxx.xxx[4500] to yyy.yyy.yyy.yyy[4500] (65 bytes)
Dec 14 13:27:23 ipfire charon: 10[NET] received packet: from yyy.yyy.yyy.yyy[4500] to xxx.xxx.xxx.xxx[4500] (57 bytes)
Dec 14 13:27:23 ipfire charon: 10[ENC] parsed INFORMATIONAL response 17 [ ]
Dec 14 13:27:23 ipfire charon: 10[IKE] IKE_SA deleted
Dec 14 13:27:23 ipfire charon: 10[IKE] IKE_SA deleted
Dec 14 13:27:23 ipfire charon: 10[CHD] updown: iptables: No chain/target/match by that name.
Dec 14 13:27:23 ipfire charon: 10[CHD] updown: iptables: Bad rule (does a matching rule exist in that chain?).
Dec 14 13:27:23 ipfire charon: 10[CHD] updown: iptables: Bad rule (does a matching rule exist in that chain?).
Dec 14 13:27:23 ipfire charon: 10[CHD] updown: iptables: No chain/target/match by that name.
Dec 14 13:27:23 ipfire charon: 10[CHD] updown: iptables: Bad rule (does a matching rule exist in that chain?).
Dec 14 13:27:23 ipfire vpn: client- C=US, ST=TX, O=MyOrg, OU=Engineering Dept, CN=remoteoffice.mycompany.com 192.168.0.0/24 == yyy.yyy.yyy.yyy -- xxx.xxx.xxx.xxx == 10.100.0.0/23
Dec 14 13:27:23 ipfire charon: 10[CHD] updown: iptables: Bad rule (does a matching rule exist in that chain?).
Dec 14 13:27:23 ipfire last message repeated 2 times
Dec 14 13:27:23 ipfire vpn: tunnel- yyy.yyy.yyy.yyy -- xxx.xxx.xxx.xxx
Dec 14 13:27:23 ipfire charon: 10[CHD] updown: iptables: No chain/target/match by that name.
Dec 14 13:27:23 ipfire vpn: snat- unknown-xxx.xxx.xxx.xxx : 192.168.0.0/24 - 10.100.0.1
Comment 1 Michael Tremer 2015-12-15 15:19:17 UTC
Do you know if there is a bug report about this upstream?
Comment 2 Tom Rymes 2015-12-15 15:22:39 UTC
Not that I am aware of, and those list messages are from February and August, if memory serves, so I'm guessing that if this is a StrongSwan issue, they are not fully aware of it.
Comment 3 Tom Rymes 2015-12-15 15:49:04 UTC
Looks like I spoke too soon. There is this issue:

https://wiki.strongswan.org/issues/853

and a duplicate:

https://wiki.strongswan.org/issues/959

I don't see any acknowledgment that this is a StrongSwan issue by the devs, but it has been reported.
Comment 4 Tom Rymes 2015-12-15 19:03:07 UTC
Michael,

The Strongswan team has updated that bug with a proposed patch. I am not really certain as to how to test it, though.

Issue #853 has been updated by Tobias Brunner.
Tracker changed from Issue to Bug
Status changed from New to Feedback
Please try the patch in the 853-updown-rekey-collision branch. It tries to suppress the updown event during rekeying in case an installed redundant CHILD_SA is deleted.

Tom
Comment 5 Michael Tremer 2015-12-15 20:36:35 UTC
This looks like a plausible solution. Should I build you an IPFire build so that you can test the patch? Can you reproduce the issue easily? Would an ISO image be more helpful or just a tarball with the compiled version of strongswan?
Comment 6 Tom Rymes 2015-12-15 20:43:59 UTC
To be Honest, Michael, the original poster on the forum was able to reproduce the issue, while I am not really able to do so. I can try to install a tarball and check the log for messages periodically, but that does not seem to be much of a solution.

I'll try to reach out to the original poster on the forum, but he seems to have made himself scarce.
Comment 7 Tom Rymes 2015-12-15 20:49:32 UTC
I'm not certain if this is relevant, but I am not seeing entries like this in my kernel logs:

charon: 07[CHD] running updown script:

Should I be looking elsewhere?

Also, would it make sense to have IPSec place its logs in its own file? Especially with multiple tunnels (we have 20 or so), the kernel log is completely polluted with IPSec entries.
Comment 8 Michael Tremer 2015-12-15 21:43:46 UTC
(In reply to Tom Rymes from comment #7)
> I'm not certain if this is relevant, but I am not seeing entries like this
> in my kernel logs:
> 
> charon: 07[CHD] running updown script:

This may happen if logging is not configured to log these things. We do not log that much - even if it is already a lot.

> Should I be looking elsewhere?
> 
> Also, would it make sense to have IPSec place its logs in its own file?
> Especially with multiple tunnels (we have 20 or so), the kernel log is
> completely polluted with IPSec entries.

We could also consider to turn the log level down. Could you open an extra report for that?
Comment 9 Tom Rymes 2015-12-15 23:55:07 UTC
Added a new request here: https://bugzilla.ipfire.org/show_bug.cgi?id=11001

I don't think reducing IPSEc logging levels would be a good thing, rather, placing them in their own file (see https://wiki.strongswan.org/projects/strongswan/wiki/LoggerConfiguration) would make them more readable, while simultaneously makig the kernel log significantly easier to parse for non-IPSec items.
Comment 10 Tom Rymes 2015-12-19 03:37:13 UTC
Michael: I have been able to set up a test installation and can reproduce this issue. If you can send me a tarball with instructions for installation suitable for a five-year-old, I can install it and test to see if it fixes the issue.

Tom
Comment 11 Michael Tremer 2015-12-19 15:08:37 UTC
Created attachment 387 [details]
strongswan 5.3.5 for IPFire (i586)

This is an easy one. Download the attached file and copy it to the firewall system.

Then execute "tar xvfa stronsgwan-5.3.5-for-ipfire.tar.gz -C /" and restart strongswan by executing "ipsec restart".
Comment 12 Michael Tremer 2015-12-19 19:27:44 UTC
http://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=4d7f9a81ac575207edb6bb69f8bbea8762feab96

The changes will also be in the next nightly build.
Comment 13 Tom Rymes 2015-12-19 23:29:15 UTC
From what I can tell, this has solved the issue. I have had a tunnel up with these settings all day:

keylife=3m
rekeymargin=1m
rekeyfuzz=0%

It would normally exhibit the issue almost immediately, but it has been up all day now, with no seeming issues. a snippet from the command 'grep coll /var/log/messages':

Dec 19 16:42:02 laconia charon: 05[IKE] detected CHILD_REKEY collision with CHILD_REKEY 
Dec 19 16:42:03 laconia charon: 11[IKE] CHILD_SA rekey collision lost, deleting rekeyed child 
Dec 19 16:42:04 laconia charon: 03[IKE] detected CHILD_REKEY collision with CHILD_DELETE 
Dec 19 16:44:15 laconia charon: 15[IKE] detected CHILD_REKEY collision with CHILD_REKEY 
Dec 19 16:44:15 laconia charon: 14[IKE] CHILD_SA rekey collision lost, deleting rekeyed child 
Dec 19 16:44:16 laconia charon: 02[IKE] detected CHILD_REKEY collision with CHILD_DELETE 
Dec 19 16:46:27 laconia charon: 02[IKE] detected CHILD_REKEY collision with CHILD_REKEY 
Dec 19 16:46:27 laconia charon: 06[IKE] CHILD_SA rekey collision lost, deleting rekeyed child 
Dec 19 16:46:28 laconia charon: 05[IKE] detected CHILD_REKEY collision with CHILD_DELETE 
Dec 19 16:48:39 laconia charon: 10[IKE] detected CHILD_REKEY collision with CHILD_REKEY 
Dec 19 16:48:39 laconia charon: 11[IKE] CHILD_SA rekey collision won, deleting old child 
Dec 19 16:50:51 laconia charon: 13[IKE] detected CHILD_REKEY collision with CHILD_DELETE 
Dec 19 16:53:02 laconia charon: 15[IKE] detected CHILD_REKEY collision with CHILD_REKEY 
Dec 19 16:53:02 laconia charon: 12[IKE] CHILD_SA rekey collision won, deleting old child 
Dec 19 16:55:14 laconia charon: 03[IKE] detected CHILD_REKEY collision with CHILD_DELETE 
Dec 19 16:57:25 laconia charon: 04[IKE] detected CHILD_REKEY collision with CHILD_REKEY 
Dec 19 16:57:25 laconia charon: 12[IKE] CHILD_SA rekey collision won, deleting old child 
Dec 19 16:59:37 laconia charon: 05[IKE] detected CHILD_REKEY collision with CHILD_DELETE 
Dec 19 17:01:48 laconia charon: 15[IKE] detected CHILD_REKEY collision with CHILD_REKEY 
Dec 19 17:01:48 laconia charon: 04[IKE] CHILD_SA rekey collision won, deleting old child 
Dec 19 17:04:00 laconia charon: 11[IKE] detected CHILD_REKEY collision with CHILD_DELETE 
Dec 19 17:06:11 laconia charon: 15[IKE] detected CHILD_REKEY collision with CHILD_REKEY 
Dec 19 17:06:11 laconia charon: 13[IKE] CHILD_SA rekey collision lost, deleting rekeyed child 
Dec 19 17:06:12 laconia charon: 02[IKE] detected CHILD_REKEY collision with CHILD_DELETE 

Similarly, the IPTables chains are still populated:

Chain IPSECBLOCK (2 references)
target     prot opt source               destination         
REJECT     all  --  anywhere             10.100.0.0/23        reject-with icmp-net-unreachable
Chain IPSECINPUT (1 references)
target     prot opt source               destination         
ACCEPT     esp  --  mydom.dom            other.dom           
ACCEPT     ah   --  mydom.dom            other.dom           
ACCEPT     ip   --  mydom.dom            other.dom           
ACCEPT     ip   --  mydom.dom            other.dom            policy match dir in pol ipsec reqid 3 proto esp
RETURN     all  --  10.100.0.0/23        10.44.0.0/24         policy match dir in pol ipsec reqid 3 proto esp
ACCEPT     udp  --  anywhere             anywhere             udp dpt:isakmp
ACCEPT     udp  --  anywhere             anywhere             udp dpt:ipsec-nat-t
ACCEPT     udp  --  anywhere             anywhere             udp dpt:isakmp
ACCEPT     udp  --  anywhere             anywhere             udp dpt:ipsec-nat-t
Chain IPSECOUTPUT (1 references)
target     prot opt source               destination         
MARK       all  --  10.44.0.0/24         10.100.0.0/23        policy match dir out pol ipsec reqid 3 proto esp MARK set 0x32
Chain IPSECFORWARD (1 references)
target     prot opt source               destination         
RETURN     all  --  10.100.0.0/23        10.44.0.0/24         policy match dir in pol ipsec reqid 3 proto esp
MARK       all  --  10.44.0.0/24         10.100.0.0/23        policy match dir out pol ipsec reqid 3 proto esp MARK set 0x32

Unfortunately, I was rushing this morning and forgot to update both sides of the tunnel, so the other side had the issue and traffic was not traversing the tunnel when I got back this afternoon, so I have updated the other side and I will confirm that traffic is moving still in a number of hours.

Tom
Comment 14 Michael Tremer 2015-12-19 23:45:57 UTC
(In reply to Tom Rymes from comment #13)
> keylife=3m
> rekeymargin=1m
> rekeyfuzz=0%

Are you using these settings just for reproducing the bug?
Comment 15 Tom Rymes 2015-12-20 00:01:57 UTC
Yes, only to force re-key collisions.
Comment 16 Tom Rymes 2015-12-20 01:20:57 UTC
I have had the tunnel up and running for just under two hours and all is still fine. It does seem like this fix has worked.
Comment 17 Tom Rymes 2015-12-20 15:02:44 UTC
Another 12+ hours and hundreds of collisions later it's still working. I will report back to StrongSwan.
Comment 18 Michael Tremer 2016-01-06 19:20:20 UTC
Alright. Since this is scheduled for release with C96, I will set this to ON_QA.