[strongSwan] Problem with rekey collisions

andreas.braun at bagus.de andreas.braun at bagus.de
Tue Jan 27 08:16:57 CET 2015


Hello Martin,

thank you very much for your message. 

>> Now my problem is that tunnels between devices break every few days. The log
>> states that strongSwan detects a CHILD_REKEY collision, then it resolves the
>> collision, but no traffic is going through.

>> I noticed that after a rekey collision, some iptables rules were gone. I
>> still have to verify this in the production system.

> Most likely we falsely invoke the updown script with a "down" event for
> that rekey collision, as that CHILD_SA gets deleted (but actually is
> still up with the "winning" instance). Probably we can fix that, but I
> don't know how trivial it is.

I have checked that and indeed the function "down-client:iptables" in the _updown script gets called after a collision. But not every time. It looks like only when it says "CHILD_SA rekey collision lost" or "CHILD_SA rekey collision won". Then some firewall rules gets deleted. Even with the more recent strongSwan 5.2.1 the problem still remains.

Here are some log entries when it happens:

09:50:49 charon:  15[IKE] detected CHILD_REKEY collision with CHILD_REKEY 
09:50:49 charon:  15[ENC] generating CREATE_CHILD_SA response 19 [ N(IPCOMP_SUP) SA No KE TSi TSr ] 
09:50:49 charon:  15[NET] sending packet: from 10.1.100.254[4500] to 10.1.100.178[4500] (476 bytes) 
09:50:49 charon:  11[NET] received packet: from 10.1.100.178[4500] to 10.1.100.254[4500] (476 bytes) 
09:50:49 charon:  11[ENC] parsed CREATE_CHILD_SA response 5 [ N(IPCOMP_SUP) SA No KE TSi TSr ] 
09:50:49 charon:  11[IKE] received IPCOMP_SUPPORTED notify 
09:50:49 charon:  11[CHD]   using AES_CBC for encryption 
09:50:49 charon:  11[CHD]   using HMAC_SHA1_96 for integrity 
09:50:49 charon:  11[CHD] adding inbound ESP SA 
09:50:49 charon:  11[CHD]   SPI 0xce1d4d5e, src 10.1.100.178 dst 10.1.100.254 
09:50:49 charon:  11[KNL] adding SAD entry with SPI 0000cb33 and reqid {1}  (mark 0/0x00000000) 
09:50:49 charon:  11[KNL]   using compression algorithm IPCOMP_DEFLATE 
09:50:49 charon:  11[KNL] adding SAD entry with SPI ce1d4d5e and reqid {1}  (mark 0/0x00000000) 
09:50:49 charon:  11[KNL]   using encryption algorithm AES_CBC with key size 128 
09:50:49 charon:  11[KNL]   using integrity algorithm HMAC_SHA1_96 with key size 160 
09:50:49 charon:  11[KNL]   using replay window of 32 packets 
09:50:49 charon:  11[CHD] adding outbound ESP SA 
09:50:49 charon:  11[CHD]   SPI 0xca7c0116, src 10.1.100.254 dst 10.1.100.178 
09:50:49 charon:  11[KNL] adding SAD entry with SPI 000015f8 and reqid {1}  (mark 0/0x00000000) 
09:50:49 charon:  11[KNL]   using compression algorithm IPCOMP_DEFLATE 
09:50:49 charon:  11[KNL] adding SAD entry with SPI ca7c0116 and reqid {1}  (mark 0/0x00000000) 
09:50:49 charon:  11[KNL]   using encryption algorithm AES_CBC with key size 128 
09:50:49 charon:  11[KNL]   using integrity algorithm HMAC_SHA1_96 with key size 160 
09:50:49 charon:  11[KNL]   using replay window of 32 packets 
09:50:49 charon:  11[KNL] policy 10.2.34.0/24 === 10.1.34.0/24 out  (mark 0/0x00000000) already exists, increasing refcount 
09:50:49 charon:  11[KNL] policy 10.1.34.0/24 === 10.2.34.0/24 in  (mark 0/0x00000000) already exists, increasing refcount 
09:50:49 charon:  11[KNL] policy 10.1.34.0/24 === 10.2.34.0/24 fwd  (mark 0/0x00000000) already exists, increasing refcount 
09:50:49 charon:  11[KNL] policy 10.2.34.0/24 === 10.1.34.0/24 out  (mark 0/0x00000000) already exists, increasing refcount 
09:50:49 charon:  11[KNL] updating policy 10.2.34.0/24 === 10.1.34.0/24 out  (mark 0/0x00000000) 
09:50:49 charon:  11[KNL] policy 10.1.34.0/24 === 10.2.34.0/24 in  (mark 0/0x00000000) already exists, increasing refcount 
09:50:49 charon:  11[KNL] updating policy 10.1.34.0/24 === 10.2.34.0/24 in  (mark 0/0x00000000) 
09:50:49 charon:  11[KNL] policy 10.1.34.0/24 === 10.2.34.0/24 fwd  (mark 0/0x00000000) already exists, increasing refcount 
09:50:49 charon:  11[KNL] updating policy 10.1.34.0/24 === 10.2.34.0/24 fwd  (mark 0/0x00000000) 
09:50:49 charon:  11[KNL] getting a local address in traffic selector 10.2.34.0/24 
09:50:49 charon:  11[KNL] using host 10.2.34.1 
09:50:49 charon:  11[KNL] using 10.1.100.178 as nexthop to reach 10.1.100.178/32 
09:50:49 charon:  11[KNL] 10.1.100.254 is on interface red0 
09:50:49 charon:  11[IKE] CHILD_SA VpnFrankfurtEssenTest{1} established with SPIs ce1d4d5e_i ca7c0116_o and TS 10.2.34.0/24 === 10.1.34.0/24  
09:50:49 charon:  11[IKE] CHILD_SA VpnFrankfurtEssenTest{1} established with SPIs ce1d4d5e_i ca7c0116_o and TS 10.2.34.0/24 === 10.1.34.0/24  
09:50:49 charon:  11[IKE] CHILD_SA rekey collision won, deleting old child 
09:50:49 charon:  11[IKE] reinitiating already active tasks 
09:50:49 charon:  11[IKE]   CHILD_REKEY task 
09:50:49 charon:  11[KNL] querying SAD entry with SPI c78986b4  (mark 0/0x00000000) 
09:50:49 charon:  11[KNL] querying SAD entry with SPI c5afd499  (mark 0/0x00000000) 
09:50:49 charon:  11[IKE] closing CHILD_SA VpnFrankfurtEssenTest{1} with SPIs c78986b4_i (0 bytes) c5afd499_o (0 bytes) and TS 10.2.34.0/24 === 10.1.34.0/24  
09:50:49 charon:  11[IKE] closing CHILD_SA VpnFrankfurtEssenTest{1} with SPIs c78986b4_i (0 bytes) c5afd499_o (0 bytes) and TS 10.2.34.0/24 === 10.1.34.0/24  
09:50:49 charon:  11[IKE] sending DELETE for ESP CHILD_SA with SPI c78986b4 
09:50:49 charon:  11[ENC] generating INFORMATIONAL request 6 [ D ] 
09:50:49 charon:  11[NET] sending packet: from 10.1.100.254[4500] to 10.1.100.178[4500] (76 bytes) 
09:50:49 charon:  13[NET] received packet: from 10.1.100.178[4500] to 10.1.100.254[4500] (76 bytes) 
09:50:49 charon:  13[ENC] parsed INFORMATIONAL request 20 [ D ] 
09:50:49 charon:  13[IKE] received DELETE for ESP CHILD_SA with SPI c4a8cd3a 
09:50:49 charon:  13[KNL] querying SAD entry with SPI c950c19c  (mark 0/0x00000000) 
09:50:49 charon:  13[KNL] querying SAD entry with SPI c4a8cd3a  (mark 0/0x00000000) 
09:50:49 charon:  13[IKE] closing CHILD_SA VpnFrankfurtEssenTest{1} with SPIs c950c19c_i (0 bytes) c4a8cd3a_o (0 bytes) and TS 10.2.34.0/24 === 10.1.34.0/24  
09:50:49 charon:  13[IKE] closing CHILD_SA VpnFrankfurtEssenTest{1} with SPIs c950c19c_i (0 bytes) c4a8cd3a_o (0 bytes) and TS 10.2.34.0/24 === 10.1.34.0/24  
09:50:49 charon:  13[IKE] sending DELETE for ESP CHILD_SA with SPI c950c19c 
09:50:49 charon:  13[IKE] CHILD_SA closed 
09:50:49 charon:  13[CHD] running updown script: 2>&1 PLUTO_VERSION='1.1' PLUTO_VERB='down-client' PLUTO_CONNECTION='VpnFrankfurtEssenTest' PLUTO_INTERFACE='red0' PLUTO_REQID='1' PLUTO_PROTO='esp' PLUTO_UNIQUEID='1' PLUTO_ME='10.1.100.254' PLUTO_MY_ID='10.1.100.254' PLUTO_MY_CLIENT='10.2.34.0/24' PLUTO_MY_PORT='0' PLUTO_MY_PROTOCOL='0' PLUTO_PEER='10.1.100.178' PLUTO_PEER_ID='10.1.100.178' PLUTO_PEER_CLIENT='10.1.34.0/24' PLUTO_PEER_PORT='0' PLUTO_PEER_PROTOCOL='0' PLUTO_IPCOMP='1' /usr/libexec/ipsec/_updown 


Other host:
09:50:47 charon:  10[IKE] detected CHILD_REKEY collision with CHILD_REKEY 
09:50:47 charon:  10[ENC] generating CREATE_CHILD_SA response 5 [ N(IPCOMP_SUP) SA No KE TSi TSr ] 
09:50:47 charon:  10[NET] sending packet: from 10.1.100.178[4500] to 10.1.100.254[4500] (476 bytes) 
09:50:47 charon:  13[NET] received packet: from 10.1.100.254[4500] to 10.1.100.178[4500] (476 bytes) 
09:50:47 charon:  13[ENC] parsed CREATE_CHILD_SA response 19 [ N(IPCOMP_SUP) SA No KE TSi TSr ] 
09:50:48 charon:  13[IKE] CHILD_SA VpnEssenFrankfurtTest{5} established with SPIs c4a8cd3a_i c950c19c_o and TS 10.1.34.0/24 === 10.2.34.0/24  
09:50:48 charon:  13[IKE] CHILD_SA VpnEssenFrankfurtTest{5} established with SPIs c4a8cd3a_i c950c19c_o and TS 10.1.34.0/24 === 10.2.34.0/24  
09:50:48 charon:  13[IKE] CHILD_SA rekey collision lost, deleting rekeyed child 
09:50:48 charon:  13[IKE] closing CHILD_SA VpnEssenFrankfurtTest{5} with SPIs c4a8cd3a_i (0 bytes) c950c19c_o (0 bytes) and TS 10.1.34.0/24 === 10.2.34.0/24  
09:50:48 charon:  13[IKE] closing CHILD_SA VpnEssenFrankfurtTest{5} with SPIs c4a8cd3a_i (0 bytes) c950c19c_o (0 bytes) and TS 10.1.34.0/24 === 10.2.34.0/24  
09:50:48 charon:  13[IKE] sending DELETE for ESP CHILD_SA with SPI c4a8cd3a 
09:50:48 charon:  13[ENC] generating INFORMATIONAL request 20 [ D ] 
09:50:48 charon:  13[NET] sending packet: from 10.1.100.178[4500] to 10.1.100.254[4500] (76 bytes) 
09:50:48 charon:  03[NET] received packet: from 10.1.100.254[4500] to 10.1.100.178[4500] (76 bytes) 
09:50:48 charon:  03[ENC] parsed INFORMATIONAL request 6 [ D ] 
09:50:48 charon:  03[IKE] received DELETE for ESP CHILD_SA with SPI c78986b4 
09:50:48 charon:  03[IKE] closing CHILD_SA VpnEssenFrankfurtTest{5} with SPIs c5afd499_i (0 bytes) c78986b4_o (0 bytes) and TS 10.1.34.0/24 === 10.2.34.0/24  
09:50:48 charon:  03[IKE] closing CHILD_SA VpnEssenFrankfurtTest{5} with SPIs c5afd499_i (0 bytes) c78986b4_o (0 bytes) and TS 10.1.34.0/24 === 10.2.34.0/24  
09:50:48 charon:  03[IKE] sending DELETE for ESP CHILD_SA with SPI c5afd499 
09:50:48 charon:  03[IKE] CHILD_SA closed 
09:50:48 charon:  03[IKE] detected CHILD_REKEY collision with CHILD_DELETE 
09:50:48 charon:  03[ENC] generating INFORMATIONAL response 6 [ D ] 
09:50:48 charon:  03[NET] sending packet: from 10.1.100.178[4500] to 10.1.100.254[4500] (76 bytes) 
09:50:48 charon:  05[NET] received packet: from 10.1.100.254[4500] to 10.1.100.178[4500] (76 bytes) 
09:50:48 charon:  05[ENC] parsed INFORMATIONAL response 20 [ D ] 
09:50:48 charon:  05[IKE] received DELETE for ESP CHILD_SA with SPI c950c19c 
09:50:48 charon:  05[IKE] CHILD_SA closed 


Although I am not a programmer, I dug around in the source code and might have found a relevant place. Maybe it's not working right here.

File: child_delete.c
static status_t destroy_and_reestablish(private_child_delete_t *this)
[...]
		/* signal child down event if we are not rekeying */
		if (!this->rekeyed)
		{
			charon->bus->child_updown(charon->bus, child_sa, FALSE);
		}
[...]



>> RETURN     all  --  10.1.34.0/24         10.2.34.0/24         policy match dir in pol ipsec reqid 1 proto 50
>> MARK       all  --  10.2.34.0/24         10.1.34.0/24         policy match dir out pol ipsec reqid 1 proto 50 MARK set 0x32

> Seems that there are some complex firewall rules involved, not sure if
> that is provided by your IPFire distribution or if these are your custom
> rules.

> As a work-around, you could consider using static policy matching IPsec
> rules (not for specific subnets) instead of those installed with updown.
> But I really don't know if that works for your specific rules.

I didn't change the firewall rules in the _updown script, so I think they were added by the IPFire developers.

I'd rather not fuss with the script, because it could get changed in the future and I don't know why certain things were added.


>> 1.) Is there any way to check wether rekey time randomization really works
>> as it should?

> Given that they are triggered randomly, I'd guess that works as
> expected/configured.

OK.


>> 2.) Could you tell me why I am getting CHILD_REKEY collisions every few
>> days? Is that to be expected?

> Depends on your rekey configuration, latencies etc. Not unlikely that
> these can happen, especially if you rekey every 40 minutes. You might
> consider adjusting your rekey interval, margin and fuzz to reduce
> collision probability.

I am using the default values. Network latencies are at about 20 ms.

lifetime=1h
margintime=9m (is not set)
rekeyfuzz=100% (is not set)

The Wiki states that rekey times are as following:
rekeytime_min = 1h - (9m + 9m) = 42m
rekeytime_max = 1h - (9m + 0m) = 51m


> Also you can use a shorter rekey time on one end, so rekeying is
> guaranteed to get initiated by the same end only. This should make it
> impossible that a collision happens if the options are chosen
> carefully. 

That's a great idea! I think I will go with the shortened rekey time on one end as long as the problem remains.


It would be great if you could take a look at the updown script problem. If you need any more information, I'll be glad to provide. Again, thank you for your help.


Kind regards

Andreas Braun



-----Ursprüngliche Nachricht-----
Von: Martin Willi [mailto:martin at strongswan.org]
Gesendet: Dienstag, 20. Januar 2015 15:24
An: andreas.braun at bagus.de
Cc: users at lists.strongswan.org
Betreff: Re: [strongSwan] Problem with rekey collisions


Hi Andreas,

> Now my problem is that tunnels between devices break every few days. The log
> states that strongSwan detects a CHILD_REKEY collision, then it resolves the
> collision, but no traffic is going through.

> I noticed that after a rekey collision, some iptables rules were gone. I
> still have to verify this in the production system.

Most likely we falsely invoke the updown script with a "down" event for
that rekey collision, as that CHILD_SA gets deleted (but actually is
still up with the "winning" instance). Probably we can fix that, but I
don't know how trivial it is.

> RETURN     all  --  10.1.34.0/24         10.2.34.0/24         policy match dir in pol ipsec reqid 1 proto 50
> MARK       all  --  10.2.34.0/24         10.1.34.0/24         policy match dir out pol ipsec reqid 1 proto 50 MARK set 0x32

Seems that there are some complex firewall rules involved, not sure if
that is provided by your IPFire distribution or if these are your custom
rules.

As a work-around, you could consider using static policy matching IPsec
rules (not for specific subnets) instead of those installed with updown.
But I really don't know if that works for your specific rules.

> 1.) Is there any way to check wether rekey time randomization really works
> as it should?

Given that they are triggered randomly, I'd guess that works as
expected/configured.

> 2.) Could you tell me why I am getting CHILD_REKEY collisions every few
> days? Is that to be expected?

Depends on your rekey configuration, latencies etc. Not unlikely that
these can happen, especially if you rekey every 40 minutes. You might
consider adjusting your rekey interval, margin and fuzz to reduce
collision probability.

Also you can use a shorter rekey time on one end, so rekeying is
guaranteed to get initiated by the same end only. This should make it
impossible that a collision happens if the options are chosen
carefully. 

> 3.) Any idea why the iptables rules get changed after rekey collision?

As said, most likely because we invoke that updown hook once too often
in that specific collision case.

Regards
Martin



More information about the Users mailing list