[strongSwan-dev] StrongSwan negotiated two pairs of IPsec SAs that lead to occasional connectivity issue

Sun Aug 2 23:33:03 CEST 2015

On Sat, Aug 1, 2015 at 2:14 PM, Ansis Atteka <aatteka at nicira.com> wrote:
> On Fri, Jul 31, 2015 at 7:35 PM, Ansis Atteka <aatteka at nicira.com> wrote:
>> We are seeing occasional connectivity issues caused by IPsec (either
>> by Linux Kernel IPsec stack or StrongSwan). At the time of seeing this
>> connectivity issue I captured output of:
>> 1) ip -s xfrm state
>> 2) ip -s xfrm policy
>> 3) ipsec statusall
>>
>> Raw output of those commands is in the attachment (host .148 and
>> .149). After looking into the 'ip xfrm" output I decided to create a
>> shell script that would manually restore Linux Kernel IPsec
>> configuration to the same state that strongSwan pushed to it. This way
>> I was able to reproduce this bug 100% of the time (see scripts
>> spoofer_on_148 and spoofer_on_149 that restore XFRM state in the Linux
>> kernel to what strongSwan pushed).
>>
>> Basically the sympthoms of this bug are:
>> 1) It goes away on IKE_SA rekey
>> 2) And for one IPsec SA bytes_o remains set to 0 while for the other
>> SA bytes_i remains set to 0 (like if both SAs are being partially
>> used):
>> 192.168.2.149{1}:  AES_CBC_128/HMAC_SHA1_96, *0 bytes_i*, 12871
>> bytes_o (111 pkts, 34s ago), rekeying in 33 minutes
>> 192.168.2.149{4}:  AES_CBC_128/HMAC_SHA1_96, 45023 bytes_i (724 pkts,
>> 0s ago), *0 bytes_o*, rekeying in 35 minutes
>>
>> Is this a known issue?
>
>
> After looking closer into the ip-xfrm policy and state dumps that are
> in the attachment, I have come to conclusion that this is a strongSwan
> bug.
>
> It looks like strongSwan incorrectly set reqid in IPsec policy. Either
> strongSwan should have:
> 1) on 192.168.2.148 installed IPsec policty with reqid=4 instead of reqid=1; OR
> 2) on 192.168.2.149 installed IPsec policy with reqid=5 instead of reqid=4.
>
> Since, my understanding is that reqid does not show up at IKEv2 wire
> protocol level, then is this connectivity bug a result of race
> condition where both sides tried to rekey at the same time?

It appears that this bug has been fixed sometime between strongswan
5.0.4 and 5.1.2.

The fix seems to be something among the lines where strongSwan would
install duplicate SAs with the *same reqid*:

aatteka at strongswan5_1_2 (works) $ sudo ip xfrm state | egrep "reqid|spi|10"
src 10.33.72.113 dst 10.33.75.235
 proto esp spi 0xc70af592 reqid 1 mode transport
 sel src 10.33.72.113/32 dst 10.33.75.235/32
src 10.33.75.235 dst 10.33.72.113
 proto esp spi 0xcaa1f843 reqid 1 mode transport
 sel src 10.33.75.235/32 dst 10.33.72.113/32
src 10.33.72.113 dst 10.33.75.235
 proto esp spi 0xc9199313 reqid 1 mode transport
 sel src 10.33.72.113/32 dst 10.33.75.235/32
src 10.33.75.235 dst 10.33.72.113
 proto esp spi 0xcc9b0254 reqid 1 mode transport
 sel src 10.33.75.235/32 dst 10.33.72.113/32

aatteka at strongswan5_0_4 (does not work):~# sudo ip xfrm state | egrep
"reqid|spi|192"
src 192.168.2.148 dst 192.168.2.149
 proto esp spi 0xc95f3530 reqid 1 mode transport
 sel src 192.168.2.148/32 dst 192.168.2.149/32
src 192.168.2.149 dst 192.168.2.148
 proto esp spi 0xc7659697 reqid 1 mode transport
 sel src 192.168.2.149/32 dst 192.168.2.148/32
src 192.168.2.148 dst 192.168.2.149
 proto esp spi 0xcb11dc75 reqid 3 mode transport
 sel src 192.168.2.148/32 dst 192.168.2.149/32
src 192.168.2.149 dst 192.168.2.148
 proto esp spi 0xc207dc11 reqid 3 mode transport
 sel src 192.168.2.149/32 dst 192.168.2.148/32

I glanced over git-log but it wasn't obvious to me which exactly
commit might have fixed this bug.

Could someone familiar with this bug point me to the GIT commit so
that we could evaluate whether we need to to uprade to 5.1.2 or simply
backport bug fix to 5.0.4?

Thanks a lot,
Ansis