[strongSwan] Traffic Pauses to IKEv1 VPN with Juniper ISG 1000

Thomas Egerer hakke_007 at gmx.de
Mon Dec 14 08:52:29 CET 2015


On 12/14/2015 12:00 AM, Mahesh Neelakanta wrote:
> Thanks Thomas. I (sort of) understand the issue. A few followup questions:
> 
> 1) If the packets are delivered out-of-order. Is there any specific reason
> that they aren't just dropped so that the higher level tcp level would just
> re-send?
Yes, but usually TCP runs congestion control algorithms (unless you're using
some kind of satellite optimized version or so), and hence interprets packet
loss as congestion situation, which leads to drastic perfomance penalty.
> 
> 2) With regards to talking a look at the packet captures; aside from
> knowing that they arrived out of order, anything in particular I should be
> looking for?  I realize this might be too general a question so no worries
> if an answer isn't possible.
You might want to check packet sizes to see if there is a pattern. To my
knowledge, small packets usually have a lower priority. Maybe you can
recognize a pattern.
> 
> *On a positive note*, right after noticing that the issue was related to
> replay window (and before i received your explanation)  I did re-start
> strongswan with charon.replay_window = 0 (apparently that disables it) and
> so far the traffic has not paused over the tunnel.  I'll also try
> increasing the replay_window at the next opportunity.
Yes, setting the replay size to 0 completely disables replay detection.
So yet *another* positive note: you don't have to recompile the kernel,
we found the reason for the packet loss :)

> thanks again to both you & noel for answering on a weekend!

You're welcome!

> mahesh

Cheers,
Thomas
> 
> On Sun, Dec 13, 2015 at 5:18 PM, Thomas Egerer <hakke_007 at gmx.de> wrote:
> 
>> On 12/13/2015 10:18 PM, Mahesh Neelakanta wrote:
>>> Thomas, the vpn paused and I ran the IP spi command in looks like the
>>> replay-window keeps increasing. Any ideas what that means?
>> Yes, surely. The ip-xfrm framework uses a sliding window for replay
>> detection. This means only a certain number of packets (in your case 32)
>> less then the largest sequence number received are accepted. All packets
>> below that limit are dropped (increasing the replay-window counter).
>> This means your ESP-packets were reorderd and arrive in a different
>> order than they were sent. Depending on your underlying (encrypted)
>> traffic this can heal.
>> To take countermeasures, you may want to increase your replay window:
>> a) use the global charon.replay_window from strongswan.conf [1]
>> b) use the connection specific ipsec.conf option replay_window
>>   (available since strongswan 5.2.0) [2].
>> If this does not help, you can perform further investigation: take the
>> broken tunnel check if any inbound packets are received, or if all of
>> them are dropped. This can be done by running the iproute command 'ip -s
>> x s s spi <tunnel_spi>' prior to a 'tcpdump -w <output-file> -i
>> <interface>' for an arbitrary time (let's say one minute) and another
>> iproute2-command as described above *immediately* after tcpdump was
>> stopped. You can then analyze your received ESP-packets againts the
>> number of replay-window errors before and after the capturing. Also: all
>> inbound packets not dropped show up -- as Noel already pointed out -- in
>> plain text again, so you can compare your results from iproute to your
>> pcap file.
>>
>> Cheers,
>> Thomas
>>
>> [1] https://wiki.strongswan.org/projects/strongswan/wiki/StrongswanConf
>> [2] https://wiki.strongswan.org/projects/strongswan/wiki/ConnSection
>>>
>>>         proto esp spi 0xc6ff382c(3338614828) reqid 2(0x00000002) mode
>> tunnel
>>>         replay-window 32 seq 0x00000000 flag af-unspec (0x00100000)
>>>         auth-trunc hmac(sha1) 0xc609a31c3e5b7d6fa5267737c759fed017d2d6ea
>>> (160 bits) 96
>>>         enc cbc(aes) 0x4fba8977e230c1155780f03a19b90111 (128 bits)
>>>         lifetime config:
>>>           limit: soft (INF)(bytes), hard (INF)(bytes)
>>>           limit: soft (INF)(packets), hard (INF)(packets)
>>>           expire add: soft 3600(sec), hard 3600(sec)
>>>           expire use: soft 0(sec), hard 0(sec)
>>>         lifetime current:
>>>           676746342(bytes), 703241(packets)
>>>           add 2015-12-13 20:39:21 use 2015-12-13 20:39:21
>>>         stats:
>>> *          replay-window 533 replay 0 failed 0*
>>>
>>> On Sun, Dec 13, 2015 at 3:50 PM, Mahesh Neelakanta <neelakanta at gmail.com
>>>
>>> wrote:
>>>
>>>> Thanks Thomas. I was able to run the "ip" command but it does look like
>>>> (as you mentioned) that CONFIG_XFRM_STATISTICS is disabled (this is the
>>>> amazon ubuntu 12.04 AMI). I'll try a newer release of amazon's own
>> linux to
>>>> see if it has it installed before trying a kernel recompile. Right now
>> the
>>>> ip command shows no errors (but i've restarted vpn) so i'll await it to
>>>> hang again.
>>>>
>>>> ip -s x s s spi 0xc6ff382c
>>>>
>>>>         proto esp spi 0xc6ff382c(3338614828) reqid 2(0x00000002) mode
>>>> tunnel
>>>>         replay-window 32 seq 0x00000000 flag af-unspec (0x00100000)
>>>>         auth-trunc hmac(sha1) 0xc609a31c3e5b7d6fa5267737c759fed017d2d6ea
>>>> (160 bits) 96
>>>>         enc cbc(aes) 0x4fba8977e230c1155780f03a19b90111 (128 bits)
>>>>         lifetime config:
>>>>           limit: soft (INF)(bytes), hard (INF)(bytes)
>>>>           limit: soft (INF)(packets), hard (INF)(packets)
>>>>           expire add: soft 3600(sec), hard 3600(sec)
>>>>           expire use: soft 0(sec), hard 0(sec)
>>>>         lifetime current:
>>>>           183887577(bytes), 191174(packets)
>>>>           add 2015-12-13 20:39:21 use 2015-12-13 20:39:21
>>>> *        stats:*
>>>> *          replay-window 0 replay 0 failed 0*
>>>>
>>>>
>>>> On Sun, Dec 13, 2015 at 3:23 PM, Thomas Egerer <hakke_007 at gmx.de>
>> wrote:
>>>>
>>>>> Mahesh,
>>>>>
>>>>> run 'ip -s x s s spi <your_broken_inbound_spi' (as root) on your
>>>>> Linux-Box and check if your error statistics increase for the
>> particular :
>>>>> <snip>
>>>>>         stats:
>>>>>           replay-window 0 replay 0 failed 0
>>>>> <snap>
>>>>> Also: 'grep -vw 0 /proc/net/xfrm_stat' and check for increasing
>>>>> counters. You will probably have to rebuild your Linux-kernel for this,
>>>>> unless it has the CONFIG_XFRM_STATISTICS option enabled. If the file
>>>>> does exist you're lucky, if not -- like on current Debian systems --
>> you
>>>>> will have to recompile.
>>>>> The rationale behind this is that your inbound traffic gets dropped
>>>>> during inbound transformation. Reasons for this may vary: failed
>>>>> integrity checks, replay problems, failed inbound policy check etc.
>>>>>
>>>>> Cheers,
>>>>> Thomas
>>>>>
>>>>>
>>>>> On 12/13/2015 05:06 PM, Mahesh Neelakanta wrote:
>>>>>> Hi,
>>>>>>  I  have a Strongswan VPN server that is being used to terminate VPN
>>>>>> connections with multiple endpoints. Most of the existing endpoints
>> are
>>>>>> cisco, sophos, etc. Recently I have a Juniper ISG 1000 endpoint that
>> is
>>>>>> posing some intermittent traffic problems.
>>>>>>
>>>>>> The exact issue is that traffic over the VPN pauses after some
>> (random)
>>>>>> time. The tunnel itself is up and at the next rekey traffic starts
>>>>> flowing
>>>>>> again. If I reduce the re-key time from 3600s down to 600s, the
>> problem
>>>>> is
>>>>>> reduced significantly. I did verify with the remote side that their
>>>>> keylife
>>>>>> is 3600s. We do not have DPD enabled. There is constant traffic so
>> there
>>>>>> are no periods of inactivity.
>>>>>>
>>>>>> During the periods where traffic pauses, ipsec statusall report shows
>> no
>>>>>> more packets in bytes_i (whereas bytes_o is still increasing).
>>>>>>
>>>>>> Here is the config on our end (IPs and subnets have been changed for
>>>>>> security):
>>>>>>
>>>>>> config setup
>>>>>>    uniqueids = no
>>>>>>    charondebug = ike 2
>>>>>>
>>>>>> conn %default
>>>>>>    keyingtries=%forever
>>>>>>    dpdaction=none
>>>>>>
>>>>>> conn vpn-juniper-prd
>>>>>>         left=%defaultroute
>>>>>>         leftid=42.75.5.14 # Our actual local IP is  10.20.1.18, we are
>>>>>> NATed going out
>>>>>>         leftsubnet=5.22.11.21/32
>>>>>>         right=168.42.68.5
>>>>>>         rightid=168.42.68.5
>>>>>>         rightsubnet=12.23.0.0/16
>>>>>>         keyexchange=ikev1
>>>>>>         ikelifetime=28800s
>>>>>>         ike=aes128-sha1-modp1024
>>>>>>         esp=aes128-sha1-modp1024
>>>>>>         keylife=3600m
>>>>>>         type=tunnel
>>>>>>         compress=no
>>>>>>         authby=secret
>>>>>>         auto=start
>>>>>>
>>>>>> Notice that the last "bytes_i" shows 145s ago (ipsec statusall
>> output):
>>>>>>
>>>>>> vpn-juniper-prd:  %any...168.42.68.5  IKEv1
>>>>>> vpn-juniper-prd:   local:  [42.75.5.14] uses pre-shared key
>>>>> authentication
>>>>>> vpn-juniper-prd:   remote: [168.42.68.5] uses pre-shared key
>>>>> authentication
>>>>>> vpn-juniper-prd:   child:  5.22.11.21/32 === 12.23.0.0/16 TUNNEL
>>>>>> vpn-juniper-prd[1]: ESTABLISHED 110 minutes ago,
>>>>>> 10.20.1.18[42.75.5.14]...168.42.68.5[168.42.68.5]
>>>>>> vpn-juniper-prd[1]: IKEv1 SPIs: a8ed9dd3b567a578_i*
>> 97dbd6dbb3683aa4_r,
>>>>>> pre-shared key reauthentication in 5 hours
>>>>>> vpn-juniper-prd[1]: IKE proposal:
>>>>>> AES_CBC_128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024
>>>>>> vpn-juniper-prd{44}:  REKEYED, TUNNEL, reqid 4, expires in 10 minutes
>>>>>> vpn-juniper-prd{44}:   5.22.11.21/32 === 12.23.0.0/16
>>>>>> vpn-juniper-prd{52}:  INSTALLED, TUNNEL, reqid 4, ESP SPIs: c3fdc693_i
>>>>>> 9d90fe7f_o
>>>>>> vpn-juniper-prd{52}:  AES_CBC_128/HMAC_SHA1_96, 24197112 bytes_i
>> *(26366
>>>>>> pkts, 145s ago*), 8889197 bytes_o (31780 pkts, 0s ago), rekeying in 10
>>>>>> minutes
>>>>>> vpn-juniper-prd{52}:   5.22.11.21/32 === 12.23.0.0/16
>>>>>>
>>>>>> During that time, we still see packets going in/out via the eth0
>>>>> interface :
>>>>>>
>>>>>> 03:38:52.349565 IP 10.20.1.18 > 168.42.68.5:
>>>>>> ESP(spi=0x9d90fe7f,seq=0x7c0f), length 132
>>>>>> 03:38:52.363916 IP 168.42.68.5 > 10.20.1.18:
>>>>>> ESP(spi=0xc3fdc693,seq=0x5cd3), length 132
>>>>>> 03:38:52.548261 IP 168.42.68.5 > 10.20.1.18:
>>>>>> ESP(spi=0xc3fdc693,seq=0x5cd4), length 100
>>>>>> 03:38:52.564198 IP 168.42.68.5 > 10.20.1.18:
>>>>>> ESP(spi=0xc3fdc693,seq=0x5cd5), length 100
>>>>>> 03:38:53.357693 IP 10.20.1.18 > 168.42.68.5:
>>>>>> ESP(spi=0x9d90fe7f,seq=0x7c10), length 132
>>>>>> 03:38:53.371666 IP 168.42.68.5 > 10.20.1.18:
>>>>>> ESP(spi=0xc3fdc693,seq=0x5cd6), length 132
>>>>>> 03:38:54.365616 IP 10.20.1.18 > 168.42.68.5:
>>>>>> ESP(spi=0x9d90fe7f,seq=0x7c11), length 132
>>>>>> 03:38:54.379533 IP 168.42.68.5 > 10.20.1.18:
>>>>>> ESP(spi=0xc3fdc693,seq=0x5cd7), length 132
>>>>>> 03:38:55.250707 IP 168.42.68.5 > 10.20.1.18:
>>>>>> ESP(spi=0xc3fdc693,seq=0x5cd8), length 100
>>>>>> 03:38:55.373593 IP 10.20.1.18 > 168.42.68.5:
>>>>>> ESP(spi=0x9d90fe7f,seq=0x7c12), length 132
>>>>>> 03:38:55.387695 IP 168.42.68.5 > 10.20.1.18:
>>>>>> ESP(spi=0xc3fdc693,seq=0x5cd9), length 132
>>>>>>
>>>>>>
>>>>>> thanks,
>>>>>> mahesh
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at lists.strongswan.org
>>>>>> https://lists.strongswan.org/mailman/listinfo/users
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at lists.strongswan.org
>>>>> https://lists.strongswan.org/mailman/listinfo/users
>>>>>
>>>>
>>>>
>>>
>>
>>
>>
> 



More information about the Users mailing list