[strongSwan] Timeout on poor connection

Tue Oct 10 09:43:40 CEST 2017

You are welcome. In the StrongSwan website they documented a description about why to not set ikesa_table_size too high (they write hash table size depends number of cores in machine):

‎https://wiki.strongswan.org/projects/strongswan/wiki/IkeSaTable

Anvar Kuchkartaev 
anvar at anvartay.com 
  Original Message  
From: Stephen Scheck
Sent: lunes, 9 de octubre de 2017 06:39 p.m.
To: Noel Kuntze
Cc: Anvar Kuchkartaev; users at lists.strongswan.org
Subject: Re: [strongSwan] Timeout on poor connection

Based on a suggestion from Anvar (thanks!), I set the ikesa_table_size lower, to 512 or 256. After doing this, I’ve been unable to reproduce the issue with stuck connections. So, that seems to have solved it.

However, it would be useful to understand why 1024 seems to be a “tipping point” for strongSwan at which it becomes unstable.

> On Oct 5, 2017, at 12:28 PM, Stephen Scheck <sscheck.ssni at gmail.com> wrote:
> 
> The point about adjusting TCP MSS and MTU with iptables rules is well-taken for production deployments. But my scale test is running in a controlled environment, all Ethernet with no intervening provider networks or public internet, and stable, jumbo MTUs. So I’m certain it cannot be the cause of the issue I’m seeing.
> 
> I have this on both server (responder) and client (initiator) in charon.conf:
> 
> ikesa_table_segments = 16
> ikesa_table_size = 1024
> 
> Since the client is simulating connections from what would be many thousands of individual clients in a real situation, I did not think the following setting was relevant and did not apply it per the comments in the IKE SA table article.
> 
> reuse_ikesa = no
> 
> The test is being run between Fedora 25 boxes with a very recent kernel: 4.11.12-200.fc25.x86_64
> 
> Aside from DPD settings and keyingtries=%forever, are there any other settings which would help initially failed connections to keep retrying until they successfully establish? Or other settings which would need tuning for large-scale deployments? I’m no IKE/IPsec expert.
> 
> Thanks.
> 
>> On Oct 4, 2017, at 5:55 PM, Noel Kuntze <noel.kuntze+strongswan-users-ml at thermi.consulting> wrote:
>> 
>> You do not need to explicitely accept frag-needed. It is included in ctstate RELATED.
>> 
>> dpddelay sets the interval between dpd packets, not when dpdaction is taken.
>> dpdtimeout controls when the action is taken.
>> 
>> The firewall rules you mentioned are needed anyway and do not deserve the term optimization. Not using them commonly breaks scenarios,
>> and they are vital to having working tunnels.
>> 
>> strongSwan is specifically optimized for multi core CPUs. You probably have problems because the CPU scheduler moves the threads around a lot.
>> You can try working around that by tuning it(, upgrading your kernel hoping that it fixes that) or by changing the code to pin the threads to certain CPUs.
>> 
>> I hope you optimized the strongSwan settings to make efficient use of parallelism by using hashtables[1].
>> 
>> [1] https://wiki.strongswan.org/projects/strongswan/wiki/IkeSaTable
>> 
>> On 04.10.2017 08:55, Anvar Kuchkartaev wrote:
>>> TCPMSS parameters in firewall are required proper routing of tcp connections of client within the ipsec tunnel but:
>>> iptables -A INPUT -p icmp --icmp-type fragmentation-needed -j ACCEPT 
>>> 
>>> Rule can help to udp connections when mtu changes. The Same thing happened on me when connection from clients ISP being throttled and dropped silently. Use:
>>> 
>>> dpddelay=300s ‎
>>> ‎dpdaction=clear
>>> 
>>> On server side (this will check dead peers and remove them in every 300 seconds in your case if client disappears maximum after 300s he/she can connect, you might decrease 300s to find optimal time)
>>> 
>>> And use:
>>> 
>>> dpddelay=5s
>>> dpdaction=restart
>>> 
>>> On client side (if connection dropped client will check in each 5s and restart connection automatically if it drops)
>>> In this case server will drop connections if they completely disconnected within the 300s maximum and client will restart the connection in 5s if temporary failure occured due to packet loss.
>>> 
>>> Also adding mobike=yes into ipsec.conf connections and changing reuse_ikesa to yes in strongswan.d/charon.conf will help connection remain active even if ip changes or temporary disruptions (if client uses mobile 3G connection with high latency and low bandwith).
>>> 
>>> Anvar Kuchkartaev 
>>> anvar at anvartay.com 
>>> Original Message 
>>> From: Stephen Scheck
>>> Sent: martes, 3 de octubre de 2017 09:18 p.m.
>>> To: Anvar Kuchkartaev
>>> Cc: Jamie Stuart; users at lists.strongswan.org
>>> Subject: Re: [strongSwan] Timeout on poor connection
>>> 
>>> 
>>> Thanks for the configs.
>>> 
>>> I added the dpd* parameters to my configurations. My situation is a little different in that my traffic is primarily UDP, so the TCP MSS settings are not needed. I also need to use IKEv1. Furthermore, I’m running a scale test in which there’s low latency and plenty of bandwidth, which may nonetheless be saturated by the number of simultaneous connections which are being attempted.
>>> 
>>> Unfortunately, the dpd* parameters did not help. I still notice a small number (25-50) connections out of several thousand which fail to establish, and stay that way until the StrongSwans are restarted.
>>> 
>>> Does anybody know of any further parameters which may influence connection attempts and retries?
>>> 
>>> One thing that I’ve noted is that if I run both the client and server StrongSwan processes on single core machines, or with the StrongSwan threads pinned to a single CPU, the success rate is *decidedly better* than with multiple cores available (although, occasionally, even then a couple of them fail to establish and stay “stuck”).
>>> 
>>> I’m beginning to think there may be some troublesome concurrency bugs in the StrongSwan IKEv1 routines.
>>> 
>>> Any help appreciated!
>>> 
>>> 
>>> 
>>>> On Sep 30, 2017, at 7:14 PM, Anvar Kuchkartaev <anvar at anvartay.com> wrote:
>>>> 
>>>> ipsec.conf
>>>> 
>>>> keyexchange=ikev2
>>>> type=tunnel
>>>> dpdaction=clear
>>>> dpddelay=300s
>>>> rekey=yes
>>>> left=%any
>>>> right=%any
>>>> fragmentation=yes
>>>> compress=yes
>>>> 
>>>> parameters from server side and:
>>>> 
>>>> dpdtimeout=20s
>>>> dpddelay=5s
>>>> dpdaction=restart
>>>> 
>>>> from client side I think most important.
>>>> 
>>>> Also you have to do several server optimizations like:
>>>> 
>>>> 
>>>> firewall:
>>>> 
>>>> iptables -A INPUT -p esp -j ACCEPT
>>>> 
>>>> iptables -A INPUT -p udp -m multiport --dport 500,4500 -j ACCEPT
>>>> 
>>>> iptables -A INPUT -p icmp --icmp-type fragmentation-needed -j ACCEPT
>>>> 
>>>> iptables -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
>>>> 
>>>> sysctl.conf
>>>> 
>>>> net.ipv4.ip_forward_use_pmtu=1 (I assume you have done rest of the sysctl configurations like ip_forward, etc.)
>>>> 
>>>> 
>>>> 
>>>> On 30/09/17 19:37, Jamie Stuart wrote:
>>>>> Could you post your (redacted) strongswan config Anvar?
>>>>> 
>>>>>> On 30 Sep 2017, at 00:59, Anvar Kuchkartaev <anvar at anvartay.com> wrote:
>>>>>> 
>>>>>> I also have some clients connecting from central Asia where internet is very poor and restricted. The main optimizations must be done at the server os and firewall not in strongswan. In strongswan try to authenticate server with 2048 bit certificate or higher and watch out IKE ciphers, dos_protection, ikesa_table_size, ikesa_table_segments, ikesa_hashtable_size parameters. Allow only IKEv2 if possible and decrease dpd requests and set dpdaction=restart to restart connection automatically if tunnel fails. From operating system watch out mtu changes because in my case I had a lot of mtu decreases within the provider network in the region client located. Allow icmp fragmentation needed requests from firewall and make tcpmss optimizations. It is also recommended to install proxy server behind VPN server which only possible to connect within the VPN tunnel (so client could configure it's browser to proxy server to enhance connection stability).
>>>>>> 
>>>>>> Anvar Kuchkartaev
>>>>>> anvar at anvartay.com
>>>>>> Original Message
>>>>>> From: Jamie Stuart
>>>>>> Sent: viernes, 29 de septiembre de 2017 05:59 p.m.
>>>>>> To: users at lists.strongswan.org
>>>>>> Subject: [strongSwan] Timeout on poor connection
>>>>>> 
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> We have client (running on LEDE) connecting to a server (Ubuntu). The client is connecting from rural Africa of 2G/3G with high latency and low speed.
>>>>>> Often, the connection does not come up, timing out after 5 retracts like the log below:
>>>>>> 
>>>>>> 
>>>>>> ipsec up {connection}
>>>>>> initiating IKE_SA {connection}[2] to {serverip}
>>>>>> generating IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_S_IP) N(FRAG_SUP) N(HASH_ALG) N(REDIR_SUP)]
>>>>>> sending packet: from {clientip}[500] to {serverip}[500] (378 bytes)
>>>>>> retransmit 1 of request with message ID 0
>>>>>> sending packet: from {clientip}[500] to {serverip}[500] (378 bytes)
>>>>>> retransmit 2 of request with message ID 0
>>>>>> sending packet: from {clientip}[500] to {serverip}[500] (378 bytes)
>>>>>> retransmit 3 of request with message ID 0
>>>>>> sending packet: from {clientip}[500] to {serverip}[500] (378 bytes)
>>>>>> 
>>>>>> 
>>>>>> Is there anything more we can do to make the connection 1) establish more reliably 2) remain ’up’ even over a power quality connection (using MOBIKE already)
>>>>>> 
>>>>>> 
>>>>>> Thanks in advance!
>>>>>> 
>>>>>> Jamie, onebillion
>>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>