[strongSwan] parallel ipsec processing / strongswan performance

Sun Mar 20 16:42:33 CET 2016

Hello,

I' would be grateful if someone could explain me how to enable
parallel ipsec processing and improve strongswan performance.

I have a testing system on AWS:
CentOS 7

$ uname -r
        3.10.0-327.10.1.el7.x86_64

$ yum info
        Installed Packages
        Name        : strongswan
        Arch        : x86_64
        Version     : 5.3.2
        Release     : 1.el7
        Size        : 2.9 M
        Repo        : installed
        From repo   : epel

strongswan --version
        Linux strongSwan U5.3.2/K3.10.0-327.10.1.el7.x86_64

On this server the load-tester responder is configured:
/etc/strongswan/ipsec.conf:
        config setup
                charondebug=3
                strictcrlpolicy=yes
                uniqueids = never

        conn %default
                # default 3h
                ikelifetime=120m
                # IPsec SA expires, default 1h
                keylife=60m
                # Time before SA expiry the rekeying should start
                rekeymargin=3m
                # how many attempts, default 3
                keyingtries=1

        conn ikev2-with-eap-loadtest
                keyexchange=ikev2
                leftsubnet=0.0.0.0/0
                leftfirewall=yes
                leftid="CN=srv, OU=load-test, O=strongSwan"
                leftauth=pubkey
                leftcert=resp.pem
                right=%any
                rightsourceip=10.0.0.0/9
                rightsendcert=never
                rightauth=eap-mschapv2
                eap_identity=%any
                auto=add

/etc/strongswan/strongswan.d/charon.conf:
        charon {
            dos_protection = no
            half_open_timeout = 30
            ikesa_table_segments = 2560
            ikesa_table_size = 32
            reuse_ikesa = no
            threads = 5120
            processor {
                priority_threads {
                        high = 2
                        medium = 4
                }
            }
        }
         ...

As you can see there are ikesa table tuning (it is supposed to support
50K connection), disabled ddos, increased threads number, etc. like
advised at:
* https://wiki.strongswan.org/projects/strongswan/wiki/IkeSaTable
* https://wiki.strongswan.org/projects/strongswan/wiki/JobPriority
* https://wiki.strongswan.org/projects/strongswan/wiki/ExpiryRekey

On a client side (a few boxes with debian with strongSwan
U5.1.2/K3.13.0-74-generic)

/etc/strongswan.d/charon/load-tester.conf
        load-tester {
            child_rekey = 1200
            delay = 500
            delete_after_established = no
            dpd_delay = 0
            eap_password = testpwd
            enable = yes
            ike_rekey = 0
            init_limit = 500000
            initiator_auth = eap-mschap
            initiator_id = loadtest-%d
            issuer_cert = /etc/ipsec.d/cacerts/cacert.pem
            ca_dir = /etc/ipsec.d/cacerts/
            load = yes
            mode = tunnel
            proposal = aes128-sha1-modp2048
            request_virtual_ip = yes
            responder = 172.31.128.6
            responder_auth = pubkey
            shutdown_when_complete = yes
            version = 0
            addrs {
            }
        }

charon.conf:
        charon {
            threads = 10240 (
            ...

and tests run like this:
# ipsec load-tester initiate 7000 200

I've reviewed this document
https://www.strongswan.org/docs/Steffen_Klassert_Parallelizing_IPsec.pdf,
and tryed pcrypt module (from here:
https://wiki.strongswan.org/projects/strongswan/wiki/Pcrypt):

$ sudo modprobe pcrypt
$ sudo modprobe tcrypt alg="pcrypt(authenc(hmac(sha1),cbc(aes)))" type=3

the last is fail (as mentioned at link above).
And I believe it correspond to a load-tester configuration:
`aes128-sha1-modp2048`.

Also I made some net tuning:
$ sudo  ip link set dev eth0 txqueuelen 2000
$ sudo sysctl -w net.core.somaxconn=2048
$ sudo sysctl -w net.core.rmem_max=16777216
$ sudo sysctl -w net.core.netdev_budget=600

So the situation is: I run load-tester initiators on 2-4 server
clients and look at server statistics.
1) all interrupts are at one CPU:
$ cat /proc/interrupts | grep -E 'CPU|eth0'

           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
      CPU6       CPU7

 98:     246344          0          0          0          0          0
         0          0   xen-dyn-event     eth0

2) Htop's shows that only one charon thread at one CPU is busy (100%),
all others are lulled.

3) I've played with number of threads, from 128 up to 5120, with
interval of SA initiation, with backend where users are stored (from
plaintext ipsec.secret to RADIUS with SQL), with instance types of the
server - but all results quite the same: only one CPU is working and
regardless of settings at about **3500** concurrent connection query
of half-opened SA started to grow and clients started to have
retransmissions issues. On server side it looks like:

$ sudo swanctl  -S
        uptime: 18 minutes, since Mar 20 14:40:34 2016
        worker threads: 5120 total, 0 idle, working: 6/2/2558/2554
        job queues: 0/0/56/1732
        jobs scheduled: 19203
        IKE_SAs: 9764 total, 2845 half-open
        mallinfo: sbrk 272531456, mmap 528384, used 238963392, free 33568064

$ sudo netstat -s
Ip:
    180188 total packets received
    0 forwarded
    0 incoming packets discarded
    180187 incoming packets delivered
    90058 requests sent out
    16 dropped because of missing route
...

Udp:
    173971 packets received
    1235 packets to unknown port received.
    443 packet receive errors
    86006 packets sent
    0 receive buffer errors
    0 send buffer errors

$ ip -s link
...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc pfifo_fast
state UP mode DEFAULT qlen 2000
    link/ether 06:47:5b:74:9c:65 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    43746734   180568   0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    33184525   90883    0       0       0       0

and at log I have messages like this:
        Mar 20 15:18:49 ip-172-31-128-6 charon: 4739[MGR] ignoring
request with ID 6, already processing
        Mar 20 15:18:49 ip-172-31-128-6 charon: 4739[MGR] ignoring
request with ID 6, already processing
        Mar 20 15:18:49 ip-172-31-128-6 charon: 4739[MGR] ignoring
request with ID 6, already processing
        Mar 20 15:18:49 ip-172-31-128-6 charon: 4739[MGR] ignoring
request with ID 6, already processing
        Mar 20 15:18:49 ip-172-31-128-6 charon: 4739[MGR] ignoring
request with ID 6, already processing
        Mar 20 15:18:49 ip-172-31-128-6 charon: 4739[MGR] ignoring
request with ID 6, already processing
        Mar 20 15:18:49 ip-172-31-128-6 charon: 4739[MGR] ignoring
request with ID 6, already processing
        Mar 20 15:18:49 ip-172-31-128-6 charon: 4739[MGR] ignoring
request with ID 6, already processing
        Mar 20 15:18:49 ip-172-31-128-6 charon: 4739[MGR] ignoring
request with ID 6, already processing

So the questions are how to enable multi CPU support and how to break
this 3.5K users limit? I mean I can have e.g. 12K users, but from 3-4K
users there is a significant performance issues, while the server
isn't really loaded.

Best Regarards,
Ruslan Kalakutsky