[strongSwan] Throughput on high BDP networks

Mon Jun 1 17:25:00 CEST 2015

> On June 1, 2015 at 3:49 AM Martin Willi <martin at strongswan.org> wrote:
>
>
> Hi,
>
> > I can see the multiple kworker threads spread across all 12 cores in
> > these fairly high powered systems but I am still dropping packets and
> > performance is not much improved.
>
> If all your cores are processing traffic, then pcrypt probably works as
> it should.
>
> What does "fairly high powered system" mean? What is the raw crypto
> throughput with AES-GCM you can expect on these boxes? Have you
> benchmarked UDP traffic to see where the processing limit is?
>
> > I tried to set this up so it would work at boot [...] and it causes a
> > kernel panic as soon as we attempt to send traffic through the tunnel -
> > every single time.
>
> Most likely a bug in your kernel. The panic details might help to track
> this down, but you probably should report this issue to your distro or a
> kernel mailing list.
>
> Regards
> Martin
>
Thank you for the suggestion of testing with UDP.  Here is a nuttcp test where I
activated IPSec in the middle of the test:
root at gwhq-2:/etc# nuttcp -T 600 -i 10 -R 850m -u 99.193.69.199
 1013.1221 MB /  10.00 sec =  849.8669 Mbps   169 / 1037606 ~drop/pkt 0.01629
~%loss
 1013.0703 MB /  10.00 sec =  849.8240 Mbps   209 / 1037593 ~drop/pkt 0.02014
~%loss
 1013.0967 MB /  10.00 sec =  849.8455 Mbps   193 / 1037604 ~drop/pkt 0.01860
~%loss
 1013.2402 MB /  10.00 sec =  849.9684 Mbps    43 / 1037601 ~drop/pkt 0.00414
~%loss
 1013.2217 MB /  10.00 sec =  849.9526 Mbps    57 / 1037596 ~drop/pkt 0.00549
~%loss
 1013.2031 MB /  10.00 sec =  849.9347 Mbps    73 / 1037593 ~drop/pkt 0.00704
~%loss
  992.7129 MB /  10.00 sec =  832.7510 Mbps 12560 / 1029098 ~drop/pkt  1.22
~%loss
  928.7295 MB /  10.00 sec =  779.0736 Mbps 84796 / 1035815 ~drop/pkt  8.19
~%loss
  925.6387 MB /  10.00 sec =  776.4829 Mbps 94448 / 1042302 ~drop/pkt  9.06
~%loss
  854.1211 MB /  10.00 sec =  716.4881 Mbps 164101 / 1038721 ~drop/pkt 15.80
~%loss
  883.2725 MB /  10.00 sec =  740.9418 Mbps 127304 / 1031775 ~drop/pkt 12.34
~%loss
  818.1533 MB /  10.00 sec =  686.3183 Mbps 201399 / 1039188 ~drop/pkt 19.38
~%loss
  868.4219 MB /  10.00 sec =  728.4850 Mbps 146718 / 1035982 ~drop/pkt 14.16
~%loss
  927.4893 MB /  10.00 sec =  778.0309 Mbps 93307 / 1043056 ~drop/pkt  8.95
~%loss
  853.7568 MB /  10.00 sec =  716.1862 Mbps 157886 / 1032133 ~drop/pkt 15.30
~%loss
  904.3838 MB /  10.00 sec =  758.6524 Mbps 115361 / 1041450 ~drop/pkt 11.08
~%loss

Even at these rates, the CPU did not appear to be very busy.  We had one at 85%
occupied but that was the one running nuttcp.  We have seen these boxes pass
almost 20 Gbps with single digit utilization so they have plenty of horsepower.
 We are also running haveged on them to prevent entropy starvation for the
encryption.

Activating pcrypt did have a positive effect in this case.  Here are results:
root at gwhq-2:/etc# nuttcp -T 600 -i 10 -R 930m -u 99.193.69.199
  971.4111 MB /  10.00 sec =  814.8777 Mbps  3874 / 998599 ~drop/pkt  0.39
~%loss
 1084.8506 MB /  10.00 sec =  910.0390 Mbps  4963 / 1115850 ~drop/pkt  0.44
~%loss
 1085.2539 MB /  10.00 sec =  910.3768 Mbps  5433 / 1116733 ~drop/pkt  0.49
~%loss
 1085.4424 MB /  10.00 sec =  910.5346 Mbps  4703 / 1116196 ~drop/pkt  0.42
~%loss
 1086.0830 MB /  10.00 sec =  911.0728 Mbps  3942 / 1116091 ~drop/pkt  0.35
~%loss
 1086.6123 MB /  10.00 sec =  911.5165 Mbps  3939 / 1116630 ~drop/pkt  0.35
~%loss
 1086.5000 MB /  10.00 sec =  911.4225 Mbps  3925 / 1116501 ~drop/pkt  0.35
~%loss

I also noticed that all our drops were on the sending system but on the RX queue
so I don't think these drops are related to encryption overhead.  If we could
not keep up with encryption, wouldn't those show up as TX drops or aborts?

So, I'm still a bit mystified about why our TCP tests cannot get over 421 Mbps
even though there are no retransmissions.  What else may be bottlenecking?

Thanks - John