[strongSwan] Throughput on high BDP networks

John A. Sullivan III jsullivan at opensourcedevel.com
Thu Jun 4 01:44:25 CEST 2015


On Wed, 2015-06-03 at 15:51 -0400, John A. Sullivan III wrote:
> On Tue, 2015-06-02 at 22:23 -0400, jsullivan at opensourcedevel.com wrote:
> > > On June 1, 2015 at 11:48 AM Martin Willi <martin at strongswan.org> wrote:
> > >
> > >
> > >
> > > > Even at these rates, the CPU did not appear to be very busy. We had one at
> > > > 85%
> > > > occupied but that was the one running nuttcp.
> > >
> > > On the outgoing path, the Linux kernel usually accounts ESP encryption
> > > under the process that sends traffic using a socket send() call. So
> > > these 85% probably include AES-GCM.
> > >
> > > On the receiving or forwarding path, you'll have to look at the software
> > > interrupt usage (si in top).
> > >
> > > > We have seen these boxes pass almost 20 Gbps with single digit
> > > > utilization so they have plenty of horsepower.
> > >
> > > That does not have to mean much. Its all about encryption, and that is
> > > rather expensive. If you have specialized hardware, this most likely
> > > means it is good at shuffling data over the network, but might be
> > > underpowered when it has to do encryption in software.
> > >
> > > > We are also running haveged on them to prevent entropy starvation for the
> > > > encryption.
> > >
> > > Only the key exchange needs entropy, raw AES-GCM does not.
> > >
> > > Regards
> > > Martin
> > >
> >  
> > Hello, all.  Still battling this problem.  The system is a SuperMicro server and
> > not a specialized device.  It looks like the problem may be software interrupts
> > but on the sending side so I am very curious about the recommendation to check
> > the receiving side.  In sending only 100 Mbps or so, I see a single CPU pegged
> > at 100% for si.  I wondered if it might be the number of ACK packets being
> > returned from the other side.  We have a large window size so we get a flood of
> > ACK packets in reply send but that really doesn't seem to make sense.  But it
> > would explain why we see this more in TCP than UDP . . . . or so we thought.  I
> > then sent 200 Mbps of UDP traffic so virtually no reply traffic and the sender
> > was still at 100% si.  What might be generating such a huge number of software
> > interrupts and how can we reduce them or spread them across multiple processors?
> > 
> <snip>
> We appear to be chasing a compound problem perhaps also involving
> problems with GRE.  As we try to isolate components, one issue we see is
> TCP Window size.  For some reason, even though the w/rmem_max and tcp
> have maximum values over 16M, we are not achieving a TCP Window size
> much larger than 4M when we add IPSec to the mix.  Not only does this
> seem to be the case when we are using IPSec only but, if we add a GRE
> tunnel (to make it a little easier to do a packet trace), with GRE only,
> we see the TCP window size go to the full 16M (but we have a problem
> with packet drops).  When we add IPSec (GRE/IPSec), the packet drops
> magically go away (perhaps due to the lower throughput) but the TCP
> Window size stays stuck at that 4M level.
> 
> What would cause this and how do we get the full sized TCP Window inside
> an IPSec transport stream? Thanks - John
> 
<snip>
The second part of the compound problem appears to be caused by
increasing the number of individual flows in the ESP transport stream.
If I run a nuttcp at 400 Mbps on the above test link, I sustain that
throughput with a single stream and there is virtually no packet loss.

If I then run the same test with 9 flows, Performance drops to around
250 Mbps and the transmitted segment count is over 11,000 on our 120s
test.

CPU is near nil and the si is in single digits on any one processor.

Any idea why performance would degrade with the same requested
throughput spread across multiple flows and how we fix it? Thanks - John



More information about the Users mailing list