[strongSwan] High charon CPU usage + stale SPIs in kernel

Jiri Horky jiri.horky at gmail.com
Thu Sep 18 11:39:46 CEST 2014


Forgot to note the configuration in this thread as well:

The configuration looks like follows:

[root at ams05-004 127.0.0.1-2014-09-17-21:36:11]# cat
/etc/strongswan/ipsec.conf
config setup
        uniqueids=no

conn %default
        ikelifetime=180m
        keylife=90m
        rekeymargin=5m
        keyingtries=%forever
        keyexchange=ikev2
        mobike=no
        reauth=no
        dpdaction=restart
        closeaction=restart

include /etc/strongswan/ipsec.d/*.conf

One of the connection file looks like this:
conn tch017
        left=5.45.X.Y
        right=5.45.Z.W
        authby=secret
        type=transport
        auto=route


Also, looking at one of the affected node on the left side of the
tunnel(tch019 == 5.45.C.D host), the charon log is flooded by similar
messages (only one particular connection grepped out):

Sep 18 11:25:19 08[IKE] <jsc065|31627> received DELETE for ESP CHILD_SA
with SPI c87d949a
Sep 18 11:25:19 08[IKE] <jsc065|31627> CHILD_SA closed
Sep 18 11:25:19 08[IKE] <jsc065|31627> establishing CHILD_SA jsc065
Sep 18 11:25:19 08[ENC] <jsc065|31627> generating CREATE_CHILD_SA
request 23257 [ N(USE_TRANSP) SA No TSi TSr ]
Sep 18 11:25:19 08[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (348 bytes)
Sep 18 11:25:19 04[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (348 bytes)
Sep 18 11:25:19 04[ENC] <jsc065|31627> parsed CREATE_CHILD_SA request
23216 [ N(USE_TRANSP) SA No TSi TSr ]
Sep 18 11:25:19 04[IKE] <jsc065|31627> CHILD_SA jsc065{65} established
with SPIs c7d624c8_i c48ca083_o and TS 5.45.C.D/32 === 5.45.A.X/32
Sep 18 11:25:19 04[ENC] <jsc065|31627> generating CREATE_CHILD_SA
response 23216 [ N(USE_TRANSP) SA No TSi TSr ]
Sep 18 11:25:19 04[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (204 bytes)
Sep 18 11:25:19 10[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (204 bytes)
Sep 18 11:25:19 10[ENC] <jsc065|31627> parsed CREATE_CHILD_SA response
23257 [ N(USE_TRANSP) SA No TSi TSr ]
Sep 18 11:25:19 10[IKE] <jsc065|31627> CHILD_SA jsc065{65} established
with SPIs c6b2b2ef_i c1fb6e97_o and TS 5.45.C.D/32 === 5.45.A.X/32
Sep 18 11:25:19 10[ENC] <jsc065|31627> generating INFORMATIONAL request
23258 [ ]
Sep 18 11:25:19 10[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (76 bytes)
Sep 18 11:25:19 08[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (76 bytes)
Sep 18 11:25:19 08[ENC] <jsc065|31627> parsed INFORMATIONAL request
23217 [ ]
Sep 18 11:25:19 08[ENC] <jsc065|31627> generating INFORMATIONAL response
23217 [ ]
Sep 18 11:25:19 08[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (76 bytes)
Sep 18 11:25:19 10[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (76 bytes)
Sep 18 11:25:19 10[ENC] <jsc065|31627> parsed INFORMATIONAL response
23258 [ ]
Sep 18 11:25:19 10[IKE] <jsc065|31627> closing expired CHILD_SA
jsc065{65} with SPIs c3bca161_i c076627e_o and TS 5.45.C.D/32 ===
5.45.A.X/32
Sep 18 11:25:19 10[IKE] <jsc065|31627> sending DELETE for ESP CHILD_SA
with SPI c3bca161
Sep 18 11:25:19 10[IKE] <jsc065|31627> scheduling CHILD_SA recreate
after hard expire
Sep 18 11:25:19 10[ENC] <jsc065|31627> generating INFORMATIONAL request
23259 [ D ]
Sep 18 11:25:19 10[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (76 bytes)
Sep 18 11:25:19 05[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (76 bytes)
Sep 18 11:25:19 05[ENC] <jsc065|31627> parsed INFORMATIONAL request
23218 [ ]
Sep 18 11:25:19 05[ENC] <jsc065|31627> generating INFORMATIONAL response
23218 [ ]
Sep 18 11:25:19 05[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (76 bytes)
Sep 18 11:25:19 03[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (76 bytes)
Sep 18 11:25:19 03[ENC] <jsc065|31627> parsed INFORMATIONAL response
23259 [ D ]
Sep 18 11:25:19 03[IKE] <jsc065|31627> received DELETE for ESP CHILD_SA
with SPI c076627e
Sep 18 11:25:19 03[IKE] <jsc065|31627> CHILD_SA closed
Sep 18 11:25:19 03[IKE] <jsc065|31627> establishing CHILD_SA jsc065
Sep 18 11:25:19 03[ENC] <jsc065|31627> generating CREATE_CHILD_SA
request 23260 [ N(USE_TRANSP) SA No TSi TSr ]
Sep 18 11:25:19 03[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (348 bytes)
Sep 18 11:25:19 08[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (348 bytes)
Sep 18 11:25:19 08[ENC] <jsc065|31627> parsed CREATE_CHILD_SA request
23219 [ N(USE_TRANSP) SA No TSi TSr ]

So it seems that it keeps recreating CHILD_SA over and over. Did I hit
any kind of race condition? What could cause that? Restarting of
strongswan on tch019 helps to resolve the problem temporarily, but it
reappears on some other node.

Regards
Jirka H.
 
On 09/18/2014 10:25 AM, Jiri Horky wrote:
> Hi list,
>
> we have a rather specific setup with 102x102 host2host tunnels, each
> host from left side conneting to all 102 hosts on the other side. We use
> strongswan-5.2.0-2, we are on CentOS 6.5 with long term support kernel
> v3.14.13. 
>
> The problem we are seeing is that from time to time, a lot of SPIs are created:
>
> [root at ams02-001 ~]# ip xfrm state | grep -c ^src
> 7857
>
> Instead of expected ~ 102*2=204 SPIs. Also, we see a alot. Usually,
>
> Some of the SPIs are in a stale state, on the affected nodes, we see this in charon.log:
>
>
> Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI c663d950 failed: No such process (3)
> Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI c5265bed failed: No such process (3)
> Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI cea44e84 failed: No such process (3)
> Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI c218d962 failed: No such process (3)
> Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI c08f7af3 failed: No such process (3)
> Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI ca3eb959 failed: No such process (3)
> Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI ccd82ddf failed: No such process (3)
> Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI c5e8e50d failed: No such process (3)
> Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI c7c6f94f failed: No such process (3)
> Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI cb94f74a failed: No such process (3)
> Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI c2ebe4df failed: No such process (3)
> Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI cc64998e failed: No such process (3)
> Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI c68e87ce failed: No such process (3)
> Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI cc044f83 failed: No such process (3)
> Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI c38d0c03 failed: No such process (3)
> Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI c433c8d0 failed: No such process (3)
> Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI cd2d20c0 failed: No such process (3)
> Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI c01a3729 failed: No such process (3)
> Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI cb6871db failed: No such process (3)
> Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI c5425a67 failed: No such process (3)
> Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI c546ed9d failed: No such process (3)
> Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI cab7ddd0 failed: No such process (3)
> Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI c86040c9 failed: No such process (3)
> Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI cc54935a failed: No such process (3)
> Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI cb459320 failed: No such process (3)
> Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI c471d8de failed: No such process (3)
> Sep 18 10:15:57 06[KNL] <tch059|50061> querying SAD entry with SPI c4e7124f failed: No such process (3)
> Sep 18 10:15:57 06[KNL] <tch059|50061> querying SAD entry with SPI c2c4b801 failed: No such process (3)
>
> Some of the point to point assosiations are there multiple times:
>
> ip xfrm state  | grep ^src | sort | uniq -c | sort -n
>
> [root at ams02-001 ~]# ip xfrm state  | grep ^src | sort | uniq -c | sort -rn | head -n10
>    3528 src 5.45.A.B dst 5.45.C.D
>    3528 src 5.45.C.D dst 5.45.A.B
>      88 src 5.45.A.B dst 5.45.E.F
>      88 src 5.45.E.F dst 5.45.A.B
>
> This might related to the kernel crashes we see which I described in thread "Occasional kernel crash at __xfrm_state_lookup".
>
> Any help would be appreciated.
>
> Jiri Horky
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.strongswan.org/pipermail/users/attachments/20140918/82090157/attachment-0001.html>


More information about the Users mailing list