<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Forgot to note the configuration in this thread as well:<br>
<br>
<pre wrap="">The configuration looks like follows:
[root@ams05-004 127.0.0.1-2014-09-17-21:36:11]# cat
/etc/strongswan/ipsec.conf
config setup
uniqueids=no
conn %default
ikelifetime=180m
keylife=90m
rekeymargin=5m
keyingtries=%forever
keyexchange=ikev2
mobike=no
reauth=no
dpdaction=restart
closeaction=restart
include <i class="moz-txt-slash"><span class="moz-txt-tag">/</span>etc/strongswan/ipsec.d<span class="moz-txt-tag">/</span></i>*.conf
One of the connection file looks like this:
conn tch017
left=5.45.X.Y
right=5.45.Z.W
authby=secret
type=transport
auto=route</pre>
<br>
Also, looking at one of the affected node on the left side of the
tunnel(tch019 == 5.45.C.D host), the charon log is flooded by
similar messages (only one particular connection grepped out):<br>
<br>
Sep 18 11:25:19 08[IKE] <jsc065|31627> received DELETE for ESP
CHILD_SA with SPI c87d949a<br>
Sep 18 11:25:19 08[IKE] <jsc065|31627> CHILD_SA closed<br>
Sep 18 11:25:19 08[IKE] <jsc065|31627> establishing CHILD_SA
jsc065<br>
Sep 18 11:25:19 08[ENC] <jsc065|31627> generating
CREATE_CHILD_SA request 23257 [ N(USE_TRANSP) SA No TSi TSr ]<br>
Sep 18 11:25:19 08[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (348 bytes)<br>
Sep 18 11:25:19 04[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (348 bytes)<br>
Sep 18 11:25:19 04[ENC] <jsc065|31627> parsed CREATE_CHILD_SA
request 23216 [ N(USE_TRANSP) SA No TSi TSr ]<br>
Sep 18 11:25:19 04[IKE] <jsc065|31627> CHILD_SA jsc065{65}
established with SPIs c7d624c8_i c48ca083_o and TS 5.45.C.D/32 ===
5.45.A.X/32 <br>
Sep 18 11:25:19 04[ENC] <jsc065|31627> generating
CREATE_CHILD_SA response 23216 [ N(USE_TRANSP) SA No TSi TSr ]<br>
Sep 18 11:25:19 04[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (204 bytes)<br>
Sep 18 11:25:19 10[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (204 bytes)<br>
Sep 18 11:25:19 10[ENC] <jsc065|31627> parsed CREATE_CHILD_SA
response 23257 [ N(USE_TRANSP) SA No TSi TSr ]<br>
Sep 18 11:25:19 10[IKE] <jsc065|31627> CHILD_SA jsc065{65}
established with SPIs c6b2b2ef_i c1fb6e97_o and TS 5.45.C.D/32 ===
5.45.A.X/32 <br>
Sep 18 11:25:19 10[ENC] <jsc065|31627> generating
INFORMATIONAL request 23258 [ ]<br>
Sep 18 11:25:19 10[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (76 bytes)<br>
Sep 18 11:25:19 08[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (76 bytes)<br>
Sep 18 11:25:19 08[ENC] <jsc065|31627> parsed INFORMATIONAL
request 23217 [ ]<br>
Sep 18 11:25:19 08[ENC] <jsc065|31627> generating
INFORMATIONAL response 23217 [ ]<br>
Sep 18 11:25:19 08[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (76 bytes)<br>
Sep 18 11:25:19 10[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (76 bytes)<br>
Sep 18 11:25:19 10[ENC] <jsc065|31627> parsed INFORMATIONAL
response 23258 [ ]<br>
Sep 18 11:25:19 10[IKE] <jsc065|31627> closing expired
CHILD_SA jsc065{65} with SPIs c3bca161_i c076627e_o and TS
5.45.C.D/32 === 5.45.A.X/32 <br>
Sep 18 11:25:19 10[IKE] <jsc065|31627> sending DELETE for ESP
CHILD_SA with SPI c3bca161<br>
Sep 18 11:25:19 10[IKE] <jsc065|31627> scheduling CHILD_SA
recreate after hard expire<br>
Sep 18 11:25:19 10[ENC] <jsc065|31627> generating
INFORMATIONAL request 23259 [ D ]<br>
Sep 18 11:25:19 10[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (76 bytes)<br>
Sep 18 11:25:19 05[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (76 bytes)<br>
Sep 18 11:25:19 05[ENC] <jsc065|31627> parsed INFORMATIONAL
request 23218 [ ]<br>
Sep 18 11:25:19 05[ENC] <jsc065|31627> generating
INFORMATIONAL response 23218 [ ]<br>
Sep 18 11:25:19 05[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (76 bytes)<br>
Sep 18 11:25:19 03[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (76 bytes)<br>
Sep 18 11:25:19 03[ENC] <jsc065|31627> parsed INFORMATIONAL
response 23259 [ D ]<br>
Sep 18 11:25:19 03[IKE] <jsc065|31627> received DELETE for ESP
CHILD_SA with SPI c076627e<br>
Sep 18 11:25:19 03[IKE] <jsc065|31627> CHILD_SA closed<br>
Sep 18 11:25:19 03[IKE] <jsc065|31627> establishing CHILD_SA
jsc065<br>
Sep 18 11:25:19 03[ENC] <jsc065|31627> generating
CREATE_CHILD_SA request 23260 [ N(USE_TRANSP) SA No TSi TSr ]<br>
Sep 18 11:25:19 03[NET] <jsc065|31627> sending packet: from
5.45.C.D[500] to 5.45.A.X[500] (348 bytes)<br>
Sep 18 11:25:19 08[NET] <jsc065|31627> received packet: from
5.45.A.X[500] to 5.45.C.D[500] (348 bytes)<br>
Sep 18 11:25:19 08[ENC] <jsc065|31627> parsed CREATE_CHILD_SA
request 23219 [ N(USE_TRANSP) SA No TSi TSr ]<br>
<br>
So it seems that it keeps recreating CHILD_SA over and over. Did I
hit any kind of race condition? What could cause that? Restarting of
strongswan on tch019 helps to resolve the problem temporarily, but
it reappears on some other node.<br>
<br>
Regards<br>
Jirka H.<br>
<br>
<div class="moz-cite-prefix">On 09/18/2014 10:25 AM, Jiri Horky
wrote:<br>
</div>
<blockquote cite="mid:541A96E8.2050506@gmail.com" type="cite">
<pre wrap="">Hi list,
we have a rather specific setup with 102x102 host2host tunnels, each
host from left side conneting to all 102 hosts on the other side. We use
strongswan-5.2.0-2, we are on CentOS 6.5 with long term support kernel
v3.14.13.
The problem we are seeing is that from time to time, a lot of SPIs are created:
[root@ams02-001 ~]# ip xfrm state | grep -c ^src
7857
Instead of expected ~ 102*2=204 SPIs. Also, we see a alot. Usually,
Some of the SPIs are in a stale state, on the affected nodes, we see this in charon.log:
Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI c663d950 failed: No such process (3)
Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI c5265bed failed: No such process (3)
Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI cea44e84 failed: No such process (3)
Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI c218d962 failed: No such process (3)
Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI c08f7af3 failed: No such process (3)
Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI ca3eb959 failed: No such process (3)
Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI ccd82ddf failed: No such process (3)
Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI c5e8e50d failed: No such process (3)
Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI c7c6f94f failed: No such process (3)
Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI cb94f74a failed: No such process (3)
Sep 18 10:15:57 13[KNL] <tch059|49961> querying SAD entry with SPI c2ebe4df failed: No such process (3)
Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI cc64998e failed: No such process (3)
Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI c68e87ce failed: No such process (3)
Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI cc044f83 failed: No such process (3)
Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI c38d0c03 failed: No such process (3)
Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI c433c8d0 failed: No such process (3)
Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI cd2d20c0 failed: No such process (3)
Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI c01a3729 failed: No such process (3)
Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI cb6871db failed: No such process (3)
Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI c5425a67 failed: No such process (3)
Sep 18 10:15:57 05[KNL] <tch059|49972> querying SAD entry with SPI c546ed9d failed: No such process (3)
Sep 18 10:15:57 07[KNL] <tch059|50015> querying SAD entry with SPI cab7ddd0 failed: No such process (3)
Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI c86040c9 failed: No such process (3)
Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI cc54935a failed: No such process (3)
Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI cb459320 failed: No such process (3)
Sep 18 10:15:57 12[KNL] <tch059|50021> querying SAD entry with SPI c471d8de failed: No such process (3)
Sep 18 10:15:57 06[KNL] <tch059|50061> querying SAD entry with SPI c4e7124f failed: No such process (3)
Sep 18 10:15:57 06[KNL] <tch059|50061> querying SAD entry with SPI c2c4b801 failed: No such process (3)
Some of the point to point assosiations are there multiple times:
ip xfrm state | grep ^src | sort | uniq -c | sort -n
[root@ams02-001 ~]# ip xfrm state | grep ^src | sort | uniq -c | sort -rn | head -n10
3528 src 5.45.A.B dst 5.45.C.D
3528 src 5.45.C.D dst 5.45.A.B
88 src 5.45.A.B dst 5.45.E.F
88 src 5.45.E.F dst 5.45.A.B
This might related to the kernel crashes we see which I described in thread "Occasional kernel crash at __xfrm_state_lookup".
Any help would be appreciated.
Jiri Horky
</pre>
</blockquote>
<br>
</body>
</html>