[strongSwan] Charon fails after recovering from crash

Fri Jul 26 19:07:27 CEST 2013

I have multiple strongswan machines, and I randomly have noticed all
tunnels on them have failed.  Running ipsec statusall shows everything
looks normal, but no SA's are built.  I have tried changing the auto=start,
auto=route, and some dpd settings hoping to make them recover with no luck.

Looking back at logs, it seems to have died at some point (received signal
11) and been restarted automatically.  Problem being that after being
restarted it errors and can not build SA's.  Therefore when the tunnels
time out, they never come back.

I have found this post from 2011 that seems to explain exactly my same
problem.  Problem being the only reply was asking him to run 4.5.3 and
attach GDB

https://lists.strongswan.org/pipermail/users/2011-August/006521.html

As this post suggests, doing a kill -11 on the process can exactly
replicate my problem.  The process dies, and is re-started.  Once this
happens no new SA's will be created (even though the current ones will be
used for the time being).  I have tried running an ipsec reload, and that
appears to allow new SA's at first, but I have seen this not work or fail
after a short time.  So far I have been doing a full ipsec restart to
rectify the situation.  So now I have a process watching for this error and
restarting strongswan.  That guarantees a few dropped packets as the old
tunnels are destroyed and new ones created.  This is really not what I
want, but it is the best I have right now.

My version reported from ipsec is:
Linux strongSwan U4.5.2/K3.2.0-29-generic

I see that 5.04 is available, so I complied that and gave it a try.  As I
can not test what was causing the segfault, I have tested a kill -11 on the
process to see what it would do in the event that a segfault does occur.
 This version is very similar to the 4.5.2 I am running.  The tunnels stay
up, until the starter notices charon isn't running.  It correctly starts
the process again, but at this point charon fails to insert into the SPD.
 Looking at ip xfrm I can see that when it tries this insert it runs into
the existing entries that were still around because the process died
unexpectedly.  It seems when it tries to insert a second time, it actually
knocks the existing (working) tunnels out of the table.  This causes the
tunnels to die.  If I issue the ipsec reload command, it tries one more
time, and this time is successful allowing the tunnels to come back up.  I
can cross my fingers that this version doesn't segfault, but it seems it
should do something with the SPD errors (maybe re-try the insert?) so that
it can recover if the process does die.

I could really use some direction here as to how this is expected to work.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.strongswan.org/pipermail/users/attachments/20130726/d9dd7391/attachment.html>