[strongSwan] problems with charon in 4.4.1

Andreas Schuldei schuldei+strongswan at spotify.com
Fri May 27 10:24:47 CEST 2011


after the test setup survived the night (i dont know if there were
problems during the night, but if there where, they self-healed, which
is almost as good.)  this morning the there were again several hosts
without and SA in ESTABLISHED state (according to ipsec statusall).

it centered around fiona again. after running ipsec down
$connection-name; ipsec up $connection-name things worked again,
except for the connection between fiona and grazyna.lon.spotify.net
(which are in the same local network, where i dont expect any package
loss and very low latency). the SA was set up again, but both hosts
were unable to ping each other or transfere test-data. we dont fiddle
with iptables, so i expect xfrm policy or state went south.  so today
i uploaded even the output of ( ip xfrm policy show; ip xfrm state
show ) > /tmp/xfrm-policy-and-state.dump

Unfortunatly i forgot to set the charon logging of the cfg module to
2, as i had intended. charon 4.4.1 does not seem to know the keyword
in the config file, but i had intended to set it with stroke. doh! I
did, however, change the configuration to use auto=route and
dpdaction=hold, in order to make the setup more resiliant and somewhat
self-healing.

Could you please check out if you can see any fishyness in the logs,
regardless? note that this time we dont just have a failure to
negotiate a SA, but even to transmit any payload after its there, so
this is something new and somewhat more scary (that depends on your
view, i guess...)

the logs and dumps are at

http://origin.scdn.co/u/wp/fiona.lon.spotify.net-charon.log
http://origin.scdn.co/u/wp/fiona.lon.spotify.net-xfrm-policy-and-state.dump
http://origin.scdn.co/u/wp/grazyna.lon.spotify.net-charon.log
http://origin.scdn.co/u/wp/grazyna.lon.spotify.net-xfrm-policy-and-state.dump


thanks!
/andreas



On Thu, May 26, 2011 at 12:51 PM, Andreas Schuldei
<schuldei+strongswan at spotify.com> wrote:
> On Wed, May 25, 2011 at 8:49 AM, Andreas Schuldei
> <schuldei+strongswan at spotify.com> wrote:
>> now i uploaded new logs from taylor and aldona. the two dropped their
>> SA sometimes after 2011-05-24T21:48:21 (that is the last good SA
>> negotiation i can see in the logs) and didnt manage to establish a new
>> one.
>>
>> could someone please look at the logs and tell me if i can do anything
>> about this failure (by choosing different config options)? The
>> configuration files are unchanged since my first mail.
>>
>> please find the log files at
>> http://origin.scdn.co/u/wp/aldona.ash.spotify.net-charon.log
>> http://origin.scdn.co/u/wp/taylor.sto.spotify.net-charon.log
>>
>> they are smaller this time, and the timestamp might make it easier, too. :-)
>
> yesterday i changed (per the suggestions on irc) the ipsec.conf to say
> reauth=no, to make the connections less prone to reauthentication
> isssues (and also switched to transport mode).  Then i restarted
> everything and also extended the testbed a little, so that we have 23
> machines sending random traffic to each other through ipsec
> continuously. (->253 host-to-host connections :-)
>
> i uploaded new log files of a failure now, which centers around fiona.
> aldona, alejandra, alvina, amber and annmarie failed to re-establish
> their SAs to fiona. annmarie for example set up its last SA at 21:37
> and then stopped talking to fiona altogether. other traffic to other
> hosts goes on as before.
>
> please check out
>
> http://origin.scdn.co/u/wp/aldona.ash.spotify.net-charon.log
> http://origin.scdn.co/u/wp/alejandra.ash.spotify.net-charon.log
> http://origin.scdn.co/u/wp/alvina.ash.spotify.net-charon.log
> http://origin.scdn.co/u/wp/amber.lon.spotify.net-charon.log
> http://origin.scdn.co/u/wp/annmarie.ash.spotify.net-charon.log
> http://origin.scdn.co/u/wp/fiona.lon.spotify.net-charon.log
>
> as well as their IPsec config files that are now generated with this template:
>
> $comment
>
> config setup
>        plutostart=no # pluto is used for IKEv1
>
> conn %default
>        ikelifetime=3h           # strongSwan default
>        lifetime=1h              # strongSwan default
>        margintime=9m            # strongSwan default
>        keyingtries=%forever     # strongSwan default
>        mobike=no                # mobike is used for NAT traversal
>        keyexchange=ikev2
>        ike=aes128-sha1-modp2048
>        esp=aes128-sha1-modp2048
>        left=%defaultroute
>        leftcert=host_server.crt
>        type=transport           # should work just as good as tunnel, but
> less overhead
>        reauth=no                # recommended so that SAs are rekeyed, not
> reauthenticaed
>
>
> # Begin connection section
>
> # For all connections, the peer with the host name
> # that is first in a lexicographical sorting
> # is selected as the initiator of the connection.
> #for $peer in $peers
>
> conn $host-$peer.name
>        right=$peer.ip
>        rightid="C=SE, O=Spotify, CN=$peer.name"
>        #if $peer.initiator
>        auto=start
>        dpdaction=restart
>        #else
>        auto=add
>        dpdaction=clear
>        #end if
> #end for
>
>
> regarding the remnants of xfrm policy after /etc/init.d/ipsec stop: is
> that a sign for the cleanup of charon at shutdown gone wrong? i also
> see that the xfrm kernel modules are still heavily used (with a usage
> count of ~60, on some machines) when charon was stopped and no SAs are
> active any more. how can i see with lsof (or similar tools) what
> userspace (or kernel) stuff uses it?
>




More information about the Users mailing list