[strongSwan-dev] Fault Restart Issue with Key Sockets

Wed Jun 3 18:29:49 CEST 2015

Hi Herbie,

> Today's topic is that when the daemon restarts, the security policies
> all get deleted.  This leaves the kernel completely wide open...

Actually, the policies are NOT deleted from the kernel.  So the SAs and
policies are actually used without interruption after the daemon has
crashed.  Until the SAs expire.  However, we don't set lifetimes on
policies so plain packets should not leave the host even when the SAs
are eventually gone.

But because those entries are still in the kernel when the SAs are
reestablished, after the charon daemon got restarted by the starter
daemon, you'll get these EEXIST errors:

> 11:51:00 09[KNL] adding policy 10.2.10.121/32 === 10.2.10.122/32 out 
> 11:51:00 09[KNL] unable to add policy: File exists. (5017)

When starter is terminated properly (e.g. via `ipsec stop`) the kernel
state is flushed, but during a restart because of a crash that's not the
case.  Neither are routes or firewall rules removed that were added by
the daemon or via updown script.  The basic assumption is, of course,
that the daemon should never crash.

> I checked the KLIPS and Netlink versions and it looks like the
> add_policy method is always supposed to update existing SPs in the
> kernel.

If it knows about the existing policies, they are updated.  But in this
case the restarted IKE daemon has no knowledge of them (it assumes full
control over the kernel's IPsec stack and a clean slate).

> With that in mind, would the fix for this problem be to handle EEXIST
> in add_policy_internal by replacing

Yes, that should work for this particular case.  Not sure if there are
situations where updating existing policies is unwanted (for the daemon
this still looks like it added the policies, so it will eventually
remove them).  Anyway, I pushed changes for the kernel-pfkey and
kernel-netlink plugins to the policy-update-eexist branch [1].

By the way, if auto=route is used the result might be a bit odd.  Since
the previously installed SAs and policies are still there no new SAs are
established until the existing SAs expire.  Soft expires will be
triggered, for which the daemon does not find any state, so no rekeying
is done.  And only when the SAs have finally expired will the kernel
send an acquire to the daemon (this assumes that the reqids are the same
as they were before the restart, which should be the case since 5.3.0).
Also, if the other peer has lower lifetimes it will try to rekey the
CHILD_SA using an IKE_SA that does not exist on the restarted peer, so
after a few retransmits it will eventually trigger its configured
`dpdaction`.

Regards,
Tobias

[1]
https://git.strongswan.org/?p=strongswan.git;a=shortlog;h=refs/heads/policy-update-eexist