[strongSwan] Broken CHILD_SA following IKE_SA re-auth with FortiGate remote

Mon Aug 29 11:51:31 CEST 2016

Hi again,

* Tobias Brunner <tobias at strongswan.org>

> > So is strongSwan here intentionally behaving in a non-compliant
> > manner simply in order to better interoperate with other
> > non-compliant IKEv2 implementations, or is there some other reason
> > why "make before break" isn't the default? That is, are there any
> > other limitations/bugs/known interop problems/etc. that I should be
> > aware of before enabling it?  
> 
> There are some aspects that could make creating overlapping SAs
> tricky. One of these is the use of dynamic virtual IPs and other
> configuration attributes assigned when setting up the IKE_SA
> (assigning the same IP twice or assigning a new one, installing the
> same IP and perhaps DNS servers "twice" on the client etc.).  Another
> issue is caused by duplicate checks (for a responder there is no
> indication that an SA is created for a reauthentication or not -
> except perhaps the INITIAL_CONTACT notify but its use is optional -
> so some heuristics might have to be used to avoid destroying the old
> SAs as duplicates). Some issues are strongSwan specific (or only
> apply to certain versions) like sharing an IPsec policy between
> multiple CHILD_SAs (or more specifically multiple reqids).  Custom
> updown scripts might also struggle with this as they might have to do
> some explicit refcounting to avoid something is undone when the old
> IKE_SA and its children are torn down after the new ones were
> created.  The break-before-make approach avoids most of this but, of
> course, creates other problems (like interrupting traffic, issues
> with trap policies, possible race conditions on the responder
> handling deletion of the old and creation of the new SAs
> concurrently).

Thank you. I think none of those considerations are likely to cause me
any problems. I'm using strongSwan for statically configured
site-to-site tunnels. No updown scripts, virtual addresses,
addresses/DNS servers being assigned, or anything like that. Unless of
course those strongSwan-specific issues you refer to are present in
version 5.3.5?

On the other hand having a stable connection is very important, so even
if I disregard the FortiGate issue, enabling make-before-break would be
worth it for me just in order to avoid the brief connectivity
disruption after the old IKE SA has been torn down.

There was one thing you mentioned above that gave me some pause though:

«some heuristics might have to be used to avoid destroying the old SAs
as duplicates»

Could you elaborate on how this might be a problem?

If I understand correctly: if make-before-break reauth is being
performed, and strongSwan has successfully establisehed replacement
IKE and Child SAs, then it shouldn't be any problem with destroying the
old and duplicate/superfluous IKE SA (and its associated Child SAs)?
Why would you want to avoid that from happening - isn't getting rid of
the old SAs precisely the point of reauthenticating?

(I assume that "destroying" here does imply sending a Delete
notification to the remote end so that it too will clean up the old SAs
from its side.)

> I guess for IKEv2 implementations that are based on existing IKEv1
> implementations the make-before-break approach comes relatively
> naturally as it was the only way to rekey an IKEv1 SA.  But there is now
> that strong link between IKE and CHILD_SAs that didn't exist in IKEv1,
> which might be what FortiGate is struggling with.

I agree. Presumably the FortiGate implementation is originally an IKEv1
one with IKEv2 support retrofitted. And they probably didn't notice
that IKEv2 introduced this strong link between the IKE and CHILD SA, so
their IKEv2 implementation has IKEv1 behaviour grandfathered in. That'd
be a bug, of course.

In any case, I do think that enabling make-before-break on the strongSwan
side will prevent the blackholing situation from happening again. The
FortiGate bug likely only got exposed because both sides initiated a
new IKE SA at the same time, something which probably wouldn't have
happened if strongSwan hadn't deleted the old IKE SA without first
having a replacement ready.

Furthermore I suspect they might have been in disagreement about which
of the two duplicate Child SAs was the newest (due to the network
latency) and that it might have been a factor in causing the
disagreement on which Child SA that should remain active.

Tore