[strongSwan-dev] 2 transport mode CHILD SAs, tunnel up, connection down

Wed Apr 3 16:27:32 CEST 2013

Hi James,

(bringing the discussion back to the mailing list.)

> I have refined and tested my patch and let it run with a short keylife
> (10 minutes) and rekeymargin (1 minute) since last Thursday. In that
> time frame I have not had the problem as described

Many thanks for your patch, I finally have found some time to take a
closer look at the issue and your work. Sorry for the delay.

> I have gotten to the bottom of the issue and have found that when both
> peers rekey (auto=start) a CHILD_SA at the same time, both rekeying
> attempts are successful, traffic flows on each of the new CHILD_SAs 
> in 1 direction and it breaks the tunnel.

If a peer initiates rekeying, the responder tries to detect that (by
comparing traffic selectors etc.) and reuse the requid. Reusing the same
requid is crucial: We can't install identical policies in the Linux
kernel, hence we only install one copy. But if we have two different
requids, we can't have a working policy for both tunnels, this only
works if we have the same reqid.

The kernel should use always the newest SA to send traffic, but accept
traffic on both the new and the rekeyed SA to have no interruption in
the traffic flow. But as the reqid changes, not both policies work
anymore, breaking the tunnel.

> At the next rekeying attempt by either of the peers the redundant
> CHILD_SA message comes, 1 of the CHILD_SAs is deleted and the tunnel 
> is functional again.

Yes, this is the "interoperable" way we tried to solve the "redundant"
Quick Mode issue. While your patch will work very well between
strongSwans, I don't know how this would end when talking to a different
implementation. In IKEv2 we can compare nonces to elect a CHILD_SA to
delete after a collision; this is standardized. But I don't think there
isn't a procedure defined for IKEv1, or is the SPI comparison used in
your patch based on some recommendation?

> I assume that only having 1 of the peers configured with auto=start
> would solve this issue

No, actually not. Rekeying works independent of the initial tunnel
setup: any peer may initiate rekeying at the configured interval, and
the risk of a collision always exists (even if can be reduced with a
larger rekeymargin/fuzz).

> [...] if both peers check for redundancy after rekeying they
> should both find an extra CHILD_SA and as long as they delete the same
> CHILD_SA the tunnel should function properly (feel free to poke holes 
> in my idea).

That's true, but as indicated above, it might me non-trivial to "delete
the same CHILD_SA" when talking to a third party implementation.

I think the main issue has already been solved by the call to
check_for_rekeyed_child() in quick_mode.c. The problem is just that the
comparison does not include Quick Modes in rekeyed state, as this only
happens during collisions. The attached patch changes that, and fixes
traffic flow even if we have multiple identical Quick Modes for some
time. The existing is_redundant() check will take care that multiple
Quick Mode states won't multiply states again in the next rekey cycle.

Please let me know if this patch works for you, I'll then include it for
5.0.3.

Best regards
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Reuse-reqid-of-an-existing-Quick-Mode-even-if-it-has.patch
Type: text/x-patch
Size: 1305 bytes
Desc: not available
URL: <http://lists.strongswan.org/pipermail/dev/attachments/20130403/87482e3a/attachment.bin>