[strongSwan] HA resync issue

Thomas Egerer hakke_007 at gmx.de
Fri Aug 1 22:41:46 CEST 2014


Hi Emeric,

On 08/01/2014 06:22 PM, Emeric POUPON wrote:
> Hello,
> 
> I'm running Strongswan 5.2.0 on FreeBSD security gateways.
> 
> I set up a Active/Passive HA cluster.
> I successfully created 300 connections thanks to another remote gateway using strongswan's load-tester plugin.
> => the passive node has been correctly synchronized.
> 
> I then decided to bring down the passive node and bring it up shortly after.
> 
> The wiki says:
> "Synchronizing CHILD_SAs is not possible using the cache, as the messages do not contain sequence number information managed in the kernel. To reintegrate a node, the active node initiates rekeying on all CHILD_SAs. The new CHILD_SA will be synchronized, starting with fresh sequence numbers in the kernel. CHILD_SA rekeying is inexpensive, as it usually does not include a DH exchange."
> 
> (BTW, why would the CHILD SA rekey not include a DH exchange?)
Because by default, PFS is not enabled for children by
default (proposal_t::proposal_create_default).
> Indeed the active node rekeys the 300 CHILD SA in a few seconds, but the passive node gets synchronized with only few CHILD SA (about 30).
> 
> Logs:
> ...
> Aug  1 16:15:16 02[CFG] <sample-psk|9> installed HA passive IKE_SA 'sample-psk' 172.18.0.53[srv.strongswan.org]...172.18.0.54[c108-r1.strongswan.org]
> Aug  1 16:15:16 02[CFG] <sample-psk|10> installed HA passive IKE_SA 'sample-psk' 172.18.0.53[srv.strongswan.org]...172.18.0.54[c20-r1.strongswan.org]
> 
> And then a lot of errors like that:
> ...
> Aug  1 16:15:16 02[CFG] passive HA IKE_SA to update not found
> ...
> Aug  1 16:15:16 02[CHD] IKE_SA for HA CHILD_SA not found
> ...
> Aug  1 16:15:16 02[CHD] <11> HA is missing nodes child configuration
> ...
> 
> Any idea?
This can happen if the passive node is
- not able to check out the IKE_SA to be updated (case 1)
- not able to check out the IKE_SA it should add a child to (case 2)
- not able to find a configuration matching the one used in
  the HA CHILD_SA update (case 3)

which to me looks like your passive node does not have all the
configurations required for the synchronization.
If a passive node comes up it requests an immediate resync
by the active node. This node pushes all established IKE_SAs
(from ha_cache) to the passive node. I've seen cases that
failed the sync, if the configs were not identical.
Maybe a race condition that resync is faster than your backend
loading the configs? In that case 'stroke statusall' should
list a lot of (unnamed) IKE_SAs, the ones that were not synced
properly.

Cheers,

Thomas


More information about the Users mailing list