[strongSwan-dev] Error handling in the HA plugin

Emeric POUPON emeric.poupon at stormshield.eu
Thu Jun 16 18:49:26 CEST 2016


The HA plugin uses sockets to communicate with the other members of the HA cluster.
The problem is that if there is a transmission error we end up in a desynchronized state, which may be very difficult to recover from.

Actually we have a modified version of the HA plugin that uses corosync instead of sockets but the problem is still the same, even it is mitigated.

The question is: how can we automatically recover from such a situation?
I was thinking about sending the non responsible nodes a FLUSH message in order for them to clean up everything and make then respond with a RESYNC message.
The problem is that in ha_socket we have no clue about segments, messages and responsibilities... Maybe we would need a new event?

What do you think?


