[strongSwan-dev] Error handling in the HA plugin
emeric.poupon at stormshield.eu
Thu Jun 16 18:49:26 CEST 2016
The HA plugin uses sockets to communicate with the other members of the HA cluster.
The problem is that if there is a transmission error we end up in a desynchronized state, which may be very difficult to recover from.
Actually we have a modified version of the HA plugin that uses corosync instead of sockets but the problem is still the same, even it is mitigated.
The question is: how can we automatically recover from such a situation?
I was thinking about sending the non responsible nodes a FLUSH message in order for them to clean up everything and make then respond with a RESYNC message.
The problem is that in ha_socket we have no clue about segments, messages and responsibilities... Maybe we would need a new event?
What do you think?
More information about the Dev