[strongSwan] help debug reconnect after idle timeout

Mon Nov 12 15:24:15 CET 2018

Hi folks,

Long time listener, first time caller. I've been a happy strongswan user
for years, recently I moved my gateways to the latest Linux 4.14.77-80 and
Strongswan U5.7.1 and if I leave my connection idle, it can never pass
traffic again until I restart the connection.

I have two gateways connecting to a Check Point vpn concentrator I don't
manage. Identical configuration, bone stock default strongswan.conf. I
constantly ping the remote gateway on one of my gateways and it's been up
for more than a day, the other I let idle and now I can't reach the remote
IPs any more.

My ipsec.conf is basic:

config setup
      uniqueids = yes
conn %default
      inactive=15m
      ikelifetime=1h
      lifetime=31m
      margintime=3m
      rekeyfuzz=100%
      rekey=yes
conn REMOTE
      authby=psk
      auto=route
      keyexchange=ikev1

What detail would help troubleshoot this?

ip route list table 220 is the same on the gw that is currently working and
the gw that has timed out.

/proc/net/xfrm_stat is all zeroes EXCEPT XfrmOutNoStates which keeps
increasing with every ping I send and never get a response to.

ip xfrm state:

WORKING:

src LOCAL dst REMOTEGW
proto esp spi 0xff09c486 reqid 1 mode tunnel
replay-window 0 flag af-unspec
auth-trunc hmac(sha256) 0xREDACTED 128
enc cbc(aes) 0xREDACTED
anti-replay context: seq 0x0, oseq 0x2, bitmap 0x00000000
src REMOTEGW dst LOCAL
proto esp spi 0xc757e42f reqid 1 mode tunnel
replay-window 32 flag af-unspec
auth-trunc hmac(sha256) 0xREDACTED 128
enc cbc(aes) 0xREDACTED
anti-replay context: seq 0x2, oseq 0x0, bitmap 0x00000003

TIMED OUT:

src LOCAL dst REMOTEGW
proto esp spi 0x00000000 reqid 1 mode tunnel
replay-window 0
anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
sel src LOCALIP/32 dst REMOTEIP/32 proto udp sport 33053 dport 1025 dev eth0

Comparison of working and idled out statuses:

WORKING:

Routed Connections:
 REMOTE{1}:  ROUTED, TUNNEL, reqid 1
 REMOTE{1}:   LOCALSUBNET/32 === REMOTESUBNET/32
Security Associations (1 up, 0 connecting):
 REMOTE[49]: ESTABLISHED 9 minutes ago, redacted
 REMOTE{66}:  INSTALLED, TUNNEL, reqid 1, ESP SPIs: c894097b_i ed7e44f7_o
 REMOTE{66}:   LOCALSUBNET/32 === REMOTESUBNET/32

TIMED OUT:

Routed Connections:
 REMOTE{1}:  ROUTED, TUNNEL, reqid 1
 REMOTE{1}:   LOCALSUBNET/32 === REMOTESUBNET/32
Security Associations (1 up, 0 connecting):
 REMOTE[16]: ESTABLISHED 6 minutes ago, redacted

I have charon debug running with most everything set to 2, here are some
state changes from the one that timed out:

08:13 15[IKE] <15> IKE_SA (unnamed)[15] state change: CREATED => CONNECTING
08:13 16[IKE] <REMOTE|15> IKE_SA REMOTE[15] state change: CONNECTING =>
ESTABLISHED
08:13 06[CHD] <REMOTE|14> CHILD_SA REMOTE{7} state change: CREATED =>
DESTROYING
08:13 06[IKE] <REMOTE|14> IKE_SA REMOTE[14] state change: ESTABLISHED =>
DELETING
08:13 06[IKE] <REMOTE|14> IKE_SA REMOTE[14] state change: DELETING =>
DESTROYING

08:28 16[IKE] <16> IKE_SA (unnamed)[16] state change: CREATED => CONNECTING
08:28 06[IKE] <REMOTE|16> IKE_SA REMOTE[16] state change: CONNECTING =>
ESTABLISHED
08:28 07[CHD] <REMOTE|15> CHILD_SA REMOTE{8} state change: CREATED =>
DESTROYING
08:28 07[IKE] <REMOTE|15> IKE_SA REMOTE[15] state change: ESTABLISHED =>
DELETING
08:28 07[IKE] <REMOTE|15> IKE_SA REMOTE[15] state change: DELETING =>
DESTROYING

I'm guessing my side thinks the tunnel is up, remote thinks tunnel is down.
How can I get it to automatically reset in this case?

Thanks in advance!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.strongswan.org/pipermail/users/attachments/20181112/00c72204/attachment-0001.html>