[strongSwan] NAT-T, SNAT/DNAT and TCP checksum incorrect on peer VPN gateway (site-to-site)
Narendra Joshi
narendraj9 at gmail.com
Wed Apr 22 21:27:20 CEST 2020
Hi,
Yes, you were right. The checksum errors were only for messages sent
out of the instance (it is probably doing checksum offloading). I
changed the `TCPMSS` values on the IPSec GW router using the
instructions provided on the wiki. I still see the same intermittent
connect timeout errors. I see a lot of re-transmissions for "SYN" and
"SYN-ACK" messages during the connect phase during which the MTU is
being negotiated. Here is a tcpdump of a sample requests as observed
from one of the nodes in the remote subnet of the tunnel:
```
I have added rules to the FORWARD chain to alter the MSS for packets
forwarded in both directions to 1000. But looks like it got changed
only in one direction. In the other direction, it is still 1460.
TCPDUMP of a request to this host that succeeded:
19:17:36.172560 IP (tos 0x0, ttl 62, id 21997, offset 0, flags [DF],
proto TCP (6), length 60)
172.16.0.10.45116 > 10.132.0.3.https: Flags [S], cksum 0xe9cd
(correct), seq 3064550850, win 64944, options [mss 984,sackOK,TS val
2372558830 ecr 0,nop,wscale 7], length 0
19:17:36.172654 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
10.132.0.3.https > 172.16.0.10.45116: Flags [S.], cksum 0xb6cf
(incorrect -> 0x9ba2), seq 3686463638, ack 3064550851, win 28960,
options [mss 1460,sackOK,TS val 1664262474 ecr 2372558830,nop,wscale
7], length 0
19:17:36.190859 IP (tos 0x0, ttl 62, id 21998, offset 0, flags [DF],
proto TCP (6), length 52)
172.16.0.10.45116 > 10.132.0.3.https: Flags [.], cksum 0x3980
(correct), seq 1, ack 1, win 508, options [nop,nop,TS val 2372558849
ecr 1664262474], length 0
19:17:36.191817 IP (tos 0x0, ttl 62, id 21999, offset 0, flags [DF],
proto TCP (6), length 569)
<----- More [P.] [.] message ------>
172.16.0.10.45116 > 10.132.0.3.https: Flags [P.], cksum 0xbad8
(correct), seq 888:919, ack 1704, win 505, options [nop,nop,TS val
2372558914 ecr 1664262490], length 31
19:17:36.255797 IP (tos 0x0, ttl 62, id 22010, offset 0, flags [DF],
proto TCP (6), length 52)
172.16.0.10.45116 > 10.132.0.3.https: Flags [F.], cksum 0x2ef4
(correct), seq 919, ack 1704, win 505, options [nop,nop,TS val
2372558914 ecr 1664262490], length 0
19:17:36.255871 IP (tos 0x0, ttl 63, id 23011, offset 0, flags [DF],
proto TCP (6), length 83)
10.132.0.3.https > 172.16.0.10.45116: Flags [P.], cksum 0xb6e6
(incorrect -> 0x2764), seq 1704:1735, ack 920, win 235, options
[nop,nop,TS val 1664262495 ecr 2372558914], length 31
19:17:36.255902 IP (tos 0x0, ttl 63, id 23012, offset 0, flags [DF],
proto TCP (6), length 52)
10.132.0.3.https > 172.16.0.10.45116: Flags [F.], cksum 0xb6c7
(incorrect -> 0x2fdd), seq 1735, ack 920, win 235, options [nop,nop,TS
val 1664262495 ecr 2372558914], length 0
19:17:36.272869 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto
TCP (6), length 40)
172.16.0.10.45116 > 10.132.0.3.https: Flags [R], cksum 0x3744
(correct), seq 3064551770, win 0, length 0
19:17:36.272892 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto
TCP (6), length 40)
172.16.0.10.45116 > 10.132.0.3.https: Flags [R], cksum 0x3744
(correct), seq 3064551770, win 0, length 0
19:17:37.335765 IP (tos 0x0, ttl 62, id 5467, offset 0, flags [DF],
proto TCP (6), length 60)
TCPDUMP of a request to this host that timed out on a connect. There
is a constant arrival of SYN messages for which SYN ACKs are being
sent but they never reach the other side:
172.16.0.10.45118 > 10.132.0.3.https: Flags [S], cksum 0x1176
(correct), seq 3000642907, win 64944, options [mss 984,sackOK,TS val
2372559994 ecr 0,nop,wscale 7], length 0
19:17:37.335857 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
10.132.0.3.https > 172.16.0.10.45118: Flags [S.], cksum 0xb6cf
(incorrect -> 0x9ebe), seq 1985380708, ack 3000642908, win 28960,
options [mss 1460,sackOK,TS val 1664262765 ecr 2372559994,nop,wscale
7], length 0
19:17:37.353222 IP (tos 0x0, ttl 62, id 5468, offset 0, flags [DF],
proto TCP (6), length 52)
172.16.0.10.45118 > 10.132.0.3.https: Flags [.], cksum 0x3c9d
(correct), seq 1, ack 1, win 508, options [nop,nop,TS val 2372560012
ecr 1664262765], length 0
19:17:37.354233 IP (tos 0x0, ttl 62, id 5469, offset 0, flags [DF],
proto TCP (6), length 569)
172.16.0.10.45118 > 10.132.0.3.https: Flags [P.], cksum 0x602d
(correct), seq 1:518, ack 1, win 508, options [nop,nop,TS val
2372560013 ecr 1664262765], length 517
19:17:37.354272 IP (tos 0x0, ttl 63, id 8980, offset 0, flags [DF],
proto TCP (6), length 52)
10.132.0.3.https > 172.16.0.10.45118: Flags [.], cksum 0xb6c7
(incorrect -> 0x3ba4), seq 1, ack 518, win 235, options [nop,nop,TS
val 1664262769 ecr 2372560013], length 0
19:17:37.357210 IP (tos 0x0, ttl 63, id 8981, offset 0, flags [DF],
proto TCP (6), length 1286)
10.132.0.3.https > 172.16.0.10.45118: Flags [P.], cksum 0xbb99
(incorrect -> 0x3d86), seq 1:1235, ack 518, win 235, options
[nop,nop,TS val 1664262770 ecr 2372560013], length 1234
19:17:37.374045 IP (tos 0x0, ttl 62, id 5470, offset 0, flags [DF],
proto TCP (6), length 52)
172.16.0.10.45118 > 10.132.0.3.https: Flags [.], cksum 0x35b6
(correct), seq 518, ack 1235, win 499, options [nop,nop,TS val
2372560032 ecr 1664262770], length 0
19:17:37.374484 IP (tos 0x0, ttl 62, id 5471, offset 0, flags [DF],
proto TCP (6), length 145)
172.16.0.10.45118 > 10.132.0.3.https: Flags [P.], cksum 0xdc2b
(correct), seq 518:611, ack 1235, win 505, options [nop,nop,TS val
2372560033 ecr 1664262770], length 93
19:17:37.374648 IP (tos 0x0, ttl 63, id 8983, offset 0, flags [DF],
proto TCP (6), length 103)
10.132.0.3.https > 172.16.0.10.45118: Flags [P.], cksum 0xb6fa
(incorrect -> 0x4b87), seq 1235:1286, ack 611, win 235, options
[nop,nop,TS val 1664262774 ecr 2372560033], length 51
19:17:37.374791 IP (tos 0x0, ttl 63, id 8984, offset 0, flags [DF],
proto TCP (6), length 114)
10.132.0.3.https > 172.16.0.10.45118: Flags [P.], cksum 0xb705
(incorrect -> 0x84b7), seq 1286:1348, ack 611, win 235, options
[nop,nop,TS val 1664262774 ecr 2372560033], length 62
19:17:37.391790 IP (tos 0x0, ttl 62, id 5472, offset 0, flags [DF],
proto TCP (6), length 105)
172.16.0.10.45118 > 10.132.0.3.https: Flags [P.], cksum 0x26cc
(correct), seq 611:664, ack 1348, win 505, options [nop,nop,TS val
2372560050 ecr 1664262774], length 53
19:17:37.391809 IP (tos 0x0, ttl 62, id 5473, offset 0, flags [DF],
proto TCP (6), length 108)
172.16.0.10.45118 > 10.132.0.3.https: Flags [P.], cksum 0x43a0
(correct), seq 664:720, ack 1348, win 505, options [nop,nop,TS val
2372560050 ecr 1664262774], length 56
19:17:37.391817 IP (tos 0x0, ttl 62, id 5474, offset 0, flags [DF],
proto TCP (6), length 94)
172.16.0.10.45118 > 10.132.0.3.https: Flags [P.], cksum 0x7ed0
(correct), seq 720:762, ack 1348, win 505, options [nop,nop,TS val
2372560050 ecr 1664262774], length 42
19:17:37.391821 IP (tos 0x0, ttl 62, id 5475, offset 0, flags [DF],
proto TCP (6), length 140)
172.16.0.10.45118 > 10.132.0.3.https: Flags [P.], cksum 0x845d
(correct), seq 762:850, ack 1348, win 505, options [nop,nop,TS val
2372560050 ecr 1664262774], length 88
19:17:37.391824 IP (tos 0x0, ttl 62, id 5476, offset 0, flags [DF],
proto TCP (6), length 90)
172.16.0.10.45118 > 10.132.0.3.https: Flags [P.], cksum 0x48a7
(correct), seq 850:888, ack 1348, win 505, options [nop,nop,TS val
2372560050 ecr 1664262774], length 38
19:17:37.391928 IP (tos 0x0, ttl 63, id 8985, offset 0, flags [DF],
proto TCP (6), length 52)
10.132.0.3.https > 172.16.0.10.45118: Flags [.], cksum 0xb6c7
(incorrect -> 0x34c0), seq 1348, ack 888, win 235, options [nop,nop,TS
val 1664262779 ecr 2372560050], length 0
19:17:37.392023 IP (tos 0x0, ttl 63, id 8986, offset 0, flags [DF],
proto TCP (6), length 94)
10.132.0.3.https > 172.16.0.10.45118: Flags [P.], cksum 0xb6f1
(incorrect -> 0x0c15), seq 1348:1390, ack 888, win 235, options
[nop,nop,TS val 1664262779 ecr 2372560050], length 42
19:17:37.392126 IP (tos 0x0, ttl 63, id 8987, offset 0, flags [DF],
proto TCP (6), length 90)
10.132.0.3.https > 172.16.0.10.45118: Flags [P.], cksum 0xb6ed
(incorrect -> 0x753b), seq 1390:1428, ack 888, win 235, options
[nop,nop,TS val 1664262779 ecr 2372560050], length 38
19:17:37.397977 IP (tos 0x0, ttl 63, id 8988, offset 0, flags [DF],
proto TCP (6), length 231)
10.132.0.3.https > 172.16.0.10.45118: Flags [P.], cksum 0xb77a
(incorrect -> 0x45d8), seq 1428:1607, ack 888, win 235, options
[nop,nop,TS val 1664262780 ecr 2372560050], length 179
19:17:37.398053 IP (tos 0x0, ttl 63, id 8989, offset 0, flags [DF],
proto TCP (6), length 149)
10.132.0.3.https > 172.16.0.10.45118: Flags [P.], cksum 0xb728
(incorrect -> 0xbb19), seq 1607:1704, ack 888, win 235, options
[nop,nop,TS val 1664262780 ecr 2372560050], length 97
19:17:37.408804 IP (tos 0x0, ttl 62, id 5477, offset 0, flags [DF],
proto TCP (6), length 52)
172.16.0.10.45118 > 10.132.0.3.https: Flags [.], cksum 0x3351
(correct), seq 888, ack 1428, win 505, options [nop,nop,TS val
2372560067 ecr 1664262779], length 0
19:17:37.414814 IP (tos 0x0, ttl 62, id 5478, offset 0, flags [DF],
proto TCP (6), length 52)
172.16.0.10.45118 > 10.132.0.3.https: Flags [.], cksum 0x3236
(correct), seq 888, ack 1704, win 505, options [nop,nop,TS val
2372560073 ecr 1664262780], length 0
19:17:37.414839 IP (tos 0x0, ttl 62, id 5479, offset 0, flags [DF],
proto TCP (6), length 83)
172.16.0.10.45118 > 10.132.0.3.https: Flags [P.], cksum 0x2941
(correct), seq 888:919, ack 1704, win 505, options [nop,nop,TS val
2372560073 ecr 1664262780], length 31
19:17:37.414940 IP (tos 0x0, ttl 62, id 5480, offset 0, flags [DF],
proto TCP (6), length 52)
172.16.0.10.45118 > 10.132.0.3.https: Flags [F.], cksum 0x3216
(correct), seq 919, ack 1704, win 505, options [nop,nop,TS val
2372560073 ecr 1664262780], length 0
19:17:37.414948 IP (tos 0x0, ttl 63, id 8990, offset 0, flags [DF],
proto TCP (6), length 83)
10.132.0.3.https > 172.16.0.10.45118: Flags [P.], cksum 0xb6e6
(incorrect -> 0xd162), seq 1704:1735, ack 919, win 235, options
[nop,nop,TS val 1664262784 ecr 2372560073], length 31
19:17:37.414983 IP (tos 0x0, ttl 63, id 8991, offset 0, flags [DF],
proto TCP (6), length 52)
10.132.0.3.https > 172.16.0.10.45118: Flags [F.], cksum 0xb6c7
(incorrect -> 0x3301), seq 1735, ack 919, win 235, options [nop,nop,TS
val 1664262784 ecr 2372560073], length 0
19:17:37.414994 IP (tos 0x0, ttl 63, id 8992, offset 0, flags [DF],
proto TCP (6), length 52)
10.132.0.3.https > 172.16.0.10.45118: Flags [.], cksum 0xb6c7
(incorrect -> 0x3300), seq 1736, ack 920, win 235, options [nop,nop,TS
val 1664262784 ecr 2372560073], length 0
19:17:37.431812 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto
TCP (6), length 40)
172.16.0.10.45118 > 10.132.0.3.https: Flags [R], cksum 0x6378
(correct), seq 3000643827, win 0, length 0
19:17:38.368902 IP (tos 0x0, ttl 62, id 42541, offset 0, flags [DF],
proto TCP (6), length 60)
172.16.0.10.45120 > 10.132.0.3.https: Flags [S], cksum 0x960b
(correct), seq 3412102196, win 64944, options [mss 984,sackOK,TS val
2372561027 ecr 0,nop,wscale 7], length 0
19:17:38.368992 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
10.132.0.3.https > 172.16.0.10.45120: Flags [S.], cksum 0xb6cf
(incorrect -> 0x0eea), seq 2448062776, ack 3412102197, win 28960,
options [mss 1460,sackOK,TS val 1664263023 ecr 2372561027,nop,wscale
7], length 0
19:17:39.385347 IP (tos 0x0, ttl 62, id 42542, offset 0, flags [DF],
proto TCP (6), length 60)
172.16.0.10.45120 > 10.132.0.3.https: Flags [S], cksum 0x9213
(correct), seq 3412102196, win 64944, options [mss 984,sackOK,TS val
2372562043 ecr 0,nop,wscale 7], length 0
19:17:39.385421 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
10.132.0.3.https > 172.16.0.10.45120: Flags [S.], cksum 0xb6cf
(incorrect -> 0x0dec), seq 2448062776, ack 3412102197, win 28960,
options [mss 1460,sackOK,TS val 1664263277 ecr 2372561027,nop,wscale
7], length 0
19:17:40.387609 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
10.132.0.3.https > 172.16.0.10.45120: Flags [S.], cksum 0xb6cf
(incorrect -> 0x0cf1), seq 2448062776, ack 3412102197, win 28960,
options [mss 1460,sackOK,TS val 1664263528 ecr 2372561027,nop,wscale
7], length 0
19:17:40.739601 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
10.132.0.3.https > 172.16.0.10.45114: Flags [S.], cksum 0xb6cf
(incorrect -> 0xe6b0), seq 2507646035, ack 2215775801, win 28960,
options [mss 1460,sackOK,TS val 1664263616 ecr 2372543764,nop,wscale
7], length 0
19:17:41.401367 IP (tos 0x0, ttl 62, id 42543, offset 0, flags [DF],
proto TCP (6), length 60)
172.16.0.10.45120 > 10.132.0.3.https: Flags [S], cksum 0x8a33
(correct), seq 3412102196, win 64944, options [mss 984,sackOK,TS val
2372564059 ecr 0,nop,wscale 7], length 0
19:17:41.401440 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
10.132.0.3.https > 172.16.0.10.45120: Flags [S.], cksum 0xb6cf
(incorrect -> 0x0bf4), seq 2448062776, ack 3412102197, win 28960,
options [mss 1460,sackOK,TS val 1664263781 ecr 2372561027,nop,wscale
7], length 0
19:17:43.427545 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
10.132.0.3.https > 172.16.0.10.45120: Flags [S.], cksum 0xb6cf
(incorrect -> 0x09f9), seq 2448062776, ack 3412102197, win 28960,
options [mss 1460,sackOK,TS val 1664264288 ecr 2372561027,nop,wscale
7], length 0
19:17:45.529443 IP (tos 0x0, ttl 62, id 42544, offset 0, flags [DF],
proto TCP (6), length 60)
172.16.0.10.45120 > 10.132.0.3.https: Flags [S], cksum 0x7a13
(correct), seq 3412102196, win 64944, options [mss 984,sackOK,TS val
2372568187 ecr 0,nop,wscale 7], length 0
19:17:45.529512 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
10.132.0.3.https > 172.16.0.10.45120: Flags [S.], cksum 0xb6cf
(incorrect -> 0x07ec), seq 2448062776, ack 3412102197, win 28960,
options [mss 1460,sackOK,TS val 1664264813 ecr 2372561027,nop,wscale
7], length 0
19:17:49.699616 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
10.132.0.3.https > 172.16.0.10.45120: Flags [S.], cksum 0xb6cf
(incorrect -> 0x03d9), seq 2448062776, ack 3412102197, win 28960,
options [mss 1460,sackOK,TS val 1664265856 ecr 2372561027,nop,wscale
7], length 0
19:17:53.721925 IP (tos 0x0, ttl 62, id 42545, offset 0, flags [DF],
proto TCP (6), length 60)
172.16.0.10.45120 > 10.132.0.3.https: Flags [S], cksum 0x5a13
(correct), seq 3412102196, win 64944, options [mss 984,sackOK,TS val
2372576379 ecr 0,nop,wscale 7], length 0
19:17:53.722008 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
10.132.0.3.https > 172.16.0.10.45120: Flags [S.], cksum 0xb6cf
(incorrect -> 0xffeb), seq 2448062776, ack 3412102197, win 28960,
options [mss 1460,sackOK,TS val 1664266861 ecr 2372561027,nop,wscale
7], length 0
19:17:56.867632 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
10.132.0.3.https > 172.16.0.10.45114: Flags [S.], cksum 0xb6cf
(incorrect -> 0xd6f0), seq 2507646035, ack 2215775801, win 28960,
options [mss 1460,sackOK,TS val 1664267648 ecr 2372543764,nop,wscale
7], length 0
19:18:01.731671 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
10.132.0.3.https > 172.16.0.10.45120: Flags [S.], cksum 0xb6cf
(incorrect -> 0xf818), seq 2448062776, ack 3412102197, win 28960,
options [mss 1460,sackOK,TS val 1664268864 ecr 2372561027,nop,wscale
7], length 0
19:18:17.859601 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto
TCP (6), length 60)
10.132.0.3.https > 172.16.0.10.45120: Flags [S.], cksum 0xb6cf
(incorrect -> 0xe858), seq 2448062776, ack 3412102197, win 28960,
options [mss 1460,sackOK,TS val 1664272896 ecr 2372561027,nop,wscale
7], length 0
```
Thanks,
On Tue, Apr 21, 2020 at 11:37 PM Noel Kuntze
<noel.kuntze+strongswan-users-ml at thermi.consulting> wrote:
>
> Hello Narendra,
>
> There is no specific, dedicated tool, other than just trying large packets by, for example, using the -s flag for ping.
>
> No, MTU problems can not cause TCP checksum errors. That is likely a false lead. It might be caused by RX and TX checksum offloading though. Check the sizes first though and specifically, just getting google.com. That page is quite small and should work fine. Loading a picture from Instagram probably fails. PMTUD didn't work with Instagram's CDN last time I checked.
>
> Kind regards
>
> Noel
>
> Am 21.04.20 um 22:39 schrieb Narendra Joshi:
> > Noel Kuntze <noel.kuntze+strongswan-users-ml at thermi.consulting> writes:
> >
> >> Hi,
> >> Those are likely all false leads. It's likely to be an MTU/MSS problem, which is described on the wiki[1].
> > Thank you very much for the quick response. I will follow the instructions provided in the wiki.
> > Is there a tool that I can use to verify that it is MTU because of which there is a failure to connect? I noticed incorrect values for the TCP checksum on the host in the peer's subnet using `tcpdump`. Moreover, ICMP seems to be working without any packet loss at all. I can imagine that ICMP packets won't be large enough to reach the MTU value (probably). Can MTU cause TCP checksum failures? My networking knowledge is definitely limited here.
> >
> >> Kind regards
> >> Noel
> >> [1] https://wiki.strongswan.org/projects/strongswan/wiki/ForwardingAndSplitTunneling#MTUMSS-issues
> >> Am 21.04.20 um 20:38 schrieb Narendra Joshi:
> >>> Hi, I have setup an IPSec gateway on a virtual instance in a VPC using a cloud provider. The cloud provider has Elastic IPs that aren't attached to any network interface on the virtual instance so strongSwan uses NAT-T. Also I need to do SNAT/DNAT for mapping my side of the subnet that is advertised to my VPN peer. I have found that this setup causes very frequent TCP checksum failures. There are so frequent that an HTTP request fails ~50% of the time because TCP connect times out. It would be great if anyone who has faced something similar before can help me understand what is happening and how it can be avoided. Here is an image of the setup I have: Best regards,
> >>
> >
>
More information about the Users
mailing list