[strongSwan] Services unreachable after first connection

Wed Jun 3 15:41:55 CEST 2020

Hi everyone, I just joined the ML, first of all thank you for your patience
and help; I don't have a huge experience with vpn in general and this is
the 1st time I used strongswan.

Recently I setup up a test environment for a project where the objective
was to implement kerberos SSO between one of our application servers
(10.1.0.137, an AWS EC2 instance which runs some J2EE applications) and one
of our customers Active Directory domain (10.128.4.15, 10.128.4.16 are the
two domain controllers), after that the application have to search for some
user attributes using AD as ldap directory.
To archive this I managed to setup a site-to-site ipsec vpn between our
systems and our customer datacenter, on our side I used another EC2
instance as vpn endpoint (10.1.0.144, which is behind NAT by AWS with a
public ip 74.74.74.74) using CentOS 7 and strongswan 5.7.2, on our customer
side I don't have control or visibility, the only thing I know is that the
vpn endpoint should be a Fortinet appliance with a public ip
(217.217.217.217).

You can see the whole architecture on this png
https://sc.burrfoot.it/vpn.png

The vpn setup went pretty smooth:
- tunnel established (https://sc.burrfoot.it/strongswan.png)
- I made my application server to use our vpn endpoint as gateway for the
two domain controllers with a static route
- adjusted EC2 security groups to allow kerberos and ldap communication
(TCP and UDP 88 for kerberos, TCP 389 for ldap), on the other side our
customer sysadmin did the same on his firewall.
- no masquerade rules on our vpn endpoint because our customer allowed
requests from our application server internal ip.

Everything seeems ok and a quick test using nmap from the application
server (10.1.0.137) worked pretty well (https://sc.burrfoot.it/nmap.png),
but after some tests (some basic ldapsearch queries) I noticed the ldap did
not respond anymore, so I tried on the second domain controller and it
worked... after that also the second domain controller did not respond
anymore.
At this point I made another nmap test which resulted in traffic filtered (
https://sc.burrfoot.it/nmap2.png).
After a couple of minutes I did some other tests, the ldap seem returned
reachable and queries went ok, but after a while TCP 389 turned unreachable.

To clear out this strange beahvior I setup some basic tcp check with
Nagios, which resulted ok most of the time, except when we did some ldap
queries, at that point port 389 seems to close for a while and returned
available after a few minutes.
At first I thought the problem seems related to some strange firewall
behaviour on our customer side because we don't have any security appliance
or tool on our side (only a basic EC2 security group) but before asking our
customer to do some checks I wanna be sure that our strongswan
configuration is ok and couldn't be the cause of this problem.

I also tried to capture some traffic on our strongswan endpoint
(10.1.0.144), for instance I was looking for TCP port 389 and ESP
protocols, when I made a nmap test on port 389 (open) this is the result
--> https://sc.burrfoot.it/tcpdump1.png
When the port result closed I saw not a single packed passing through, not
a single one, not even a SYN packet from our application server.
Checking strongswan tunnel status I never had a single disconnection,
everything seems very stable from a vpn point of view.

This is my strongswan configuration, I know that some protocols are not the
best from a security point of view, but we had to follow our customer's
specifications.
---
conn aws-customer
    ikelifetime=1440m
    keylife=60m
    rekeymargin=3m
    keyingtries=3
    keyexchange=ikev1
    aggressive=no
    mobike=no
    ike=aes128-sha1-modp1536
    ike=aes256-sha256-modp1536
    esp=aes128-sha1-modp1536
    esp=aes256-sha256-modp1536
    left=10.1.0.144
    leftid=74.74.74.74
    leftsubnet=10.1.0.137/32
    leftauth=psk
    right=217.217.217.217
    rightid=217.217.217.217
    rightsubnet=10.128.4.0/26
    rightauth=psk
    type=tunnel
    auto=start
    dpdaction=restart
---

Do you think this strange behaviour can be cause by our strongswan
configuration?
Can you suggest me some more in deep tests to figure out why we have these
strange interruptions?
Do you have any other suggestions?

Thank you very much for any informations.

Tas

---
*"Arguing that you don't care about the right to privacy because you have
nothing to hide is no different than saying you don't care about free
speech because you have nothing to say."*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.strongswan.org/pipermail/users/attachments/20200603/826d25b4/attachment.html>