[strongSwan-dev] charon deadlock involving three threads
Thomas Egerer
thomas.egerer at secunet.com
Thu Dec 22 14:49:23 CET 2011
Hello *,
I recently discovered a deadlock which can occur in charon. It was
caused by one of my own plugins but I managed to reproduce it using
an almost vanilla upstream charon version 4.6.1. Twp patches are
attached to this mail:
Patch 0001 is to create a backend with a peer config which does not
have a child with the same name. I used stroke's backend for this
purpose.
Patch 0002 is simply to enlarge the window in which the deadlock can
occur and make it reproducible.
The deadlock occurs when the following happens (in the given order):
a) an IKE_SA is built and a thread is processing the IKE_AUTH request,
which can take a bit longer when a smartcard is involved. This
causes the ike_sa_manager to lock a particular IKE_SA exclusively.
b) an acquire is triggered which causes the rwlock in the trap_manager
to be read-locked, the subsequent call to
ike_sa_manager->checkout_by_config has to wait until a) unlocks
it's ike_sa.
c) a child_cfg contained in the peer_cfg belonging to the ike_sa
a) has locked is routed causes the child_configs contained
in the peer config to be locked by c) while the actual routing
code within trap_manager tries to writelock it's rwlock.
That's about it. As soon as a) finishes authentication of the peer
and tries to find a matching child sa it will try to lock the child
configs of the peer config which is not possible since it has been
locked by c).
Thread | Resource locked | Resource desired
-------+--------------------------------+--------------------------------
(a) | ike_sa in ike_sa_manager | child_cfgs of peer_cfg
| |
(b) | rwlock in trap-manager (read) | ike_sa in ike_sa_manager
| |
(c) | child_cfgs of peer_cfg | rwlock in trap-manager (write)
Here's the configs used to reproduce the deadlock and the steps
to perform to lock charon up. Setup involves two boxes, psk1 and psk2.
ipsec.conf of psk1:
config setup
charonstart=yes
plutostart=no
conn %default
keyexchange=ikev2
authby=psk
conn psk_del
left=192.168.178.1
right=192.168.178.2
auto=add
type=tunnel
conn psk_keep
left=192.168.178.1
leftsubnet=192.168.178.1/32
right=192.168.178.2
rightsubnet=192.168.178.2/32
auto=add
type=tunnel
conn acquire_conn
left=192.168.178.1
leftsubnet=192.168.178.1/32
right=192.168.178.3
type=tunnel
auto=route
ipsec.conf of psk2:
################################################
config setup
charonstart=yes
plutostart=no
conn %default
keyexchange=ikev2
authby=psk
conn psk
left=192.168.178.2
leftsubnet=192.168.178.2/32
right=192.168.178.1
rightsubnet=192.168.178.1/32
type=tunnel
auto=route
################################################
Steps to perform on psk1/psk2:
#psk1> ipsec stroke del psk_del
#psk2> ipsec stroke up psk
#psk1> ping 192.168.178.3
#psk1> ipsec stroke route psk_del
#psk1> touch /tmp/unlock_child_create
Chances to trigger this deadlock are pretty slim regarding the
tiny time window and the probability that all three actions
are performed in the same order at the same time.
Yet it's pretty easy to write code the routes/unroutes child_sas
while enumerating a peer_cfgs children. So I think you should
known.
I can also provide the ISO-image in which I reproduced the
deadlock and a core dump of the deadlocked charon instance.
Cheers Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Keep-peer-config-if-only-a-child-was-removed.patch
Type: text/x-diff
Size: 959 bytes
Desc: not available
URL: <http://lists.strongswan.org/pipermail/dev/attachments/20111222/292294b4/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Delay-child-creation-to-demonstrate-deadlock.patch
Type: text/x-diff
Size: 995 bytes
Desc: not available
URL: <http://lists.strongswan.org/pipermail/dev/attachments/20111222/292294b4/attachment-0001.patch>
More information about the Dev
mailing list