[strongSwan] what could cause strongswan 4.3.2 to freeze up

Simon Chan simon.chan3 at yahoo.ca
Tue Dec 6 02:45:46 CET 2011


Hi all,

I posted this question last month and Martin Willi gave me the following advise:

>attach gdb and run thread apply all bt

I got lucky and caught the problem happening in our lab. It is a case 2 threads
in deadlock because they each hold the mutex needed by the other thread. The backtrace
is attached at the end. I am posting here again because I think this problem still exists
in the latest version of StrongSwan. If my analysis is correct than I hope someone will
take note to fix it. Also have a question about what's the mutex in charon's bus.c, function
vlog() trying to protect:

this->mutex->lock(this->mutex);
/* We use the remove() method to invoke listners... */
this->listeners->remove(this->listeners, &data, (void*)log_cb);
this->mutex->unlock(this->mutex);

Trace into the callback "log_cb()" it seems to be invoking log() to console, auth.log and syslog.
Is the mutex trying to prevent one log file having different order of log entries from another
log file? I mean:
- in /var/log/auth.log we have thread 2 do something ahead of thread 3.
- in /var/log/messages we have thread 3 do something ahead of thread 2.

If the above is all the mutex is trying to protect, then it would make my change simpler. Just
comment out the mutex->lock/mutex->unlock as the relative order of which thread log first is
not that important. After the mutex is released, who knows which thread get to run first.
For me, the less code change the better because I may break something.

The backtrace indicate the following events:
- thread 9 passed through child_state_change() in bus.c:406 and owned
private_bus_t->mutex
- thread 4 passed through build_address_list() in ike_mobike.c, called 
kernel_interface->create_address_enumerator and owned 
private_kernel_netlink_net_t->mutex
- thread 9 invoked kernel_interface->get_interface and have to wait for 
the private_kernel_netlink_net_t->mutex
- thread 4 called DBG2(DBG_ENC ,"added payload of type %N to message",...)
and have to wait for private_bus_t->mutex

I had a look at the bus.c in the latest StrongSwan version. The mutex is still
used in METHODs vlog, alert, ike_state_change, child_state_change, child_updown etc.
Then the same race condition in 4.3.2 can happen in 4.6.1.

Call stacks of the two contending threads:

(gdb) thread 9
[Switching to thread 9 (Thread 0xb303bb70 (LWP 2981))]
#0  0xb77e07c4 in __lll_lock_wait () from /lib/libpthread.so.0
(gdb) bt
#0  0xb77e07c4 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0xb77dbe5d in _L_lock_748 () from /lib/libpthread.so.0
#2  0xb77dbc81 in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0xb7800190 in lock (this=0xb9072bf0) at utils/mutex.c:191
#4  0xb70ae191 in get_interface_name (this=0xb9072b38, ip=0xaed5b480) at kernel_netlink_net.c:641
#5  0xb784568f in get_interface (this=0xb9066f80, host=0xaed5b480) at kernel/kernel_interface.c:198
#6  0xb7112bc7 in updown (this=<optimized out>, 
ike_sa=<optimized out>, child_sa=0xb90bae90, up=true) at 
updown_listener.c:166
#7  0xb71131d5 in child_state_change (this=0xb9074b80, ike_sa=0xb908a8d8, child_sa=0xb90bae90, state=CHILD_INSTALLED)
    at updown_listener.c:288
#8  0xb78320ea in child_state_change (this=0xb9065b18, child_sa=0xb90bae90, state=CHILD_INSTALLED) at bus/bus.c:406
#9  0xb784b659 in set_state (this=0xb90bae90, state=CHILD_INSTALLED) at sa/child_sa.c:170
#10 0xb78573af in select_and_install (this=0xb90a2318, no_dh=<optimized out>) at sa/tasks/child_create.c:445
#11 0xb7857f0b in build_r (this=0xb90a2318, message=0xb9090ed0) at sa/tasks/child_create.c:909
#12 0xb78591b4 in build_r (this=0xb907e390, message=0xb9090ed0) at sa/tasks/child_rekey.c:200
#13 0xb7853e38 in build_response (request=<optimized out>, this=<optimized out>) at sa/task_manager.c:596
#14 process_request (message=<optimized out>, this=<optimized out>) at sa/task_manager.c:843
#15 process_message (this=0xb9082840, msg=0xb90a26e8) at sa/task_manager.c:858
#16 0xb784feb2 in process_message (this=0xb908a8d8, message=0xb90a26e8) at sa/ike_sa.c:1302
#17 0xb7847696 in execute (this=0xb90a64d8) at processing/jobs/process_message_job.c:78
#18 0xb7848c7a in process_jobs (this=0xb9068070) at processing/processor.c:123 
#19 0xb77d97b0 in start_thread () from /lib/libpthread.so.0

= = = = = = = = = = = = = = = = = = = = = = = = = = = 

(gdb) thread 4
[Switching to thread 4 (Thread 0xb083bb70 (LWP 2986))]
#0  0xb77e07c4 in __lll_lock_wait () from /lib/libpthread.so.0
(gdb) bt
#0  0xb77e07c4 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0xb77dbe5d in _L_lock_748 () from /lib/libpthread.so.0
#2  0xb77dbc81 in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0xb7800190 in lock (this=0xb9065bc0) at utils/mutex.c:191
#4  0xb78002b1 in lock_r (this=0xb9065bcc) at utils/mutex.c:226
#5  0xb783223a in vlog (this=0xb9065b18, group=DBG_ENC, level=LEVEL_DIAG,
    format=0xb7867824 "added payload of type %N to message", args=0xb083b090 "\200\003\207\267)") at bus/bus.c:313
#6  0xb78322cd in log_ (this=0xb9065b18, group=DBG_ENC, level=LEVEL_DIAG,
    format=0xb7867824 "added payload of type %N to message") at bus/bus.c:331
#7  0xb783beb6 in add_payload (this=0xaed10160, payload=0xaef2c0c0) at encoding/message.c:787
#8  0xb783c729 in add_notify (this=0xaed10160, flush=false, type=ADDITIONAL_IP4_ADDRESS, data=...) at encoding/message.c:811
#9  0xb785f1f9 in build_address_list (this=<optimized out>, message=<optimized out>) at sa/tasks/ike_mobike.c:216
#10 0xb785fe01 in build_r (this=0xaed215c0, message=0xaed10160) at sa/tasks/ike_mobike.c:424
#11 0xb7853e38 in build_response (request=<optimized out>, this=<optimized out>) at sa/task_manager.c:596
#12 process_request (message=<optimized out>, this=<optimized out>) at sa/task_manager.c:843
#13 process_message (this=0x4, msg=0xb908a048) at sa/task_manager.c:858
#14 0xb784feb2 in process_message (this=0xaed0f0f0, message=0xb908a048) at sa/ike_sa.c:1302
#15 0xb7847696 in execute (this=0xb90744a0) at processing/jobs/process_message_job.c:78
#16 0xb7848c7a in process_jobs (this=0xb9068070) at processing/processor.c:123
#17 0xb77d97b0 in start_thread () from /lib/libpthread.so.0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.strongswan.org/pipermail/users/attachments/20111205/d97b544b/attachment.html>


More information about the Users mailing list