<html><body><div style="color:#000; background-color:#fff; font-family:arial, helvetica, sans-serif;font-size:12pt">Hi all,<br><br><font size="3">I posted this question last month and Martin Willi gave me the following advise</font>:<br><pre>><i> attach gdb and run thread apply all bt<br></i><br>I got lucky and caught the problem happening in our lab. It is a case 2 threads<br>in deadlock because they each hold the mutex needed by the other thread. The backtrace<br>is attached at the end. I am posting here again because I think this problem still exists<br>in the latest version of StrongSwan. If my analysis is correct than I hope someone will<br>take note to fix it. Also have a question about what's the mutex in charon's bus.c, function<br>vlog() trying to protect:<br><br> this->mutex->lock(this->mutex);<br> /* We use the remove() method to invoke listners... */<br> this->listeners->remove(this->listeners, &data,
(void*)log_cb);<br> this->mutex->unlock(this->mutex);<br><br>Trace into the callback "log_cb()" it seems to be invoking log() to console, auth.log and syslog.<br>Is the mutex trying to prevent one log file having different order of log entries from another<br>log file? I mean:<br>- in /var/log/auth.log we have thread 2 do something ahead of thread 3.<br>- in /var/log/messages we have thread 3 do something ahead of thread 2.<br><br>If the above is all the mutex is trying to protect, then it would make my change simpler. Just<br>comment out the mutex->lock/mutex->unlock as the relative order of which thread log first is<br>not that important. After the mutex is released, who knows which thread get to run first.<br>For me, the less code change the better because I may break something.<br><br>The backtrace indicate the following events:<br>- thread 9 passed through child_state_change() in bus.c:406 and owned<br> private_bus_t->mutex<br>-
thread 4 passed through build_address_list() in ike_mobike.c, called <br> kernel_interface->create_address_enumerator and owned <br> private_kernel_netlink_net_t->mutex<br>- thread 9 invoked kernel_interface->get_interface and have to wait for <br> the private_kernel_netlink_net_t->mutex<br>- thread 4 called DBG2(DBG_ENC ,"added payload of type %N to message",...)<br> and have to wait for private_bus_t->mutex<br><br>I had a look at the bus.c in the latest StrongSwan version. The mutex is still<br>used in METHODs vlog, alert, ike_state_change, child_state_change, child_updown etc.<br>Then the same race condition in 4.3.2 can happen in 4.6.1.<br><br>Call stacks of the two contending threads:<br><br><font><font size="2">(gdb) thread 9<br>[Switching to thread 9 (Thread 0xb303bb70 (LWP 2981))]<br>#0 0xb77e07c4 in __lll_lock_wait () from /lib/libpthread.so.0<br>(gdb) bt<br>#0 0xb77e07c4 in __lll_lock_wait () from
/lib/libpthread.so.0<br>#1 0xb77dbe5d in _L_lock_748 () from /lib/libpthread.so.0<br>#2 0xb77dbc81 in pthread_mutex_lock () from /lib/libpthread.so.0<br>#3 0xb7800190 in lock (this=0xb9072bf0) at utils/mutex.c:191<br>#4 0xb70ae191 in get_interface_name (this=0xb9072b38, ip=0xaed5b480) at kernel_netlink_net.c:641<br>#5 0xb784568f in get_interface (this=0xb9066f80, host=0xaed5b480) at kernel/kernel_interface.c:198<br>#6 0xb7112bc7 in updown (this=<optimized out>, <br>ike_sa=<optimized out>, child_sa=0xb90bae90, up=true) at <br>updown_listener.c:166<br>#7 0xb71131d5 in child_state_change (this=0xb9074b80, ike_sa=0xb908a8d8, child_sa=0xb90bae90, state=CHILD_INSTALLED)<br> at updown_listener.c:288<br>#8 0xb78320ea in child_state_change (this=0xb9065b18, child_sa=0xb90bae90, state=CHILD_INSTALLED) at bus/bus.c:406<br>#9 0xb784b659 in set_state (this=0xb90bae90,
state=CHILD_INSTALLED) at sa/child_sa.c:170<br>#10 0xb78573af in select_and_install (this=0xb90a2318, no_dh=<optimized out>) at sa/tasks/child_create.c:445<br>#11 0xb7857f0b in build_r (this=0xb90a2318, message=0xb9090ed0) at sa/tasks/child_create.c:909<br>#12 0xb78591b4 in build_r (this=0xb907e390, message=0xb9090ed0) at sa/tasks/child_rekey.c:200<br>#13 0xb7853e38 in build_response (request=<optimized out>, this=<optimized out>) at sa/task_manager.c:596<br>#14 process_request (message=<optimized out>, this=<optimized out>) at sa/task_manager.c:843<br>#15 process_message (this=0xb9082840, msg=0xb90a26e8) at sa/task_manager.c:858<br>#16 0xb784feb2 in process_message (this=0xb908a8d8, message=0xb90a26e8) at sa/ike_sa.c:1302<br>#17 0xb7847696 in execute (this=0xb90a64d8) at processing/jobs/process_message_job.c:78<br>#18 0xb7848c7a in process_jobs (this=0xb9068070) at processing/processor.c:123</font></font> <br><font><font
size="2">#19 0xb77d97b0 in start_thread () from /lib/libpthread.so.0</font></font><br><br>= = = = = = = = = = = = = = = = = = = = = = = = = = = <br><br><font><font size="2">(gdb) thread 4<br>[Switching to thread 4 (Thread 0xb083bb70 (LWP 2986))]<br>#0 0xb77e07c4 in __lll_lock_wait () from /lib/libpthread.so.0<br>(gdb) bt<br>#0 0xb77e07c4 in __lll_lock_wait () from /lib/libpthread.so.0<br>#1 0xb77dbe5d in _L_lock_748 () from /lib/libpthread.so.0<br>#2 0xb77dbc81 in pthread_mutex_lock () from /lib/libpthread.so.0<br>#3 0xb7800190 in lock (this=0xb9065bc0) at utils/mutex.c:191<br>#4 0xb78002b1 in lock_r (this=0xb9065bcc) at utils/mutex.c:226<br>#5 0xb783223a in vlog (this=0xb9065b18, group=DBG_ENC, level=LEVEL_DIAG,<br> format=0xb7867824 "added payload of type %N to message", args=0xb083b090 "\200\003\207\267)") at bus/bus.c:313<br>#6 0xb78322cd in log_ (this=0xb9065b18, group=DBG_ENC,
level=LEVEL_DIAG,<br> format=0xb7867824 "added payload of type %N to message") at bus/bus.c:331<br>#7 0xb783beb6 in add_payload (this=0xaed10160, payload=0xaef2c0c0) at encoding/message.c:787<br>#8 0xb783c729 in add_notify (this=0xaed10160, flush=false, type=ADDITIONAL_IP4_ADDRESS, data=...) at encoding/message.c:811<br>#9 0xb785f1f9 in build_address_list (this=<optimized out>, message=<optimized out>) at sa/tasks/ike_mobike.c:216<br>#10 0xb785fe01 in build_r (this=0xaed215c0, message=0xaed10160) at sa/tasks/ike_mobike.c:424<br>#11 0xb7853e38 in build_response (request=<optimized out>, this=<optimized out>) at sa/task_manager.c:596<br>#12 process_request (message=<optimized out>, this=<optimized out>) at sa/task_manager.c:843<br>#13 process_message (this=0x4, msg=0xb908a048) at sa/task_manager.c:858<br>#14 0xb784feb2 in process_message (this=0xaed0f0f0, message=0xb908a048) at
sa/ike_sa.c:1302<br>#15 0xb7847696 in execute (this=0xb90744a0) at processing/jobs/process_message_job.c:78<br>#16 0xb7848c7a in process_jobs (this=0xb9068070) at processing/processor.c:123<br>#17 0xb77d97b0 in start_thread () from /lib/libpthread.so.0</font></font><br></pre><br> <div style="font-family: arial,helvetica,sans-serif; font-size: 12pt;"> </div> </div></body></html>