[strongSwan-dev] Charon crash at logging

Tobias Brunner tobias at strongswan.org
Tue Aug 16 11:09:36 CEST 2011


Hi Bharat,

> 1. ipsec stroke loglevel any 8

This is not recommended in productive systems [1].  The log messages are 
sent over the signal bus in charon which is synchronous, that is, only 
one log message from any thread can be logged at a time.  Also, an IKE 
SA is locked while a related message is logged or while waiting to 
acquire the lock in the signal bus.  Setting the loglevel that high for 
all log groups is almost bound to cause problems, at the very least in 
regards to performance.  Increasingly so, if you are using ipsec 
statusall (see below).

> Following step 2,3,4 are in loop
> while  [ 1 ]; do
>    2. ping traffic.
>    3. Stop the ping.
>    4. Inactivity fires after 15 seconds.
> done

This looks ok.  Do you have only one SA in this setup?

> 5. ipsec statusall every 2 seconds.

ipsec statusall is also problematic because it is synchronous and has to 
lock all SAs.  In the default setup it actually locks the whole list of 
IKE SAs while enumerating them (this can be improved if the number of 
SAs is very high, see [2]).  Because the loglevel is set so high it's 
very probable that an SA is locked while statusall want's to lock it 
too.  If you use 4.5.3 you can avoid the latter by using ipsec 
statuallnb which skips SAs that are currently checked out by other threads.

Is this setup just for testing or are you intending to call ipsec 
statusall every two seconds on a productive system?

> It looks from core that 12 out of 16 threads are waiting on low level
> __lll_lock_wait.
> #3  0x2aaf9dc0 in lock_r (this=0x416510) at threading/mutex.c:147
> #4  0x2ab66a6c in vlog (this=0x416450, group=DBG_JOB,
> level=LEVEL_DIAG, format=0x2ab2f5d8 "JOB %p, job.execute %p,
> job.destroy %p", args=0x2b17ee34)
>      at bus/bus.c:257

As you can see, these threads all want to log a message.

> from /var/log/messages:
> Feb 16 16:32:24 2011: %DAEMON-6-INFO: charon: 10[DMN] thread 10 received 4

This is not good.  Signal number 4 corresponds to SIGILL (at least on my 
system).  Is it possible for you to debug which instruction causes this 
e.g. with ipsec start --attach-gdb, provided that it is actually 
reproducible?

Regards,
Tobias

[1] http://wiki.strongswan.org/projects/strongswan/wiki/LoggerConfiguration
[2] http://wiki.strongswan.org/projects/strongswan/wiki/IkeSaTable




More information about the Dev mailing list