[strongSwan-dev] [PATCH] log: do not acquire lock if nobody is listening

Wed Apr 16 18:59:36 CEST 2014

Hi Christophe,

>>> +	if (this->max_level[group] < level &&
>>> +		this->max_vlevel[group] < level)
>>> +		return;
>>
>> Unfortunately, this might not be thread-safe.  It's essentially the same
>> problem I described in my response to your half-open IKE_SA patch.
>> Because the log levels/logger registrations can be changed at runtime
>> the threads that are calling vlog() might not be aware of these updates.
> 
> I agree it is not thread safe, but the only risk I suspect it that a log 
> listener might lose a message at the time it registers or increases its 
> loglevel (assuming that reading this->max_level[group] or 
> this->max_vlevel[group] is atomic), in which case it does not seem to be 
> a serious problem.

Agreed, while it might not be a very serious issue in this particular
case (opinions on that may vary), it seems like bad practice to me to
introduce thread-unsafe code.  Sure, we could add a comment, but other
developers reading the source might still think it's fine to do
something similar in other places, which I'd rather not encourage.
On the other hand, it is quite a special case as it is probably the most
frequently called method in the whole daemon.

At least we should make sure the read is atomic.  Something like this
perhaps?

  if (ref_cur(&this->max_level[group]) < level &&
      ref_cur(&this->max_vlevel[group]) < level)
  {
    return;
  }

For the __atomic/__sync macro versions of ref_cur() this should be fine
but for the fallback there is a type issue because level_t is signed
while ref_cur() returns an unsigned refcount_t and expects a pointer to one.

We could perhaps first add more generic atomic wrapper macros/functions
that can more easily be used for other integer types and then use int
(or int32_t) for max_[v]level instead of level_t (the width of which is
implementation dependent as it is an enum).

>>> It seems strange that acquiring an *uncontested* read lock (the write
>>> lock is only very rarely held, in your case probably never) can produce
>>> such a massive overhead.  Acquiring such a lock should be possible
>>> without syscall, just some atomic compare-and-swap operation should be
>>> sufficient.
>>
>> I was also surprised by this behavior. I didn't expect spin_locks to be
>> invoked on a read lock.
> 
> I had a look at the source code for pthread_rwlock_rdlock in the glibc
> nptl implementation.
> 
> The implementation of pthread_rwlock_rdlock calls lll_lock to protect
> critical sections within its own pthread library data structures.
> __pthread_rwlock_rdlock perform several operations under this internal
> lock (check flags, increment counters...). Once done, the lock is
> released with lll_unlock.
> 
> The lll_lock function may call a kernel futex under contention.

Quite interesting.  A few weeks ago I looked into the pthread
implementation on FreeBSD while debugging strange crashes, and there the
rwlock is implemented quite elegantly.  Acquiring the read lock is
implemented similar to this:

  state = lock->rw_state;
  while (!(state & WRITE_OWNER))
  {
    if (atomic_compare_and_swap(&lock->rw_state, state, state + 1))
    {
      return 0;
    }
    state = lock->rw_state;
  }
  /* will need to wait for the writer to leave */

Where state and rw_state are defined as uint32_t and WRITE_OWNER is
0x80000000U.  That's a bit simplified, as there are additional flags and
a check to make sure the number of readers does not come near those
flags.  The point is, though, that there is no kernel lock required to
acquire the read lock if there are no writers.

For glibc (or eglibc as used by Debian/Ubuntu) even the optimized
assembler code in e.g. [1] might has to call __lll_lock_wait().  The
code actually follows the generic code that uses lll_lock() very
closely.  It looks like they just inlined some stuff (e.g. the lock-free
acquire - if uncontested - that the x86_64 version of lll_lock() already
employs).  Looks like there is definitely potential for improvements.

Also, you might have to consider building performance critical libraries
yourself for high-load environments as the packages in Debian/Ubuntu (or
other distributions) might be a bit too generic for such purposes (e.g.
in regards to compiler flags or even library internal optimizations).

Regards,
Tobias

[1] nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_rdlock.S