Question on mutex code

Sun Mar 15 18:19:49 EDT 2015

On Sun, 2015-03-15 at 23:49 +0200, Matthias Bonne wrote:
> On 03/15/15 03:09, Davidlohr Bueso wrote:
> > On Sat, 2015-03-14 at 18:03 -0700, Davidlohr Bueso wrote:
> >> Good analysis, but not quite accurate for one simple fact: mutex
> >> trylocks _only_ use fastpaths (obviously just depend on the counter
> >> cmpxchg to 0), so you never fallback to the slowpath you are mentioning,
> >> thus the race is non existent. Please see the arch code.
> >
> > For debug we use the trylock slowpath, but so does everything else, so
> > again you cannot hit this scenario.
> >
> >
> 
> You are correct of course - this is why I said that
> CONFIG_DEBUG_MUTEXES must be enabled for this to happen.

Right, so I just skimmed through the email ;)

>  Can you
> explain why this scenario is still not possible in the debug case?
> 
> The debug case uses mutex-null.h, which contains these macros:
> 
> #define __mutex_fastpath_lock(count, fail_fn)           fail_fn(count)
> #define __mutex_fastpath_lock_retval(count)             (-1)
> #define __mutex_fastpath_unlock(count, fail_fn)         fail_fn(count)
> #define __mutex_fastpath_trylock(count, fail_fn)        fail_fn(count)
> #define __mutex_slowpath_needs_to_unlock()              1
> 
> So both mutex_trylock() and mutex_unlock() always use the slow paths.

Right.

> The slowpath for mutex_unlock() is __mutex_unlock_slowpath(), which
> simply calls __mutex_unlock_common_slowpath(), and the latter starts
> like this:
> 
>          /*
>           * As a performance measurement, release the lock before doing 
> other
>           * wakeup related duties to follow. This allows other tasks to 
> acquire
>           * the lock sooner, while still handling cleanups in past 
> unlock calls.
>           * This can be done as we do not enforce strict equivalence 
> between the
>           * mutex counter and wait_list.
>           *
>           *
>           * Some architectures leave the lock unlocked in the fastpath 
> failure
>           * case, others need to leave it locked. In the later case we 
> have to
>           * unlock it here - as the lock counter is currently 0 or negative.
>           */
>          if (__mutex_slowpath_needs_to_unlock())
>                  atomic_set(&lock->count, 1);

Correct, in debug this is most likely true, yet safe because everything
is serialized through the mutex wait_lock. 

> 
>          spin_lock_mutex(&lock->wait_lock, flags);
>          [...]
> 
> So the counter is set to 1 before taking the spinlock, which I think
> might cause the race. Did I miss something?

So in debug we play no counter/wait_list games when trying to grab the
lock, ie things such as lock stealing or optimistic spinning.
Furthermore, it is the unlocker thread's duty to wakeup the next task in
the list, so nothing can jump in and steal the lock. Additionally,
ordering also relies on the wait_queue ticket spinlock.