some questions/oddities regarding "exclusive" wait queues

Robert P. J. Day rpjday at crashcourse.ca
Fri Jun 21 17:54:04 EDT 2013


  currently digging through the code regarding wait queues and i ran
across some oddities that i figure someone here can explain, so here
goes.

  first, according to wait.h, there appears to be only one "flag" for
a wait queue object:

=====
struct __wait_queue {
        unsigned int flags;
#define WQ_FLAG_EXCLUSIVE       0x01
        void *private;
        wait_queue_func_t func;
        struct list_head task_list;
};
=====

  that's it -- WQ_FLAG_EXCLUSIVE appears to be the only possible flag
setting for a "__wait_queue".  am i missing any others? i certainly
don't see any.

  next, as i read the code, it's possible to add both exclusive and
non-exclusive wait queues to a wait queue head, and those operations
are done in an interesting way in kernel/wait.c:

=====
void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
{
        unsigned long flags;

        wait->flags &= ~WQ_FLAG_EXCLUSIVE;
        spin_lock_irqsave(&q->lock, flags);
        __add_wait_queue(q, wait);
        spin_unlock_irqrestore(&q->lock, flags);
}
EXPORT_SYMBOL(add_wait_queue);

void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t
*wait)
{
        unsigned long flags;

        wait->flags |= WQ_FLAG_EXCLUSIVE;
        spin_lock_irqsave(&q->lock, flags);
        __add_wait_queue_tail(q, wait);
        spin_unlock_irqrestore(&q->lock, flags);
}
EXPORT_SYMBOL(add_wait_queue_exclusive);
=====

  note carefully the difference -- "exclusive" wait queues are added
at the *tail* end of the list, which is important for what happens
next, since this is how an entire wait queue is woken by the core
wakeup function in kernel/sched/core.c:

=====
/*
 * The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just
 * wake everything up. If it's an exclusive wakeup (nr_exclusive == small +ve
 * number) then we wake all the non-exclusive tasks and one exclusive task.
 *
 * There are circumstances in which we can try to wake a task which has already
 * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns
 * zero in this (rare) case, and we handle it by continuing to scan the queue.
 */
static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
                        int nr_exclusive, int wake_flags, void *key)
{
        wait_queue_t *curr, *next;

        list_for_each_entry_safe(curr, next, &q->task_list, task_list) {
                unsigned flags = curr->flags;

                if (curr->func(curr, mode, wake_flags, key) &&
                                (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
                        break;
        }
}
=====

  first, the comment above that function seems wrong -- that routine
will not simply wake "one exclusive task", it will wake "int
nr_exclusive" exclusive tasks, will it not? (perhaps an earlier
version woke only one, and the code was extended without adjusting the
comment.) also, now it's clear why exclusive tasks are added at the
*end* of the list -- all of the non-exclusive tasks must be processed
first before counting off "nr_exclusive" tasks to be processed, does
that seem correct?

  finally, if i understand all of the above correctly, there's one
more curiosity back in wait.h -- a second form of the wait exclusive
routine:

=====
/*
 * Used for wake-one threads:
 */
static inline void __add_wait_queue_exclusive(wait_queue_head_t *q,
                                              wait_queue_t *wait)
{
        wait->flags |= WQ_FLAG_EXCLUSIVE;
        __add_wait_queue(q, wait);
}
=====

  note that that variation adds an exclusive wait queue, but *not* at
the end, because it's apparently some sort of "wake-one thread", but i
don't see a real distinction here. is it really necessary to recognize
that distinction? a search of the entire kernel source tree shows only
this:

$ grep -rw "__add_wait_queue_exclusive" *
fs/eventpoll.c:		__add_wait_queue_exclusive(&ep->wq, &wait);
include/linux/wait.h:static inline void __add_wait_queue_exclusive(wait_queue_head_t *q,
$

  that is, all of *one* usage in fs/eventpoll.c. does eventpoll.c
genuinely need a unique interface for exclusive wait queues?

  anyway, feel free to tackle any of the above. more to come, i'm
sure.

rday

-- 

========================================================================
Robert P. J. Day                                 Ottawa, Ontario, CANADA
                        http://crashcourse.ca

Twitter:                                       http://twitter.com/rpjday
LinkedIn:                               http://ca.linkedin.com/in/rpjday
========================================================================



More information about the Kernelnewbies mailing list