side effects of calling interruptible_sleep_on_timeout()

Srivatsa S. Bhat srivatsa.bhat at linux.vnet.ibm.com
Thu Apr 26 04:03:46 EDT 2012


On 04/26/2012 10:03 AM, Arun KS wrote:

> Hi Srivatsa,
> 
> On Wed, Apr 25, 2012 at 3:56 PM, Srivatsa S. Bhat
> <srivatsa.bhat at linux.vnet.ibm.com> wrote:
>> On 04/25/2012 03:36 AM, Philipp Ittershagen wrote:
>>
>>> Hi Devendra,
>>>
>>> On Tue, Apr 24, 2012 at 03:24:23PM +0530, devendra rawat wrote:
>>>>    Hi,
>>>>    A switch driver is causing soft lockup on Montavista Linux Kernel
>>>>    2.6.10 system.
>>>>    While browsing through the code of the driver. I came across a snippet
>>>>    where after disabling the interrupts
>>>>    a call is made to interruptible_sleep_on_timeout().
>>>>    The code snippet is like
>>>>    cli();
>>>>    init_waitqueue_head(&queue);
>>>>            interruptible_sleep_on_timeout(&queue, USEC_TO_JIFFIES(usec));
>>>>            thread_check_signals();
>>>>    sti();
>>>>    I need to know the side effect of this sort of code, can it be
>>>>    responsible for the softlockup of the system ? Its a PowerPC based
>>>>    system.
>>>
>>> you cannot call sleep functions after disabling interrupts, because no
>>> interrupt will arrive for the scheduler to see the timeout and resume your
>>> task.
>>>
>>
>>
>> Yes, that's right. Also, in general, sleeping inside atomic sections (eg.,
>> sections with interrupts disabled or preempt disabled) is wrong. There is a
>> config option in the kernel that you can use to enable
>> sleep-inside-atomic-section-checking (CONFIG_DEBUG_ATOMIC_SLEEP I believe),
>> which can help you pin-point such bugs easily.
> 
> I tired an experiment to check this.
> 
> /* disable interrupts and preemption */
> spin_lock_irqsave(&lock, flags);
> /* enable preemption, but interrupt still disabled */
> spin_unlock(&lock);
> /* Now schedule something else */
> schedule_timeout(10 * HZ);
> 
> But this is not causing any harm. I m able to call schedule with
> interrupt disabled and system works fine afterwards.
> 
> So when I looked inside the schedule() function, it checks only
> whether preemption is disabled or not. schedule calls  BUG() only if
> preemption is disabled and not if interrupts are disabled.
> 
> And AFAIK there is no fuction inside the kernel which tells you that
> interrupt are disabled.
> 
> So explantion why system works fine after calling a schedule with
> interrupt disabled go here,
> 
> There is a raw_spin_lock_irq(&rq->lock) inside the __schedule() which
> in turn calls local_irq_disable().
> 
> local_irq_disable/enable() functions are not nested. We dont have
> reference counting.
> One call to local_irq_enable is enough to enable multiple calls of
> local_irq_disable().
> 
> So my inference is that if you call a schedule with interrupt disable
> will not cause any problem. Because schedule function enable it back
> before we really schedules out.
> But call to schedule() with preemtion disabled will end up in famous
> BUG scheduling while atomic.
> 


Indeed, you are right! And your experiment and analysis is perfect too!
Sorry for the confusion - I had used the term "atomic" quite loosely.  But
your careful experiment of just re-enabling preemption, while still keeping
the interrupts disabled was a very good one!  And to add to what you said
above, the __schedule() also does a preempt_enable() to re-enable preemption
(which it had disabled at the beginning). But since preempt_disable() can
nest, if we had called __schedule() with preemption already disabled, then
we end up in trouble - and hence the BUG is fired in such cases.

Thanks for the clarification!

Regards,
Srivatsa S. Bhat




More information about the Kernelnewbies mailing list