watchdog pet in kernel module

Peter Teoh htmldeveloper at gmail.com
Wed Dec 4 23:24:54 EST 2013


On Thu, Dec 5, 2013 at 10:19 AM, Rajat Sharma <fs.rajat at gmail.com> wrote:

> Although /dev/watchdog is available in usermode, but nothing should stop
> you to write to it from a kernel thread.
>
> Rajat
>

I don't think /dev/watchdog (literally, I meant) is available in the
kernel.   It is accessible in userspace, but translated to a different name
in the kernel.   and moreover, if u access the variable directly, bypassing
all the spinlock (see drivers/watchdog and look for "wdt_lock" spinlock)
that is implemented around it, u might be going into a  racing condition.

BUT.....if u really insist probing from inside the kernel....it is not
watchdog, it is "process watch", in your own way.

ie, u can always write a loop that periodically probe the status of that
specific to make sure it is in RUNNING state (vs BLOCKING when it is
waiting for some I/O, or locks to complete), and perhaps check the CPU
instruction to make sure that it is not going into a tight loop (ie, a
userspace program that literally do "while(true) {do_nothing()}....and many
other possible "hung" criteria for a process as well.   not easy...but
extremely complex.


>
>
> On Wed, Dec 4, 2013 at 5:50 PM, Peter Teoh <htmldeveloper at gmail.com>wrote:
>
>>
>>
>>
>> On Thu, Dec 5, 2013 at 9:06 AM, Vipul Jain <vipulsj at gmail.com> wrote:
>>
>>>
>>>
>>>
>>> On Wed, Dec 4, 2013 at 4:57 PM, <Valdis.Kletnieks at vt.edu> wrote:
>>>
>>>> On Wed, 04 Dec 2013 16:45:44 -0800, Vipul Jain said:
>>>>
>>>> > If you don't mind can you please provide me more insight as what can
>>>> be
>>>> > false alarm I can encounter to move pet inside kernel module?
>>>>
>>>> The issue isn't false alarms - it's failure to alarm when it should.
>>>>
>>>> The problem is that it's possible for a kernel to get wedged in such a
>>>> way that
>>>> a kernel thread is still able to feed the watchdog timer on a regular
>>>> basis,
>>>> but userspace is effectively hung and unable to proceed.  For example,
>>>> if an
>>>> OOPS happens while a filesystem lock is held, all future userspace
>>>> references
>>>> to that filesystem (and possibly all filesystems of the same type) will
>>>> hang,
>>>> eventually strangling the box while the kernel is still perfectly able
>>>> to keep
>>>> the watchdog working.
>>>>
>>>> Hi Valdis,
>>>
>>> I see what you are saying but what if the user process that's feeding
>>> the dog gets hung and rest of the system is fine then it will bring the
>>> whole system down won't it? I basically want to avoid this?
>>>
>>>
>> Normally the process that feed the dog, is a simple process that JUST
>> periodically set the watchdog device descriptor.    Yes, one main() with a
>> while loop just periodically resetting the descriptor.
>>
>> And so it is is not able to respond in time, by inference, OTHER PROCESS
>> must have hung.   In other system i saw there is a mother process that
>> monitor a few (not all) of its key child process .... so perhaps one child
>> will have one variable to signal to the mother that it is running.   If not
>> responding in time, the mother will clean up everything and then purposely
>> not setting the watchdog, resulting in reboot.
>>
>>
>>> Regards,
>>> Vipul.
>>>
>>>
>>
>>
>> --
>> Regards,
>> Peter Teoh
>>
>> _______________________________________________
>> Kernelnewbies mailing list
>> Kernelnewbies at kernelnewbies.org
>> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>>
>>
>


-- 
Regards,
Peter Teoh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20131205/f43160fe/attachment.html 


More information about the Kernelnewbies mailing list