Try/catch for modules?

Fri Oct 18 18:28:48 EDT 2019

On Fri, Oct 18, 2019 at 07:09:54PM -0300, Martin Galvan wrote:
> Hi Valdis, thanks for the thorough response.
> 
> El vie., 18 oct. 2019 a las 18:53, Valdis Klētnieks
> (<valdis.kletnieks at vt.edu>) escribió:
> > Well..here's the thing.  Unless you have "panic_on_oops" set, hitting a null
> > pointer will usually *NOT* panic the whole system. In fact, that #0000 in the
> > panic message is a counter of how many times the kernel has OOPs'ed already.
> > Way back in the dark mists of time, I had a system that managed to get it up to
> > #1500 or so overnight.
> 
> Yes, and this is why my horribly hackish way to fix things is to
> manually tamper with panic_on_oops on a die_notifier. I was hoping to
> find a way not to do this.

Yes, please never do that.  Just check for an error code (and there
always will be one, if not, you are doing something wrong), and handle
it properly.

Yes, it's not easy, but this is the kernel, that goes without saying :)

> I'd rather have the kernel just return control to me, at the beginning
> of the catch block, and give me a chance to fix things (or at least
> log some debugging info). I imagine that's what Windows' __except
> block is for. The kernel may not know which locks are safe to break,
> but I do.

Never break a lock, again, you are doing something wrong if that is
needed.

> > And if you actually *think* about it - a 'try/catch' is semantically *identical* to
> > coding a parameter test before the event or checking a return code after.
> 
> I humbly disagree. Return codes aren't possible in all cases, which is
> why there are things like native_read_msr_safe which implement some
> form of exception handling through _ASM_EXTABLE.

In Linux, where is a return code not possible?

> But then I can choose to let my process die, plus log some useful info
> and maybe even do some minor cleanups, without raising a panic. My
> particular module just reads some hardware registers and returns the
> info to userspace, so it's not something essential for the system. As
> a user, I would hate it if a non-essential module crashes the whole
> system like that. Perhaps the real problem is that panic_on_oops
> affects all of the kernel, rather than a given module.

Modules are not processes, it's not a correct mapping at all.  Modules
are code, that's it.

If all you are doing is reading hardware registers, you shouldn't have
to worry about any of this, that should be a very tiny and simple
module.

Wait, what registers are you reading that we don't already support from
userspace today?

thanks,

greg k-h