PCI error handlers in Linux

Greg KH greg at kroah.com
Tue Sep 16 14:15:39 EDT 2014

On Wed, Sep 17, 2014 at 02:04:04AM +0800, Alvin Abitria wrote:
> In my pci driver for a certain pci device, I implemented the pci error
> handler functions (error_detected, slot_reset methods, etc).  I want
> to trigger a pci error for me to exercise those handlers and observe
> its behavior.  I've read from the pci error recovery kernel
> documentation that the 1st step is with error_detected method, called
> by the system if it detected any error related to the pci device.  The
> good thing is that the system will detect it for the driver,
> simplifying things.  But I'm having problems with error detection
> itself.
> I tried to trigger the error via the PCI device.  On its FW, I
> triggered a reset of its PCI subsystem.  As a result, the I/O rate
> dropped to zero and the driver now can't send to the device.
> Something indeed happened in their PCI connection.  However, I
> couldn't see my error_detected method being called, when I was
> expecting the kernel will detect the PCI error and call the handler.
> Instead, some warning message appeared in the console as follows:
> irq 16: nobody cared
> handlers:
> ...
> ...
> Disabling IRQ # 16
> What baffles me more is that the injected PCI error seemed to brought
> down that IRQ 16 device as well - which is definitely not the irq # of
> my driver/device.  Any thoughts on why the kernel did not detect that
> PCI error?  Is there anything I could possibly missed during
> registration of error handler methods?
> If that is so, I'd like to ask for other means of injecting PCI
> errors, in order for me to exercise my error handlers.  Thanks!

You might want to ask this on the linux-pci at vger.kernel.org mailing list
instead.  The developers there can help you out better than the people
here can.

Hope this helps,

greg k-h

