Trying to debug interrupt flood after unbind

Tue May 31 14:41:24 EDT 2016

I am trying to load a driver for an Exar serial chip, but that chip is gobbled up by the 8250 driver on boot.  So, I use the "unbind" command in /sys/bus/pci/drivers/serial to remove the device from the clutches of 8250.  Based on cobbled together google searches, I use the following to unbind it (assuming the address in /sys/bus/pci/drivers/serial is 0000:04:00.0.

sudo echo -n "0000:04:00.0" | tee ./unbind

The address disappears from that dir when I do this command, so I'm assuming it works.

I then go and install the Exar-provided driver.  Within about 3 seconds, I then get a system notification that the IRQ used by the Exar driver has been disabled.  I can also go look at /proc/interrupts and see a huge amount of interrupts happening on that IRQ.

There's a crash message in the log:

[  167.938861] irq 17: nobody cared (try booting with the "irqpoll" option)
[  167.938868] CPU: 0 PID: 801 Comm: Xorg Tainted: G           O  3.16.6-2-desktop #1
[  167.938871] Hardware name: RTD Embedded Technologies, Inc CMA34CR/CMA34CR, BIOS v3.72.51.0009-1.1.85582 02/09/2015 09:39:58
[  167.938873]  ffff880148672cc4 ffffffff8161ab03 ffff880148672c00 ffffffff810b8acd
[  167.938877]  ffff880148672c00 0000000000000011 0000000000000000 ffffffff810b9011
[  167.938880]  0000000000000000 0000000000000000 0000000000000011 0000000000000000
[  167.938883] Call Trace:
[  167.938899]  [<ffffffff8100519e>] dump_trace+0x8e/0x350
[  167.938905]  [<ffffffff81005506>] show_stack_log_lvl+0xa6/0x190
[  167.938909]  [<ffffffff81006c01>] show_stack+0x21/0x50
[  167.938914]  [<ffffffff8161ab03>] dump_stack+0x49/0x6a
[  167.938922]  [<ffffffff810b8acd>] __report_bad_irq+0x2d/0xc0
[  167.938928]  [<ffffffff810b9011>] note_interrupt+0x241/0x290
[  167.938935]  [<ffffffff810b67f1>] handle_irq_event_percpu+0xa1/0x1d0
[  167.938940]  [<ffffffff810b695e>] handle_irq_event+0x3e/0x60
[  167.938945]  [<ffffffff810b9b58>] handle_fasteoi_irq+0x88/0x160
[  167.938949]  [<ffffffff810050fd>] handle_irq+0x1d/0x30
[  167.938955]  [<ffffffff81624549>] do_IRQ+0x49/0xe0
[  167.938959]  [<ffffffff816224ad>] common_interrupt+0x6d/0x6d
[  167.938967]  [<ffffffff81620dce>] _raw_spin_unlock_irqrestore+0xe/0x30
[  167.938974]  [<ffffffff815c3f05>] unix_poll+0x25/0xb0
[  167.938980]  [<ffffffff81513fa9>] sock_poll+0x49/0x110
[  167.938986]  [<ffffffff811caf40>] do_select+0x390/0x7a0
[  167.938991]  [<ffffffff811cb4e4>] core_sys_select+0x194/0x2b0
[  167.938995]  [<ffffffff811cb6aa>] SyS_select+0xaa/0xf0
[  167.938999]  [<ffffffff8162182d>] system_call_fastpath+0x1a/0x1f
[  167.939015]  [<00007f7563abda43>] 0x7f7563abda42
[  167.939016] handlers:
[  167.939020] [<ffffffffa0549110>] serialxr_interrupt [xr17v35x]
[  167.939023] Disabling IRQ #17

So, as near as I can tell, when the Exar driver is inserted, an interrupt flood occurs, and the Exar driver (the only interrupt handler on that IRQ) does not respond to any of them.  I put in some debug code and verified that the Exar interrupt handler is called... but the handler just returns with an IRQ_NONE value.

I've tried:

1)      Multiple CPUs from different families (Core i7, Core 2 Duo, AMD G-Series) and it occurs with all of them.

2)      Kernels 3.16 and 4.2 and it occurs with both of them.

3)      Disabling ModemManager and it still happens.

4)      Contacting Exar about this.  They could not reproduce the problem.

5)      openSUSE 13.2 and Fedora 20.  It happened with openSUSE, but NOT with Fedora 20.

6)      The reference Exar implementation vs. our implementation, and it occurs with both of them.

Do you have any suggestions on how I can discover what is sending all of those interrupts?  Are there kernel tools specifically for that?

Thank you.

Rob Groner

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20160531/2487197b/attachment.html