<html><body><div style="color:#000; background-color:#fff; font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:12pt"><div>Hello,</div><div><br></div><div style="color: rgb(0, 0, 0); font-size: 16px; font-family: HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif; background-color: transparent; font-style: normal;">I have a server with onboard Intel 10G ports (82599). When I load the kernel
module driver for these ports, everything is fine, I can see the newly created ethX devices using "ip addr show". However, after I assign an IP address, and right after I issue command to bring up the port, I get a kernel panic related to DMAR (DMA
remapping) in the VFIO (Virtual Function IO) module. I am not even
sure why I am getting this panic since this Intel kernel module does not
use VFIO. I know why the panic is happening, NULL being sent as a
parameter to function vfio_group_get(), in which it is being de-referenced. I
know NULL is passed, because register RDI, which is used to pass the
first argument to a function, contains 0.</div><div style="color: rgb(0, 0, 0); font-size: 16px; font-family: HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif; background-color: transparent; font-style: normal;"><br></div><div style="color: rgb(0, 0, 0); font-size: 16px; font-family: HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif; background-color: transparent; font-style: normal;">Linux kernel 3.6.11</div><div style="color: rgb(0, 0, 0); font-size: 16px; font-family: HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif; background-color: transparent; font-style: normal;"><br></div><div style="color: rgb(0, 0, 0); font-size: 16px; font-family: HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif; background-color: transparent; font-style: normal;">Following is the stack trace of panic:<br></div><br><pre><code># [11036.855410] BUG: unable to handle kernel [11036.887249] ixgbe
0000:84:00.0: eth6: detected SFP+: 3
NULL pointer dereference at (null)
[11037.010224] IP: [<ffffffffa006615a>] vfio_group_get+0x9/0x27 [vfio]
[11037.085047] PGD 1fd6b5b067 PUD 20404b1067 PMD 0
[11037.140181] Oops: 0000 [#1] SMP
[11037.178676] Modules linked in: ixgbe(O) nfsv3 autofs4 nfsd nfs_acl nfs lockd sunrpc vfio_pci vfio_iommu_type1 vfio i2c_mux i2c_smbus i2c_dev container ide_pci_generic ide_core uhci_hcd isci ata_generic
[11037.393137] CPU 0
[11037.414974] Pid: 14045, comm: kworker/0:0 Tainted: G O 3.6.11
[11037.539628] RIP: 0010:[<ffffffffa006615a>] [<ffffffffa006615a>] vfio_group_get+0x9/0x27 [vfio]
[11037.643521] RSP: 0018:ffff881f52453d00 EFLAGS: 00010282
[11037.706886] RAX: ffff881fd6740680 RBX: 0000000000000000 RCX: ffff88204157ec00
[11037.792053] RDX: 0000000000000084 RSI: 0000000001f5327a RDI: 0000000000000000
[11037.877221] RBP: ffff881f52453d10 R08: ffff881f5327abe0 R09: 0000000000000000
[11037.962394] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88204157f800
[11038.024995] ixgbe 0000:84:00.0: eth6: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[11038.025144] IPv6: ADDRCONF(NETDEV_CHANGE): eth6: link becomes ready
[11038.211671] R13: 0000000000000084 R14: 0000000000000000 R15: 0000000000000000
[11038.296842] FS: 0000000000000000(0000) GS:ffff88204f000000(0000) knlGS:0000000000000000
[11038.393430] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[11038.461988] CR2: 0000000000000000 CR3: 0000001fd686d000 CR4: 00000000001407f0
[11038.547156] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[11038.632326] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[11038.717496] Process kworker/0:0 (pid: 14045, threadinfo ffff881f52452000, task ffff882034d61950)
[11038.822392] Stack:
[11038.846298] 0000000000000084 ffff881fd6740680 ffff881f52453d30 ffffffffa006618a
[11038.934688] 0000000001f5327a ffff882035e23e00 ffff881f52453d50 ffffffffa0066442
[11039.023078] ffff881f52453d70 ffff881fd6740680 ffff881f52453d70 ffffffffa0072072
[11039.111465] Call Trace:
[11039.140571] [<ffffffffa006618a>] vfio_device_get+0x12/0x30 [vfio]
[11039.214324] [<ffffffffa0066442>] vfio_device_get_from_dev+0x19/0x1f [vfio]
[11039.297425] [<ffffffffa0072072>] vfio_pci_dmar_error_handler+0x13/0x4a [vfio_pci]
[11039.387796] [<ffffffff81420cc6>] dmar_fault_do_one+0xd4/0xf1
[11039.456366] [<ffffffff8104175d>] process_one_work+0x1c2/0x311
[11039.525968] [<ffffffff81041568>] ? manage_workers+0x23a/0x24c
[11039.595566] [<ffffffff81420bf2>] ? dmar_get_fault_reason+0x52/0x52
[11039.670354] [<ffffffff81041b42>] worker_thread+0x26c/0x34a
[11039.736840] [<ffffffff810418d6>] ? process_scheduled_works+0x2a/0x2a
[11039.813710] [<ffffffff8104583a>] kthread+0x86/0x8e
[11039.871891] [<ffffffff81604bf4>] kernel_thread_helper+0x4/0x10
[11039.942524] [<ffffffff810457b4>] ? kthread_freezable_should_stop+0x4d/0x4d
[11040.025618] [<ffffffff81604bf0>] ? gs_change+0xb/0xb
[11040.085865] Code: 48 8b 00 48 8b 40 20 48 85 c0 74 0c 55 48 8b 7f 40 48 89 e5 ff d0 eb 08 48 c7 c0 ea ff ff ff c3 5d c3 55 48 89 e5 53 48 89 fb 52 <8b> 07 85 c0 75 11 be 2a 00 00 00 48 c7 c7 38 76 06 a0 e8 32 84
[11040.312869] RIP [<ffffffffa006615a>] vfio_group_get+0x9/0x27 [vfio]
[11040.388722] RSP <ffff881f52453d00>
[11040.430282] CR2: 0000000000000000<br><br></code><br></pre><div style="color: rgb(0, 0, 0); font-size: 16px; font-family: HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif; background-color: transparent; font-style: normal;">- Can someone please help me understand the damr/vfio related function calls in the back trace, and why they are getting invoked? I know what causes DMAR error, but not sure how this could be happening, since none of the devices is managed by VFIO. <br></div><div style="color: rgb(0, 0, 0); font-size: 16px; font-family: HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif; background-color: transparent; font-style: normal;">- Looking at the source code, it seems dmar_fault_do_one() is called from interrupt handler dmar_fault(). I am just curious, why dmar_fault() is not part of the stack trace?</div><div style="color: rgb(0, 0, 0); font-size: 16px; font-family:
HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif; background-color: transparent; font-style: normal;">- What is the significance of the "?" in front of some of the functions in the backtrace (e.g. dmar_get_fault_reason()).</div><div style="color: rgb(0, 0, 0); font-size: 16px; font-family: HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif; background-color: transparent; font-style: normal;"><br></div><div style="color: rgb(0, 0, 0); font-size: 16px; font-family: HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif; background-color: transparent; font-style: normal;">Thank you,</div><div style="color: rgb(0, 0, 0); font-size: 16px; font-family: HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif; background-color: transparent; font-style: normal;">Ahmed.</div><div style="color: rgb(0, 0, 0); font-size: 16px; font-family: HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida
Grande,sans-serif; background-color: transparent; font-style: normal;"><br></div><code></code></div></body></html>