STATE: TASK_UNINTERRUPTIBLE (PANIC)
Marc Smith
msmith626 at gmail.com
Wed Nov 11 11:25:14 EST 2020
Hi,
I have an issue with the 'bcache' Linux subsystem (block I/O cache). I
hit a kernel panic when using this software, and I've reported that
upstream on the "linux-bcache" mailing list:
https://www.spinics.net/lists/linux-bcache/msg09069.html
I'd like to contribute and learn more on how to debug this myself.
Here is the output from 'crash' on a dumpfile from this panic:
SYSTEM MAP: /home/marc.smith/Downloads/System.map-esos.prod
DEBUG KERNEL: /home/marc.smith/Downloads/vmlinux-esos.prod (5.4.69-esos.prod)
DUMPFILE: /home/marc.smith/Downloads/dumpfile-1604062993
CPUS: 8
DATE: Fri Oct 30 09:02:56 2020
UPTIME: 2 days, 12:38:15
LOAD AVERAGE: 9.48, 8.89, 7.69
TASKS: 980
NODENAME: node-10cccd-2
RELEASE: 5.4.69-esos.prod
VERSION: #1 SMP Thu Oct 22 19:45:11 UTC 2020
MACHINE: x86_64 (2799 Mhz)
MEMORY: 24 GB
PANIC: "Oops: 0002 [#1] SMP NOPTI" (check log for details)
PID: 18272
COMMAND: "kworker/2:13"
TASK: ffff88841d9e8000 [THREAD_INFO: ffff88841d9e8000]
CPU: 2
STATE: TASK_UNINTERRUPTIBLE (PANIC)
crash> bt
PID: 18272 TASK: ffff88841d9e8000 CPU: 2 COMMAND: "kworker/2:13"
#0 [ffffc90000100938] machine_kexec at ffffffff8103d6b5
#1 [ffffc90000100980] __crash_kexec at ffffffff8110d37b
#2 [ffffc90000100a48] crash_kexec at ffffffff8110e07d
#3 [ffffc90000100a58] oops_end at ffffffff8101a9de
#4 [ffffc90000100a78] no_context at ffffffff81045e99
#5 [ffffc90000100ae0] async_page_fault at ffffffff81e010cf
[exception RIP: atomic_try_cmpxchg+2]
RIP: ffffffff810d3e3b RSP: ffffc90000100b98 RFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000080006
RDX: 0000000000000001 RSI: ffffc90000100ba4 RDI: 0000000000000a6c
RBP: 0000000000000010 R8: 0000000000000001 R9: ffffffffa0418d4e
R10: ffff88841c8b3000 R11: ffff88841c8b3000 R12: 0000000000000046
R13: 0000000000000000 R14: ffff8885a3a0a000 R15: 0000000000000a6c
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffffc90000100b98] _raw_spin_lock_irqsave at ffffffff81cf7d7d
#7 [ffffc90000100bb8] try_to_wake_up at ffffffff810c1624
#8 [ffffc90000100c08] closure_sync_fn at ffffffffa040fb07 [bcache]
#9 [ffffc90000100c10] clone_endio at ffffffff81aac48c
#10 [ffffc90000100c40] call_bio_endio at ffffffff81a78e20
#11 [ffffc90000100c58] raid_end_bio_io at ffffffff81a78e69
#12 [ffffc90000100c88] raid1_end_write_request at ffffffff81a79ad9
#13 [ffffc90000100cf8] blk_update_request at ffffffff814c3ab1
#14 [ffffc90000100d38] blk_mq_end_request at ffffffff814caaf2
#15 [ffffc90000100d50] blk_mq_complete_request at ffffffff814c91c1
#16 [ffffc90000100d78] nvme_complete_cqes at ffffffffa002fb03 [nvme]
#17 [ffffc90000100db8] nvme_irq at ffffffffa002fb7f [nvme]
#18 [ffffc90000100de0] __handle_irq_event_percpu at ffffffff810e0d60
#19 [ffffc90000100e20] handle_irq_event_percpu at ffffffff810e0e65
#20 [ffffc90000100e48] handle_irq_event at ffffffff810e0ecb
#21 [ffffc90000100e60] handle_edge_irq at ffffffff810e494d
#22 [ffffc90000100e78] do_IRQ at ffffffff81e01900
#23 [ffffc90000100eb0] common_interrupt at ffffffff81e00a0a
#24 [ffffc90000100f38] __softirqentry_text_start at ffffffff8200006a
#25 [ffffc90000100fc8] irq_exit at ffffffff810a3f6a
#26 [ffffc90000100fd0] smp_apic_timer_interrupt at ffffffff81e020b2
bt: invalid kernel virtual address: ffffc90000101000 type: "pt_regs"
crash>
Looking at the call trace, I see this was the last function from
'bcache' in the trace (linux-5.4.69/drivers/md/bcache/closure.c):
static void closure_sync_fn(struct closure *cl)
{
struct closure_syncer *s = cl->s;
struct task_struct *p;
rcu_read_lock();
p = READ_ONCE(s->task);
s->done = 1;
wake_up_process(p);
rcu_read_unlock();
}
And I believe the calls above this in my crash-backtrace output come
from this call: wake_up_process()
Is the panic perhaps because the task/process is already
gone/finished? Not sure where to start looking next. Any help would be
greatly appreciated.
--Marc
More information about the Kernelnewbies
mailing list