How to debug stuck read?

Dāvis Mosāns davispuh at gmail.com
Wed Feb 2 12:15:14 EST 2022


Hi,

I have a corrupted file on BTRFS which has CoW disabled thus no
checksum. Trying to read this file causes the process to get stuck
forever. It doesn't return EIO.

How can I find out why it gets stuck?

$ ddrescue -b 1 currupted_file /tmp/temp
GNU ddrescue 1.26
Press Ctrl-C to interrupt
    ipos:        0 B, non-trimmed:        0 B,  current rate:       0 B/s
    opos:        0 B, non-scraped:        0 B,  average rate:       0 B/s
non-tried:    8388 kB,  bad-sector:        0 B,    error rate:       0 B/s
 rescued:        0 B,   bad areas:        0,        run time:          0s
pct rescued:    0.00%, read errors:        0,  remaining time:         n/a
                             time since last successful read:         n/a
Copying non-tried blocks... Pass 1 (forwards)
^C
// doesn't stop with Ctrl+C nor SIGTERM

$ gdb -q -p 3449
Attaching to process 3449
^C
// same gets stuck

$ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux
folio_wait_bit_common (mm/filemap.c:1314)
filemap_get_pages (mm/filemap.c:2622)
filemap_read (mm/filemap.c:2676)
new_sync_read (fs/read_write.c:401 (discriminator 1))
vfs_read (fs/read_write.c:481)
ksys_read (fs/read_write.c:619)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)


I enabled
CONFIG_BTRFS_DEBUG=y
CONFIG_BTRFS_ASSERT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
CONFIG_LOCKUP_DETECTOR=y
CONFIG_SOFTLOCKUP_DETECTOR=y
CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_LOCK_ALLOC=y

but in dmesg only thing that shows up is a lot of
BTRFS error (device sdh): invalid lzo header, lzo len 2937060802
compressed len 4096

If I try to do btrfs send, it gets stuck same way
$ btrfs send -v /mnt/fs > /dev/null

$ cat /proc/4712/stack | ./scripts/decode_stacktrace.sh vmlinux
folio_wait_bit_common (mm/filemap.c:1314)
__filemap_get_folio (mm/filemap.c:1690 ./include/linux/pagemap.h:779
mm/filemap.c:1960)
pagecache_get_page (mm/folio-compat.c:126)
send_extent_data (fs/btrfs/send.c:4980 fs/btrfs/send.c:5048
fs/btrfs/send.c:5235) btrfs
process_extent (fs/btrfs/send.c:5575 fs/btrfs/send.c:5959) btrfs
btrfs_ioctl_send (fs/btrfs/send.c:6770 fs/btrfs/send.c:7368
fs/btrfs/send.c:7688) btrfs
_btrfs_ioctl_send (fs/btrfs/ioctl.c:4963) btrfs
btrfs_ioctl (fs/btrfs/ioctl.c:5072) btrfs
__x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:874 fs/ioctl.c:860 fs/ioctl.c:860)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)

I'm now using 5.17.0-rc2 but it's exactly same with 5.16.5

Thanks!

Best regards,
Dāvis



More information about the Kernelnewbies mailing list