How to debug stuck read?

FMDF fmdefrancesco at gmail.com
Sun Feb 6 06:01:02 EST 2022


On Wed, Feb 2, 2022 at 10:50 PM Dāvis Mosāns <davispuh at gmail.com> wrote:
>
> trešd., 2022. g. 2. febr., plkst. 21:13 — lietotājs Matthew Wilcox
> (<willy at infradead.org>) rakstīja:
> >
> > On Wed, Feb 02, 2022 at 07:15:14PM +0200, Dāvis Mosāns wrote:
> > > I have a corrupted file on BTRFS which has CoW disabled thus no
> > > checksum. Trying to read this file causes the process to get stuck
> > > forever. It doesn't return EIO.
> > >
> > > How can I find out why it gets stuck?
> >
> > > $ cat /proc/3449/stack | ./scripts/decode_stacktrace.sh vmlinux
> > > folio_wait_bit_common (mm/filemap.c:1314)
> > > filemap_get_pages (mm/filemap.c:2622)
> > > filemap_read (mm/filemap.c:2676)
> > > new_sync_read (fs/read_write.c:401 (discriminator 1))
> >
> > folio_wait_bit_common() is where it waits for the page to be unlocked.
> > Probably the problem is that btrfs isn't unlocking the page on
> > seeing the error, so you don't get the -EIO returned?
>
>
> Yeah, but how to find where that happens.
> Anyway by pure luck I found memcpy that wrote outside of allocated
> memory and fixing that solved this issue but I still don't know how to
> debug this properly.
>
There is no special recipe for debugging "this properly" :)

You wrote that "by pure luck" you found a memcpy() that wrote beyond the
limit of allocated memory. I suppose that you found that faulty memcpy()
somewhere in one of the function listed in the stack trace.

That's the right approach! You read the calls chain and find out where something
looks wrong and then fix it. This is why stack traces are so helpful.

It was not "pure luck". I think that you did what developers usually do after
decoding a stack trace. If not, how did you find that faulty memcpy() buried
somewhere in 40 millions lines of code?

it seems that you've found the right way to figure out the problems in code
that (probably) you had not ever worked on or read before you hit that bug.

Have you sent a patch to the LKML? If not, please do it.

Regards,

Fabio M. De Francesco



More information about the Kernelnewbies mailing list