read the memory mapped address - pcie - kernel hangs
Muni Sekhar
munisekharrms at gmail.com
Thu Jan 9 07:20:30 EST 2020
On Thu, Jan 9, 2020 at 5:07 PM Greg KH <greg at kroah.com> wrote:
>
> On Thu, Jan 09, 2020 at 04:44:16PM +0530, Muni Sekhar wrote:
> > On Thu, Jan 9, 2020 at 1:15 AM Greg KH <greg at kroah.com> wrote:
> > >
> > > On Thu, Jan 09, 2020 at 12:30:20AM +0530, Muni Sekhar wrote:
> > > > Hi All,
> > > >
> > > > I have module with Xilinx FPGA. It implements UART(s), SPI(s),
> > > > parallel I/O and interfaces them to the Host CPU via PCI Express bus.
> > > > I see that my system freezes without capturing the crash dump for certain tests.
> > > > I debugged this issue and it was tracked down to the ‘readl()’ in
> > > > interrupt handler code
> > > >
> > > > In ISR, first reads the Interrupt Status register using ‘readl()’ as
> > > > given below.
> > > > status = readl(ctrl->reg + INT_STATUS);
> > > >
> > > > And then clears the pending interrupts using ‘writel()’ as given blow.
> > > > writel(status, ctrl->reg + INT_STATUS);
> > > >
> > > > I've noticed a kernel hang if INT_STATUS register read again after
> > > > clearing the pending interrupts.
> > >
> > > Why would you read that register again after writing to it?
> > >
> > > And are you sure you are reading/writing the correct size of the irq
> > > field? I thought it was a "word" not "long"? But that might depend on
> > > your hardware, do you have a pointer to the kernel driver source you are
> > > using for all of this?
> > Actually no need to read that register again. But reading that
> > register again should not freeze the system, right?
>
> It might, depends on your hardware. Go talk to the hardware vendor if
> you have questions about this.
>
> > INT_STATUS register is 32-bit width, so readl() API is used(my system
> > is x86_64, Intel(R) Atom(TM) CPU). Instead of readl(), do I need to
> > use readw() twice? If so what is reason for this code change?
>
> Ok, if that register is 32 bits, that's fine. It all depends on your
> hardware.
>
> > I’m trying to understand why system freezes without any crash dump
> > while reading the memory mapped IO from interrupt context?
>
> Because your hardware locked things up?
Here hardware means PCI controller on host side or PCI endpoint(FPGA) device?
>
> > FPGA code might be buggy, it may not send the completion for Memory
> > Read request. But CPU should not get stuck at LOAD instruction level..
>
> PCI hardware can do lots of bad things to a system, it _IS_ part of the
> memory bus, right? So of course it can lock the CPU at a read.
>
> > When it hung, it does not even respond for SYSRQ button(SYSRQ is
> > enabled – in normal scenario it works), only way to recover is reboot
> > the system. I enabled almost all the kernel.panic* variables. I set
> > the kernel.panic to positive, so it should reboot after panic instead
> > of just hang. But it’s not rebooting by itself. Even 'pstore\ramoops’
> > also not helped.
> > After reboot I looked at the kern.log and most of the times it has
> > “^@^@^@^ ...“ line just before reboot.
> >
> > Okay, I will write the minimalistic code to reproduce this one and
> > then share with you guys.
>
> What's wrong with the real/full driver source?
>
> And again, why are you trying to read the register twice?
I’m not the original author of this driver, so no idea why it
implemented like that. May be to verify the register contents after
clearing the bits…
>
> thanks,
>
> greg k-h
--
Thanks,
Sekhar
More information about the Kernelnewbies
mailing list