Debugging a kernel freeze
phil
phil at pjd.me.uk
Fri Dec 1 11:53:23 EST 2017
On 01/12/17 14:05, Victor Ascroft wrote:
> I have a iMX6 running a 4.9 kernel with a custom kernel driver communicating
> with a FPGA over PCIe. The driver is not built in to the kernel but loaded as
> a module after complete boot up. During the running of the system, after a few
> hours the kernel completely freezes. No kernel panics or stack traces, nothing.
> I have access to the serial console.
I've done a lot of work with the imx6 and an Altera Cyclone IV FPGA
connected via PCIe bus and I've not experienced any major issues with
this setup.
> In such a scenario what are the ways to debug and try locating the source of
> the problem? I am not looking for a solution for my problem but things or
> approaches one can go about trying while trying to fix such a scenario?
This is a difficult situation and it will take a lot of time to debug
but you really just need to spend time picking apart the driver. You
should try disabling various parts and adding dynamic debug messages or
tracing.
My first suspicion in these cases however is always with interrupts.
There have been a few times when our FPGA code has a fault and the
interrupts fail, so my first port of call is to usually disable
interrupts in my driver and replace them with highres timers. Also you
might want to look at load balancing the interrupts, ARM processors keep
interrupts to one core (or they did in the kernels I've been using) and
you can either manually assign the interrupts to other cores or use
irqbalance to do so automatically. I prefered the manual solution as
irqbalance didn't seem to assign my workload efficiently across the
cores. At any rate you should probably be monitoring the interrupts.
Good Luck!
Regards,
Philip Downer
More information about the Kernelnewbies
mailing list