Debugging a kernel freeze

phil phil at pjd.me.uk
Fri Dec 1 11:53:23 EST 2017


On 01/12/17 14:05, Victor Ascroft wrote:
> I have a iMX6 running a 4.9 kernel with a custom kernel driver communicating
> with a FPGA over PCIe. The driver is not built in to the kernel but loaded as
> a module after complete boot up. During the running of the system, after a few
> hours the kernel completely freezes. No kernel panics or stack traces, nothing.
> I have access to the serial console.

I've done a lot of work with the imx6 and an Altera Cyclone IV FPGA 
connected via PCIe bus and I've not experienced any major issues with 
this setup.

> In such a scenario what are the ways to debug and try locating the source of
> the problem? I am not looking for a solution for my problem but things or
> approaches one can go about trying while trying to fix such a scenario?

This is a difficult situation and it will take a lot of time to debug 
but you really just need to spend time picking apart the driver. You 
should try disabling various parts and adding dynamic debug messages or 
tracing.

My first suspicion in these cases however is always with interrupts. 
There have been a few times when our FPGA code has a fault and the 
interrupts fail, so my first port of call is to usually disable 
interrupts in my driver and replace them with highres timers. Also you 
might want to look at load balancing the interrupts, ARM processors keep 
interrupts to one core (or they did in the kernels I've been using) and 
you can either manually assign the interrupts to other cores or use 
irqbalance to do so automatically. I prefered the manual solution as 
irqbalance didn't seem to assign my workload efficiently across the 
cores. At any rate you should probably be monitoring the interrupts.

Good Luck!

Regards,

Philip Downer



More information about the Kernelnewbies mailing list