Assistance Needed for Kernel mode driver Soft Lockup Issue

Sun Oct 20 08:48:05 EDT 2024

суббота, 19 октября 2024 г. пользователь Muni Sekhar <
munisekharrms at gmail.com> написал:

> Dear Linux Kernel Developers,
>
> I am encountering a soft lockup issue in my system related to the
> continuous while loop in the empty_rx_fifo() function. Below is the
> relevant code:
>
>
> #include <linux/io.h> // For readw()
>
> #define FIFO_STATUS 0x0014
> #define FIFO_MAN_READ 0x0015
> #define RX_FIFO_EMPTY 0x01 // Assuming RX_FIFO_EMPTY is defined as 0x01
>
> static inline uint16_t read16_shifted(void __iomem *addr, u32 offset)
> {
>     void __iomem *target_addr = addr + (offset << 1); // Left shift
> the offset by 1 and add to the base address
>     uint16_t value = readw(target_addr); // Read the 16-bit value from
> the calculated address
>     return value;
> }
>
> void empty_rx_fifo(void __iomem *addr)
> {
>     while (!(read16_shifted(addr, FIFO_STATUS) & RX_FIFO_EMPTY)) {
>         read16_shifted(addr, FIFO_MAN_READ); // Keep reading from the
> FIFO until it's empty
>     }
> }
>
> Explanation:
> Function Name: read16_shifted — The function reads a 16-bit value from
> an offset address with a left shift operation.
> Operation: It shifts the offset left by 1 (offset << 1), adds it to
> the base address, and reads the value from the new address.
> The empty_rx_fifo function is designed to clear out the RX FIFO, but
> I've encountered soft lockup issues. Specifically, the system logs
> repeated soft lockup messages in the kernel log, with a time gap of
> roughly 28 seconds between them (as per the kernel log timestamps).
> Here's an example log:
>
> watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
>
> In all cases, the RIP points to:
> RIP: 0010:read16_shifted+0x11/0x20
>
>
> Analysis:
> The soft lockup seems to be caused by the continuous while loop in the
> empty_rx_fifo() function. The RX FIFO takes a considerable amount of
> time to empty, sometimes up to 1000 seconds. As a result, from the
> first occurrence of the soft lockup trace, the log repeats
> approximately every 28 seconds for the entire 1000 seconds duration.
> After 1000 seconds, the system resumes normal operation.
>
> Questions:
> 1. How should I best handle this kind of issue? Even if the hardware
> takes time, I would like advice on the best approach to prevent these
> lockups.


 I guess that you can switch on interrupt model or run a thread to check
the status there (here I mean check RX empty and release cpu)

2. Do soft lockup issues auto-recover like this? Is this something I
> should consider serious, or can it be ignored?


The kernel tells you that your cpu resource is stuck instead of doing
something useful


> I would appreciate any guidance on how to resolve or mitigate this problem.
>
>
> --
> Thanks,
> Sekhar
>
> _______________________________________________
> Kernelnewbies mailing list
> Kernelnewbies at kernelnewbies.org
> https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies
>


-- 
Regards / Mit besten Grüßen,
Denis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20241020/2f7985ad/attachment.html>