How to find a bug with lost network messages

Sandro Stiller sandro.stiller at elfin.de
Tue Feb 2 04:09:20 EST 2016


Hello,

I'm struggeling with a network driver (sllin[1]) which is not in the 
official kernel.
It has a lot in common with the slcan driver but is used for LIN networks.
The problem is, that sometimes messages sent to the network layer via 
netif_rx() don't arrive in all listening programs.

This is how the driver works:
1. The application sends CAN messages to the network interface
2. The driver forwards it to the UART (tty)
3. The UART receives the same message (single-wire connection, RX and TX 
connected) and sends it back to the network layer
4. The sending application receives the previously sent message and can 
check for transmission errors and appended LIN slave replies.

Sometimes the last point (4.) does not work after 10 - 40 seconds of 
transmission.
The application does not receive the message using a blocking read() on 
the socket, but other processes receive it (running candump on the 
interface). netif_rx() always returns 0.

If more programs are listening (running multiple instances of candump), 
the problem appears less often or never.
On my PC there is no problem, it occures on ARM only.
I'm using kernel 4.1.

Can you give me a hint where to search for the cause of this behaviour?

Thank you very much.

Sandro


[1]: https://github.com/sstiller/sllin/tree/master/sllin



More information about the Kernelnewbies mailing list