How to find a bug with lost network messages
Sandro Stiller
sandro.stiller at elfin.de
Tue Feb 2 04:09:20 EST 2016
Hello,
I'm struggeling with a network driver (sllin[1]) which is not in the
official kernel.
It has a lot in common with the slcan driver but is used for LIN networks.
The problem is, that sometimes messages sent to the network layer via
netif_rx() don't arrive in all listening programs.
This is how the driver works:
1. The application sends CAN messages to the network interface
2. The driver forwards it to the UART (tty)
3. The UART receives the same message (single-wire connection, RX and TX
connected) and sends it back to the network layer
4. The sending application receives the previously sent message and can
check for transmission errors and appended LIN slave replies.
Sometimes the last point (4.) does not work after 10 - 40 seconds of
transmission.
The application does not receive the message using a blocking read() on
the socket, but other processes receive it (running candump on the
interface). netif_rx() always returns 0.
If more programs are listening (running multiple instances of candump),
the problem appears less often or never.
On my PC there is no problem, it occures on ARM only.
I'm using kernel 4.1.
Can you give me a hint where to search for the cause of this behaviour?
Thank you very much.
Sandro
[1]: https://github.com/sstiller/sllin/tree/master/sllin
More information about the Kernelnewbies
mailing list