Multipath Routing sends IP datagrams to a wrong next hop with hashing enabled

Viacheslav Biriukov v.v.biriukov at gmail.com
Sun Jul 9 10:23:59 EDT 2023


Hello team, I'd appreciate any help, suggestions or hints regarding my TCP
RST issue with multipath routing.

I see random TCP connection resets, and while investigating I am stuck with
understanding what part of the kernel can call the `ip_forward()` other
than `ip_route_input_noref()`.

I have the following setup:

Kernel 6.4.0.

```
sysctl -w net.ipv4.ip_forward=1
sysctl -w net.ipv4.fib_multipath_hash_policy=1
```

```
# ip r

192.168.200.0/24 dev enp0s8 proto kernel scope link src 192.168.200.5 #
network with backends
192.168.222.0/24 dev enp0s9 proto kernel scope link src 192.168.222.5 #
client network
10.100.100.0/24 proto bird # VIP network
        nexthop via 192.168.200.101 dev enp0s8 weight 1
        nexthop via 192.168.200.102 dev enp0s8 weight 1
```

```
192.168.222.99 - client IP
192.168.200.101 and 192.168.200.102 - real IPs of 2 backends.
10.100.100.1 - a VIP, backends have it on lo dev.
```

Backends and a client have the router as a default gateway.

Example of a RST with my investigations are the following.

Using `bpftrace` I can see that `ip_route_input_noref()` is run 3 times
before the connection reset:

```
ack_seq: 0,         192.168.222.99:46308 -> 10.100.100.1:8080,
 8:0:27:a8:29:e5 -> 8:0:27:86:65:ff
ack_seq: 322960818, 192.168.222.99:46308 -> 10.100.100.1:8080,
 8:0:27:a8:29:e5 -> 8:0:27:86:65:ff
ack_seq: 0,         192.168.222.99:46308 -> 10.100.100.1:8080,
 8:0:27:a8:29:e5 -> 8:0:27:86:65:ff
```

but `ip_forward()` runs 4 times with a wrong `gw` in the `skb->_skb_refdst`
at the middle of a connection. It should be 192.168.200.101, but it uses
192.168.200.102 instead:

```
ack_seq: 0,         192.168.222.99:46308 -> 10.100.100.1:8080,
 8:0:27:a8:29:e5 -> 8:0:27:86:65:ff, gw: 192.168.200.101
ack_seq: 322960818, 192.168.222.99:46308 -> 10.100.100.1:8080,
 8:0:27:a8:29:e5 -> 8:0:27:86:65:ff, gw: 192.168.200.101
ack_seq: 322960827, 192.168.222.99:46308 -> 10.100.100.1:8080,
 8:0:27:a8:29:e5 -> 8:0:27:86:65:ff, gw: 192.168.200.102
ack_seq: 0,         192.168.222.99:46308 -> 10.100.100.1:8080,
 8:0:27:a8:29:e5 -> 8:0:27:86:65:ff, gw: 192.168.200.101
```

I locally added logging to `__mkroute_input()` to understand if it's a
hashing problem, but it also shows me only 3 calls and a correct `rt_gw4`
in `rth rtable` and `nhc`:

```
192.168.222.99:46308 -> 10.100.100.1:8080, mac: 8:0:27:a8:29:e5 ->
8:0:27:86:65:ff, gw4: 192.168.200.101, rt_gw4:192.168.200.101
192.168.222.99:46308 -> 10.100.100.1:8080, mac: 8:0:27:a8:29:e5 ->
8:0:27:86:65:ff, gw4: 192.168.200.101, rt_gw4:192.168.200.101
192.168.222.99:46308 -> 10.100.100.1:8080, mac: 8:0:27:a8:29:e5 ->
8:0:27:86:65:ff, gw4: 192.168.200.101, rt_gw4:192.168.200.101
````

I am stuck with understanding how it's possible that the kernel makes an
additional incorrect `ip_forward()` call, when neither `__mkroute_input()`
nor `ip_route_input_noref()` calls it.


Thank you for any hints.

-- 
Viacheslav Biriukov
BR
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20230709/0efbfb99/attachment.html>


More information about the Kernelnewbies mailing list