-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Babeld-only network does not work on wireless mesh links #666
Comments
A partial fix is to allow the redistribution of the local routes, we're denying it here: lime-packages/packages/lime-proto-babeld/files/usr/lib/lua/lime/proto/babeld.lua Lines 73 to 77 in d81c12c
With this, the fourth router can ping internet and can ping the other routers. Still a client connected to the fourth router cannot ping the internet nor the routers. |
This is not a bug, all the routers are configured on |
Don't stop at the title. Currently we have a schizophrenic situation in which Babeld decides the next hop and Batman-adv decides how to get there. Some times. Some other times Babeld also gets to send the packets through the route it choose. For example, when pinging the internet from the fourth router, the next hop for the outgoing packet gets decided from Babeld, but the path of the replay gets decided by Batman-adv. The outgoing and the incoming packets go on different interfaces with maybe even different MTU. |
That is normal, the kernel always chose the path that seems more convenient, but sometimes that is not necessarily the best path from an "omniscient" point of view. The kernel have to take the routing decision with the information it has access to which is limited, and with less CPU cycles as possible. So when you ping an host outside your sub-net the packets that comes out of your device has the destination IP, and anygw mac as L2 destination, so it bumps on L3 on first anygw node it can, and since that point the L3 routing is in charge, when a packet come back from the internet the first router which have same sub-net as you will think "the destination is on my same broadcast domain so I have to push it via L2" and then since that point L2 will be in charge of routing that packet, even if that may be sub-optimal in some cases. This is how the internet works, you can try to avoid that by using libremesh in L3 mode only but you will need to give each router an unique non overlapping sub-net, and you will loose roaming capability, as that "stupid routing" as you name it upstairs is what make roaming possible. Perfect routing doesn't exists you need yo chose the trades-off to live with. |
I suggest to stop releasing LibreMesh with both Batman-adv and Babeld. Instead, I would release LibreMesh-node with just Batman-adv (and batman-adv-auto-gw-mode package) to be used for small-medium networks and LibreMesh-border with just Babeld for the border nodes between the LibreMesh-node networks. |
The discussion about whether to release a L2+L3 system or something else is on #468 |
I just tested a network with no Batman-adv but only Babeld, on top of OpenWrt 18.06 (initially spotted when testing on OpenWrt 19.07-rc2, where the same problem is present): wireless mesh links do not work.
When adding all the protocols but
list protocols batadv:%N1
in the/etc/config/lime-node
, the routing over the wireless links stops working (while the cabled ones are still ok). Needless to say, when both Batman-adv and Babeld are present everything works.But it works just because the broken routing is "patched" as the packets flow through bat0.
My setup has 4 nodes:
Commercial AP with internet - - - - LibreMesh wireless client (10.13.40.169) ------ LibreMesh AP + mesh (10.13.104.0) - - - - LibreMesh AP + mesh (10.13.123.8)
the ping to the internet from the third router works (i.e. the cabled link works), but from the fourth does not work (i.e. the wireless link does not work).
The Babeld dump (
echo dump | nc ::1 30003
) looks good both in the third and in the fourth router, and does not change when adding or removing Batman-adv.Third router routes:
ping to the internet works (via eth0-1_17) and ping to the second router (10.13.40.169) works (as it is reachable from the LAN network included in br-lan).
Fourth router routes:
Ping to the internet is sent through wlan1-mesh_17 but the reply does not arrive as the third router sends it through br-lan, which does not include the mesh interface.
Ping to the third router (10.13.104.0) is sent through br-lan and does not even get to the third router as the mesh interface is not included in br-lan.
When Batman-adv is present, the connection magically works, even if the routes are identical to the ones I have shown here, but works in a stupid way I think:
the reply of the ping from the fourth router to the internet finds its way back as bat0 is included in br-lan; the ping to the third router works even if it's sent through br-lan as it includes bat0 which includes wlan1-mesh_29.
Using tcpdump I checked that this happens: pinging the internet, the outwards ping goes through wlan1-mesh_17 (Babeld interface) and the reply arrives from wlan1-mesh_29 (included in bat0); pinging the third router wlan1-mesh_17 does not have any traffic and everything flows through wlan1-mesh_29.
In my opinion this is a bug (limits the MTU and maybe the performances) which is caused by the fact that Babeld interfaces have only /32 IPv4 addresses which does not allow the packets to be routed through it. I don't know the solution but I think that Babeld should be in charge to announce more meaningful routes.
The option
babeld_over_batman
proposed in #631 could be used for enabling or disabling this kind of things, but by default Babeld routed packets should not pass through bat0.The text was updated successfully, but these errors were encountered: