If a packet is heard multiple times, rebroadcast using the highest hop limit #5534

erayd · 2024-12-09T09:18:06Z

Sometimes a packet will be in the TX queue waiting to be transmitted, when it is overheard being rebroadcast by another node with a higher remaining hop limit. When this occurs, modify the pending packet in the TX queue to avoid unnecessarily wasting hops.

This is intended to help solve the problem of a router node with excellent coverage (A) hearing a packet bounce around between several poorly-placed nodes while waiting for the opportunity to rebroadcast. On a busy mesh, this can mean that the first copy of the packet that (A) hears may have had its hop limit prematurely reduced, and a better version is subsequently received.

This is more relevant to the late-rebroadcast scenario implemented in #5528, but is still applicable even without that mode, particularly when having to contend with poorly-placed rooftop routers.

For example:

(E) transmits a packet with hop limit 3
(B) & (D) both overhear it. (D) starts rebroadcasting it first with hop limit 2, (B) overhears (D) and waits.
(B) and (C) are both ready to transmit. (C) happens to go first, rebroadcasting with hop limit 1.
(A) hears the packet from (C), and is about to rebroadcast with hop limit 0, when it hears (B) starting to send.
(B) rebroadcasts the packet with hop limit 2.
(A) hears the packet again from (B), this time with hop limit 2.
(A) adjusts the hop limit on the packet in its TX queue from 0 to 1, and then transmits the packet.

…p limit Sometimes a packet will be in the TX queue waiting to be transmitted, when it is overheard being rebroadcast by another node, with a higher hop limit remaining. When this occurs, modify the pending packet in the TX queue to avoid unnecessarily wasting hops.

fifieldt · 2024-12-09T09:46:02Z

This makes sense to me

GUVWAF · 2024-12-09T13:11:09Z

Implementation looks good, but I'm not entirely sure about the idea behind it, as it effectively means you're artificially increasing the hop limit for certain branches of the mesh. Not sure this is necessarily a bad thing, as it wouldn't happen if the optimal route was chosen in the first place, but it does go against the idea of limiting the amount of rebroadcasts by setting a proper hop limit.

Also, a small issue with this is that a traceroute result will show a route with more hops than the hops_away accompanying the packet, which is also used to add “unknown” nodes to the route.

erayd · 2024-12-09T13:20:48Z

I'm not entirely sure about the idea behind it, as it effectively means you're artificially increasing the hop limit for certain branches of the mesh. Not sure this is necessarily a bad thing, as it wouldn't happen if the optimal route was chosen in the first place, but it does go against the idea of limiting the amount of rebroadcasts by setting a proper hop limit.

The way I see it, is that it is correcting an unintended (and suboptimal) lowering of hop count for those sections. Two different ways of looking at the same situation perhaps? This wouldn't change the number of rebroadcasts in that area, but it would allow the packet to ultimately travel further, rather than getting lost between high sites due to routing shenanigans at the origin area.

Also, a small issue with this is that a traceroute result will show a route with more hops than the hops_away accompanying the packet, which is also used to add “unknown” nodes to the route.

Thanks for flagging this. I do not yet understand how traceroute is implemented (have yet to investigate that part of the source), so will dig into that to ensure it is treated properly.

GUVWAF · 2024-12-09T14:21:39Z

This wouldn't change the number of rebroadcasts in that area

For this specific packet, it does (otherwise this won't have any effect).

Two different ways of looking at the same situation perhaps?

I guess it depends on whether users are already accounting for this or not. Even with this solution, consuming hops too early might still happen if the packet already left the Tx queue, or if the first packet arrived with a hop limit of 0. So, if users set their hop limit based on the maximum they have seen, this only leads to more unnecessary rebroadcasts. Only if they set it exactly to the one needed for the optimal route, then this would indeed higher the success rate without any additional unnecessary rebroadcasts.

GUVWAF · 2024-12-09T14:26:51Z

I do not yet understand how traceroute is implemented

It will be called upon as a module, and then modifies the payload, before adding it to the Tx queue again. When you see a duplicate packet, it will be filtered out and thus the modules are not called for it, so the original traceroute result will be in the Tx queue.
I've struggled with this before, since you need to be careful with handling duplicate packets multiple times, for example because it shouldn't be sent to the phone and MQTT twice.

erayd · 2024-12-09T22:59:13Z

For this specific packet, it does (otherwise this won't have any effect).

Can you clarify? I'm not seeing where the extra rebroadcast is in the local area. The only additional rebroadcast should be out of the problem area, by whichever relaying device would otherwise have seen a hop limit of 0 (in this case, A).

The fundamental question is, do you see this behaviour as desirable or undesirable in the general case? If the latter, would you be more comfortable if I implemented this as part of #5528, to apply only to ROUTER_LATE?

I think it's definitely warranted in the ROUTER_LATE case, because those nodes are intended to be serving isolated pockets of nodes that do a poor job amongst themselves first. But with other roles, the usefulness is much more marginal.

GUVWAF · 2024-12-10T18:05:45Z

It might just be we’re not agreeing on what the “(local/problem) area” is. In my opinion it doesn’t really matter where the additional rebroadcasts happen, if they are unnecessary then it should be avoided. It is unnecessary when an end-node rebroadcasts even though there is no new receiver, but it is especially relevant for direct messages, where most nodes do not need to receive the packet at all. For example here, if node 0 sends a DM to node 1 with a hop limit of 2, it also goes to node 2 and 3, and ends up with a hop limit of 0 at node 4. However, when it arrives later via 5, with this PR node 4 would still rebroadcast.

do you see this behaviour as desirable or undesirable in the general case?

If users are conservative in setting the hop limit, then it would be desirable, but in my experience this is mostly not the case.

I think it's definitely warranted in the ROUTER_LATE case, because those nodes are intended to be serving isolated pockets of nodes that do a poor job amongst themselves first.

The ROUTER_LATE should serve isolated pocket of nodes that did not hear the packet yet. The first packet that arrives at the ROUTER_LATE should not be coming from those isolated pockets of nodes that consume too many hops.

But with other roles, the usefulness is much more marginal.

I agree this will happen more frequently if there are ROUTER_LATE roles in the mesh. However, it will then also affect ROUTERs and REPEATERs, because even when the route via a ROUTER_LATE uses less hops, its packet likely arrives later than a suboptimal route.

Maybe we can come to a compromise and only introduce this after ROUTER_LATE and the NextHopRouter (#2856) are introduced? With the NextHopRouter unnecessary relaying for DMs will already be limited, while using this PR, for broadcasts the reachability might in some cases be improved.

erayd · 2024-12-11T10:33:45Z

It might just be we’re not agreeing on what the “(local/problem) area” is.

I think that may well be the case. I'm looking at this from the principle of "The packet should be spread as widely as possible across the mesh with the fewest number of hops". In that context, what I consider to be unnecessary rebroadcasts are those generated by clients who missed seeing a prior nearby repetition of the packet (e.g. due to terrain obstructions). Whereas you seem to be coming at it from the perspective of "Users are probably trying to reach only people very close to them, and packets should be discouraged from travelling longer distances." (unless I have badly misunderstood your example scenario in the above post).

In my opinion it doesn’t really matter where the additional rebroadcasts happen, if they are unnecessary then it should be avoided.

Agreed. However, currently we do not have a way to divine the user's intent - they might be trying to reach node 1, but they may also be wanting to reach node 6, or perhaps even something further away than that which would require relaying via (6). This is particularly the case with broadcast packets (e.g. if (6) and a closer node share a common channel, the user may desire them both to receive the packet). In the common channel case, the user may not even know how far away (6) is, or whether they're a member of that channel.

With DMs, it's more straightforward - the destination node ID is in the packet header. Albeit that we don't (yet) have a mechanism to determine where on the network that node might be located, and which hops may be necessary to reach it. Looking forward to #2856 for that!

do you see this behaviour as desirable or undesirable in the general case?

If users are conservative in setting the hop limit, then it would be desirable, but in my experience this is mostly not the case.

Is it not that users bump the hop limit up because they are struggling to reach their target otherwise? I know that is certainly the case at the edges of our local mesh here... distant users increase their hopcount, because otherwise their packets don't reliably make it through the mess of mountains & hills to the person 400km away that they're trying to communicate with.

I think it's definitely warranted in the ROUTER_LATE case, because those nodes are intended to be serving isolated pockets of nodes that do a poor job amongst themselves first.

The ROUTER_LATE should serve isolated pocket of nodes that did not hear the packet yet. The first packet that arrives at the ROUTER_LATE should not be coming from those isolated pockets of nodes that consume too many hops.

This makes no sense. ROUTER_LATE does not exist as a one-way thing to only deliver packets to those pockets of users. For it to properly serve those users, also exists to receive packets from them and relay those out to the rest of the mesh. In many cases, due to the topography, that node will be the only path available for such traffic to take.

But with other roles, the usefulness is much more marginal.

I agree this will happen more frequently if there are ROUTER_LATE roles in the mesh. However, it will then also affect ROUTERs and REPEATERs, because even when the route via a ROUTER_LATE uses less hops, its packet likely arrives later than a suboptimal route.

By the time the ROUTER_LATE version of a packet arrives, anything with a ROUTER or REPEATER role would have already rebroadcast that packet (and will therefore completely ignore the ROUTER_LATE version) unless the one from ROUTER_LATE is the first time it was received. This optimisation is only relevant to packets that are already sitting in the TX queue - it doesn't retain them there to wait for better options, and it doesn't rebroadcast it a second time. This behaviour is purely opportunistic. Hence why I say the benefits are more marginal for other roles, because with other roles things get flushed out of the TX queue much more rapidly. The sooner they move, the less this helps.

Maybe we can come to a compromise and only introduce this after ROUTER_LATE and the NextHopRouter (#2856) are introduced? With the NextHopRouter unnecessary relaying for DMs will already be limited, while using this PR, for broadcasts the reachability might in some cases be improved.

I'd be OK with that 🙂. It's a feature I'm keen to have available on our local mesh sooner than that - because it's clear from experience that we definitely do need it - but we can always just use custom binaries on our sites where that matters, so waiting until after #2856 and #5528 are merged won't cause us any problems.

GUVWAF · 2024-12-11T19:40:02Z

"Users are probably trying to reach only people very close to them, and packets should be discouraged from travelling longer distances." (unless I have badly misunderstood your example scenario in the above post).

My point is that the hop limit is a global setting, it applies to every packet, so you have to set it to the maximum you expect you need. However, it is also used for the case where you send a direct message to a node close to you.

ROUTER_LATE does not exist as a one-way thing to only deliver packets to those pockets of users. For it to properly serve those users, also exists to receive packets from them and relay those out to the rest of the mesh.

Ah, I was thinking about the use-case for a ROUTER_LATE on a rooftop that serve CLIENT_MUTE devices in the house, but yes, when they are normal CLIENTs and the packets go the other way around, you’re right.

a ROUTER or REPEATER role would have already rebroadcast that packet

Yes, if it is in front of the Tx queue, and there will not be any new packet received in between from another ROUTER/REPEATER or a node that didn’t hear the first packet (hidden node problem).

Anyway, let's merge this after ROUTER_LATE and the NextHopRouter (and hopefully after fixing the traceroute issue).

erayd · 2024-12-11T19:54:29Z

Anyway, let's merge this after ROUTER_LATE and the NextHopRouter (and hopefully after fixing the traceroute issue).

Sounds good 🙂. And yes, I agree that the traceroute issue you've identified is something that I should resolve before this is merged.

GUVWAF · 2024-12-11T19:56:33Z

src/mesh/RadioLibInterface.cpp

+/**
+ * Clamp the hop limit to the greater of the hop count provided, or the hop count in the queue
+ */
+void RadioLibInterface::clampHopsToMax(NodeNum from, PacketId id, uint32_t hop_limit)


Can you add this function to the SimRadio also?

I'm not familiar with the SimRadio part of the codebase yet (or at least, I don't recognise the name), but assuming there are no show-stoppers, yes 🙂.

It's used with https://github.com/meshtastic/Meshtasticator?tab=readme-ov-file#interactive-simulator to simulate multiple nodes on a Linux system (or Docker container).

It should just work if you simply copy this function :)

Or, maybe better to implement it in RadioInterface directly.

Does RadioInterface have access to the TX queue? I didn't think it did... that's why I put this in RadioLibInterface in the first place.

Yeah, sorry, it doesn't. Indeed it has to be in RadioLibInterface and then unfortunately needs to be copied to SimRadio.

erayd · 2024-12-12T17:19:16Z

@caveman99 Please don't force-push my branches. No issue with you keeping it up to date if you want to, but unnecessarily breaking sync is irritating. Merge commits work just fine.

garthvh · 2024-12-12T17:30:21Z

@caveman99 Please don't force-push my branches. No issue with you keeping it up to date if you want to, but unnecessarily breaking sync is irritating. Merge commits work just fine.

This is our process, having one branch managed differently than the others is not really going to work.

erayd · 2024-12-12T17:36:54Z

@garthvh

This is our process, having one branch managed differently than the others is not really going to work.

#5528 indicates otherwise, as you folks have been using merge commits there without issue. And on several other PR branches, including some of them by @caveman99. So using merge commits is clearly accepted practice here.

Why is doing things in a way that doesn't break sync OK for those branches, but not this one?

caveman99 · 2024-12-12T18:39:12Z

Preferred is rebase, sometimes you can't rebase, then we use merge commits.

erayd · 2024-12-12T19:30:19Z

@caveman99

Preferred is rebase, sometimes you can't rebase, then we use merge commits.

Roger. Looking at the PR list though, I'm still seeing a fair bit of merge activity in cases where a rebase would have worked fine (including my initial example of #5528) - so it doesn't seem that this preference is being adhered to in practice across the board. If that is something you want to insist on though, then I will just put up with the annoyance as the price of engaging with this project.

Rebase-first-by-maintainers for regularly keeping every PR updated on a repo this busy does seem an odd way of doing things IMO. Given the frequency of updates, you'd be breaking things on a near-constant basis. If this preference is something you want to insist on, might I suggest that you mention it in the default notes when somebody creates a PR? Currently it just requests that editing by maintainers be enabled, but makes no mention that one's work might get repeatedly rebased without warning. That is a useful thing to know upfront, especially for PRs that may involve a notable number of distinct commits between pushes.

thebentern requested a review from GUVWAF December 9, 2024 11:40

erayd marked this pull request as draft December 9, 2024 13:24

GUVWAF reviewed Dec 11, 2024

View reviewed changes

caveman99 force-pushed the overheard-hoptimisation branch from e520ba6 to 6ea0fed Compare December 12, 2024 15:46

erayd force-pushed the overheard-hoptimisation branch from 6ea0fed to ea42503 Compare December 12, 2024 17:11

Merge branch 'master' into overheard-hoptimisation

a4cfa57

erayd force-pushed the overheard-hoptimisation branch from 8f75b28 to a4cfa57 Compare December 21, 2024 10:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If a packet is heard multiple times, rebroadcast using the highest hop limit #5534

If a packet is heard multiple times, rebroadcast using the highest hop limit #5534

erayd commented Dec 9, 2024 •

edited

Loading

fifieldt commented Dec 9, 2024

GUVWAF commented Dec 9, 2024

erayd commented Dec 9, 2024

GUVWAF commented Dec 9, 2024

GUVWAF commented Dec 9, 2024 •

edited

Loading

erayd commented Dec 9, 2024

GUVWAF commented Dec 10, 2024

erayd commented Dec 11, 2024 •

edited

Loading

GUVWAF commented Dec 11, 2024 •

edited

Loading

erayd commented Dec 11, 2024 •

edited

Loading

GUVWAF Dec 11, 2024

erayd Dec 11, 2024

GUVWAF Dec 11, 2024

erayd Dec 11, 2024 •

edited

Loading

GUVWAF Dec 11, 2024

erayd commented Dec 12, 2024

garthvh commented Dec 12, 2024

erayd commented Dec 12, 2024 •

edited

Loading

caveman99 commented Dec 12, 2024

erayd commented Dec 12, 2024

If a packet is heard multiple times, rebroadcast using the highest hop limit #5534

Are you sure you want to change the base?

If a packet is heard multiple times, rebroadcast using the highest hop limit #5534

Conversation

erayd commented Dec 9, 2024 • edited Loading

fifieldt commented Dec 9, 2024

GUVWAF commented Dec 9, 2024

erayd commented Dec 9, 2024

GUVWAF commented Dec 9, 2024

GUVWAF commented Dec 9, 2024 • edited Loading

erayd commented Dec 9, 2024

GUVWAF commented Dec 10, 2024

erayd commented Dec 11, 2024 • edited Loading

GUVWAF commented Dec 11, 2024 • edited Loading

erayd commented Dec 11, 2024 • edited Loading

GUVWAF Dec 11, 2024

Choose a reason for hiding this comment

erayd Dec 11, 2024

Choose a reason for hiding this comment

GUVWAF Dec 11, 2024

Choose a reason for hiding this comment

erayd Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

GUVWAF Dec 11, 2024

Choose a reason for hiding this comment

erayd commented Dec 12, 2024

garthvh commented Dec 12, 2024

erayd commented Dec 12, 2024 • edited Loading

caveman99 commented Dec 12, 2024

erayd commented Dec 12, 2024

erayd commented Dec 9, 2024 •

edited

Loading

GUVWAF commented Dec 9, 2024 •

edited

Loading

erayd commented Dec 11, 2024 •

edited

Loading

GUVWAF commented Dec 11, 2024 •

edited

Loading

erayd commented Dec 11, 2024 •

edited

Loading

erayd Dec 11, 2024 •

edited

Loading

erayd commented Dec 12, 2024 •

edited

Loading