-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graph Neighbor Packets #341
Conversation
Neighborinfo module
For more context, this PR is a result of this Discourse discussion on performing network analysis within Meshtastic networks. The idea is that a new firmware module would allow each given node to choose to transmit which other nodes it has heard from out onto the network. Each node would record the last time it heard from each of its neighbors, then every The idea is that clients could listen for these |
Thanks for this PR. Questions:
The last two are especially important since at the level of maturity of the code right now, we can't have any changes that break compatibility. |
Note: This change should be tested on an NRF52 device to observe the actual change in device memory. It was that target that we used to define our memory cap limit. |
Also:
... when/if this is implemented, we will need that data wiped when the modem configuration changes. The SNR data is only valid for a single set of modem configs. |
@mc-hamster I can speak to the problem that this PR solves. We're interested in building out functionality to allow a network administrator to get a sense of the topology of a given Meshtastic mesh. To do this, we need a way to determine the connections between any given node and the nodes around it (its neighbors). Once we have enough information on these connections, we can then build out a graph data structure and analyze the strength of the network from there. We had a discussion with @GUVWAF a couple of months ago (link) about how we can collect this information on node-node connections. Originally we were looking into the TraceRoute module, but we decided that the cost of running a TraceRoute analysis from each node to each other node would be far too expensive. From this, we came up with the idea where each node on the network would listen for which nodes it receives packets from (single-hop only), store this information, and transmit aggregate information on all its neighbors onto the network. Clients could then listen for Per my current understanding, it's not possible for a node to determine who most recently relayed any given packet, since the original Since this is not possible, we would rely on the new I remember hearing some discussion around adding support for nodes determining who most recently relayed a given packet within the packet header, and assuming this functionality eventually gets implemented, we would then look to rely on that information as opposed to the As an important side note, we would be looking to make this functionality opt-in on a given network, meaning nodes would have to intentionally choose to send information on their neighbors out into the network. |
Thanks for the feedback! I'm going to answer the second four questions (@ajmcquilkin has the first one).
Practically, a neighbor is the ID of an adjacent node that has been heard from within some period of time (message freshness).
Note: part of the reason the roles are confusing here is that we're using
This change as a whole does not remove any fields in the protobufs, and should be backwards compatible. The only exception would be NodeDB changes. I'm unsure how strict the current NodeDB size limit is and if changing it slightly would break things; especially across different types of hardware; can anyone advise?
The reason this PR changes the NodeDB is simply to store a small amount of info on the nodes that a node has heard from most recently. This info could be stored anywhere -- there's no reason to specifically store it in the NodeDB other than it seemed natural. If these changes to the NodeDB aren't backwards compatible, I'd love to find an alternate approach! |
Given the limited information currently available, I think this is a clever way to get an insightful overview of the network for users to opt-in for. It’s indeed less impactful to the network than exhaustively using TraceRoute to each node. @uhuruhashimoto Do you have an example of a graph that can be created with this? I think that would clarify the goal as well.
Yeah, but it would require a breaking change. On the other hand, since we currently still have 4 bits of the header flags unused, these could be used to hold the hop limit setting of the original transmitter. In this way, you can determine with how many hops a packet was received (and thus also if the node of the |
All very awesome! Any changes to the header must provide core functionality as it adds overhead to every packet sent. Even with protobufs, changes here are not completely free. I'd advise finding somewhere else to make that change. From my point of view, I would not approve any change to the header that won't always be used. As an opt in feature, consider adding the trace route data to the telemetry module / feature. That may give you the detail you're looking for in a more realtime form. Also note that "nearest neighbor" is nearest on the network but would not be nearest physically, this is due to the "low snr based prioritization" built into our routing protocol. |
Let's put the data elsewhere. Every small bit of info into the node db is multiplied n-times and reserved in memory for every node even if the network isn't of the maximum size. |
Over all, the concept being presented here is exciting. This is visibility that people have been asking for some time. Let's make sure that our impact to the total system architecture is clean and especially be mindful of adding overhead on the network. The core functionality of messaging must not be negatively impacted.
Let's leave those for bits unused for now. This functionality doesn't need to track every single packet and for the packets that are tracked, we can add the original hop limit to those ... eg, telemetry, message, etc. |
Definitely worth consideration. It would also allow us to persist it to the file system separately. We have support for maps now in proto3 and I think it could be a great use case for them. |
If we do go into memory |
Thanks for the feedback on the header! Mostly throwing that idea out there just for the record, although totally get not wanting to add something to the header that's not mission-critical for the messaging case. On the impact on the network, in my head I'm imagining this functionality to be implemented as a module with the opt-in as discussed above and a configurable transmission interval to allow users to increase/decrease bandwidth hits as needed.
I'm not very familiar with the telemetry module, so I'm not sure if I'm following @mc-hamster's suggestion on how this functionality would fit in with the existing telemetry packets.
Thanks for the heads-up! This hadn't occurred to me. I don't see this as being an issue especially if the nodes on the network enable location, but definitely something that's good to acknowledge. Also, was the PR close intentional? |
I've worked to keep Telemetry's scope to be pristinely metrics. I think your approach of having the separate portnum (and eventual module) makes the most sense. That way it's opt-in. I think we do need to explore storing that neighbors list outside of NodeDB proper though.
I think that was an accident. I reopened. I want to track the discussion and evolution of this in the current PR. We'll wait to integrate when all of the major stakeholders are satisfied. |
Another thought I had reviewing the data model... do we need to send the rx_time timestamp of the last received message for a node. I understand storing it locally, but my assumption is that the purpose of that is it determine whether the link with the neighbor is still relevant in the topology. Could we instead store that on the device but omit from the neighbor payload based on an expiry? We would selectively filter out "stale" neighbors from that message before sending to the mesh. @GUVWAF I'd be curious about your perspective on this as well. |
If I'm understanding your question correctly, I believe the reason we wanted to include the time a packet was received from the neighbor was to allow the clients to choose when the edges between neighbors aren't relevant instead of the device. I think an underlying assumption I had was that the transmission time of the I think the other thought that comes to mind is that multiple devices having different timeout times (assuming the timeout times would be configurable) could potentially cause issues on the client, but that's something I'd probably have to think through a bit more. @uhuruhashimoto do you have any thoughts? If we could drop that field I could imagine that would be great for bandwidth |
That's an interesting question! I think the core of it is how we're defining validity for an edge. I've been thinking of it in terms of two kinds of time. If we define time in "sending cycles," we can make a blanket assumption that all edges for the previous sending cycle are valid. With a long period, this assumption could be untrue. With a short period, we take a bandwidth hit and could start dropping packets/losing info. If we want a more granular view, we can look at the specific time that each packet was received by a node (the rx time itself). This would make sense to do if our period was very long or if our graph topology was changing rapidly. Currently, we're just making the assumption that edges haven't dropped in the last period (the first assumption) and not doing anything with the rx time. Because we're using the same packet to deliver info and collect info, all of our edges will be one period old anyway, and so the specific rx time doesn't seem as important. We also tag each ...So basically what @ajmcquilkin said :) |
If you don't have plans to use it immediately, don't add it yet. We've had to do a lot of cleanup (some of it of my own fault) of unused fields in the protos. Will leave it up to you. |
Thanks! I'll check if I missed some specific reason we need it, and if not I'll take it out. The simpler the better! |
oops |
Removing myself as a reviewer. This is going in the right direction. |
The proposal is fine, i'd just try to minimize protocol related traffic. if you can make one slightly bigger packet instead of 2 its easier on the netowrk. I have the same dilemma with the Store&Forward heartbeat, and would rather ditch that extra packet every 5 minutes in favor of including the capability in the device telemetry. Maybe we can do something similar here? @thebentern (telemetry guy) thoughts? |
Following up on the question of the current usage of the let most_recent_data = if neighbor.rx_time > opposite_neighbor.rx_time {
return neighbor;
} else {
return &opposite_neighbor;
};
snr_hashmap.insert(
as_key(node_id_1, node_id_2),
GraphEdgeMetadata::new(most_recent_data.snr, most_recent_data.rx_time),
); Currently we haven't implemented graph edge timeout, but we're also looking to use the I'll give an example I've been thinking through in my head. Assume all nodes on a network are set to transmit the The additional consequence of not having the If the client received the t=1min edge weight after the t=4min edge weight (meaning the client would have to assume the t=1min edge weight was more recent), that would mean the weight of the edge would be 10min old at clear time t=11min. If the client received the t=4min weight after the t=1min weight (and would assume correctly that the t=4min edge weight was more recent), then the weight of the established edge would be 7min old at clear time t=11min. If we instead had the
Also note that this isn't a concern in terms of missing or forgetting to drop edges, but rather a concern that we have the potential to significantly increase how long an edge is valid, based on the time difference between the reception of a packet by the device and the reception of the resulting edge by the client. This being said, if cutting the size of the This is mostly to summarize my thoughts on this -- I would love to hear others' thoughts on this too. |
That could be far too frequent. That'd easily become a large tenant on the network. Think of how much network utilization you deem adequate and scale it based on the estimated network capacity. There's a preexisting library for to help with real time estimation. |
Agreed. I think we'd definitely want to default the interval to something closer to our standard 15 minute interval. |
Granted, sorry I should have made it more clear that the 5min interval was purely illustrative. If we move the timeout duration to 20min and the retransmission interval to 15min (both configurable), I would argue the problems discussed above would become significantly more pronounced from the perspective of the client UX. Maybe this is moving more to a firmware discussion, but I think a smart edge broadcast would also be an interesting idea to reduce the bandwidth hit, even at a 15min interval. |
Good point about the timestamps. If the
As for the
Smart broadcast would be a a bit tricky I think, because you wouldn’t know if the edge should be removed or if it is just not changed significantly. |
Removed tx and rx times as per discussion. The one thing I'm not quite clear on about this PR is the max size of the
Is this a reason for raising the max number of neighbors per packet? Do we lose bandwidth if we do so? |
In my mind, the reason is two-fold:
|
Yes and No. The Meshpacket Max Payload Size is 237 bytes. If you want to transmit neighbour info, you can up the max_count:10 till you reach that limit. Unused spots will not be used in encoding, so resulting packets will be smaller. Right now the calculated NeighborInfo_size is 142, so you could go to max_count:17 to reach 233 which will just fit. |
Goal
This PR sets up the protobuf changes necessary for every node to obtain and send info on who its neighbors are. It will have a companion firmware PR that sets up a
NeighborInfo
module.Implementation
This PR adds two new types of packet to the protobufs:
Neighbor
NeighborInfo
Bandwidth Implications
We're setting each
NeighborInfo
packet to hold a static field of 10Neighbors
, giving the packet a size of ~80 bytes. This increases the strain on the network. If we expect a small number of neighbors in most cases, this can be improved by limiting these packets to 5Neighbors
and sending multiple packets if necessary.Testing
Tested previously on simulator and in hardware with multiple nodes.