Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting the "refresh" operation for NodeGraph buckets #345

Closed
3 tasks done
joshuakarp opened this issue Feb 18, 2022 · 13 comments · Fixed by #378
Closed
3 tasks done

Supporting the "refresh" operation for NodeGraph buckets #345

joshuakarp opened this issue Feb 18, 2022 · 13 comments · Fixed by #378
Assignees
Labels
development Standard development r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices

Comments

@joshuakarp
Copy link
Contributor

joshuakarp commented Feb 18, 2022

Specification

Kademlia has this to say about refreshing buckets;

"Buckets will generally be kept constantly fresh, due to the traffic of requests travelling through nodes. To avoid pathological cases when no traffic exists, each node refreshes a bucket in whose range it has not performed a node lookup within an hour. Refreshing means picking a random ID in the bucket’s range and performing a node search for that ID. To join the network, a node u must have a contact to an already participating node w. u inserts w into the appropriate k-bucket. u then performs a node lookup for its own node ID. Finally, u refreshes all k-buckets further away than its closest neighbour. During the refreshes, u both populates its own k-buckets and inserts itself into other nodes’ k-buckets as necessary." kademlia spec

Given this we need to implement the following

  1. A refresh bucket method needs to be added that does the following. This may need to be part of an async queue since refreshing a bucket can be expensive.
    1. Picks a random node inside the range of a given bucket.
    2. preforms a node lookup on the generated NodeId.
  2. We need to refresh a bucket if it hasn't been used within a certain time. This defaults to 1 hour. The timeout can be implemented as a priority queue and a single timer tracking the shortest time, no need to maintain 255 timers at once.
    1. track timeouts for each bucket
    2. refresh timeout whenever a bucket is used.
    3. when a timeout on a bucket is triggered we need to do a refresh bucket operation on that bucket.
  3. When entering the network we need to refresh all k-buckets further away than its closest neighbour.
    1. when entering the network after searching for itself in the network the node then needs to refresh every bucket 'above' it's closest neighbour.

refreshBucket is pretty simple. It generates a random NodeId within the targeted bucket and preforms a nodeFind operation for that nodeId. We generate the desired 'distance' by generating 32 random bytes and then zeroing the bits above the target bucket bit and forcing 1 for the target bucket bit. The random NodeId is made by XOR-ing the distance with the base NodeId.

The activity timers for buckets were implemented the following way. We maintain a map of bucketIndex -> deadline to track when each bucket needs to be refreshed. We then maintain a single timer that triggers when the closest deadline has passed. When the timer is triggered we add any bucket that has passed it's deadline to the queue and reset the timer to the next closest deadline. When resetting timers for buckets that have been access we just update the map entry for that timer. if the updated bucket is the closest deadline then we update the timer again. When a bucket is is the queue, it's deadline is disabled by setting it to 0 and is not considered when updating the timer or adding to queue.

The refreshBucket queue is implemented using a set of unique bucket indexes so a bucket can only be in the queue once. The queue is asynchronously digested doing a single refreshBucket at a time. When a refreshBucket operation is completed for a bucket it's deadline is reinstated. if a bucket is updated while in the queue then it's deadline is reinstated and removed from the queue.

Entering the network has been updated. NodeConnectionManager.syncNodeGraph has been updated such that when it completes the initial sync it will add any buckets above the closest node's bucket to the refreshBucket queue.

Additional context

Tasks

  1. Implement refreshBucket as per the above spec.
  2. Implement refresh timeouts for every node bucket as per the above spec
  3. Update how the node enters the network as per the above spec.
@joshuakarp joshuakarp added the development Standard development label Feb 18, 2022
@CMCDragonkai CMCDragonkai self-assigned this Feb 21, 2022
@CMCDragonkai
Copy link
Member

With the new index sublevel, we can iterate over a key stream on the index, with the lt set to anything less than the current time minus 24 hours, or whatever the "TTL" is here, and then we can then ping these nodes and remove them if they are not live.

@CMCDragonkai
Copy link
Member

Note that the NodeConnectionManager has a TTL for each NodeConnection, this refresh policy would not be like this, because doing so would cause alot of network activity. So instead triggering the refresh operation might be based on a single TTL across the entire node when it is inactive. It depends on the spec. Will assign this to @tegefaulkes to find out.

@tegefaulkes
Copy link
Contributor

@tegefaulkes
Copy link
Contributor

Old spec for reference

Specification

The Kademlia paper and Wikipedia page mention a notion of bucket "refreshing":

Refreshing means picking a random ID in the bucket’s range and performing a node search for that ID.

In both the paper and the Wikipedia spec, this is performed in 2 different places:

  1. Bootstrapping: after initially querying the seed node for the k closest nodes to itself, the joining node refreshes all k-buckets further away than the k-bucket the bootstrap (seed) node falls in (alternatively, the paper simply suggests to refresh all k-buckets further away from the closest node you currently have - i.e. find non-empty bucket with smallest index)
    • this is quite a straightforward place. We'd simply find our closest node (or use our bootstrap node), and iterate over every bucket, calling getClosestGlobalNodes on your random node ID in the bucket's range
  2. Inactivity: if no node lookups have been performed in any given bucket's range for tRefresh (an hour in basic Kademlia) from http://xlattice.sourceforge.net/components/protocol/kademlia/specs.html#refresh.
    • this is a little trickier. We'd need a means of continuously evaluating the "liveliness" of a bucket
    • we already have a TTL for NodeConnections that we could use. Additionally, we also have a lastUpdated field in every node's entry in the NodeGraph. Both of these could be utilised for this purpose

It allows the NodeGraph to be in a relatively "fresh" state, and would also slightly mitigate the impact of Kademlia poisoning (i.e. by removing "inactive"/invalid nodes from our own NodeGraph, and replacing them with new nodes).

Recall that we have 256 buckets on each node (because we have a 256 bit node ID). It seems like this might be quite a resource heavy process if we perform 256 refreshes, but this will need to be investigated.

Also note that we already have a refreshBuckets operation, but this is completely unrelated to this procedure. This is for when our node ID changes (on key renewal, etc) and we need to re-place the nodes in their correct bucket according to our new node ID. A better name for this function might be reoganiseBuckets.

Additional context

Tasks

  1. Further investigate the benefits of the refresh operation
  2. Add getClosestGlobalNodes calls to the bootstrapping process for a new node
  3. Add means of evaluating the liveness of a bucket
  4. Use the liveness of a bucket to routinely make calls to getClosestGlobalNodes on "stale" buckets

tegefaulkes added a commit that referenced this issue Apr 11, 2022
This method preforms the kademlia `refreshBucket` operation. It selects a random node within the bucket and preforms a search for that node. The process exchanges node information with any nodes it connects to.

#345
@tegefaulkes
Copy link
Contributor

I'm implementing something like a queue for refreshing buckets. Refreshing a bucket can be pretty expensive and can potentially take a while but this all depends on the findNode implementation. We may need to revisit how findNode is implemented and check that it's working efficiently. So the queue makes sure that we are only doing one refreshBucket at a time sequentially.

For tidiness we need to wait for the current refreshBucket operation to resolve before we can finish stopping the NodeGraph otherwise we have the potential for a dangling promise. But since refreshBucket can possibly take quite a while to finish we may need a way to cancel a currently running one.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Apr 11, 2022

Shouldn't refresh bucket be done in the background? As in part of our queuing system too?

With #297 we can do a proper cancellation.

@tegefaulkes
Copy link
Contributor

That's the Idea i'm going for. I think at this rate we may just need to make a generic async queue class. I feel like we might need it again later.

@CMCDragonkai
Copy link
Member

There's some ideas in #329 which would involving using the DB which can provide a generic queue.

tegefaulkes added a commit that referenced this issue Apr 11, 2022
Added queuing for `refreshBucket`. This means that buckets will be refreshed one at a time sequentially. This is to avoid doing a lot of costly refreshing all at once.

Added no activity for buckets. If a bucket hasn't been touched for a while, 1 hour by default, it will add a refresh bucket operation to the queue. Timers are disabled for buckets already in the queue. Only 1 timer is used for all buckets since only one of them can have the shortest timer and that's all we really care about.

#345
@tegefaulkes
Copy link
Contributor

I don't entirely like the naming of the new methods I've made. Looking for suggestions for better names.

@tegefaulkes
Copy link
Contributor

We may need to review how the node currently enters the network with syncNodeGraph. See if it needs more updating.

tegefaulkes added a commit that referenced this issue Apr 12, 2022
`nodeConnectionManager.syncNodeGraph` now refreshes all buckets above the closest node as per the kademlia spec. This means adding a lot of buckets to the refresh bucket queue when an agent is started.

#345
@tegefaulkes
Copy link
Contributor

tegefaulkes commented Apr 12, 2022

Some potential problems I've noticed that may not be a part of this Issue.

  1. nodeConnectionManager.syncNodeGraph is not pinging the nodes before it adds them. We may need to implement a concurrent pinging queue to process them without running them all at once or just one at a time. the refresh buckets operation has to be done after the pinging and adding has been completed.
  2. the refreshBucket and by extension the findNode operation needs a way to be aborted, possibly using an abort controller. It can be a slow process and will currently block the stopping of the NodeManager. So we need a way to cancel it early without leaving a dangling promise.
  3. We may need some generic queue utility classes for quickly creating in memory and persistent async queues. this keeps coming up and likely will be needed again in the future. We need a standard way of creating and handling these queues. This likely should be it's own issue.

tegefaulkes added a commit that referenced this issue Apr 14, 2022
Added support to cancel out of a `refreshBucket` operation. This is to allow faster stopping of the `NodeManager` by aborting out of a slow `refreshBucket` operation. This has been implemented with the `AbortController`/`AbortSignal` API. This is not fully supported by Node14 so we're using the `node-abort-controller` to provide functionality for now.

#345
@tegefaulkes
Copy link
Contributor

We should look at using the generic queue class for this now. But we may need to extend Queue's functionality first since the implementation here uses a set for uniqueness. Queue's functionality will be expanded by #329.

@CMCDragonkai
Copy link
Member

Currently this making use of its own refresh queue separate from the nodes ping and set queue.

tegefaulkes added a commit that referenced this issue Jun 1, 2022
This method preforms the kademlia `refreshBucket` operation. It selects a random node within the bucket and preforms a search for that node. The process exchanges node information with any nodes it connects to.

#345
tegefaulkes added a commit that referenced this issue Jun 1, 2022
Added queuing for `refreshBucket`. This means that buckets will be refreshed one at a time sequentially. This is to avoid doing a lot of costly refreshing all at once.

Added no activity for buckets. If a bucket hasn't been touched for a while, 1 hour by default, it will add a refresh bucket operation to the queue. Timers are disabled for buckets already in the queue. Only 1 timer is used for all buckets since only one of them can have the shortest timer and that's all we really care about.

#345
tegefaulkes added a commit that referenced this issue Jun 1, 2022
`nodeConnectionManager.syncNodeGraph` now refreshes all buckets above the closest node as per the kademlia spec. This means adding a lot of buckets to the refresh bucket queue when an agent is started.

#345
tegefaulkes added a commit that referenced this issue Jun 1, 2022
This method preforms the kademlia `refreshBucket` operation. It selects a random node within the bucket and preforms a search for that node. The process exchanges node information with any nodes it connects to.

#345
tegefaulkes added a commit that referenced this issue Jun 1, 2022
Added queuing for `refreshBucket`. This means that buckets will be refreshed one at a time sequentially. This is to avoid doing a lot of costly refreshing all at once.

Added no activity for buckets. If a bucket hasn't been touched for a while, 1 hour by default, it will add a refresh bucket operation to the queue. Timers are disabled for buckets already in the queue. Only 1 timer is used for all buckets since only one of them can have the shortest timer and that's all we really care about.

#345
tegefaulkes added a commit that referenced this issue Jun 1, 2022
`nodeConnectionManager.syncNodeGraph` now refreshes all buckets above the closest node as per the kademlia spec. This means adding a lot of buckets to the refresh bucket queue when an agent is started.

#345
tegefaulkes added a commit that referenced this issue Jun 1, 2022
Added support to cancel out of a `refreshBucket` operation. This is to allow faster stopping of the `NodeManager` by aborting out of a slow `refreshBucket` operation. This has been implemented with the `AbortController`/`AbortSignal` API. This is not fully supported by Node14 so we're using the `node-abort-controller` to provide functionality for now.

#345
tegefaulkes added a commit that referenced this issue Jun 2, 2022
This method preforms the kademlia `refreshBucket` operation. It selects a random node within the bucket and preforms a search for that node. The process exchanges node information with any nodes it connects to.

#345
tegefaulkes added a commit that referenced this issue Jun 2, 2022
Added queuing for `refreshBucket`. This means that buckets will be refreshed one at a time sequentially. This is to avoid doing a lot of costly refreshing all at once.

Added no activity for buckets. If a bucket hasn't been touched for a while, 1 hour by default, it will add a refresh bucket operation to the queue. Timers are disabled for buckets already in the queue. Only 1 timer is used for all buckets since only one of them can have the shortest timer and that's all we really care about.

#345
tegefaulkes added a commit that referenced this issue Jun 2, 2022
`nodeConnectionManager.syncNodeGraph` now refreshes all buckets above the closest node as per the kademlia spec. This means adding a lot of buckets to the refresh bucket queue when an agent is started.

#345
tegefaulkes added a commit that referenced this issue Jun 2, 2022
Added support to cancel out of a `refreshBucket` operation. This is to allow faster stopping of the `NodeManager` by aborting out of a slow `refreshBucket` operation. This has been implemented with the `AbortController`/`AbortSignal` API. This is not fully supported by Node14 so we're using the `node-abort-controller` to provide functionality for now.

#345
tegefaulkes added a commit that referenced this issue Jun 6, 2022
This method preforms the kademlia `refreshBucket` operation. It selects a random node within the bucket and preforms a search for that node. The process exchanges node information with any nodes it connects to.

#345
tegefaulkes added a commit that referenced this issue Jun 6, 2022
Added queuing for `refreshBucket`. This means that buckets will be refreshed one at a time sequentially. This is to avoid doing a lot of costly refreshing all at once.

Added no activity for buckets. If a bucket hasn't been touched for a while, 1 hour by default, it will add a refresh bucket operation to the queue. Timers are disabled for buckets already in the queue. Only 1 timer is used for all buckets since only one of them can have the shortest timer and that's all we really care about.

#345
tegefaulkes added a commit that referenced this issue Jun 6, 2022
`nodeConnectionManager.syncNodeGraph` now refreshes all buckets above the closest node as per the kademlia spec. This means adding a lot of buckets to the refresh bucket queue when an agent is started.

#345
tegefaulkes added a commit that referenced this issue Jun 6, 2022
Added support to cancel out of a `refreshBucket` operation. This is to allow faster stopping of the `NodeManager` by aborting out of a slow `refreshBucket` operation. This has been implemented with the `AbortController`/`AbortSignal` API. This is not fully supported by Node14 so we're using the `node-abort-controller` to provide functionality for now.

#345
tegefaulkes added a commit that referenced this issue Jun 7, 2022
This method preforms the kademlia `refreshBucket` operation. It selects a random node within the bucket and preforms a search for that node. The process exchanges node information with any nodes it connects to.

#345
tegefaulkes added a commit that referenced this issue Jun 7, 2022
Added queuing for `refreshBucket`. This means that buckets will be refreshed one at a time sequentially. This is to avoid doing a lot of costly refreshing all at once.

Added no activity for buckets. If a bucket hasn't been touched for a while, 1 hour by default, it will add a refresh bucket operation to the queue. Timers are disabled for buckets already in the queue. Only 1 timer is used for all buckets since only one of them can have the shortest timer and that's all we really care about.

#345
tegefaulkes added a commit that referenced this issue Jun 7, 2022
`nodeConnectionManager.syncNodeGraph` now refreshes all buckets above the closest node as per the kademlia spec. This means adding a lot of buckets to the refresh bucket queue when an agent is started.

#345
tegefaulkes added a commit that referenced this issue Jun 7, 2022
Added support to cancel out of a `refreshBucket` operation. This is to allow faster stopping of the `NodeManager` by aborting out of a slow `refreshBucket` operation. This has been implemented with the `AbortController`/`AbortSignal` API. This is not fully supported by Node14 so we're using the `node-abort-controller` to provide functionality for now.

#345
tegefaulkes added a commit that referenced this issue Jun 10, 2022
This method preforms the kademlia `refreshBucket` operation. It selects a random node within the bucket and preforms a search for that node. The process exchanges node information with any nodes it connects to.

#345
tegefaulkes added a commit that referenced this issue Jun 10, 2022
Added queuing for `refreshBucket`. This means that buckets will be refreshed one at a time sequentially. This is to avoid doing a lot of costly refreshing all at once.

Added no activity for buckets. If a bucket hasn't been touched for a while, 1 hour by default, it will add a refresh bucket operation to the queue. Timers are disabled for buckets already in the queue. Only 1 timer is used for all buckets since only one of them can have the shortest timer and that's all we really care about.

#345
tegefaulkes added a commit that referenced this issue Jun 10, 2022
`nodeConnectionManager.syncNodeGraph` now refreshes all buckets above the closest node as per the kademlia spec. This means adding a lot of buckets to the refresh bucket queue when an agent is started.

#345
tegefaulkes added a commit that referenced this issue Jun 10, 2022
Added support to cancel out of a `refreshBucket` operation. This is to allow faster stopping of the `NodeManager` by aborting out of a slow `refreshBucket` operation. This has been implemented with the `AbortController`/`AbortSignal` API. This is not fully supported by Node14 so we're using the `node-abort-controller` to provide functionality for now.

#345
emmacasolin pushed a commit that referenced this issue Jun 14, 2022
This method preforms the kademlia `refreshBucket` operation. It selects a random node within the bucket and preforms a search for that node. The process exchanges node information with any nodes it connects to.

#345
emmacasolin pushed a commit that referenced this issue Jun 14, 2022
Added queuing for `refreshBucket`. This means that buckets will be refreshed one at a time sequentially. This is to avoid doing a lot of costly refreshing all at once.

Added no activity for buckets. If a bucket hasn't been touched for a while, 1 hour by default, it will add a refresh bucket operation to the queue. Timers are disabled for buckets already in the queue. Only 1 timer is used for all buckets since only one of them can have the shortest timer and that's all we really care about.

#345
emmacasolin pushed a commit that referenced this issue Jun 14, 2022
`nodeConnectionManager.syncNodeGraph` now refreshes all buckets above the closest node as per the kademlia spec. This means adding a lot of buckets to the refresh bucket queue when an agent is started.

#345
emmacasolin pushed a commit that referenced this issue Jun 14, 2022
Added support to cancel out of a `refreshBucket` operation. This is to allow faster stopping of the `NodeManager` by aborting out of a slow `refreshBucket` operation. This has been implemented with the `AbortController`/`AbortSignal` API. This is not fully supported by Node14 so we're using the `node-abort-controller` to provide functionality for now.

#345
emmacasolin pushed a commit that referenced this issue Jun 14, 2022
This method preforms the kademlia `refreshBucket` operation. It selects a random node within the bucket and preforms a search for that node. The process exchanges node information with any nodes it connects to.

#345
emmacasolin pushed a commit that referenced this issue Jun 14, 2022
Added queuing for `refreshBucket`. This means that buckets will be refreshed one at a time sequentially. This is to avoid doing a lot of costly refreshing all at once.

Added no activity for buckets. If a bucket hasn't been touched for a while, 1 hour by default, it will add a refresh bucket operation to the queue. Timers are disabled for buckets already in the queue. Only 1 timer is used for all buckets since only one of them can have the shortest timer and that's all we really care about.

#345
emmacasolin pushed a commit that referenced this issue Jun 14, 2022
`nodeConnectionManager.syncNodeGraph` now refreshes all buckets above the closest node as per the kademlia spec. This means adding a lot of buckets to the refresh bucket queue when an agent is started.

#345
emmacasolin pushed a commit that referenced this issue Jun 14, 2022
Added support to cancel out of a `refreshBucket` operation. This is to allow faster stopping of the `NodeManager` by aborting out of a slow `refreshBucket` operation. This has been implemented with the `AbortController`/`AbortSignal` API. This is not fully supported by Node14 so we're using the `node-abort-controller` to provide functionality for now.

#345
@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices label Jul 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:core activity 4 End to End Networking behind Consumer NAT Devices
Development

Successfully merging a pull request may close this issue.

3 participants