Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate TaskManager into NodeGraph and Discovery #445

Merged
merged 40 commits into from
Sep 21, 2022

Conversation

tegefaulkes
Copy link
Contributor

@tegefaulkes tegefaulkes commented Sep 2, 2022

Description

This PR focuses on updating Nodes, Discovery domain to use the new Tasks system.

Generally there are 3 places where background queues are being used. Each of these need to be updated to use the new tasks system from #438.

  1. Discovery system
  2. Nodes pinging and authenticating nodes.
  3. Nodes refreshing buckets on a timeout.
  4. NodeConnectionManager Does the network entry procedure with syncNodeGraph.

SetNode details

In NodeConnectionManager when adding a node to the NodeGraph with nodeManager.setNode we can end up with the case where a bucket is full. When this happens we need to ping nodes within the bucket to determine if they're still alive and remove any nodes that don't respond so we can add the new one. We need to convert this to use the scheduler/queue.

By default the nodeManager.setNode doesn't ping a node to check if it's online before adding it. It is expected that you ping the node before using setNode to add it. We add nodes whenever we interact or discover a node. This can happen in the cases of... We learn about a node from other nodes, the a connection to the learned node hasn't been made so it needs to be pinged. A node has connected to us so it just needs to be added. We connected to a node, details need to be updated. Given that we needed the ability to ping and setNode or just setNode.

RefreshBuckets details

nodeManager.refreshBucket needs to updated to use the new scheduler/queue system. In this case we will be making use of the Scheduler features. A refresh bucket operation needs to be run on a bucket if the bucket hasn't seen activity for an hour. Given this we need schedule each bucket for an hour delay. If a bucket is updated we need to reset the timer for the bucket. To do this we need to make use of the taskPath and timer updating features of the Scheduler. These refreshBucket tasks should be the lowest priority.

Having refreshBuckets system use the Tasks system is a little complex since it works on a kind of watchdog system. here are the requirements of the refreshBucket system

  1. A refreshBucket operation selects a random NodeId within the target bucket's range of nodes and preforms a search for that node.
  2. Each bucket needs to be scheduled to do a refreshBucket operation every hour.
  3. If data is updated within a bucket then this scheduled delay is reset to an hour.

here are some relevant constraints of the tasks system.

  1. A task is scheduled with a delay and is removed when cancelled or completed.
  2. A task state can be checked. It can be scheduled, queued, active, success or failure.
  3. A scheduled task's delay can be updated if it is in the scheduled state.
  4. A task can be 'grouped' using the task's path, We can track a task for a single bucket using a path such as ['refreshBucket', bucketIndex]. This can enable us to find existing tasks for buckets.

Given these constraints I think we need to do the following.

  1. During start we need to iterate over the existing refreshBucket tasks using the tasks.getTasksByPath, reset the delay on existing tasks and create ones for buckets missing them.
  2. When any bucket get updated using nodeManager.setNode then we need to update the delay of that refresh bucket task. This can be done by getting the task for the bucket using tasks.getTasksbyPath and updating the scheduled delay if the task is in the scheduled state. If it is queued or Active then ignore, if not task exist create one. The tasks's delay can be updated by either canceling it and re-creating or using a provided updateDelay method.
  3. The tasks themselves are technically emphemiral so we could remove all refreshBuckets tasks during nodeManager.stop(). Minor detail, won't really make a difference to operation.

NodeConenctionManager details

In the NodeConnectionManager we have a method syncNodeGraph that does the following proceedure. If syncNodeGrap is run with blocking = false then we need to run it in the background.

  1. for each seed node we request the closest nodes to our NodeId. Then go through this list pinging and adding their details to our nodeGraph.
  2. Then for every bucket above the closet node's bucket we run a refreshBucket operation.

Looking over this I think it should be part of NodeManager and making calls to NodeConnectionManager for pinging.

As for how to implement this using Tasks.

  1. Do an initial getRemoteNodeClosestNodes to the seed node. This is blocking but it could also be a task.
  2. The first step then schedules pingAndAdd tasks for each of the closest nodes.
  3. The first step then schedules with 0 delay a refreshBucket operation for every bucket above the closest node's bucket.

Other details

Other aspects to consider. The Kademlia findNode operation in the NodeConnectionManager is considered a single operation but is in essence a priority queue search for the target node. We can consider splitting this up into a compound task where each step in the search can be it's own task within the process. This would apply to the refreshBucket operation since that is doing a findNode as well.

Handlers will need to support cancel-ability. They must take an abort signal and quickly end operation.

Issues Fixed

Tasks

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@ghost
Copy link

ghost commented Sep 2, 2022

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

@tegefaulkes tegefaulkes changed the base branch from staging to feature-queue September 2, 2022 08:09
@tegefaulkes tegefaulkes changed the title Feature tasks implementation Integrate Tasks into NodeGraph and Discovery Sep 6, 2022
@tegefaulkes
Copy link
Contributor Author

Looking at the syncNodeGraph in the NodeConnectionManager I think we can move that functionality into the NodeManager and avoid having the NodeConnectionManager depend on the Tasks. But it does fit within the context of other NodeConnectionManager methods such as findNode, getClosestGlobalNodes and getRemoteNodeClosestNodes so I'm not sure.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Sep 7, 2022 via email

@tegefaulkes
Copy link
Contributor Author

That function does the initial network entry procedure for Kademlia by getting the closes nodes to yourself, pining and adding them to your grapth. Then setting up initial refreshBuckets operations to fill in the rest of the NodeGraph.

@CMCDragonkai
Copy link
Member

@CMCDragonkai
Copy link
Member

Then we redeploy testnet and focus on #441.

@CMCDragonkai
Copy link
Member

Discovery has a bug according to vscode:

    // If we don't have one then we can't request data so just skip
    if (authIdentityIds === [] || authIdentityIds[0] == null) {
      return undefined;
    }

It has never had a proper review, so it's architecture should be reviewed and probably refactored to align to the model we have developed in the Tasks system, since it's also something that has background tasks. In fact... now that the tasks system centralises background task processing, it's possible that we can have discovery and nodegraph both delegate repeat-processing to the tasks system, and remove their own internal loops, thus simplifying how discovery and nodegraph works!!

@CMCDragonkai CMCDragonkai force-pushed the feature-queue branch 18 times, most recently from 24e2cb4 to 63fde43 Compare September 11, 2022 15:18
@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Sep 21, 2022

Ok, this is good to merge now. There are 3 test domains that need an eyeball still.

  • Discovery
  • tests/nat
  • Nodes - I think I got all the bugs in this but waiting for CI to finish will take a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment