-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve node search over k-buckets (getClosestLocalNode) #212
Comments
With the NodeGraph using sublevels to represent buckets, you can now use If you cannot fill out the k buffer, then just start streaming to the right. Then you can return it. |
Reposting from #326 (comment) I've done some further prototyping on the NodeIds that are in each bucket. And this has an important impact on how we do the The first thing to realise is that And because they are just a sequence of bytes, there is a numeric equivalence of The
Note that this means our The buckets sublevel stores keys in lexicographic order, this means we are in ascending order of bucket indexes by default, and then in each bucket, the node IDs are also in lexicographic order. This is achieved by I believe lexicographic order of the raw A 1-byte NodeId like Supposing lexicographic order of However upon testing with more node IDs, this did not end up being the case. The end result is that for any given
This can be applied to any NodeId using XOR, because the inverse of XOR is XOR. So:
Afterwards, all other Then each bucket has Here's an example to prove this: // Create 100 NodeIds
// They all have a size of 1 byte
// Numerically they represent 0 - 99
const nodeIds: Array<NodeId> = Array.from(
{length: 100},
(_, i) => IdInternal.create<NodeId>(utils.bigInt2Bytes(BigInt(i), 1))
);
const results: Array<any> = [];
for (let i = 0; i < nodeIds.length; i++) {
let bucketIndex;
let distance;
try {
// 77 is the chosen NodeId
bucketIndex = nodesUtils.bucketIndex(nodeIds[77], nodeIds[i]);
distance = nodesUtils.nodeDistance(nodeIds[77], nodeIds[i]);
} catch (e) {
// Ignore the chosen NodeId
}
results.push(
[
i,
nodeIds[i],
bucketIndex,
distance
]
);
}
console.log(results); The output is:
As you can see, bucket 2 & 3 are left-of NodeId 77, while bucket 4 & 5 is right-of NodeId 77 and finally bucket 6 is on the far-left. Additionally, the numeric/lexicographic order for each NodeId does not match an ordered distance. However it is true that all possible NodeIds in particular bucket would be contiguous numerically. This means when getting the closest node IDs, we still need to sort each bucket's item by the distance function. This is bounded by the bucket size since buckets are limited to 20 nodes by default. I'm writing 2 methods that do similar things:
I originally thought that I could have The 2 sort-orders within buckets that are useful are ultimately:
So this means I'll need to bring back the @tegefaulkes - you'll need to beware of this to solve this issue. |
Insertion sort would be nice to use here, but I fear it won't be efficient because arrays will end up being resized. So once you map all the values to distance, you then just sort them all. |
This test was added, so it proves that lexicographic order is equal to |
A |
We cannot do that as I explained in #158. The distances are different because sometimes we want to know the closest nodes to a node that isn't the own node. So there's no way to just index by 1 distance metric. You have to use the way I'm doing it in |
I suggest a signature like:
This is similar to |
Use this comment: #158 (comment) |
With |
In your tests for this, make sure to add in a test that the returned closest nodes should be ordered by distance ascending. Example of doing this: const closestNodeDistances = closestNodes.map(([nodeId]) => nodesUtils.nodeDistance(sourceNodeId, nodeId));
expect(
closestNodeDistances.slice(1).every((distance, i) => {
return closestNodeDistances[i] <= distance;
})
).toBe(true); I'm actually not sure if 2 nodes could have the same distance away from a given node ID, if that's not possible then you can change that to |
I have an implementation that works but it may not be the best solution right now. Since a bucket will not be sorted by distance AND we want to sort the output by the 20 closest nodes. when we get some nodes from a bucket we need to get the whole bucket. My first implementation started by iterating over each node with a limit set. but doing that it was possible to end up stopping half way into a bucket. Since streaming like this it is not easy to tell where the boundary of the bucket is. What I have now is I'm getting each bucket to the left of the starting bucket inclusive and the right of the starting bucket exclusive until I reach the limit. sort this list and truncate it down to the limit. However we can end up trying to read a lot of empty buckets this way. It's possible I can do the stream method until I reach the limit and then read the whole bucket at that last Id. I can then take the union of the stream list, sort and truncate the result. This will give us the best of both worlds. |
Yes you have to acquire the full bucket and sort it by distance. This is fine, because you know that the bucket limit is a fixed relatively small number. This is why we are using
You can tell what the boundary of the bucket is, see how we do this in
You should not be reading any empty buckets. Do it the same way that |
Created Created some tests testing for conditions where
|
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Mind you this just adds the implementation to the |
Copy of old spec for referenceSpecificationCurrently,
However, recall that the buckets are organised according to the bitwise "distance" from the current node to another. Each key in the database represents a bucket index, However, we need to remember how Kademlia functions. Not all of our buckets are going to have nodes in them (that is, there may not even be created buckets in the database at some of these indexes). Therefore, it's not just a simple case of looking at the adjacent buckets then - because you're pretty unlikely to find an immediately adjacent bucket that already has a node in it. So then the bigger question is, how do you inform the search to find the closest nodes? Additional context
TasksAfter prior discussion with @CMCDragonkai, we believe the following process should provide a sufficient improvement over the current exhaustive search. The following should be implemented (or improved): For a node
We could use a |
Need to check the |
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Reviewing how distance bucket index and distance maps over to a Sorted by base distance
Sorted by target distance
Everything within the same bucket sorted by base distance
So based on this observation we can conclude
This is the bucket switching behaviour we discussed when talking about the buckets Knowing this, we should get nodes in the following order. where N is the target's bucket and K is the highest bucket...
The code mostly followed this procedure, I just need to make a small fix. |
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Applied a fix. |
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Implemented `getClosestNodes()` and relevant tests in `NodeGraph.test.ts`. Relates to #212
Specification
To optimise reading out the closes nodes to a target node we need to apply some improvements.
The
getClosestNode
function needs to take anodeId
andlimit
as parameters. The nodeId is the node we're calculating the distance relative to. The limit is how many nodes we wish to return. The limit defaults to thenodeBucketLimit
as per the Kademlia spec.We need to avoid reading out all of the buckets and iterating over empty buckets. This can be achieved by using a
readStream
over thenodeGraphBucketsDb
level. This level contains sub levels for each bucket. Each sub level contains the nodeId:nodeInfo key:value pairs. Using thenodeGraphBucketsDb
level we can iterate over each stored node in bucket order all at once. Note when setting the gt or lt on the stream we need to start from the desired bucket. In this case the starting point is the bucket 'above' the desired starting bucket. the key we want to start from takes the form of aBuffer
with<prefix><higherBucketId><prefix>
. Iterating less than this gives us the target bucket plus all lower buckets. Above this is all of the higher buckets.When we run out of lower buckets we need to iterate over the higher buckets from where we started. If we run into limit while iterating over the nodes we need to get the whole of the last bucket we read. since nodes are out of order within a bucket we need whole buckets to ensure we obtain the closest nodes.
The resulting list is sorted by distance using
nodesUtils.bucketSortByDistance
and the list is truncated down to the provided limit.As implemented
We iterate over the nodes directly across the buckets. the nodes are read out in the following order.
N
When we reach our specified limit we read the whole of the last bucket we've read and add that to the list. we then sort all of the nodes and truncate the list back down to the limit and return that.
Additional context
Tasks
getClosestNodes
implementation to use areadStream
to iterate over each node sequentially across all of the buckets.higher
buckets.The text was updated successfully, but these errors were encountered: