-
Notifications
You must be signed in to change notification settings - Fork 1
Add binary search to get bucket for node #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: p2p
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really solid.
evm/p2p/kademlia.py
Outdated
|
||
def __le__(self, other): | ||
if not isinstance(other, self.__class__): | ||
return super(KBucket, self).__le__(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this isn't an error condition? If they aren't both KBuckets
should we be doing comparisons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope i can't, changed it to raise a TypeError
evm/p2p/kademlia.py
Outdated
def binary_get_bucket_for_node(buckets, node): | ||
"""Return the bucket for a given node.""" | ||
sorted_buckets = sorted(buckets) | ||
bucket_ends = list(KBucket.end for KBucket in sorted_buckets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
canonical version of this is just to do it as a list comprehension:
bucket_ends = [bucket.end for bucket in sorted_buckets]
Though I like tuples since they are immutable (surprise surprise!) so I'd be fine with this too.
bucket_ends = tuple(bucket.end for bucket in sorted_buckets)
evm/p2p/kademlia.py
Outdated
bucket_ends = list(KBucket.end for KBucket in sorted_buckets) | ||
bucket_position = bisect.bisect_left(bucket_ends, node.id) | ||
# Prevents edge cases where bisect_left returns an out of range index | ||
if bucket_position >= len(buckets): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's functionally the same but I like it better because it optimizes for the success case rather than adding overhead pre-checking for the failure case.
try:
bucket = sorted_buckets[bucket_position]
except IndexError:
raise ValueError("No bucket found for node with id {}".format(node.id))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to know, i thought about doing this but couldn't think of a good reason to, now i do!
evm/p2p/kademlia.py
Outdated
|
||
def binary_get_bucket_for_node(buckets, node): | ||
"""Return the bucket for a given node.""" | ||
sorted_buckets = sorted(buckets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the reasons why @pipermerriam wanted to use binary search here is to avoid unnecessarily iterating over all buckets in get_bucket_for_node(), but this kind of defeats the purpose, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gsalgado Yeah, didn't think this was the best way to do it. Is it safe to assume that the list of buckets passed in will always be ordered? (looks kinda like that way with how split_bucket's implemented). Or should I implement a bisect.insort_left()
somewhere to make sure the list stays ordered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current implementation of KBucket.split()/Routing.split_bucket(), yeah, the list will always be sorted, so there's no need for the insort_left(), but this function's docstring should make it clear that it assumes the list is sorted. Although this kind of leaves me wondering if all this extra complexity we're adding is really justified, or if this is just premature optimization that will bite us down the road
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's possible, it's hard for me to say. @pipermerriam ?
raise ValueError("No bucket found for node with id {}".format(node.id)) | ||
bucket = sorted_buckets[bucket_position] | ||
if not bucket.start <= node.id <= bucket.end: | ||
raise ValueError("No bucket found for node with id {}".format(node.id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gsalgado Well, kind of as a safe-check, and I was thinking about if a list of buckets was input that was out of range or had breaks - i.e. KBucket(2,3)
and KBucket(5,10)
and node.id was 4 or 1. But it looks as though that's not possible with how split_bucket is implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's not possible given how KBucket.split() is implemented, but more importantly, if the position returned by bisect exists on the list, this check would never fail, would it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If node.id
is for some reason below the lowest bucket.start
or above the highest bucket.end
, though I'm not sure how likely that is?? I can remove the check if it's not.
evm/p2p/test_kademlia.py
Outdated
) | ||
def test_binary_get_bucket_for_node(bucket_list, node_id, correct): | ||
node = random_node() | ||
node.id = node_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can pass a nodeid argument to random_node()
evm/p2p/test_kademlia.py
Outdated
if correct is None: | ||
with pytest.raises(ValueError): | ||
kademlia.binary_get_bucket_for_node(bucket_list, node) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it doesn't make sense to have tests for verify that our tests work as expected, it's a good idea to keep them as simple as possible and avoid any conditional logic in them. In this case that can easily be done by splitting the test into one for the success cases and another for the failure cases
evm/p2p/kademlia.py
Outdated
def __le__(self, other): | ||
if not isinstance(other, self.__class__): | ||
raise TypeError("Cannot compare KBucket with type {}.".format(other.__class__)) | ||
return self.end <= other.end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you compare against other.end instead of other.start?
raise ValueError("No bucket found for node with id {}".format(node.id)) | ||
bucket = sorted_buckets[bucket_position] | ||
if not bucket.start <= node.id <= bucket.end: | ||
raise ValueError("No bucket found for node with id {}".format(node.id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's not possible given how KBucket.split() is implemented, but more importantly, if the position returned by bisect exists on the list, this check would never fail, would it?
# Check for invalid state of KBuckets | ||
if not self.end < other.start: | ||
raise ValueError("Invalid Buckets.") | ||
return self.end < other.start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this code right, this method can only return True or a ValueError, but never False. Is that what you intended, and, if so, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, good catch. Was just tinkering with a way to check for invalid state of KBuckets - forgot it was there when I pushed.
👍 looks good to merge. |
# This is the 1st commit message: fixes ethereum#760 ethereum#762 ethereum#737 # The commit message #2 will be skipped: # use eth-utils big endian integer utils # The commit message #3 will be skipped: # Fix IndexError when an empty bucket is encountered while looking up nodes # The commit message #4 will be skipped: # dirty
What was wrong?
Change get bucket for node search from linear to binary
How was it fixed?
Used total_ordering and bisect modules
Cute Animal Picture