Skip to content

Conversation

njgheorghita
Copy link

What was wrong?

Change get bucket for node search from linear to binary

How was it fixed?

Used total_ordering and bisect modules

Cute Animal Picture

giphy-downsized-large 3

Copy link

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really solid.


def __le__(self, other):
if not isinstance(other, self.__class__):
return super(KBucket, self).__le__(other)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this isn't an error condition? If they aren't both KBuckets should we be doing comparisons?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope i can't, changed it to raise a TypeError

def binary_get_bucket_for_node(buckets, node):
"""Return the bucket for a given node."""
sorted_buckets = sorted(buckets)
bucket_ends = list(KBucket.end for KBucket in sorted_buckets)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canonical version of this is just to do it as a list comprehension:

bucket_ends = [bucket.end for bucket in sorted_buckets]

Though I like tuples since they are immutable (surprise surprise!) so I'd be fine with this too.

bucket_ends = tuple(bucket.end for bucket in sorted_buckets)

bucket_ends = list(KBucket.end for KBucket in sorted_buckets)
bucket_position = bisect.bisect_left(bucket_ends, node.id)
# Prevents edge cases where bisect_left returns an out of range index
if bucket_position >= len(buckets):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's functionally the same but I like it better because it optimizes for the success case rather than adding overhead pre-checking for the failure case.

try:
    bucket = sorted_buckets[bucket_position]
except IndexError:
    raise ValueError("No bucket found for node with id {}".format(node.id))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to know, i thought about doing this but couldn't think of a good reason to, now i do!


def binary_get_bucket_for_node(buckets, node):
"""Return the bucket for a given node."""
sorted_buckets = sorted(buckets)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the reasons why @pipermerriam wanted to use binary search here is to avoid unnecessarily iterating over all buckets in get_bucket_for_node(), but this kind of defeats the purpose, no?

Copy link
Author

@njgheorghita njgheorghita Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gsalgado Yeah, didn't think this was the best way to do it. Is it safe to assume that the list of buckets passed in will always be ordered? (looks kinda like that way with how split_bucket's implemented). Or should I implement a bisect.insort_left() somewhere to make sure the list stays ordered?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current implementation of KBucket.split()/Routing.split_bucket(), yeah, the list will always be sorted, so there's no need for the insort_left(), but this function's docstring should make it clear that it assumes the list is sorted. Although this kind of leaves me wondering if all this extra complexity we're adding is really justified, or if this is just premature optimization that will bite us down the road

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's possible, it's hard for me to say. @pipermerriam ?

raise ValueError("No bucket found for node with id {}".format(node.id))
bucket = sorted_buckets[bucket_position]
if not bucket.start <= node.id <= bucket.end:
raise ValueError("No bucket found for node with id {}".format(node.id))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this happen?

Copy link
Author

@njgheorghita njgheorghita Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gsalgado Well, kind of as a safe-check, and I was thinking about if a list of buckets was input that was out of range or had breaks - i.e. KBucket(2,3) and KBucket(5,10) and node.id was 4 or 1. But it looks as though that's not possible with how split_bucket is implemented?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's not possible given how KBucket.split() is implemented, but more importantly, if the position returned by bisect exists on the list, this check would never fail, would it?

Copy link
Author

@njgheorghita njgheorghita Sep 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If node.id is for some reason below the lowest bucket.start or above the highest bucket.end, though I'm not sure how likely that is?? I can remove the check if it's not.

)
def test_binary_get_bucket_for_node(bucket_list, node_id, correct):
node = random_node()
node.id = node_id
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass a nodeid argument to random_node()

if correct is None:
with pytest.raises(ValueError):
kademlia.binary_get_bucket_for_node(bucket_list, node)
else:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it doesn't make sense to have tests for verify that our tests work as expected, it's a good idea to keep them as simple as possible and avoid any conditional logic in them. In this case that can easily be done by splitting the test into one for the success cases and another for the failure cases

def __le__(self, other):
if not isinstance(other, self.__class__):
raise TypeError("Cannot compare KBucket with type {}.".format(other.__class__))
return self.end <= other.end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you compare against other.end instead of other.start?

raise ValueError("No bucket found for node with id {}".format(node.id))
bucket = sorted_buckets[bucket_position]
if not bucket.start <= node.id <= bucket.end:
raise ValueError("No bucket found for node with id {}".format(node.id))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's not possible given how KBucket.split() is implemented, but more importantly, if the position returned by bisect exists on the list, this check would never fail, would it?

# Check for invalid state of KBuckets
if not self.end < other.start:
raise ValueError("Invalid Buckets.")
return self.end < other.start
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this code right, this method can only return True or a ValueError, but never False. Is that what you intended, and, if so, why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, good catch. Was just tinkering with a way to check for invalid state of KBuckets - forgot it was there when I pushed.

@pipermerriam
Copy link

👍 looks good to merge.

gsalgado pushed a commit that referenced this pull request May 25, 2018
# This is the 1st commit message:

fixes ethereum#760 ethereum#762 ethereum#737

# The commit message #2 will be skipped:

# use eth-utils big endian integer utils

# The commit message #3 will be skipped:

# Fix IndexError when an empty bucket is encountered while looking up nodes

# The commit message #4 will be skipped:

# dirty
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants