Add binary search to get bucket for node #4

njgheorghita · 2017-09-08T21:12:08Z

What was wrong?

Change get bucket for node search from linear to binary

How was it fixed?

Used total_ordering and bisect modules

Cute Animal Picture

pipermerriam

This looks really solid.

pipermerriam · 2017-09-08T21:14:53Z

evm/p2p/kademlia.py


+    def __le__(self, other):
+        if not isinstance(other, self.__class__):
+            return super(KBucket, self).__le__(other)


Can you explain why this isn't an error condition? If they aren't both KBuckets should we be doing comparisons?

nope i can't, changed it to raise a TypeError

pipermerriam · 2017-09-08T21:17:00Z

evm/p2p/kademlia.py

+def binary_get_bucket_for_node(buckets, node):
+    """Return the bucket for a given node."""
+    sorted_buckets = sorted(buckets)
+    bucket_ends = list(KBucket.end for KBucket in sorted_buckets)


canonical version of this is just to do it as a list comprehension:

bucket_ends = [bucket.end for bucket in sorted_buckets]

Though I like tuples since they are immutable (surprise surprise!) so I'd be fine with this too.

bucket_ends = tuple(bucket.end for bucket in sorted_buckets)

pipermerriam · 2017-09-08T21:18:33Z

evm/p2p/kademlia.py

+    bucket_ends = list(KBucket.end for KBucket in sorted_buckets)
+    bucket_position = bisect.bisect_left(bucket_ends, node.id)
+    # Prevents edge cases where bisect_left returns an out of range index
+    if bucket_position >= len(buckets):


It's functionally the same but I like it better because it optimizes for the success case rather than adding overhead pre-checking for the failure case.

try: bucket = sorted_buckets[bucket_position] except IndexError: raise ValueError("No bucket found for node with id {}".format(node.id))

good to know, i thought about doing this but couldn't think of a good reason to, now i do!

gsalgado · 2017-09-11T13:27:46Z

evm/p2p/kademlia.py


+def binary_get_bucket_for_node(buckets, node):
+    """Return the bucket for a given node."""
+    sorted_buckets = sorted(buckets)


One of the reasons why @pipermerriam wanted to use binary search here is to avoid unnecessarily iterating over all buckets in get_bucket_for_node(), but this kind of defeats the purpose, no?

@gsalgado Yeah, didn't think this was the best way to do it. Is it safe to assume that the list of buckets passed in will always be ordered? (looks kinda like that way with how split_bucket's implemented). Or should I implement a bisect.insort_left() somewhere to make sure the list stays ordered?

With the current implementation of KBucket.split()/Routing.split_bucket(), yeah, the list will always be sorted, so there's no need for the insort_left(), but this function's docstring should make it clear that it assumes the list is sorted. Although this kind of leaves me wondering if all this extra complexity we're adding is really justified, or if this is just premature optimization that will bite us down the road

That's possible, it's hard for me to say. @pipermerriam ?

gsalgado · 2017-09-11T13:27:59Z

evm/p2p/kademlia.py

+        raise ValueError("No bucket found for node with id {}".format(node.id))
+    bucket = sorted_buckets[bucket_position]
+    if not bucket.start <= node.id <= bucket.end:
+        raise ValueError("No bucket found for node with id {}".format(node.id))


How can this happen?

@gsalgado Well, kind of as a safe-check, and I was thinking about if a list of buckets was input that was out of range or had breaks - i.e. KBucket(2,3) and KBucket(5,10) and node.id was 4 or 1. But it looks as though that's not possible with how split_bucket is implemented?

Yeah, that's not possible given how KBucket.split() is implemented, but more importantly, if the position returned by bisect exists on the list, this check would never fail, would it?

If node.id is for some reason below the lowest bucket.start or above the highest bucket.end, though I'm not sure how likely that is?? I can remove the check if it's not.

gsalgado · 2017-09-11T13:28:50Z

evm/p2p/test_kademlia.py

+)
+def test_binary_get_bucket_for_node(bucket_list, node_id, correct):
+    node = random_node()
+    node.id = node_id


You can pass a nodeid argument to random_node()

gsalgado · 2017-09-11T13:31:23Z

evm/p2p/test_kademlia.py

+    if correct is None:
+        with pytest.raises(ValueError):
+            kademlia.binary_get_bucket_for_node(bucket_list, node)
+    else:


Since it doesn't make sense to have tests for verify that our tests work as expected, it's a good idea to keep them as simple as possible and avoid any conditional logic in them. In this case that can easily be done by splitting the test into one for the success cases and another for the failure cases

gsalgado · 2017-09-11T13:34:36Z

evm/p2p/kademlia.py

+    def __le__(self, other):
+        if not isinstance(other, self.__class__):
+            raise TypeError("Cannot compare KBucket with type {}.".format(other.__class__))
+        return self.end <= other.end


Why do you compare against other.end instead of other.start?

gsalgado · 2017-09-12T08:08:59Z

evm/p2p/kademlia.py

+        raise ValueError("No bucket found for node with id {}".format(node.id))
+    bucket = sorted_buckets[bucket_position]
+    if not bucket.start <= node.id <= bucket.end:
+        raise ValueError("No bucket found for node with id {}".format(node.id))


Yeah, that's not possible given how KBucket.split() is implemented, but more importantly, if the position returned by bisect exists on the list, this check would never fail, would it?

gsalgado · 2017-09-12T08:12:24Z

evm/p2p/kademlia.py

+        # Check for invalid state of KBuckets
+        if not self.end < other.start:
+            raise ValueError("Invalid Buckets.")
+        return self.end < other.start


If I'm reading this code right, this method can only return True or a ValueError, but never False. Is that what you intended, and, if so, why?

Nope, good catch. Was just tinkering with a way to check for invalid state of KBuckets - forgot it was there when I pushed.

pipermerriam · 2017-09-18T21:49:13Z

👍 looks good to merge.

# This is the 1st commit message: fixes ethereum#760 ethereum#762 ethereum#737 # The commit message #2 will be skipped: # use eth-utils big endian integer utils # The commit message #3 will be skipped: # Fix IndexError when an empty bucket is encountered while looking up nodes # The commit message #4 will be skipped: # dirty

Add binary search to get bucket for node

f02f7b2

pipermerriam suggested changes Sep 8, 2017

View reviewed changes

Refactoring and pr fixes

04da814

gsalgado reviewed Sep 11, 2017

View reviewed changes

PR fixes

bcd6220

gsalgado reviewed Sep 12, 2017

View reviewed changes

Refactor & remove le comparison

5d7cb25

This was referenced Sep 26, 2017

Implement binary search for RoutingTable.get_bucket_for_node ethereum/py-evm#91

Closed

Binary search ethereum/py-evm#102

Merged

Add binary search to get bucket for node #4

Are you sure you want to change the base?

Add binary search to get bucket for node #4

Uh oh!

Conversation

njgheorghita commented Sep 8, 2017

What was wrong?

How was it fixed?

Cute Animal Picture

Uh oh!

pipermerriam left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njgheorghita Sep 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njgheorghita Sep 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njgheorghita Sep 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pipermerriam commented Sep 18, 2017

Uh oh!

Uh oh!

njgheorghita Sep 11, 2017 •

edited

Loading

njgheorghita Sep 11, 2017 •

edited

Loading

njgheorghita Sep 12, 2017 •

edited

Loading