-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: correctly track CPLs of never refreshed buckets #71
Conversation
To determine how many buckets we should refresh, we: 1. Look at the max bucket. 2. Look at the peer with the greatest common prefix in the max bucket. 3. Return the "last refresh" times for all buckets between 0 and `min(maxPeer, 15)` BREAKING: this returns a slice of times instead of CplRefresh objects because we want to refresh _all_ buckets between 0 and the max CPL.
dde9e83
to
7c961a0
Compare
defer rt.tabLock.RUnlock() | ||
|
||
for i := len(rt.buckets) - 1; i >= 0; i-- { | ||
if rt.buckets[i].len() > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While #72 is a valid point are we sure we actually want to return a smaller number of tracked buckets and ramp back up? Upsides: Scales nicely as the network grows without us touching anything. Downsides: Maybe we want some minimal number of buckets to ramp up our scale.
Feel free to ignore this comment or just say "meh, this was is probably fine"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of buckets shouldn't matter in practice, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of buckets we have shouldn't matter. We might care about the number of refreshes as long, but as long as it's ~15-20 and not ~100) it's probably fine.
I'm not sure if there's any tweaking to be done over how we do initial bootstrapping so that we fill our buckets fairly quickly. I suspect in the real network this will be much less problematic then the test network though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that we:
- Start with one big bucket.
- Split the big bucket as we add peers.
- Never shrink back down to 1 bucket.
In terms of refreshing the routing table, the logic in this PR will refresh every CPL, not regardless of how many buckets we actually have. In practice, the number of buckets we have shouldn't affect anything but memory usage, unless we're doing something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM aside from the comment I added about a potential off by one error
defer rt.tabLock.RUnlock() | ||
|
||
for i := len(rt.buckets) - 1; i >= 0; i-- { | ||
if rt.buckets[i].len() > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While #72 is a valid point are we sure we actually want to return a smaller number of tracked buckets and ramp back up? Upsides: Scales nicely as the network grows without us touching anything. Downsides: Maybe we want some minimal number of buckets to ramp up our scale.
Feel free to ignore this comment or just say "meh, this was is probably fine"
|
||
// maxCommonPrefix returns the maximum common prefix length between any peer in | ||
// the bucket with the target ID. | ||
func (b *bucket) maxCommonPrefix(target ID) uint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting... it means that if we remove the maxCpl (because we fix the RPCs in the DHT to allow for querying random KadIDs) then if there's 2^10 peers in the network and bucket number 10 has someone really close to us (e.g. 20 shared bits) that I'm now going to be querying 20 buckets. Not sure if that's really what we want, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably query until we fail to fill a bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean do a query, wait for it to finish and stop if the percentage of the bucket that is full after the query drops below x%? That seems reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, something like that. But I'm fine cutting an RC without that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem at all since we max out at 15 anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll file an issue to track it
LGTM from @aschmahmann over slack. |
@aarshkshah1992 I'd like your thoughts on this post-merge. I'm merging now so I can cut an RC, but we may want to tweak the behavior here. |
@Stebalien While this looks good to me for now, we should revisit/refactor this for 0.6. I've created a meta-issue to track it and will have more thoughts on how to improve it once I start working on the issue and dig deeper. |
To determine how many buckets we should refresh, we:
min(maxPeer, 15)
BREAKING: this returns a slice of times instead of CplRefresh objects because we want to refresh all buckets between 0 and the max CPL.