-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify IDGenerator and reduce collission rate #223
Simplify IDGenerator and reduce collission rate #223
Conversation
internal/streams/streams.go
Outdated
} | ||
offset = (offset + 1) % s.numBuckets | ||
// Reduce collision by offsetting the starting point | ||
offset := atomic.AddUint32(&s.offset, s.numBuckets/10+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why s.numBuckets/10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devisor should be big enough to provide some cycles before reaching numbucket, and small enough to have significant gap between invocation to ensure that collision chance is actually reduced.
And resulted number should devide numbuckets with some remainder, to ensure it does not land on the same indexes all the time.
Let me put it into comment and I think we need to have more logic on how to pick correct offest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please take a look
de853b0
to
e85296e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change with atomic.AddUint32
seems fine and will be an improvement.
I have some questions about offset change:
- Why do this? Have we identified it as a problem somewhere? Given that each subsequent request will be given next bucket, the problem you described could only happen when all the buckets are nearly full - and in such conditions I doubt that stream assingment will be the bottleneck.
- If we do have some benchmark / test case that showcases the problem, did you verify that your patch fixes the problem using such test case? I don't see any such information in the PR description.
- I'm asking those questions because this change makes the code more complicated - so we should have a good reason for it.
- I think the implementation isn't correct for all bucket counts. By correct I mean that all buckets will be used. I checked that with the following Python program (btw such verification should be a test case in this PR):
Verificator
def findOffsetStep(buckets):
match buckets:
case 1:
return 1
case 2:
return 1
case 3:
return 2
if buckets < 8:
offsetStep = buckets // 2
if offsetStep == 0:
return 1
if buckets%2 != 0:
return offsetStep
return offsetStep + 1
for offsetStep in [13, 11, 9, 7, 5, 3]:
if buckets // offsetStep < 4:
# Ensure that it provides at least 4 cycles before reaching `buckets`
continue
if buckets % offsetStep != 0:
return offsetStep
offsetStep += 1
if buckets%offsetStep != 0:
return offsetStep
if buckets % 2 != 0:
return 2
return 1
def verify_for_buckets(buckets):
hits = {}
offset = findOffsetStep(buckets)
index = 0
while True:
hits.setdefault(index, 0)
hits[index] += 1
index = (index + offset) % buckets
if index == 0:
break
if len(hits) != buckets:
print(f'Failed for {buckets}: len {len(hits)}')
if __name__ == '__main__':
for i in range(1, 512):
print(f'Test {i}')
verify_for_buckets(i)
internal/streams/streams.go
Outdated
// findOffsetStep returns the offset step to use for the given number of buckets. | ||
// `offset` provides a way to reduce collision by offsetting the starting point | ||
// `offsetStep` should be small enough to provide some cycles before reaching `buckets`, | ||
// and big enough to have significant gap between invocations to ensure that collision chance is actually reduced. | ||
// and it should not divide `buckets` without remainder, to ensure it does not land on the same indexes all the time. | ||
func findOffsetStep(buckets int) int { | ||
switch buckets { | ||
case 1: | ||
return 1 | ||
case 2: | ||
return 1 | ||
case 3: | ||
return 2 | ||
} | ||
if buckets < 8 { | ||
offsetStep := buckets / 2 | ||
if offsetStep == 0 { | ||
return 1 | ||
} | ||
if buckets%2 != 0 { | ||
return offsetStep | ||
} | ||
return offsetStep + 1 | ||
} | ||
for _, offsetStep := range []int{13, 11, 9, 7, 5, 3} { | ||
if buckets/offsetStep < 4 { | ||
// Ensure that it provides at least 4 cycles before reaching `buckets` | ||
continue | ||
} | ||
if buckets%offsetStep != 0 { | ||
return offsetStep | ||
} | ||
|
||
offsetStep++ | ||
if buckets%offsetStep != 0 { | ||
return offsetStep | ||
} | ||
} | ||
if buckets%2 != 0 { | ||
return 2 | ||
} | ||
return 1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has a lot of non-obvious conditionals and magic numbers. There should by comments explaining not only what the code does, but also how and why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, it suppose to reduce collissions in the case when buckets are highly populated, but due to it's complexiity I start thinking that it does not worth it, let's just remove it and go back to +1
Since offset is always truncated to s.numBuckets, atomic.Add could be used instead of atomic.Load + atomic.Swap Also offsetting bucket by 1 does not give lot's of space for collission avoidance. If first call hits bucket that is fully in use, second call going to share same bucket with first call.
e85296e
to
a0de29d
Compare
Thanks, let's just remove it. |
Since offset is always truncated to s.numBuckets, atomic.Add could be used instead of atomic.Load + atomic.Swap