-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A suggestion of code improvement for BloomFilter. #7346
Comments
This appears to have been a deliberate change back in 2012. Maybe @kluever remembers the rationale. |
Thanks for reply. My fault didn't express the main point clearly enough. optimalNumOfHashFunctions(double p) instead of optimalNumOfHashFunctions(long n, long m) |
Hey, @longlong354 So, you just want the usage of (double p) and not (long m, long n) as argument for bloomfilter. |
The situation now is roughly that we start from Then we further derive rounded to an integer. You are proposing instead removing the factor of |
The derivation of the formula
is inaccurate; it should be
Please kindly note that the bolded 'n' in the numerator cancels out with the 'n' in m( m = ⌊ − n ln p / ( ln 2 ) 2 ⌋ )" ps: |
yep~ |
Hi, I would like to take on this issue. I propose optimizing the
Please let me know if I can proceed with these modifications! |
Sorry, I lost track of this issue. @longlong354 is right that the algebra in my earlier comment was incorrect, and it does seem as if rewriting the code as suggested would make sense. I still don't know why the code was rewritten in 2012 to remove the pre-calculation of I think we would accept a PR on the lines of the original comment. If @longlong354 wants to send that PR I think that would make the most sense, and otherwise @Romain-E. The only non-obvious thing I see is how to adjust the corresponding tests in |
I concur that the second test ( Let me know if this direction works for you, and I’d be happy to proceed with the PR. |
Thank u all for the discussion. |
Refactored optimalNumOfBits and optimalNumOfHashFunctions to improve efficiency and clarity.
API(s)
How do you want it to be improved?
1. use static caculated value of log(2) and squared log(2) :
2. calculate optimalNumOfBits by static values:
3. caculate optimalNumOfHashFunction by false positive rate(p) directly and using LOG_TWO :
Why do we need it to be improved?
Example
Current Behavior
as the source codes in “How do you want it to be improved”
Desired Behavior
as the given codes in
com.google.common.hash.BloomFilter::optimalNumOfBits(long n, double p)
&
com.google.common.hash.BloomFilter::optimalNumOfHashFunctions(long n, long m)
Concrete Use Cases
as "How do you want it to be improved"
Checklist
I agree to follow the code of conduct.
I have read and understood the contribution guidelines.
I have read and understood Guava's philosophy, and I strongly believe that this proposal aligns with it.
The text was updated successfully, but these errors were encountered: