-
-
Notifications
You must be signed in to change notification settings - Fork 46.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bloom Filter #8615
Bloom Filter #8615
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Click here to look at the relevant links ⬇️
🔗 Relevant Links
Repository:
Python:
Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.
algorithms-keeper
commands and options
algorithms-keeper actions can be triggered by commenting on this PR:
@algorithms-keeper review
to trigger the checks for only added pull request files@algorithms-keeper review-all
to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Click here to look at the relevant links ⬇️
🔗 Relevant Links
Repository:
Python:
Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.
algorithms-keeper
commands and options
algorithms-keeper actions can be triggered by commenting on this PR:
@algorithms-keeper review
to trigger the checks for only added pull request files@algorithms-keeper review-all
to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Click here to look at the relevant links ⬇️
🔗 Relevant Links
Repository:
Python:
Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.
algorithms-keeper
commands and options
algorithms-keeper actions can be triggered by commenting on this PR:
@algorithms-keeper review
to trigger the checks for only added pull request files@algorithms-keeper review-all
to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As all our algorithms are tested via doctest
, a docstring testing module, test functions using assert
will not fail the tests. Please convert your tests to doctest
assert b.exists("Titanic") | ||
assert b.exists("Avatar") | ||
|
||
assert b.exists("The Goodfather") in (True, False) | ||
assert b.exists("Interstellar") in (True, False) | ||
assert b.exists("Parasite") in (True, False) | ||
assert b.exists("Pulp Fiction") in (True, False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use doctest to test our modules via a workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to module docstring
print( | ||
f"""\ | ||
[exists] value = {value} | ||
hash = {self.format_bin(h)} | ||
filter = {self.format_bin(self.bitstring)} | ||
res = {res} | ||
""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print( | |
f"""\ | |
[exists] value = {value} | |
hash = {self.format_bin(h)} | |
filter = {self.format_bin(self.bitstring)} | |
res = {res} | |
""" | |
) |
In the CONTRIBUTING
it says
- return all calculation results instead of printing or plotting them
I don't think these prints are necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to method and called in doctest
print( | ||
f"""\ | ||
[add] value = {value} | ||
hash = {self.format_bin(h)} | ||
filter = {self.format_bin(self.bitstring)} | ||
""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print( | |
f"""\ | |
[add] value = {value} | |
hash = {self.format_bin(h)} | |
filter = {self.format_bin(self.bitstring)} | |
""" | |
) |
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
assert ( | ||
abs(estimated_error_rate - error_rate) <= 0.05 | ||
) # 5% absolute margin calculated experiementally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, this should be converted to a doctest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I removed the test. I found difficult to make tests with random values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Click here to look at the relevant links ⬇️
🔗 Relevant Links
Repository:
Python:
Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.
algorithms-keeper
commands and options
algorithms-keeper actions can be triggered by commenting on this PR:
@algorithms-keeper review
to trigger the checks for only added pull request files@algorithms-keeper review-all
to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.
Not added elements should return False ... | ||
>>> "The Goodfather" in bloom | ||
False | ||
>>> bloom.format_hash("The Goodfather") | ||
'00011000' | ||
>>> "Interstellar" in bloom | ||
False | ||
>>> bloom.format_hash("Interstellar") | ||
'00000011' | ||
>>> "Parasite" in bloom | ||
False | ||
>>> bloom.format_hash("Parasite") | ||
'00010010' | ||
>>> "Pulp Fiction" in bloom | ||
False | ||
>>> bloom.format_hash("Pulp Fiction") | ||
'10000100' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be The Goodfather
or The Godfather
? https://www.imdb.com/title/tt0068646
Not added elements should return False ... | |
>>> "The Goodfather" in bloom | |
False | |
>>> bloom.format_hash("The Goodfather") | |
'00011000' | |
>>> "Interstellar" in bloom | |
False | |
>>> bloom.format_hash("Interstellar") | |
'00000011' | |
>>> "Parasite" in bloom | |
False | |
>>> bloom.format_hash("Parasite") | |
'00010010' | |
>>> "Pulp Fiction" in bloom | |
False | |
>>> bloom.format_hash("Pulp Fiction") | |
'10000100' | |
Not added elements should return False ... | |
>>> not_present_films = ("The Goodfather", "Interstellar", "Parasite", "Pulp Fiction") | |
>>> {film: bloom.format_hash(film) for film in not_present_films)} | |
{'The Goodfather': '00011000', 'Interstellar': '00000011', 'Parasite': '00010010': 'Pulp Fiction': '10000100'} | |
>>> any(film in bloom for film in not_present_films) | |
False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Christian Clauss <cclauss@me.com>
Co-authored-by: Christian Clauss <cclauss@me.com>
for more information, see https://pre-commit.ci
This reverts commit 35fa5f5.
for more information, see https://pre-commit.ci
* Bloom filter with tests * has functions constant * fix type * isort * passing ruff * type hints * type hints * from fail to erro * captital leter * type hints requested by boot * descriptive name for m * more descriptibe arguments II * moved movies_test to doctest * commented doctest * removed test_probability * estimated error * added types * again hash_ * Update data_structures/hashing/bloom_filter.py Co-authored-by: Christian Clauss <cclauss@me.com> * from b to bloom * Update data_structures/hashing/bloom_filter.py Co-authored-by: Christian Clauss <cclauss@me.com> * Update data_structures/hashing/bloom_filter.py Co-authored-by: Christian Clauss <cclauss@me.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * syntax error in dict comprehension * from goodfather to godfather * removed Interestellar * forgot the last Godfather * Revert "removed Interestellar" This reverts commit 35fa5f5. * pretty dict * Apply suggestions from code review * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update bloom_filter.py --------- Co-authored-by: Christian Clauss <cclauss@me.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Bloom filter with tests * has functions constant * fix type * isort * passing ruff * type hints * type hints * from fail to erro * captital leter * type hints requested by boot * descriptive name for m * more descriptibe arguments II * moved movies_test to doctest * commented doctest * removed test_probability * estimated error * added types * again hash_ * Update data_structures/hashing/bloom_filter.py Co-authored-by: Christian Clauss <cclauss@me.com> * from b to bloom * Update data_structures/hashing/bloom_filter.py Co-authored-by: Christian Clauss <cclauss@me.com> * Update data_structures/hashing/bloom_filter.py Co-authored-by: Christian Clauss <cclauss@me.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * syntax error in dict comprehension * from goodfather to godfather * removed Interestellar * forgot the last Godfather * Revert "removed Interestellar" This reverts commit 35fa5f5. * pretty dict * Apply suggestions from code review * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update bloom_filter.py --------- Co-authored-by: Christian Clauss <cclauss@me.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Describe your change:
Checklist:
Fixes: #{$ISSUE_NO}
.