Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bloom Filter #8615

Merged
merged 32 commits into from
Apr 8, 2023
Merged

Bloom Filter #8615

merged 32 commits into from
Apr 8, 2023

Conversation

isidroas
Copy link
Contributor

@isidroas isidroas commented Apr 6, 2023

Describe your change:

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
  • If this pull request resolves one or more open issues then the commit message contains Fixes: #{$ISSUE_NO}.

@algorithms-keeper algorithms-keeper bot added awaiting reviews This PR is ready to be reviewed require descriptive names This PR needs descriptive function and/or variable names require type hints https://docs.python.org/3/library/typing.html labels Apr 6, 2023
Copy link

@algorithms-keeper algorithms-keeper bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

  • @algorithms-keeper review to trigger the checks for only added pull request files
  • @algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

Copy link

@algorithms-keeper algorithms-keeper bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

  • @algorithms-keeper review to trigger the checks for only added pull request files
  • @algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

@algorithms-keeper algorithms-keeper bot removed the require type hints https://docs.python.org/3/library/typing.html label Apr 6, 2023
Copy link

@algorithms-keeper algorithms-keeper bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

  • @algorithms-keeper review to trigger the checks for only added pull request files
  • @algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

@algorithms-keeper algorithms-keeper bot removed the require descriptive names This PR needs descriptive function and/or variable names label Apr 6, 2023
Copy link
Contributor

@CaedenPH CaedenPH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As all our algorithms are tested via doctest, a docstring testing module, test functions using assert will not fail the tests. Please convert your tests to doctest

Comment on lines 60 to 66
assert b.exists("Titanic")
assert b.exists("Avatar")

assert b.exists("The Goodfather") in (True, False)
assert b.exists("Interstellar") in (True, False)
assert b.exists("Parasite") in (True, False)
assert b.exists("Pulp Fiction") in (True, False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use doctest to test our modules via a workflow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to module docstring

Comment on lines 32 to 39
print(
f"""\
[exists] value = {value}
hash = {self.format_bin(h)}
filter = {self.format_bin(self.bitstring)}
res = {res}
"""
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(
f"""\
[exists] value = {value}
hash = {self.format_bin(h)}
filter = {self.format_bin(self.bitstring)}
res = {res}
"""
)

In the CONTRIBUTING it says

  • return all calculation results instead of printing or plotting them

I don't think these prints are necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to method and called in doctest

Comment on lines 20 to 26
print(
f"""\
[add] value = {value}
hash = {self.format_bin(h)}
filter = {self.format_bin(self.bitstring)}
"""
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(
f"""\
[add] value = {value}
hash = {self.format_bin(h)}
filter = {self.format_bin(self.bitstring)}
"""
)

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 98 to 100
assert (
abs(estimated_error_rate - error_rate) <= 0.05
) # 5% absolute margin calculated experiementally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, this should be converted to a doctest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I removed the test. I found difficult to make tests with random values.

@algorithms-keeper algorithms-keeper bot added the require type hints https://docs.python.org/3/library/typing.html label Apr 7, 2023
Copy link

@algorithms-keeper algorithms-keeper bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

  • @algorithms-keeper review to trigger the checks for only added pull request files
  • @algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.

@algorithms-keeper algorithms-keeper bot removed the require type hints https://docs.python.org/3/library/typing.html label Apr 7, 2023
@algorithms-keeper algorithms-keeper bot removed the tests are failing Do not merge until tests pass label Apr 8, 2023
Comment on lines 31 to 47
Not added elements should return False ...
>>> "The Goodfather" in bloom
False
>>> bloom.format_hash("The Goodfather")
'00011000'
>>> "Interstellar" in bloom
False
>>> bloom.format_hash("Interstellar")
'00000011'
>>> "Parasite" in bloom
False
>>> bloom.format_hash("Parasite")
'00010010'
>>> "Pulp Fiction" in bloom
False
>>> bloom.format_hash("Pulp Fiction")
'10000100'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be The Goodfather or The Godfather? https://www.imdb.com/title/tt0068646

Suggested change
Not added elements should return False ...
>>> "The Goodfather" in bloom
False
>>> bloom.format_hash("The Goodfather")
'00011000'
>>> "Interstellar" in bloom
False
>>> bloom.format_hash("Interstellar")
'00000011'
>>> "Parasite" in bloom
False
>>> bloom.format_hash("Parasite")
'00010010'
>>> "Pulp Fiction" in bloom
False
>>> bloom.format_hash("Pulp Fiction")
'10000100'
Not added elements should return False ...
>>> not_present_films = ("The Goodfather", "Interstellar", "Parasite", "Pulp Fiction")
>>> {film: bloom.format_hash(film) for film in not_present_films)}
{'The Goodfather': '00011000', 'Interstellar': '00000011', 'Parasite': '00010010': 'Pulp Fiction': '10000100'}
>>> any(film in bloom for film in not_present_films)
False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to reduce the tuple in order to avoid the following ruff error:
2023-04-08-171811_

Also tried pretty-print, but it doesn't match

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for teaching me that.

I also had to divide in multiple lines the previous line:
2023-04-08-184822_

2023-04-08-185453_

@algorithms-keeper algorithms-keeper bot added awaiting changes A maintainer has requested changes to this PR and removed awaiting reviews This PR is ready to be reviewed labels Apr 8, 2023
Co-authored-by: Christian Clauss <cclauss@me.com>
@algorithms-keeper algorithms-keeper bot added awaiting reviews This PR is ready to be reviewed and removed awaiting changes A maintainer has requested changes to this PR labels Apr 8, 2023
isidroas and others added 2 commits April 8, 2023 16:49
@algorithms-keeper algorithms-keeper bot added the tests are failing Do not merge until tests pass label Apr 8, 2023
@algorithms-keeper algorithms-keeper bot removed the tests are failing Do not merge until tests pass label Apr 8, 2023
@algorithms-keeper algorithms-keeper bot removed the awaiting reviews This PR is ready to be reviewed label Apr 8, 2023
@algorithms-keeper algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Apr 8, 2023
@algorithms-keeper algorithms-keeper bot added the tests are failing Do not merge until tests pass label Apr 8, 2023
@algorithms-keeper algorithms-keeper bot removed the tests are failing Do not merge until tests pass label Apr 8, 2023
@cclauss cclauss merged commit 14bdd17 into TheAlgorithms:master Apr 8, 2023
@algorithms-keeper algorithms-keeper bot removed the awaiting reviews This PR is ready to be reviewed label Apr 8, 2023
tianyizheng02 pushed a commit to tianyizheng02/Python that referenced this pull request May 29, 2023
* Bloom filter with tests

* has functions constant

* fix type

* isort

* passing ruff

* type hints

* type hints

* from fail to erro

* captital leter

* type hints requested by boot

* descriptive name for m

* more descriptibe arguments II

* moved movies_test to doctest

* commented doctest

* removed test_probability

* estimated error

* added types

* again hash_

* Update data_structures/hashing/bloom_filter.py

Co-authored-by: Christian Clauss <cclauss@me.com>

* from b to bloom

* Update data_structures/hashing/bloom_filter.py

Co-authored-by: Christian Clauss <cclauss@me.com>

* Update data_structures/hashing/bloom_filter.py

Co-authored-by: Christian Clauss <cclauss@me.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* syntax error in dict comprehension

* from goodfather to godfather

* removed Interestellar

* forgot the last Godfather

* Revert "removed Interestellar"

This reverts commit 35fa5f5.

* pretty dict

* Apply suggestions from code review

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update bloom_filter.py

---------

Co-authored-by: Christian Clauss <cclauss@me.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
sedatguzelsemme pushed a commit to sedatguzelsemme/Python that referenced this pull request Sep 15, 2024
* Bloom filter with tests

* has functions constant

* fix type

* isort

* passing ruff

* type hints

* type hints

* from fail to erro

* captital leter

* type hints requested by boot

* descriptive name for m

* more descriptibe arguments II

* moved movies_test to doctest

* commented doctest

* removed test_probability

* estimated error

* added types

* again hash_

* Update data_structures/hashing/bloom_filter.py

Co-authored-by: Christian Clauss <cclauss@me.com>

* from b to bloom

* Update data_structures/hashing/bloom_filter.py

Co-authored-by: Christian Clauss <cclauss@me.com>

* Update data_structures/hashing/bloom_filter.py

Co-authored-by: Christian Clauss <cclauss@me.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* syntax error in dict comprehension

* from goodfather to godfather

* removed Interestellar

* forgot the last Godfather

* Revert "removed Interestellar"

This reverts commit 35fa5f5.

* pretty dict

* Apply suggestions from code review

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update bloom_filter.py

---------

Co-authored-by: Christian Clauss <cclauss@me.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@isidroas isidroas mentioned this pull request Jan 25, 2025
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants