Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp md5.py #8065

Merged
merged 46 commits into from
Apr 1, 2023
Merged

Revamp md5.py #8065

merged 46 commits into from
Apr 1, 2023

Conversation

tianyizheng02
Copy link
Contributor

@tianyizheng02 tianyizheng02 commented Jan 1, 2023

Describe your change:

Revamped hashes/md5.py:

  • Added type hints to all functions
  • Added doctests to all functions
  • Greatly expanded documentation for all functions
  • Added input validation to some functions as needed
  • Added reference to the Wikipedia article that the implementation is based on
  • Renamed some variables and functions to clarify their usage
  • Minor refactoring for clarity

There's a lot in this PR, so let me know if anything else should be changed or if any existing changes should be reverted

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
  • If this pull request resolves one or more open issues then the commit message contains Fixes: #{$ISSUE_NO}.

@algorithms-keeper algorithms-keeper bot added awaiting reviews This PR is ready to be reviewed enhancement This PR modified some existing files labels Jan 1, 2023
"""

from collections.abc import Generator
from math import sin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use https://docs.python.org/3/library/struct.html to do the endian conversations or at least use struct in the doctests for to_little_endian().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly I'm not that familiar with the struct module, but I'm not sure how to make it work with the to_little_endian() function.

To me, the problem is that the original function rearrange() doesn't actually convert strings to little-endian. Instead, it treats 32-char string inputs as if they were 32-bit bit strings, with each char as a single "bit". It then restructures the input in an "little-endian fashion": the 8 least significant "bits" come first, followed by the 8 next least significant "bits", etc. Thus it looks little-endian if you squint hard enough.

Since the inputs to rearrange()/to_little_endian() are being restructured in units far larger than a byte, I'm not sure if the struct module would work here, unless I'm misunderstanding how the module works.

@algorithms-keeper algorithms-keeper bot added the tests are failing Do not merge until tests pass label Mar 26, 2023
@cclauss
Copy link
Member

cclauss commented Mar 26, 2023

The ruff errors should go away if you rebase.

@algorithms-keeper algorithms-keeper bot removed the tests are failing Do not merge until tests pass label Mar 26, 2023
@cclauss
Copy link
Member

cclauss commented Apr 1, 2023

For me, there are two things that I see missing...

  1. Tests comparing against hashlib.md5(msg).hexdigest()
  2. Hashes are usually calculated from messages that are bytes, not str so md5_me() should take bytes, not str.
    >>> import hashlib
    >>> from strings import ascii_letters
    >>> msgs = (b"", ascii_letters.encode("utf-8"), "Üñîçø∂é".encode("utf-8"),
    ...         b"The quick brown fox jumps over the lazy dog.")
    >>> all(md5_me(bytes_msg) == hashlib.md5(bytes_msg).hexdigest() for bytes_msg in msgs)
    True

@tianyizheng02 tianyizheng02 requested a review from cclauss April 1, 2023 19:11
Copy link
Member

@cclauss cclauss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!

@algorithms-keeper algorithms-keeper bot removed the awaiting reviews This PR is ready to be reviewed label Apr 1, 2023
@cclauss cclauss merged commit 33114f0 into TheAlgorithms:master Apr 1, 2023
@tianyizheng02 tianyizheng02 deleted the md5 branch April 1, 2023 20:20
tianyizheng02 added a commit to tianyizheng02/Python that referenced this pull request May 29, 2023
* Add type hints to md5.py

* Rename some vars to snake case

* Specify functions imported from math

* Rename vars and functions to be more descriptive

* Make tests from test function into doctests

* Clarify more var names

* Refactor some MD5 code into preprocess function

* Simplify loop indices in get_block_words

* Add more detailed comments, docs, and doctests

* updating DIRECTORY.md

* updating DIRECTORY.md

* updating DIRECTORY.md

* updating DIRECTORY.md

* updating DIRECTORY.md

* Add type hints to md5.py

* Rename some vars to snake case

* Specify functions imported from math

* Rename vars and functions to be more descriptive

* Make tests from test function into doctests

* Clarify more var names

* Refactor some MD5 code into preprocess function

* Simplify loop indices in get_block_words

* Add more detailed comments, docs, and doctests

* updating DIRECTORY.md

* updating DIRECTORY.md

* updating DIRECTORY.md

* updating DIRECTORY.md

* Convert str types to bytes

* Add tests comparing md5_me to hashlib's md5

* Replace line-break backslashes with parentheses

---------

Co-authored-by: github-actions <${GITHUB_ACTOR}@users.noreply.github.com>
sedatguzelsemme pushed a commit to sedatguzelsemme/Python that referenced this pull request Sep 15, 2024
* Add type hints to md5.py

* Rename some vars to snake case

* Specify functions imported from math

* Rename vars and functions to be more descriptive

* Make tests from test function into doctests

* Clarify more var names

* Refactor some MD5 code into preprocess function

* Simplify loop indices in get_block_words

* Add more detailed comments, docs, and doctests

* updating DIRECTORY.md

* updating DIRECTORY.md

* updating DIRECTORY.md

* updating DIRECTORY.md

* updating DIRECTORY.md

* Add type hints to md5.py

* Rename some vars to snake case

* Specify functions imported from math

* Rename vars and functions to be more descriptive

* Make tests from test function into doctests

* Clarify more var names

* Refactor some MD5 code into preprocess function

* Simplify loop indices in get_block_words

* Add more detailed comments, docs, and doctests

* updating DIRECTORY.md

* updating DIRECTORY.md

* updating DIRECTORY.md

* updating DIRECTORY.md

* Convert str types to bytes

* Add tests comparing md5_me to hashlib's md5

* Replace line-break backslashes with parentheses

---------

Co-authored-by: github-actions <${GITHUB_ACTOR}@users.noreply.github.com>
@isidroas isidroas mentioned this pull request Jan 25, 2025
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This PR modified some existing files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants