Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashlib usage is underspecified #27034

Closed
DueViktor opened this issue Oct 24, 2023 · 4 comments
Closed

Hashlib usage is underspecified #27034

DueViktor opened this issue Oct 24, 2023 · 4 comments

Comments

@DueViktor
Copy link

Feature request

From python 3.9 hashlib introduced the usedforsecurity argument:

Changed in version 3.9: All hashlib constructors take a keyword-only argument usedforsecurity with default value True. A false value allows the use of insecure and blocked hashing algorithms in restricted environments. False indicates that the hashing algorithm is not used in a security context, e.g. as a non-cryptographic one-way compression function.

transformers use hashing in many cases where the purpose is indeed not for security purposes. This should be specifed in the code.

Motivation

Transformers use MD5 from hashlib, which is not a secure algorithm, but are not specifying that it is for other purposes than security. This is causing issues for organisations following certain security standard. FIPS compliance could be an example.

Your contribution

I will attach a PR specifying the usage of hashlib algorithms. Since usedforsecurity is only specified from 3.9+ and transformers support 3.6+, I'll add a functionality to detect python version and change kwargs based on that.

@DueViktor DueViktor mentioned this issue Oct 24, 2023
5 tasks
@ArthurZucker
Copy link
Collaborator

Hey! Thanks for reporting I'll see if this relevant for us 🤗

@DueViktor
Copy link
Author

Great @ArthurZucker. The pull request have passed all tests already and are ready to merge.
No behaviour is changed.

My guess is that pretty much all federal systems in the world would have this issue.

Federal Information Processing Standards (FIPS) 140-2 is a mandatory standard for the protection of sensitive or valuable data within Federal systems. - https://www.wolfssl.com/license/fips/

@Wauplin
Copy link
Contributor

Wauplin commented Nov 17, 2023

Hey @DueViktor! Coming back to you about this request. We've finally specified hashlib usage in huggingface_hub, transformers, datasets and diffusers. Everything's merged now so I'll close this issue. Thanks again for the heads up!

@Wauplin Wauplin closed this as completed Nov 17, 2023
@DueViktor
Copy link
Author

Hi @Wauplin! Thanks so much for the update and for addressing the hashlib usage across all those libraries. Appreciate your team's prompt action on this matter. Keep up the fantastic work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants