-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: faster ZipCrypto decryption (via Rust), dropping support for Python 3.6 #107
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
29ceb77
to
5009e51
Compare
1d22419
to
10b05cf
Compare
3b6158a
to
0d7f1ed
Compare
fffa400
to
55debc9
Compare
0402cab
to
a1976ad
Compare
414a8b2
to
e615a7e
Compare
Co-authored-by: Uka Osim <uka.osim@digital.trade.gov.uk> Co-authored-by: Michal Charemza <michal@charemza.name>
It's been end-of-life for almost 3 years now. And in an upcomming commit will be using pyo3 to compile Rust for faster ZipCrypto decompression, and its recent versions also don't have support for Python 3.6 (and its older versions don't support Python 3.12 or Python 3.13) Co-authored-by: Uka Osim <uka.osim@digital.trade.gov.uk> Co-authored-by: Michal Charemza <michal@charemza.name>
10cde4b
to
ad8ea92
Compare
michalc
approved these changes
Oct 20, 2024
287c52e
to
dc96706
Compare
This decreases the time to decrypt ZipCrypto ZIPs by approximately a factor of 10 (measured using 100MiB of random data zipped). This uses pyo3 to make Rust code available to Python. However, recent versions of pyo3 don't support Python 3.6 (and older don't seem to support Python 3.12 onwards), and judging the factor of 10 worth it to drop support. This is why we dropped Python 3.6 support in an earlier commit. Did do a few quick comparisons between https://docs.rs/crc32fast/latest/crc32fast/ and the crc32 library used here, https://docs.rs/crc32-v2/latest/crc32_v2/, and the one here was a bit faster. Suspect it's because here we just call crc32 one byte at a time, and the "fast" version has overhead per call and we never get that time back. Co-authored-by: Uka Osim <uka.osim@digital.trade.gov.uk> Co-authored-by: Michal Charemza <michal@charemza.name>
dc96706
to
a1c5160
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This decreases the time to decrypt ZipCrypto ZIPs by approximately a factor of 10 (measured using 100MiB of random data zipped).
This uses pyo3 to make Rust code available to Python. However, recent versions of pyo3 don't support Python 3.6 (and older don't seem to support Python 3.12 onwards), and judging the factor of 10 worth it to drop support. And Python 3.6 has been end-of-life for almost 3 years, so perhaps it's time.
Did do a few quick comparisons between https://docs.rs/crc32fast/latest/crc32fast/ and the crc32 library used here, https://docs.rs/crc32-v2/latest/crc32_v2/, and the one here was a bit faster. Suspect it's because here we just call crc32 one byte at a time, and the "fast" version has overhead per call and we never get that time back