-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
gh-72680: Fix false positives when using zipfile.is_zipfile() #5053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Can you expand the unit testing for It would not be unreasonable to assertTrue and assertFalse as appropriate the result of |
The zipfile.is_zipfile function would only search for the EndOfZipfile section header. This failed to correctly identify non-zipfiles that contained this header. Now the zipfile.is_zipfile function verifies the first central directory entry. Changes: * Extended zipfile.is_zipfile to verify zipfile catalog * Added tests to validate failure of binary non-zipfiles
I've added a PNG file that would succeed under the old algorithm, but fails correctly with the new algorithm. Perhaps the PNG is overkill, but it's proof that a valid format for another file should fail as a ZIP archive.
I've added a few checks throughout the test_zipfile.py script. If you see others you would like to check, let me know and I can add them. |
Summary: The default implementation is lenient in that it recognizes a zipfile if the magic number appears anywhere in the archive. So if someone has the bytes `PK\x03\x04` in a tensor, it gets recognized as a zipfile. See https://bugs.python.org/issue28494 This implementation only checks the first 4 bytes of the file for the zip magic number. We could also copy python/cpython#5053 fix, but that seems like overkill. Fixes #25214 ](https://our.intern.facebook.com/intern/diff/17102516/) Pull Request resolved: #25279 Pulled By: driazati Differential Revision: D17102516 fbshipit-source-id: 4d09645bd97e9ff7136a2229fba1d9a1bce5665a
FYI - This PR as is causes |
This PR is stale because it has been open for 30 days with no activity. |
@gpshead ping |
This PR is stale because it has been open for 30 days with no activity. |
Fix zipfile validation issue by ... providing more validation!
Originally, zipfile.is_zipfile() only checked the End Central Directory
signature. If the signature could be found in the last 64k of the file,
success! This produced false positives on any file with 'PK\x05\x06' in the
last 64k of the file - including PDFs and PNGs.
This is now corrected by actually validating the Central Directory location
and size based on the information provided by the End Central Directory
along with verifying the Central Directory signature of the first entry.
This should be sufficient for the vast number of zipfiles, but more could be
done to absolutely validate the zipfile content - such as validating all
local file headers and Central Directory entries.
https://bugs.python.org/issue28494