-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Fix ZIP64 header corruption when large file at large offset #122837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When both AreSizesTooLarge and IsOffsetTooLarge are true, the Zip64ExtraField was being overwritten in the central directory header logic, losing the size information. This fixes the issue by reusing the existing Zip64ExtraField when adding the offset, using ??= instead of creating a new object. Fixes #114205 Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs
Outdated
Show resolved
Hide resolved
|
Tagging subscribers to this area: @dotnet/area-system-io-compression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request fixes a critical bug in ZipArchive where files larger than 4GB positioned at offsets greater than 4GB would produce corrupted ZIP files. The corruption occurred because the ZIP64 extra field handling was overwriting previously set size information when adding offset information.
Key Changes
- Fixed the ZIP64 header preservation logic in
WriteCentralDirectoryFileHeaderInitializeby using null-coalescing assignment (??=) instead of creating a new object - Added regression test to verify large files at large offsets are handled correctly
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs |
Changed line 530 to use ??= operator to preserve existing ZIP64 extra field instead of overwriting it with a new instance |
src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs |
Added new test LargeFile_At_LargeOffset_ZIP64_HeaderPreservation that creates a ZIP with 5GB of small files followed by a 5GB large file to trigger both size and offset ZIP64 conditions |
Wrap buffer allocation in try-catch for OutOfMemoryException and throw SkipTestException to gracefully skip the test when insufficient memory is available. Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
main PR
Description
ZipArchiveproduces corrupted ZIP files when a file >4GB is written at an offset >4GB. InWriteCentralDirectoryFileHeaderInitialize, theZip64ExtraFieldfor sizes was being overwritten when setting the offset:This caused 7-Zip to show
Extra_ERROR Zip64_ERRORandZipFile.OpenReadto throwInvalidDataException: A local file header is corrupt.Customer Impact
ZIP archives containing large files (>4GB) positioned after >4GB of preceding data become unreadable. Affects backup/archive scenarios with large datasets.
Regression
No. This is a longstanding bug in the ZIP64 handling logic.
Testing
LargeFile_At_LargeOffset_ZIP64_HeaderPreservationcovering the specific scenarioSkipTestExceptionto gracefully skip when memory is insufficientRisk
Low. Single-line logic change in a specific code path that only affects ZIP64 central directory header writing when both sizes and offset exceed 4GB.
Package authoring no longer needed in .NET 9
IMPORTANT: Starting with .NET 9, you no longer need to edit a NuGet package's csproj to enable building and bump the version.
Keep in mind that we still need package authoring in .NET 8 and older versions.
Original prompt
This section details on the original issue you should resolve
<issue_title>ZipArchive creates corrupted ZIP when writing large dataset with many repeated files</issue_title>
<issue_description>### Description
RavenDB snapshot backups produced with
ZipArchivecan be unrecoverable due to ZIP header corruption. The issue is that producing a snapshot backup which is ZIP archive withSystem.IO.Compression.ZipArchiveover a specific data set result in ZIP fails to open correctly:Extra_ERROR Zip64_ERROR: UTF8(for entry Documents\Raven.voron), and the Packed Size looks capped at 4GB.System.IO.Compression.ZipFile.OpenRead(...).Entries[i].Open()throwsSystem.IO.InvalidDataException: A local file header is corrupt.Writing the exact same dataset and order using SharpZipLib’s
ZipOutputStreamproduces a valid ZIP that both 7‑Zip andZipFile.OpenReadcan read.This started affecting us after introducing a feature that creates many per-index journal files that are hard links to the same underlying file content (so multiple distinct file paths share the exact same bytes on disk). Our dataset also includes a large 30GB file (Raven.voron). The combination seems to trigger a bug.
Reproduction Steps
Repro dataset
raven-so-database.zip(contains the on‑disk database folder): https://drive.google.com/file/d/1iCqKnzhu41umXik938umUee940MMPoWq/view?usp=sharing (10GB file, 42GB after unzipping)It includes:
Raven.voronfile (~30GB)Indexes/<IndexName>/Journals/*.journalfiles which were hard links pointing to the same physical journal files (identical SHA‑256 hashes across index folders)Repro app
Single‑file console app (targets net8.0 or net10.0). It copies files from the dataset into a ZIP using
ZipArchive, in the exact order RavenDB snapshot backup uses: