Fix ZIP64 header corruption when large file at large offset #122837

Copilot · 2026-01-03T19:04:14Z

main PR

Description

ZipArchive produces corrupted ZIP files when a file >4GB is written at an offset >4GB. In WriteCentralDirectoryFileHeaderInitialize, the Zip64ExtraField for sizes was being overwritten when setting the offset:

// Before: overwrites sizes with new object
zip64ExtraField = new() { LocalHeaderOffset = _offsetOfLocalHeader };

// After: preserves sizes, adds offset
zip64ExtraField ??= new();
zip64ExtraField.LocalHeaderOffset = _offsetOfLocalHeader;

This caused 7-Zip to show Extra_ERROR Zip64_ERROR and ZipFile.OpenRead to throw InvalidDataException: A local file header is corrupt.

Customer Impact

ZIP archives containing large files (>4GB) positioned after >4GB of preceding data become unreadable. Affects backup/archive scenarios with large datasets.

Regression

No. This is a longstanding bug in the ZIP64 handling logic.

Testing

Added regression test LargeFile_At_LargeOffset_ZIP64_HeaderPreservation covering the specific scenario
Test includes OOM exception handling with SkipTestException to gracefully skip when memory is insufficient
All 1359 existing System.IO.Compression tests pass

Risk

Low. Single-line logic change in a specific code path that only affects ZIP64 central directory header writing when both sizes and offset exceed 4GB.

Package authoring no longer needed in .NET 9

IMPORTANT: Starting with .NET 9, you no longer need to edit a NuGet package's csproj to enable building and bump the version.
Keep in mind that we still need package authoring in .NET 8 and older versions.

Original prompt

This section details on the original issue you should resolve

<issue_title>ZipArchive creates corrupted ZIP when writing large dataset with many repeated files</issue_title>
<issue_description>### Description

RavenDB snapshot backups produced with ZipArchive can be unrecoverable due to ZIP header corruption. The issue is that producing a snapshot backup which is ZIP archive with System.IO.Compression.ZipArchive over a specific data set result in ZIP fails to open correctly:

7‑Zip shows Extra_ERROR Zip64_ERROR: UTF8 (for entry Documents\Raven.voron), and the Packed Size looks capped at 4GB.

System.IO.Compression.ZipFile.OpenRead(...).Entries[i].Open() throws System.IO.InvalidDataException: A local file header is corrupt.

Writing the exact same dataset and order using SharpZipLib’s ZipOutputStream produces a valid ZIP that both 7‑Zip and ZipFile.OpenRead can read.

This started affecting us after introducing a feature that creates many per-index journal files that are hard links to the same underlying file content (so multiple distinct file paths share the exact same bytes on disk). Our dataset also includes a large 30GB file (Raven.voron). The combination seems to trigger a bug.

Reproduction Steps

Repro dataset

raven-so-database.zip (contains the on‑disk database folder): https://drive.google.com/file/d/1iCqKnzhu41umXik938umUee940MMPoWq/view?usp=sharing (10GB file, 42GB after unzipping)
It includes:
- a large Raven.voron file (~30GB)
- Multiple Indexes/<IndexName>/Journals/*.journal files which were hard links pointing to the same physical journal files (identical SHA‑256 hashes across index folders)

> $RootPath = (Get-Item .).FullName; Get-ChildItem -Path . -Include *.journal -Recurse -File | Get-FileHash | Select-Object @{Name='Path'; Expression={ $_.Path.Replace($RootPath + "\", "") }}, Hash, Algorithm

Path                                                         Hash                                                             Algorithm
----                                                         ----                                                             ---------
Configuration\Journals\0000000000000000001.journal           96F77B06EBF13895A297B7182BC162B42A05CC9B444D488A87FA541CD9962516 SHA256
Indexes\@SharedJournals\Journals\0000000000000000107.journal 16BB9C3A617844EFA25254184A4AF7E0E36ED0B656C12A71952D28F3EE2C3156 SHA256
Indexes\Activity_ByMonth\Journals\0000000000000000008.jou... 16BB9C3A617844EFA25254184A4AF7E0E36ED0B656C12A71952D28F3EE2C3156 SHA256
Indexes\Questions_Search\Journals\0000000000000000004.jou... 16BB9C3A617844EFA25254184A4AF7E0E36ED0B656C12A71952D28F3EE2C3156 SHA256
Indexes\Questions_Tags\Journals\0000000000000000007.journal  16BB9C3A617844EFA25254184A4AF7E0E36ED0B656C12A71952D28F3EE2C3156 SHA256
Indexes\Questions_Tags_ByMonths\Journals\0000000000000000... 16BB9C3A617844EFA25254184A4AF7E0E36ED0B656C12A71952D28F3EE2C3156 SHA256
Indexes\Users_Registrations_ByMonth\Journals\000000000000... 16BB9C3A617844EFA25254184A4AF7E0E36ED0B656C12A71952D28F3EE2C3156 SHA256
Indexes\Users_Search\Journals\0000000000000000005.journal    16BB9C3A617844EFA25254184A4AF7E0E36ED0B656C12A71952D28F3EE2C3156 SHA256

Repro app

Single‑file console app (targets net8.0 or net10.0). It copies files from the dataset into a ZIP using ZipArchive, in the exact order RavenDB snapshot backup uses:

Indexes (excluding any @* folder such as @SharedJournals), then
Documents (root storage env), then
Configuration folder

// Add package: ICSharpCode.SharpZipLib
//
// Example csproj snippet:
// <ItemGroup>
//   <PackageReference Include="SharpZipLib" Version="1.4.2" />
// </ItemGroup>
//
// Usage:
//   ZipArchiveIssue <sourceDbFolder> <outputDir> [options]
//
// Options:
//   --ziparchive             Generate ZIP using System.IO.Compression.ZipArchive
//   --sharpzip               Generate ZIP using SharpZipLib ZipOutputStream
//   --level=<Optimal|Fastest|NoCompression>   Compression level (default: Optimal)
//   --nonseekable            Wrap output stream to simulate non-seekable sink (ZipArchive data-descriptor path)
//   --outname=<baseName>     Base file name (default: derived from folder name)
//   --verify                 After writing, attempt to open/read entries via ZipFile.OpenRead
//
// Mapping mirrors RavenDB snapshot shape, copying from disk:
// - Order: Indexes -> Documents -> Configuration (matches RavenDB snapshot backup)
// - Root DB env  -> Documents/
// - Configuration/ -> Configuration/
// - Indexes/<IndexName>/ -> Indexes/<IndexName>/
// - Include files: Raven.voron, headers.one, headers.two, database.metadata, Journals/*.journal
// - Skip: any Temp/ folders, and all Indexes/@* folders (e.g. @SharedJournals)

#nullable enable
using System;
using System...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes dotnet/runtime#122489

<!-- START COPILOT CODING AGENT TIPS -->
---

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs.

When both AreSizesTooLarge and IsOffsetTooLarge are true, the Zip64ExtraField was being overwritten in the central directory header logic, losing the size information. This fixes the issue by reusing the existing Zip64ExtraField when adding the offset, using ??= instead of creating a new object. Fixes #114205 Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs

…Files.cs

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs

dotnet-policy-service · 2026-01-04T00:27:47Z

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This pull request fixes a critical bug in ZipArchive where files larger than 4GB positioned at offsets greater than 4GB would produce corrupted ZIP files. The corruption occurred because the ZIP64 extra field handling was overwriting previously set size information when adding offset information.

Key Changes

Fixed the ZIP64 header preservation logic in WriteCentralDirectoryFileHeaderInitialize by using null-coalescing assignment (??=) instead of creating a new object
Added regression test to verify large files at large offsets are handled correctly

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs`	Changed line 530 to use `??=` operator to preserve existing ZIP64 extra field instead of overwriting it with a new instance
`src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs`	Added new test `LargeFile_At_LargeOffset_ZIP64_HeaderPreservation` that creates a ZIP with 5GB of small files followed by a 5GB large file to trigger both size and offset ZIP64 conditions

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs

Wrap buffer allocation in try-catch for OutOfMemoryException and throw SkipTestException to gracefully skip the test when insufficient memory is available. Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>

Initial plan

7a4906e

Copilot AI assigned Copilot and stephentoub Jan 3, 2026

Copilot started work on behalf of stephentoub January 3, 2026 19:04 View session

Copilot AI changed the title ~~[WIP] Fix ZipArchive ZIP header corruption for large datasets~~ Fix ZIP64 header corruption when large file at large offset Jan 3, 2026

Copilot AI requested a review from stephentoub January 3, 2026 19:35

Copilot finished work on behalf of stephentoub January 3, 2026 19:35

stephentoub reviewed Jan 4, 2026

View reviewed changes

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs Show resolved Hide resolved

stephentoub reviewed Jan 4, 2026

View reviewed changes

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs Outdated Show resolved Hide resolved

Update src/libraries/System.IO.Compression/tests/ZipArchive/zip_Large…

ba3bdd7

…Files.cs

Copilot started work on behalf of stephentoub January 4, 2026 00:25 View session

stephentoub reviewed Jan 4, 2026

View reviewed changes

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs Outdated Show resolved Hide resolved

stephentoub marked this pull request as ready for review January 4, 2026 00:26

Copilot AI review requested due to automatic review settings January 4, 2026 00:26

stephentoub added the area-System.IO.Compression label Jan 4, 2026

Copilot started reviewing on behalf of stephentoub January 4, 2026 00:27 View session

Copilot AI reviewed Jan 4, 2026

View reviewed changes

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs Show resolved Hide resolved

src/libraries/System.IO.Compression/tests/ZipArchive/zip_LargeFiles.cs Show resolved Hide resolved

Copilot AI requested a review from stephentoub January 4, 2026 00:46

Copilot finished work on behalf of stephentoub January 4, 2026 00:46

Copilot started work on behalf of stephentoub January 4, 2026 00:46 View session

Add OOM handling to large file test

2fe867b

Wrap buffer allocation in try-catch for OutOfMemoryException and throw SkipTestException to gracefully skip the test when insufficient memory is available. Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>

Copilot finished work on behalf of stephentoub January 4, 2026 01:14

This was referenced Jan 4, 2026

"We stopped hearing from agent Azure Pipelines 32. Verify the agent machine is running and has a healthy network connection" dotnet/dnceng#1886

Open

Cannot find 'arm64-v8a' device dotnet/dnceng#2284

Open

rzikm requested a review from a team January 8, 2026 09:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ZIP64 header corruption when large file at large offset #122837

Fix ZIP64 header corruption when large file at large offset #122837

Copilot AI commented Jan 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dotnet-policy-service bot commented Jan 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix ZIP64 header corruption when large file at large offset #122837

Are you sure you want to change the base?

Fix ZIP64 header corruption when large file at large offset #122837

Conversation

Copilot AI commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Customer Impact

Regression

Testing

Risk

Package authoring no longer needed in .NET 9

Reproduction Steps

Repro dataset

Repro app

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dotnet-policy-service bot commented Jan 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 3, 2026 •

edited

Loading