Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Breaking change]: ZipArchiveEntry names and comments now respect UTF8 flag when decoded #42003

Closed
1 of 3 tasks
edwardneal opened this issue Aug 2, 2024 · 0 comments · Fixed by #42475
Closed
1 of 3 tasks
Assignees
Labels
breaking-change Indicates a .NET Core breaking change doc-idea Indicates issues that are suggestions for new topics [org][type][category] in-pr This issue will be closed (fixed) by an active pull request. Pri1 High priority, do before Pri2 and Pri3 📌 seQUESTered Identifies that an issue has been imported into Quest.

Comments

@edwardneal
Copy link

edwardneal commented Aug 2, 2024

Description

Relates to dotnet/runtime#103271.

A ZipArchive can be created with an Encoding parameter, which is used to decode the names and comments of entries in the ZIP archive. .NET 7 and 8 introduced a regression where this encoding was used by default, with a fallback to the system default code page (UTF8 in .NET Core) if no encoding was supplied. This regression is being corrected in .NET 9: if the entry's general purpose bit flags indicate that UTF8 should be used, this will be respected, the user-supplied encoding will be used (with the existing fallback to the system default code page if none is supplied.)

I've stated that .NET 9 RC 1 introduced this change - the PR hasn't yet been merged (it's pending this work) so I've selected the next known release. It'll definitely be in .NET 9.

Version

.NET 9 RC 1

Previous behavior

If ZipArchive was instantiated with a user-specified entryNameEncoding parameter, this encoding would always be used when decoding the names and comments of entries in the ZIP archive (even if the entry had the bit set to signify that its name and comment were encoded in UTF8.)

New behavior

When a ZIP archive entry's name and comment are being decoded, its UTF8 bit flag will be respected. The user-supplied entryNameEncoding parameter will only be used to decode the entry's name and comment if this bit flag is unset.

Type of breaking change

  • Binary incompatible: Existing binaries might encounter a breaking change in behavior, such as failure to load or execute, and if so, require recompilation.
  • Source incompatible: When recompiled using the new SDK or component or to target the new runtime, existing source code might require source changes to compile successfully.
  • Behavioral change: Existing binaries might behave differently at run time.

Reason for change

This corrects a regression in .NET 7 and .NET 8 (reported in dotnet/runtime#92283). It also returns ZipArchive to compliance with the ZIP file format specification, sections 4.4.4 and appendix D.

Section 4.4.4:

Bit 11: Language encoding flag (EFS). If this bit is set,
the filename and comment fields for this file
MUST be encoded using UTF-8. (see APPENDIX D)

Appendix D:

D.1 The ZIP format has historically supported only the original IBM PC character
encoding set, commonly referred to as IBM Code Page 437. This limits storing
file name characters to only those within the original MS-DOS range of values
and does not properly support file names in other character encodings, or
languages. To address this limitation, this specification will support the
following change.

D.2 If general purpose bit 11 is unset, the file name and comment SHOULD conform
to the original ZIP character encoding. If general purpose bit 11 is set, the
filename and comment MUST support The Unicode Standard, Version 4.1.0 or
greater using the character encoding form defined by the UTF-8 storage
specification. The Unicode Standard is published by the The Unicode
Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files
is expected to not include a byte order mark (BOM).

Recommended action

Users passing an encoding to the ZipArchive constructor should be aware that this will not be respected in all situations. It will only be used if the entry's UTF8 bit is not set.

Users who are using ZipArchive to parse ZIP entries with names encoded in non-UTF8 format (but which have the UTF8 bit flag set) will no longer be able to do so. This was always a bug.

Feature area

Core .NET libraries

Affected APIs

ZipArchive..ctor(Stream, ZipArchiveMode, Boolean, Encoding)
ZipFile.ExtractToDirectory(String, String, Encoding, Boolean)
ZipFile.ExtractToDirectory(Stream, String, Encoding, Boolean)
ZipFile.ExtractToDirectory(String, String, Encoding)
ZipFile.ExtractToDirectory(Stream, String, Encoding)
ZipFile.Open(String, ZipArchiveMode, Encoding)

Associated WorkItem - 292500

@edwardneal edwardneal added breaking-change Indicates a .NET Core breaking change doc-idea Indicates issues that are suggestions for new topics [org][type][category] Pri1 High priority, do before Pri2 and Pri3 labels Aug 2, 2024
@dotnet-bot dotnet-bot added ⌚ Not Triaged Not triaged labels Aug 2, 2024
@gewarren gewarren removed the ⌚ Not Triaged Not triaged label Aug 2, 2024
@dotnet-bot dotnet-bot added the ⌚ Not Triaged Not triaged label Aug 2, 2024
@gewarren gewarren added 🗺️ reQUEST Triggers an issue to be imported into Quest. and removed ⌚ Not Triaged Not triaged labels Aug 2, 2024
@sequestor sequestor bot added 📌 seQUESTered Identifies that an issue has been imported into Quest. and removed 🗺️ reQUEST Triggers an issue to be imported into Quest. labels Aug 2, 2024
@adegeo adegeo added the 🗺️ mapQUEST Only used as a way to mark an issue as updated for quest. RepoMan should instantly remove it. label Aug 12, 2024
@dotnet-bot dotnet-bot removed the 🗺️ mapQUEST Only used as a way to mark an issue as updated for quest. RepoMan should instantly remove it. label Aug 12, 2024
@dotnet-policy-service dotnet-policy-service bot added the in-pr This issue will be closed (fixed) by an active pull request. label Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Indicates a .NET Core breaking change doc-idea Indicates issues that are suggestions for new topics [org][type][category] in-pr This issue will be closed (fixed) by an active pull request. Pri1 High priority, do before Pri2 and Pri3 📌 seQUESTered Identifies that an issue has been imported into Quest.
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

4 participants