Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TarFile.ExtractToDirectory() doesn't work sometimes with compressed archives #74242

Closed
iSazonov opened this issue Aug 19, 2022 · 7 comments · Fixed by #74396
Closed

TarFile.ExtractToDirectory() doesn't work sometimes with compressed archives #74242

iSazonov opened this issue Aug 19, 2022 · 7 comments · Fixed by #74396

Comments

@iSazonov
Copy link
Contributor

Description

TarFile.ExtractToDirectoryInternal() doesn't work with some compressed archives.

The method works in examples from tests of the repo.
But it doesn't work if I try an archive created with 7-Zip or official PowerShell distributive (link is below).

If I try TarReader on the same archives I can read all entries.

Also the method works well if an archive is not compressed.

Small example archives created with 7-Zip
examples.zip

Reproduction Steps

  1. Download https://github.com/PowerShell/PowerShell/releases/download/v7.3.0-preview.6/powershell-7.3.0-preview.6-linux-x64.tar.gz
  2. Try read with simple code:
            using (var sourceStream = File.OpenRead(args[0]))
            using (var decompressorStream = new GZipStream(sourceStream, CompressionMode.Decompress))
            {
                TarFile.ExtractToDirectory(decompressorStream, args[1], true);
            }

Expected behavior

The archive is unpacked well.

Actual behavior

Exception throws:

Unhandled exception. System.FormatException: Could not find any recognizable digits.                                                                      at System.ParseNumbers.StringToInt(ReadOnlySpan`1 s, Int32 radix, Int32 flags, Int32& currPos)                                                         at System.Convert.ToInt32(String value, Int32 fromBase)                                                                                                at System.Formats.Tar.TarHelpers.GetTenBaseNumberFromOctalAsciiChars(Span`1 buffer)                                                                    at System.Formats.Tar.TarHeader.TryReadCommonAttributes(Span`1 buffer)                                                                                 at System.Formats.Tar.TarHeader.TryGetNextHeader(Stream archiveStream, Boolean copyData)                                                               at System.Formats.Tar.TarReader.TryGetNextEntryHeader(TarHeader& header, Boolean copyData)                                                             at System.Formats.Tar.TarReader.GetNextEntry(Boolean copyData)                                                                                         at System.Formats.Tar.TarFile.ExtractToDirectoryInternal(Stream source, String destinationDirectoryPath, Boolean overwriteFiles, Boolean leaveOpen)

Regression?

No response

Known Workarounds

No response

Configuration

.Net Preview.7

Other information

No response

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Aug 19, 2022
@ghost
Copy link

ghost commented Aug 19, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

TarFile.ExtractToDirectoryInternal() doesn't work with some compressed archives.

The method works in examples from tests of the repo.
But it doesn't work if I try an archive created with 7-Zip or official PowerShell distributive (link is below).

If I try TarReader on the same archives I can read all entries.

Also the method works well if an archive is not compressed.

Small example archives created with 7-Zip
examples.zip

Reproduction Steps

  1. Download https://github.com/PowerShell/PowerShell/releases/download/v7.3.0-preview.6/powershell-7.3.0-preview.6-linux-x64.tar.gz
  2. Try read with simple code:
            using (var sourceStream = File.OpenRead(args[0]))
            using (var decompressorStream = new GZipStream(sourceStream, CompressionMode.Decompress))
            {
                TarFile.ExtractToDirectory(decompressorStream, args[1], true);
            }

Expected behavior

The archive is unpacked well.

Actual behavior

Exception throws:

Unhandled exception. System.FormatException: Could not find any recognizable digits.                                                                      at System.ParseNumbers.StringToInt(ReadOnlySpan`1 s, Int32 radix, Int32 flags, Int32& currPos)                                                         at System.Convert.ToInt32(String value, Int32 fromBase)                                                                                                at System.Formats.Tar.TarHelpers.GetTenBaseNumberFromOctalAsciiChars(Span`1 buffer)                                                                    at System.Formats.Tar.TarHeader.TryReadCommonAttributes(Span`1 buffer)                                                                                 at System.Formats.Tar.TarHeader.TryGetNextHeader(Stream archiveStream, Boolean copyData)                                                               at System.Formats.Tar.TarReader.TryGetNextEntryHeader(TarHeader& header, Boolean copyData)                                                             at System.Formats.Tar.TarReader.GetNextEntry(Boolean copyData)                                                                                         at System.Formats.Tar.TarFile.ExtractToDirectoryInternal(Stream source, String destinationDirectoryPath, Boolean overwriteFiles, Boolean leaveOpen)

Regression?

No response

Known Workarounds

No response

Configuration

.Net Preview.7

Other information

No response

Author: iSazonov
Assignees: -
Labels:

area-System.IO, untriaged

Milestone: -

@jozkee
Copy link
Member

jozkee commented Aug 19, 2022

I tried your scenario with 7.0.0-rc1 and didn't get the exception but I did notice that the only file extracted was ThirdPartyNotices.txt.
cc @carlossanlop. Also found this issue reporting the same error but due that a compressed tar was being passed #70509.

@jozkee jozkee added this to the 7.0.0 milestone Aug 19, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Aug 19, 2022
@iSazonov
Copy link
Contributor Author

I tried your scenario with 7.0.0-rc1 and didn't get the exception but I did notice that the only file extracted was ThirdPartyNotices.txt.

I have seen this with several archives - after successfully reading the first entry, the subsequent TarReader.GetNextEntry() returns null (for compressed file).

@iSazonov iSazonov changed the title TarFile.ExtractToDirectoryInternal() doesn't work sometimes with compressed archives TarFile.ExtractToDirectory() doesn't work sometimes with compressed archives Aug 20, 2022
@carlossanlop
Copy link
Member

The repro of this issue is the same that we are tracking in #74316 because TarFile.ExtractToDirectory internally uses TarReader.

We fixed this with #74329

I'll close this in favor of #74316 .

@jozkee
Copy link
Member

jozkee commented Aug 22, 2022

We fixed this with #74329

@carlossanlop I'm still able to repro using the latest bits.

@jozkee jozkee reopened this Aug 22, 2022
@carlossanlop
Copy link
Member

Had a call with @jozkee . We found the root cause of this bug: When reading unseekable streams (like a GZipStream), and encounter a regular file entry (ThirdPartyNotices.txt) with lots of data, TarFile writes it successfully to disk, but attempting to visit the next entry fails because we not advancing the stream correctly: We should always skip the padding after the data, but the current condition is only advancing the padding if the user did not advance the whole data stream.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 23, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 23, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Sep 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants