Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tar: filesizes >= 8Gb are truncated on extraction due to incorrect size being written #76563

Closed
jozkee opened this issue Oct 3, 2022 · 2 comments · Fixed by #76707
Closed
Assignees
Milestone

Comments

@jozkee
Copy link
Member

jozkee commented Oct 3, 2022

From https://en.wikipedia.org/wiki/Tar_(computing)

Numeric values are encoded in octal numbers using ASCII digits, with leading zeroes. For historical reasons, a final NUL or space character should also be used. Thus although there are 12 bytes reserved for storing the file size, only 11 octal digits can be stored. This gives a maximum file size of 8 gigabytes on archived files.

You can also see this table to understand filesize capabilities of each format:
image

Create a 8 Gb file and test
On Windows I used fsutil file createnew large_file 8589934592.
On Ubuntu I used dd if=/dev/zero of=8g.img bs=1 count=0 seek=8G.

with BSD Tar and ustar format:

PS C:\repos\runtime\src\libraries\System.Formats.Tar\tests> tar -cvf large_file_bsd.tar --format ustar large_file
a large_filetar.exe: large_file: File size out of range

With GNU Tar and v7 and ustar formats:

david@amd-desktop:~/tar_largefile$ tar -cvf large.tar --format=v7 8g.img
tar: value 8589934592 out of off_t range 0..8589934591
tar: Exiting with failure status due to previous errors
david@amd-desktop:~/tar_largefile$ tar -cvf large.tar --format=ustar 8g.img
tar: value 8589934592 out of off_t range 0..8589934591
tar: Exiting with failure status due to previous errors

Using TarWriter and TarFile I noticed we are writing an incorrect size and that causes that the entry gets truncated when is extracted, either with TarFile.ExtractToDirectory or with another tool.

We should:

  • throw if a V7TarEntry or UstarTarEntry entry would exceed the 8bg limit.
  • ensure the file can properly roundtrip.

cc @carlossanlop

@jozkee jozkee added this to the 8.0.0 milestone Oct 3, 2022
@ghost
Copy link

ghost commented Oct 3, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

From https://en.wikipedia.org/wiki/Tar_(computing)

Numeric values are encoded in octal numbers using ASCII digits, with leading zeroes. For historical reasons, a final NUL or space character should also be used. Thus although there are 12 bytes reserved for storing the file size, only 11 octal digits can be stored. This gives a maximum file size of 8 gigabytes on archived files.

You can also see this table to understand filesize capabilities of each format:
image

Create a 8 Gb file and test
On Windows I used fsutil file createnew large_file 8589934592.
On Ubuntu I used dd if=/dev/zero of=8g.img bs=1 count=0 seek=8G.

with BSD Tar and ustar format:

PS C:\repos\runtime\src\libraries\System.Formats.Tar\tests> tar -cvf large_file_bsd.tar --format ustar large_file
a large_filetar.exe: large_file: File size out of range

With GNU Tar and v7 and ustar formats:

david@amd-desktop:~/tar_largefile$ tar -cvf large.tar --format=v7 8g.img
tar: value 8589934592 out of off_t range 0..8589934591
tar: Exiting with failure status due to previous errors
david@amd-desktop:~/tar_largefile$ tar -cvf large.tar --format=ustar 8g.img
tar: value 8589934592 out of off_t range 0..8589934591
tar: Exiting with failure status due to previous errors

Using TarWriter and TarFile I noticed we are writing an incorrect size and that causes that the entry gets truncated when is extracted, either with TarFile.ExtractToDirectory or with another tool.

We should:

  • throw if a V7TarEntry or UstarTarEntry entry would exceed the 8bg limit.
  • ensure the file can properly roundtrip.

cc @carlossanlop

Author: Jozkee
Assignees: -
Labels:

area-System.IO

Milestone: 8.0.0

@jozkee jozkee changed the title Tar: verify behavior with filesizes >= 8Gb Tar: issue with filesizes >= 8Gb Oct 3, 2022
@jozkee jozkee changed the title Tar: issue with filesizes >= 8Gb Tar: filesizes >= 8Gb are truncated on extraction due to incorrect size being written Oct 3, 2022
@jozkee
Copy link
Member Author

jozkee commented Oct 4, 2022

There's another issue related to large files; the reader parses the resulting octal size to int which makes sizes > 2 Gb overflow and throw System.IO.InvalidDataException : The size field is negative in a tar entry..

long size = (int)TarHelpers.ParseOctal<uint>(buffer.Slice(FieldLocations.Size, FieldLengths.Size));
if (size < 0)
{
throw new InvalidDataException(string.Format(SR.TarSizeFieldNegative));
}

We should address that as well as part of this issue.

@jozkee jozkee self-assigned this Oct 4, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 6, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 7, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Nov 7, 2022
@carlossanlop carlossanlop modified the milestones: 8.0.0, 7.0.0 Nov 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants