-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Description
The TarHelpers.GetTrimmedUtf8String()
is called here to extract strings from Tar headers:
runtime/src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Read.cs
Line 387 in 9e5e6aa
name: TarHelpers.GetTrimmedUtf8String(buffer.Slice(FieldLocations.Name, FieldLengths.Name)), |
Internally, TarHelpers.GetTrimmedUtf8String()
calls TarHelpers.TrimEndingNullsAndSpaces()
that, as the name implies, trims trailing '\0'
(0x00) and ' '
(0x20) characters.
According to these specs, these fields are null-terminated character strings.
The
name
,linkname
,magic
,uname
, andgname
are null-terminated character strings.
So the correct thing to do would be to keep as many characters as possible, and stop at the null-terminator character.
Not following this practice causes issues with this tar which contains extra bytes after the null-terminator.
Here's a hex-dump of the header for one of the entries:
.NET interprets this name as "python/bin/idle3.13\0dle3.13"
which becomes "python/bin/idle3.13_dle3.13"
once extracted. The correct name would be "python/bin/idle3.13"
.
Reproduction Steps
Download cpython-3.13.5+20250702-x86_64-unknown-linux-gnu-install_only_stripped.tar.gz and place it in the appropriate directory.
Extract it with the following code:
using Stream fileStream = File.OpenRead(@"cpython-3.13.5+20250702-x86_64-unknown-linux-gnu-install_only_stripped.tar.gz");
Directory.CreateDirectory(@"~temp");
using GZipStream tarStream = new(fileStream, CompressionMode.Decompress);
await TarFile.ExtractToDirectoryAsync(tarStream, @"~temp", false);
Expected behavior
File names should respect null-terminated strings, such as in this example:

Actual behavior
File names of the extracted files are incorrect:

Regression?
No response
Known Workarounds
No response
Configuration
.NET 9.0.300
Other information
No response