Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some corner cases in TarReader #74329

Merged

Conversation

am11
Copy link
Member

@am11 am11 commented Aug 22, 2022

  • Fix reading of tar file with hardlinks.
  • Fix reading of tar files with mixed formats (gnu with ustar or pax entries and vice versa).
  • Add node-tar fixtures test added in dotnet/runtime-assets repo.

Contributes to #74316.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 22, 2022
@ghost
Copy link

ghost commented Aug 22, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details
  • Fix reading of tar file with hardlinks.
  • Fix reading of tar files with mixed formats (gnu with ustar or pax entries and vice versa).
  • Add node-tar fixtures test added in dotnet/runtime-assets repo.

Contributes to #74316.

Author: am11
Assignees: -
Labels:

area-System.IO, community-contribution

Milestone: -

Copy link
Member

@carlossanlop carlossanlop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good. I just have a question about the test.

TarEntry? entry = null;
while (true)
{
Exception ex = Record.Exception(() => entry = reader.GetNextEntry());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First time I see Record.Exception being used.

What advantage does it have over simply letting this throw?

Copy link
Member Author

@am11 am11 Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of Record.Exception() followed by Assert.Null() is to be explicit about asserting that 'the method call is not throwing with this fix' (which were throwing without the fix). It also avoids seemingly unused statements to be optimized out by the JIT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're consuming or asserting something about entry, Name, and Type the jit cannot remove them, and you're implicitly testing that they don't throw.

Eg you can assert that Name is never null, and that if type is Directory then length is zero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eg you could do more or less what the test code did and then you won't need to record the exception explicitly

var ms = new MemoryStream();               
Assert.NotEmpty(entry.Name);                     
Assert.True(Enum.IsDefined(entry.EntryType));               
Assert.True(Enum.IsDefined(entry.Format));         
if (entry.EntryType == TarEntryType.Directory)
    continue;
var ds = entry.DataStream;
                       
if (ds != null && ds.Length > 0)
{                            
    ds.CopyTo(ms);
}

@danmoseley
Copy link
Member

Cc @stephentoub

@carlossanlop
Copy link
Member

LGTM. Thanks for fixing this @am11.

CI failure was a cancellation due to timeout.

@danmoseley I'll prepare the backport manually.

@carlossanlop carlossanlop merged commit bfd9eb0 into dotnet:main Aug 22, 2022

ex = Record.Exception(() => entry.Length);
Assert.Null(ex);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read content into a MemoryStream perhaps, as I did in my test code?

TarEntry? entry = null;
while (true)
{
Exception ex = Record.Exception(() => entry = reader.GetNextEntry());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're consuming or asserting something about entry, Name, and Type the jit cannot remove them, and you're implicitly testing that they don't throw.

Eg you can assert that Name is never null, and that if type is Directory then length is zero.

TarEntry? entry = null;
while (true)
{
Exception ex = Record.Exception(() => entry = reader.GetNextEntry());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eg you could do more or less what the test code did and then you won't need to record the exception explicitly

var ms = new MemoryStream();               
Assert.NotEmpty(entry.Name);                     
Assert.True(Enum.IsDefined(entry.EntryType));               
Assert.True(Enum.IsDefined(entry.Format));         
if (entry.EntryType == TarEntryType.Directory)
    continue;
var ds = entry.DataStream;
                       
if (ds != null && ds.Length > 0)
{                            
    ds.CopyTo(ms);
}

@danmoseley
Copy link
Member

@carlossanlop my suggestion about backport is for us to fix the various issues first in main, then bring a single back port PR. It will be less confusing. Also because the code has diverged.

carlossanlop pushed a commit to carlossanlop/runtime that referenced this pull request Aug 23, 2022
carlossanlop pushed a commit to carlossanlop/runtime that referenced this pull request Aug 23, 2022
carlossanlop added a commit that referenced this pull request Aug 24, 2022
* Improve performance of Tar library (#74281)

* Avoid unnecessary byte[] allocations

* Remove unnecessary use of FileStreamOptions

* Clean up Dispose{Async} implementations

* Clean up unnecessary consts

Not a perf thing, just readability.

* Remove MemoryStream/Encoding.UTF8.GetBytes allocations, unnecessary async variants, and overhaul GenerateExtendedAttributesDataStream

* Avoid string allocations in ReadMagicAttribute

* Avoid allocation in WriteAsOctal

* Improve handling of octal

* Avoid allocation for version string

* Removing boxing and char string allocation in GenerateExtendedAttributeName

* Fix a couple unnecessary dictionary lookups

* Replace Enum.HasFlag usage

* Remove allocations from Write{Posix}Name

* Replace ArrayPool use with string.Create

* Replace more superfluous ArrayPool usage

* Remove ArrayPool use from System.IO.Compression.ZipFile

* Fix inverted condition

* Use generic math to parse octal

* Remove allocations from StringReader and string.Split

* Remove magic string allocation for Ustar when not V7

* Remove file name and directory name allocation in GenerateExtendedAttributeName

* fix tar strings (#74321)

* Fix some corner cases in TarReader (#74329)

* Fix a few Tar issues post perf improvements (#74338)

* Fix a few Tar issues post perf improvements

* Update src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs

* Skip directory symlink recursion on TarFile archive creation (#74376)

* Skip directory symlink recursion on TarFile archive creation

* Add symlink verification

* Address suggestions by danmoseley

Co-authored-by: carlossanlop <carlossanlop@users.noreply.github.com>

* SkipBlockAlignmentPadding must be executed for all entries (#74396)

* Set modified timestamps on files being extracted from tar archives (#74400)

* Add tests for exotic external tar asset archives, fix some more corner case bugs (#74412)

* Remove unused _readFirstEntry. Remnant from before we created PaxGlobalExtendedAttributesEntry.

* Set the position of the freshly copied data stream to 0, so the first user access of the DataStream property gives them a stream ready to read from the beginning.

* Process a PAX actual entry's data block only after the extended attributes are analyzed, in case the size is found as an extended attribute and needs to be overriden.

* Add tests to verify the entries of the new external tar assets can be read. Verify their DataStream if available.

* Add copyData argument to recent alignment padding tests.

* Throw an exception sooner and with a clearer message when a data section is unexpected for the entry type.

* Allow trailing nulls and spaces in octal fields.
Co-authored-by: @am11 Adeel Mujahid <3840695+am11@users.noreply.github.com>

* Throw a clearer exception if the unsupported sparse file entry type is encountered. These entries have additional data that indicates the locations of sparse bytes, which cannot be read with just the size field. So to avoid accidentally offseting the reader, we throw.

* Tests.

* Rename to TrimLeadingNullsAndSpaces

Co-authored-by: carlossanlop <carlossanlop@users.noreply.github.com>

* Remove Compression changes, keep changes confined to Tar.

* Fix build failure due to missing using in TarHelpers.Windows

Co-authored-by: Stephen Toub <stoub@microsoft.com>
Co-authored-by: Dan Moseley <danmose@microsoft.com>
Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com>
Co-authored-by: carlossanlop <carlossanlop@users.noreply.github.com>
Co-authored-by: David Cantú <dacantu@microsoft.com>
@ghost ghost locked as resolved and limited conversation to collaborators Sep 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Formats.Tar community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants