Skip to content

Commit

Permalink
[release/7.0] Backport tar bug fixes and improvements (#74449)
Browse files Browse the repository at this point in the history
* Improve performance of Tar library (#74281)

* Avoid unnecessary byte[] allocations

* Remove unnecessary use of FileStreamOptions

* Clean up Dispose{Async} implementations

* Clean up unnecessary consts

Not a perf thing, just readability.

* Remove MemoryStream/Encoding.UTF8.GetBytes allocations, unnecessary async variants, and overhaul GenerateExtendedAttributesDataStream

* Avoid string allocations in ReadMagicAttribute

* Avoid allocation in WriteAsOctal

* Improve handling of octal

* Avoid allocation for version string

* Removing boxing and char string allocation in GenerateExtendedAttributeName

* Fix a couple unnecessary dictionary lookups

* Replace Enum.HasFlag usage

* Remove allocations from Write{Posix}Name

* Replace ArrayPool use with string.Create

* Replace more superfluous ArrayPool usage

* Remove ArrayPool use from System.IO.Compression.ZipFile

* Fix inverted condition

* Use generic math to parse octal

* Remove allocations from StringReader and string.Split

* Remove magic string allocation for Ustar when not V7

* Remove file name and directory name allocation in GenerateExtendedAttributeName

* fix tar strings (#74321)

* Fix some corner cases in TarReader (#74329)

* Fix a few Tar issues post perf improvements (#74338)

* Fix a few Tar issues post perf improvements

* Update src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarHeader.Write.cs

* Skip directory symlink recursion on TarFile archive creation (#74376)

* Skip directory symlink recursion on TarFile archive creation

* Add symlink verification

* Address suggestions by danmoseley

Co-authored-by: carlossanlop <carlossanlop@users.noreply.github.com>

* SkipBlockAlignmentPadding must be executed for all entries (#74396)

* Set modified timestamps on files being extracted from tar archives (#74400)

* Add tests for exotic external tar asset archives, fix some more corner case bugs (#74412)

* Remove unused _readFirstEntry. Remnant from before we created PaxGlobalExtendedAttributesEntry.

* Set the position of the freshly copied data stream to 0, so the first user access of the DataStream property gives them a stream ready to read from the beginning.

* Process a PAX actual entry's data block only after the extended attributes are analyzed, in case the size is found as an extended attribute and needs to be overriden.

* Add tests to verify the entries of the new external tar assets can be read. Verify their DataStream if available.

* Add copyData argument to recent alignment padding tests.

* Throw an exception sooner and with a clearer message when a data section is unexpected for the entry type.

* Allow trailing nulls and spaces in octal fields.
Co-authored-by: @am11 Adeel Mujahid <3840695+am11@users.noreply.github.com>

* Throw a clearer exception if the unsupported sparse file entry type is encountered. These entries have additional data that indicates the locations of sparse bytes, which cannot be read with just the size field. So to avoid accidentally offseting the reader, we throw.

* Tests.

* Rename to TrimLeadingNullsAndSpaces

Co-authored-by: carlossanlop <carlossanlop@users.noreply.github.com>

* Remove Compression changes, keep changes confined to Tar.

* Fix build failure due to missing using in TarHelpers.Windows

Co-authored-by: Stephen Toub <stoub@microsoft.com>
Co-authored-by: Dan Moseley <danmose@microsoft.com>
Co-authored-by: Adeel Mujahid <3840695+am11@users.noreply.github.com>
Co-authored-by: carlossanlop <carlossanlop@users.noreply.github.com>
Co-authored-by: David Cantú <dacantu@microsoft.com>
  • Loading branch information
6 people authored Aug 24, 2022
1 parent 4965a24 commit 35e86f9
Show file tree
Hide file tree
Showing 24 changed files with 1,628 additions and 567 deletions.
6 changes: 6 additions & 0 deletions src/libraries/System.Formats.Tar/src/Resources/Strings.resx
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,9 @@
<value>The entry is a symbolic link or a hard link but the LinkName field is null or empty.</value>
</data>
<data name="TarEntryTypeNotSupported" xml:space="preserve">
<value>Entry type '{0}' not supported.</value>
</data>
<data name="TarEntryTypeNotSupportedInFormat" xml:space="preserve">
<value>Entry type '{0}' not supported in format '{1}'.</value>
</data>
<data name="TarEntryTypeNotSupportedForExtracting" xml:space="preserve">
Expand Down Expand Up @@ -255,4 +258,7 @@
<data name="IO_SeekBeforeBegin" xml:space="preserve">
<value>An attempt was made to move the position before the beginning of the stream.</value>
</data>
<data name="TarInvalidNumber" xml:space="preserve">
<value>Unable to parse number.</value>
</data>
</root>
Original file line number Diff line number Diff line change
Expand Up @@ -188,10 +188,7 @@ public override Task FlushAsync(CancellationToken cancellationToken) =>
// the substream is just 'a chunk' of the super-stream
protected override void Dispose(bool disposing)
{
if (disposing && !_isDisposed)
{
_isDisposed = true;
}
_isDisposed = true;
base.Dispose(disposing);
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -465,7 +465,7 @@ private void VerifyPathsForEntryType(string filePath, string? linkTargetPath, bo
// If the destination contains a directory segment, need to check that it exists
if (!string.IsNullOrEmpty(directoryPath) && !Path.Exists(directoryPath))
{
throw new IOException(string.Format(SR.IO_PathNotFound_NoPathName, filePath));
throw new IOException(string.Format(SR.IO_PathNotFound_Path, filePath));
}

if (!Path.Exists(filePath))
Expand Down Expand Up @@ -529,7 +529,7 @@ private void ExtractAsRegularFile(string destinationFileName)
DataStream?.CopyTo(fs);
}

ArchivingUtils.AttemptSetLastWriteTime(destinationFileName, ModificationTime);
AttemptSetLastWriteTime(destinationFileName, ModificationTime);
}

// Asynchronously extracts the current entry as a regular file into the specified destination.
Expand All @@ -551,7 +551,19 @@ private async Task ExtractAsRegularFileAsync(string destinationFileName, Cancell
}
}

ArchivingUtils.AttemptSetLastWriteTime(destinationFileName, ModificationTime);
AttemptSetLastWriteTime(destinationFileName, ModificationTime);
}

private static void AttemptSetLastWriteTime(string destinationFileName, DateTimeOffset lastWriteTime)
{
try
{
File.SetLastWriteTime(destinationFileName, lastWriteTime.LocalDateTime); // SetLastWriteTime expects local time
}
catch
{
// Some OSes like Android might not support setting the last write time, the extraction should not fail because of that
}
}

private FileStreamOptions CreateFileStreamOptions(bool isAsync)
Expand Down
91 changes: 50 additions & 41 deletions src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarFile.cs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.IO.Enumeration;
using System.Threading;
using System.Threading.Tasks;

Expand All @@ -15,11 +16,6 @@ namespace System.Formats.Tar
/// </summary>
public static class TarFile
{
// Windows' MaxPath (260) is used as an arbitrary default capacity, as it is likely
// to be greater than the length of typical entry names from the file system, even
// on non-Windows platforms. The capacity will be increased, if needed.
private const int DefaultCapacity = 260;

/// <summary>
/// Creates a tar stream that contains all the filesystem entries from the specified directory.
/// </summary>
Expand Down Expand Up @@ -222,7 +218,7 @@ public static void ExtractToDirectory(string sourceFileName, string destinationD

if (!File.Exists(sourceFileName))
{
throw new FileNotFoundException(string.Format(SR.IO_FileNotFound, sourceFileName));
throw new FileNotFoundException(string.Format(SR.IO_FileNotFound_FileName, sourceFileName));
}

if (!Directory.Exists(destinationDirectoryName))
Expand Down Expand Up @@ -261,7 +257,7 @@ public static Task ExtractToDirectoryAsync(string sourceFileName, string destina

if (!File.Exists(sourceFileName))
{
return Task.FromException(new FileNotFoundException(string.Format(SR.IO_FileNotFound, sourceFileName)));
return Task.FromException(new FileNotFoundException(string.Format(SR.IO_FileNotFound_FileName, sourceFileName)));
}

if (!Directory.Exists(destinationDirectoryName))
Expand All @@ -283,23 +279,22 @@ private static void CreateFromDirectoryInternal(string sourceDirectoryName, Stre
DirectoryInfo di = new(sourceDirectoryName);
string basePath = GetBasePathForCreateFromDirectory(di, includeBaseDirectory);

char[] entryNameBuffer = ArrayPool<char>.Shared.Rent(DefaultCapacity);

try
bool skipBaseDirRecursion = false;
if (includeBaseDirectory)
{
if (includeBaseDirectory)
{
writer.WriteEntry(di.FullName, GetEntryNameForBaseDirectory(di.Name, ref entryNameBuffer));
}
writer.WriteEntry(di.FullName, GetEntryNameForBaseDirectory(di.Name));
skipBaseDirRecursion = (di.Attributes & FileAttributes.ReparsePoint) != 0;
}

foreach (FileSystemInfo file in di.EnumerateFileSystemInfos("*", SearchOption.AllDirectories))
{
writer.WriteEntry(file.FullName, GetEntryNameForFileSystemInfo(file, basePath.Length, ref entryNameBuffer));
}
if (skipBaseDirRecursion)
{
// The base directory is a symlink, do not recurse into it
return;
}
finally

foreach (FileSystemInfo file in GetFileSystemEnumerationForCreation(sourceDirectoryName))
{
ArrayPool<char>.Shared.Return(entryNameBuffer);
writer.WriteEntry(file.FullName, GetEntryNameForFileSystemInfo(file, basePath.Length));
}
}
}
Expand Down Expand Up @@ -339,44 +334,58 @@ private static async Task CreateFromDirectoryInternalAsync(string sourceDirector
DirectoryInfo di = new(sourceDirectoryName);
string basePath = GetBasePathForCreateFromDirectory(di, includeBaseDirectory);

char[] entryNameBuffer = ArrayPool<char>.Shared.Rent(DefaultCapacity);

try
bool skipBaseDirRecursion = false;
if (includeBaseDirectory)
{
if (includeBaseDirectory)
{
await writer.WriteEntryAsync(di.FullName, GetEntryNameForBaseDirectory(di.Name, ref entryNameBuffer), cancellationToken).ConfigureAwait(false);
}
await writer.WriteEntryAsync(di.FullName, GetEntryNameForBaseDirectory(di.Name), cancellationToken).ConfigureAwait(false);
skipBaseDirRecursion = (di.Attributes & FileAttributes.ReparsePoint) != 0;
}

foreach (FileSystemInfo file in di.EnumerateFileSystemInfos("*", SearchOption.AllDirectories))
{
await writer.WriteEntryAsync(file.FullName, GetEntryNameForFileSystemInfo(file, basePath.Length, ref entryNameBuffer), cancellationToken).ConfigureAwait(false);
}
if (skipBaseDirRecursion)
{
// The base directory is a symlink, do not recurse into it
return;
}
finally

foreach (FileSystemInfo file in GetFileSystemEnumerationForCreation(sourceDirectoryName))
{
ArrayPool<char>.Shared.Return(entryNameBuffer);
await writer.WriteEntryAsync(file.FullName, GetEntryNameForFileSystemInfo(file, basePath.Length), cancellationToken).ConfigureAwait(false);
}
}
}

// Generates a recursive enumeration of the filesystem entries inside the specified source directory, while
// making sure that directory symlinks do not get recursed.
private static IEnumerable<FileSystemInfo> GetFileSystemEnumerationForCreation(string sourceDirectoryName)
{
return new FileSystemEnumerable<FileSystemInfo>(
directory: sourceDirectoryName,
transform: (ref FileSystemEntry entry) => entry.ToFileSystemInfo(),
options: new EnumerationOptions()
{
RecurseSubdirectories = true
})
{
ShouldRecursePredicate = IsNotADirectorySymlink
};

static bool IsNotADirectorySymlink(ref FileSystemEntry entry) => entry.IsDirectory && (entry.Attributes & FileAttributes.ReparsePoint) == 0;
}

// Determines what should be the base path for all the entries when creating an archive.
private static string GetBasePathForCreateFromDirectory(DirectoryInfo di, bool includeBaseDirectory) =>
includeBaseDirectory && di.Parent != null ? di.Parent.FullName : di.FullName;

// Constructs the entry name used for a filesystem entry when creating an archive.
private static string GetEntryNameForFileSystemInfo(FileSystemInfo file, int basePathLength, ref char[] entryNameBuffer)
private static string GetEntryNameForFileSystemInfo(FileSystemInfo file, int basePathLength)
{
int entryNameLength = file.FullName.Length - basePathLength;
Debug.Assert(entryNameLength > 0);

bool isDirectory = file.Attributes.HasFlag(FileAttributes.Directory);
return ArchivingUtils.EntryFromPath(file.FullName, basePathLength, entryNameLength, ref entryNameBuffer, appendPathSeparator: isDirectory);
bool isDirectory = (file.Attributes & FileAttributes.Directory) != 0;
return TarHelpers.EntryFromPath(file.FullName.AsSpan(basePathLength), appendPathSeparator: isDirectory);
}

private static string GetEntryNameForBaseDirectory(string name, ref char[] entryNameBuffer)
private static string GetEntryNameForBaseDirectory(string name)
{
return ArchivingUtils.EntryFromPath(name, 0, name.Length, ref entryNameBuffer, appendPathSeparator: true);
return TarHelpers.EntryFromPath(name, appendPathSeparator: true);
}

// Extracts an archive into the specified directory.
Expand Down
Loading

0 comments on commit 35e86f9

Please sign in to comment.