Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Fast file enumeration for Windows #25426

Merged
merged 20 commits into from
Nov 30, 2017
Merged

Conversation

JeremyKuhne
Copy link
Member

Implements an optimized, low allocation replacement for file enumeration on Windows. In preliminary tests this improves performance at least two fold, particularly for recursive searches.

Removes agressive filter validation.

Copies and cleans up filename matching code from FileSystemWatcher.

Other notes:

  • Attempting to build the DosMatcher for Unix as well
  • Specific unit tests for DosMatcher forthcoming- existing tests cover it pretty well, however
  • Intent is to take the internal model and make it public and cross platform to allow custom filtering and transforms for enumeration
  • Also investigating exposing options for error and link handling
  • Intent is to make the Span combine code public as well (likely as part of Path)
  • Need to follow up on conversion of NTSTATUS errors- might have to do a fallback to GetFileInformationFromHandleEx if the conversion API isn't actually available on OneCore
  • Win7 doesn't have the right options for GetFileInformationFromHandleEx and WinRT can't use the NT API

/// Returns basic information (timestamps and attributes).
/// </summary>
/// <remarks>
/// Thunks to NtQueryInformationFile & FileBasicInformation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

& needs to be escaped in xml as &amp; (applies throughout these xml doc comments)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing


namespace System.IO
{
internal partial class FindEnumerable<T> : IEnumerable<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this couldn't implement both IEnumerable<T> and IEnumerator<T>, and return itself from GetEnumerator in the common case to avoid the enumerator allocation, like Win32FileSystemEnumerableIterator<TSource> before this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of being friendly to Linq queries being run multiple times. I'll think it through some more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should still support being able to call GetEnumerator multiple times... in such cases, you'd create a new instance with initial state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should still support being able to call GetEnumerator multiple times... in such cases, you'd create a new instance with initial state.

Not sure I follow. How would that save an allocation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at Iterator<TSource> (which is the base class of Win32FileSystemEnumerableIterator<TSource>) :

Iterator<TSource> implements both IEnumerable<T> and IEnumerator<T>.

Here's the GetEnumerator implementation:

public IEnumerator<TSource> GetEnumerator()
{
if (state == 0 && _threadId == Environment.CurrentManagedThreadId)
{
state = 1;
return this;
}
Iterator<TSource> duplicate = Clone();
duplicate.state = 1;
return duplicate;
}

The common case when using foreach or a LINQ query is to get the enumerable and then immediately call GetEnumerator to get the enumerator to start enumerating through the enumerable (e.g. foreach (string file in Directory.EnumerateFiles(...)) { ... }). If GetEnumerator hasn't already been called and it's the same thread, it can simply return itself. Otherwise, a new instance is created.

This is also how iterators work in C# when you use yield return. Similar iterator code is emitted by the compiler to use the same instance for both the enumerable and enumerator, only allocating a new instance if GetEnumerator if necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough- I'll look at utilizing Iterator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether or not Iterator is used as a base class, the same approach of reusing the enumerable object as the enumerator object for at least the initial enumeration should definitely be done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the change now.

int* priorStack = stackalloc int[16];
int* currentStack = stackalloc int[16];
Span<int> priorMatches = new Span<int>(priorStack, 16);
Span<int> currentMatches = new Span<int>(currentStack, 16);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason these can't use the recently added C# stackalloc/span support, e.g.

Span<int> priorMatches = stackalloc int[16];
Span<int> currentMatches = stackalloc int[16];

at which point, it looks like this method would no longer need to be unsafe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't aware of it. Tried switching and it wouldn't let me do the swap between the two anymore (at the end of the method).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried switching and it wouldn't let me do the swap between the two anymore (at the end of the method).

@JeremyKuhne, what error do you get?
@VSadov?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"P:\repos\corefx\src\System.IO.FileSystem\src\System.IO.FileSystem.csproj" (default target) (1) ->
(CoreCompile target) ->
  System\IO\DosMatcher.cs(298,24): error CS8352: Cannot use local 'priorMatches' in this context because it may expose referenced variables outside of their declaration scope [P:\repos
\corefx\src\System.IO.FileSystem\src\System.IO.FileSystem.csproj]


internal partial class Interop
{
public struct BOOLEAN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is nested within an internal type, so it will never actually be public, but I'm curious: should these nested types be public or internal? The visibility of Interop types in this PR are inconsistent. Is there an existing convention?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal is better- I'll fix it

[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal static bool EqualsOrdinal(ReadOnlySpan<char> first, ReadOnlySpan<char> second, bool ignoreCase = false)
{
return first.Length != second.Length ? false : CompareOrdinal(first, second, ignoreCase) == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return first.Length == second.Length && CompareOrdinal(first, second, ignoreCase) == 0;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

/// </remarks>
public unsafe static bool MatchPattern(string expression, ReadOnlySpan<char> name, bool ignoreCase = true)
{
string debug = new string(name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intended to remain?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No :) Thanks!

private string _originalUserPath;
private bool _recursive;
private FindTransform<T> _transform;
private FindPredicate _predicate;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can any of these fields be readonly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double check.

&& DosMatcher.MatchPattern(expression, findData.FileName, ignoreCase: true);
}

return new FindEnumerable<string>(directory, recursive, FindTransforms.AsUserFullPath, predicate);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing FindTransforms.AsUserFullPath and predicate like this is going to result in two delegate allocations each time UserFiles is called. Same for the methods below. It'd be nice if these delegate instances were lazily allocated and cached/reused.

You could attempt to address this by writing it as:

    	FindTransform<string> transform = (ref RawFindData findData) =>
        {
            ReadOnlySpan<char> subdirectory = findData.Directory.AsReadOnlySpan().Slice(findData.OriginalDirectory.Length);
            return PathHelpers.CombineNoChecks(findData.OriginalUserDirectory, subdirectory, findData.FileName);
        };
        
        FindPredicate predicate = (ref RawFindData findData) =>
        {
            return FindPredicates.NotDotOrDotDot(ref findData)
                && !FindPredicates.IsDirectory(ref findData)
                && DosMatcher.MatchPattern(expression, findData.FileName, ignoreCase: true);
        };
    
    	return new FindEnumerable<string>(directory, recursive, transform, predicate);

The C# compiler will lazily allocate and cache/reuse the transform delegate, but predicate is problematic as written above as it captures expression resulting in a closure allocation.

To avoid that, you'd have to change the FindPredicate signature to also take a string expression parameter, and then pass the expression to the FindEnumerable constructor so it can pass it along whenever it calls predicate, e.g.:

internal delegate bool FindPredicate(string expression, ref RawFindData findData);
        FindPredicate predicate = (string expr, ref RawFindData findData) =>
        {
            return FindPredicates.NotDotOrDotDot(ref findData)
                && !FindPredicates.IsDirectory(ref findData)
                && DosMatcher.MatchPattern(expr, findData.FileName, ignoreCase: true);
        };

        return new FindEnumerable<string>(directory, expression, recursive, transform, predicate);
        _predicate(_expression, ref findData);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do the first part for sure. Redefining the predicate delegate is problematic I think. Not all predicates need the extra parameter and future ones will need different arguments (think all files modified after some DateTime).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all predicates need the extra parameter and future ones will need different arguments

Can FindPredicate take a generic TState that can be used to pass any extra info needed?

Alternatively, instead of using delegates for the predicate and transform you could have abstract/virtual Transform/Predicate methods on the enumerable class, and have sealed subclasses for each type of enumeration.

Interop.Kernel32.CloseHandle(queue.Dequeue().Handle);
}

Interop.Kernel32.CloseHandle(_directoryHandle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to call CloseHandle multiple times if Dispose is called multiple times?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move it up to the buffer interlock to be safe about it, thanks!

if (second.Length == 0)
return first;

string result = CombineNoChecksInternal(first.AsReadOnlySpan(), second);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string result is unnecessary. (multiple places)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks- missed taking this out. Debugging through the AsReadOnlySpan() kept randomly terminating the debugger (under the latest public release of VS 2017) so I temporarily put this in. @stephentoub, are you aware of any issues like this with debugging Span? I'm not sure where to begin looking for more details when the debugger terminates unexpectedly...


internal partial class Interop
{
internal struct BOOLEAN
Copy link
Member

@jkotas jkotas Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that

    internal enum BOOLEAN : byte
    {
        FALSE = 0,
        TRUE = 1,
    }

should be sufficient given the limited use. Similar how Interop.BOOL.cs is done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition I have deals with non 0/1 values can be used just like a bool. What do we gain by making it an enum other than a simpler definition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will get a simpler definition that was my point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to address this? We are inconsistent today - both Interop.BOOL.cs and Interop.BOOLEAN.cs should use the same style.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to address this? We are inconsistent today - both Interop.BOOL.cs and Interop.BOOLEAN.cs should use the same style.

As both you and Stephen like the enum version I'll switch it to that format.


int totalLength = first.Length + second.Length + (hasSeparator ? 0 : 1);
string fullPath = new string('\0', totalLength);
fixed (char* f = fullPath)
Copy link
Member

@jkotas jkotas Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sight...
I am wondering whether the new string stringCreate<TState>(int length, TState state, System.Buffers.SpanAction<char, TState> action) API is good since you cannot really use it here.
cc @stephentoub

Copy link
Member

@stephentoub stephentoub Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you cannot really use it here

Because you can't pass spans in generics? We're going to have that problem in a bunch of places and is something we should try to solve more generally, e.g. a "ref struct" constraint and then an overload that accepts a delegate that takes such a constrained arg (and maybe have a built-in value tuple constrained to ref structs).

You can use the overload here, though, it just means pinning the input spans instead of the destination string, e.g.

using (char* firstPtr = &first.DangerousGetPinnableReference(), secondPtr = &second.DangerousGetPinnableReference())
{
    return string.Create(totalLength, ((IntPtr)firstPtr, first.Length, (IntPtr)secondPtr, second.Length, hasSeparator), (dst, state) =>
    {
        new ReadOnlySpan<char>((char*)state.Item1, state.Item2).CopyTo(dst);
        if (!state.Item5) dst[state.Item2] = Path.DirectorySeparatorChar;
        new ReadOnlyspan<char>((char*)state.Item3, state.Item4).CopyTo(dst.Slice(state.Item2 + (state.Item5 ? 0 : 1));
    });
}

etc. And tuple arg names could be used to avoid those ItemXs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try using string.Create

{
internal static partial class CharSpanExtensions
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding AggressiveInlining inlining that leads to codebloat, you should avoid the need for marshaling in CompareStringOrdinal (use char* and BOOL in the signature) and do the marshaling yourself (using fixed, etc.). It will be smaller and faster code that way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow exactly. Do fully blittable arguments (when they're simple types) make that much of a difference in imports? We've been encouraging using ref char in other places to avoid needing to use fixed... @stephentoub I recall you suggesting ref char for Span- I might be remembering incorrectly. I want to make sure we're clear on guidance here.

The results and error handling aren't exactly straight forward so I think having a wrapper method for usage is appropriate. (I also think this is a method we'd want to ultimately have public). I'm fine with removing the AggressiveInlining if the advantage is trivial.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've encouraged using ref instead of pointers to keep the code simpler and minimize unsafe pointer usage that we could mess up. If specific cases benefit from using pointers instead for perf, that's fine.

/// Equivalent to <a href="https://msdn.microsoft.com/en-us/library/windows/desktop/hh447298.aspx">FILE_FULL_DIR_INFO</a> structure.
/// </summary>
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
public unsafe struct FILE_FULL_DIR_INFORMATION
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's equivalent to FILE_FULL_DIR_INFO, shouldn't it be called FILE_FULL_DIR_INFO rather than FILE_FULL_DIR_INFORMATION?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same struct is defined twice in Windows under those two names. Given that I had to pick one I went with the "real" one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then shouldn't the comment match? If there are two equally valid names, it's fine to pick one, but we should pick one ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does- the summary calls out the actual name we used first. I also list the second definition as it isn't inherently obvious and we use it in both Win32 and NT API imports.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine

@@ -0,0 +1 @@
advapi32.dll!LsaNtStatusToWinError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok that this is being exempted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure- I called it out in my PR comments. I'm following up. If I have to I'll add logic to fall back to GetFileInformationByHandleEx when it isn't available. Win7 doesn't have the support we need in said API and it is slightly less efficient so we'd prefer to use the NtQuery API instead. It is sort of strange that they might not expose the ability to convert NTSTATUS to windows errors, given that they expose NTSTATUS. I'm trying to confirm that this API doesn't exist or if there is another entry point that gets to the same underlying API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is legalized for UWP (and it's not a new API) so it should be fine to use it on Nano, etc: our OneCore baseline just isn't updated to include it. I've reached out to understand exactly. This exclusion is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into it!

return false;

if (ignoreCase == true)
return first.SequenceEqual(second);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the logic here. Shouldn't this be if (!ignoreCase)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my mistake.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also ==true and ==false are ugly in general.

{
internal static partial class CharSpanExtensions
{
internal static unsafe bool EqualsOrdinal(ReadOnlySpan<char> first, ReadOnlySpan<char> second, bool ignoreCase = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's totally in the plan to add such APIs. We just need to find some cycles to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Just wanted to highlight the proliferation :)

{
if (value.Length == 0)
return true;
else if (span.Length == 0 || value.Length > span.Length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the first part of the condition here? Couldn't it just be:

else if (value.Length > span.Length)

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks


// Want to throw the original argument name
OriginalPath = originalPath ?? throw new ArgumentNullException("path");
FullPath = isNormalized ? fullPath ?? originalPath : Path.GetFullPath(fullPath ?? originalPath);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about storing fullPath ?? originalPath into a local? It's used three times in the above three lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

{
for (int i = 0; i < length; i++)
{
switch (c[i])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using unsafe code here? I would expect the JIT would do a very good job with:

foreach (char c in expression)

or

for (int i = 0; i < expression.Length; i++)
{
    char c = expression[i];
    ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough- I'll trust the JIT :)


if (expression.IndexOfAny(s_wildcardChars, startIndex: 1) == -1)
{
// Handle the special case of a single starting *, which essentially means "ends with"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra space between single and starting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

_directory = Path.GetFullPath(directory);
_recursive = recursive;
_predicate = predicate ?? throw new ArgumentNullException(nameof(predicate));
_transform = transform ?? throw new ArgumentNullException(nameof(transform));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand... if these aren't allowed to be null, why are they optional arguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops- artifact of earlier iteration. Fixing.

if (buffer != null)
ArrayPool<byte>.Shared.Return(buffer);

var queue = Interlocked.Exchange(ref _pending, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: var should be replaced by the actual type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I can't afaik. It doesn't allow me to use .Handle without redefining.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? What code did you write and what error did you get? You should be able to just write:

Queue<(IntPtr Handle, string Path)> queue = Interlocked.Exchange(ref _pending, null);

That doesn't work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does, you just have to redefine the names again. With var you don't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With var you don't.

But with var as I'm reading the code I have no idea what this thing is.

}

protected void Dispose(bool disposing)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the GCHandle freed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! Fixed.

{
return FindPredicates.NotDotOrDotDot(ref findData)
&& !FindPredicates.IsDirectory(ref findData)
&& DosMatcher.MatchPattern(expression, findData.FileName, ignoreCase: true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results in a closure allocation to capture expression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing.

{
return FindPredicates.NotDotOrDotDot(ref findData)
&& FindPredicates.IsDirectory(ref findData)
&& DosMatcher.MatchPattern(expression, findData.FileName, ignoreCase: true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results in a closure allocation to capture expression.

bool predicate(ref RawFindData findData)
{
return FindPredicates.NotDotOrDotDot(ref findData)
&& DosMatcher.MatchPattern(expression, findData.FileName, ignoreCase: true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results in a closure allocation to capture expression.

{
return FindPredicates.NotDotOrDotDot(ref findData)
&& !FindPredicates.IsDirectory(ref findData)
&& DosMatcher.MatchPattern(expression, findData.FileName, ignoreCase: true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same; I'll stop commenting on these.

{
FileAttributes attributes = findData.Attributes;
return attributes != (FileAttributes)(-1)
&& (findData.Attributes & FileAttributes.Directory) != 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findData.Attributes was just stored into a local... the local should be used instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks- fixed


// Historically we always treated "." as "*"
if (string.IsNullOrEmpty(expression)
|| expression == "." || expression == "*.*")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: pull this up to the previous line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


namespace System.IO
{
internal static partial class FindPredicates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: To me personally, calling this FindPredicates made me though that it would have members that were somehow directly related via the type system to FindPredicate, but that's not the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to suggestions here.

return tempSearchPattern;
// In situations where this method is invoked, "<DriveLetter>:" should be special-cased
// to instead go to the current directory.
return path != null && path.Length == 2 && path[1] == ':';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check what path[0] is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check I'm not thrilled about in general. Not sure what the original intent was. Adding the check wouldn't do much other than make !: not "work" when it does now. I'll open an issue to track this weird behavior.

internal static string TrimEndingDirectorySeparator(string path) =>
EndsInDirectorySeparator(path) ?
path.Substring(0, path.Length - 1) :
path;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this work in terms of spans to avoid the substring? Or the caller really needs a string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can when we add the Span overloads to Path.

}

public override IEnumerable<FileSystemInfo> EnumerateFileSystemInfos(string fullPath, string searchPattern, SearchOption searchOption, SearchTarget searchTarget)
{
FindEnumerableFactory.NormalizeInputs(ref fullPath, ref searchPattern);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would appear to change what exception gets thrown if both searchTarget wasn't a known enum value and if searchPattern was bad. Previously we would throw for the searchTarget, no we'll throw for the searchPattern. We may not care about that ordering, but wanted to call it out, and if no tests failed for it, we might have a tiny test hole to consider patching.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing it. I removed asserts, I believe this is still checked at all of the entry points.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing it

Let's assume you had both an invalid searchPattern and an invalid searchExpression. Previously, you'd enter the method, it would jump to the default in the switch, and it would throw an expression for the searchPattern. Now, the first thing it does is call NormalizeInputs, which will first check searchExpression, and if it's invalid, it'll throw an exception for searchExpression. I'm pointing out that we're throwing a different error now with this change. Whether that matters, eh, up to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derp- I was looking at SearchOption, sorry. In this case I think I prefer the order change and don't think it is risky.

@@ -652,19 +784,32 @@ public void SearchPatternLongPath()

[Fact]
[PlatformSpecific(TestPlatforms.Windows)] // Search pattern with double dots throws ArgumentException
public void WindowsSearchPatternWithDoubleDots()
[SkipOnTargetFramework(~TargetFrameworkMonikers.NetFramework)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're skipping on everything but .NET Framework, we probably don't need the PlatformSpecific attribute, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@@ -652,19 +784,32 @@ public void SearchPatternLongPath()

[Fact]
[PlatformSpecific(TestPlatforms.Windows)] // Search pattern with double dots throws ArgumentException
public void WindowsSearchPatternWithDoubleDots()
[SkipOnTargetFramework(~TargetFrameworkMonikers.NetFramework)]
public void WindowsSearchPatternWithDoubleDots_Desktop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we consider porting these changes back to netfx? What would we do there... quirk it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we'll quirk. And this will only work in full trust.

{
Assert.Throws<ArgumentException>(() => GetEntries(TestDirectory, Path.Combine("..ab ab.. .. abc..d", "abc..")));
Assert.Throws<ArgumentException>(() => GetEntries(TestDirectory, ".."));
Assert.Throws<ArgumentException>(() => GetEntries(TestDirectory, @".." + Path.DirectorySeparatorChar));
}

[Fact]
[PlatformSpecific(TestPlatforms.Windows)] // Search pattern with double dots throws ArgumentException
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we throw for patterns with double dots, but only on .NET Core on Windows, not on .NET Framework and not on .NET Core on Unix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll remove it from Unix as well, forgot.

private static char[] OldWildcards = new char[] { '*', '?' };
private static char[] NewWildcards = new char[] { '<', '>', '\"' };

[Fact]
[PlatformSpecific(TestPlatforms.Windows)] // Windows-invalid search patterns throw
public void WindowsSearchPatternInvalid()
[SkipOnTargetFramework(~TargetFrameworkMonikers.NetFramework)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that we've made something which didn't use to work and threw an exception now instead work? So we still throw on netfx and .NET Core on Unix but not on Windows? Will this be made to work on Unix as well for consistency, and then presumably ported to netfx for consistency there as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I missed removing this from Unix. We'll port to NetFX in full trust- this was a limited trust based block.


namespace System.IO.Tests
{
public class Regressions : FileSystemTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to call these out as regression tests. In general we add lots of tests to fill gaps where we previously had test gaps, and how we discover such gaps really doesn't matter. In a year's time it won't really matter whether we added this test because we spotted it as a regression or not. I would just add this to whatever existing test class is most closely aligned to what's being tested, e.g. to our test class for enumerators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will change.

_enumerator = Directory.EnumerateFiles(directory).GetEnumerator();
Race(_enumerator);
}
source.Cancel();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually stopping the tasks if they've already started running, so this test is leaking two tasks that run until the end of the process. Passing a token to the Task ctor only cancels the task if the task hasn't yet started executing when the cancellation request arrives; if it's already started executing, there's nothing to cancel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixing.

Add length setter for ValueStringBuilder.
Remove Unix .. check for enumerable.
_enumerator = Directory.EnumerateFiles(directory).GetEnumerator();
Enumerate(_enumerator);
}
source.Cancel();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not waiting on the tasks you scheduled here, so you'll never know if any exceptions occurred in those tasks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks

if (x != null)
Enumerate(x);
} while (!token.IsCancellationRequested);
token.ThrowIfCancellationRequested();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ThrowIfCancellationRequested isn't needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Enumerate(_enumerator);
}
source.Cancel();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to do:

Task t1 = Task.Run(Work);
Task t2 = Task.Run(Work);
...
source.Cancel();
Task.WaitAll(t1, t2);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized this as well. Thanks :)

for (int i = 0; i < 1000; i++)
{
_enumerator = Directory.EnumerateFiles(directory).GetEnumerator();
Enumerate(_enumerator);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this throws, the source will never be canceled, and the tasks you've scheduled will loop infinitely until the process ends.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

{
throw;
}
catch (Exception)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. Another approach would be:

catch (Exception e) when (!(e is IOException)) { }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update it as I'm updating VSB.

{
Grow(value - _chars.Length);
}
_pos = value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks potentially buggy. Growing the builder might end up getting an array from the pool, which might contain garbage data, but we're updating the position to indicate that it's valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringBuilder appends nulls, I'll do that.

@JeremyKuhne
Copy link
Member Author

@dotnet-bot test Windows x64 Debug Build

2017-11-30 23:10:28,714: INFO: proc(54): run_and_log_output: Output:       Condition(s) not met: \"IsRootCertificateInstalled\"
2017-11-30 23:10:41,105: INFO: proc(54): run_and_log_output: Output: FailFast: AcquireCredentialsHandle failed.
2017-11-30 23:10:41,105: INFO: proc(54): run_and_log_output: Output: System.ComponentModel.Win32Exception (0x80004005): The credentials supplied to the package were not recognized
2017-11-30 23:10:41,105: INFO: proc(54): run_and_log_output: Output:    at System.Net.SSPIWrapper.AcquireCredentialsHandle(SSPIInterface secModule, String package, CredentialUse intent, SCHANNEL_CRED scc) in D:\j\workspace\windows-TGrou---74aa877a\src\Common\src\Interop\Windows\sspicli\SSPIWrapper.cs:line 170
2017-11-30 23:10:41,105: INFO: proc(54): run_and_log_output: Output:    at System.Net.Security.SslStreamPal.<>c__DisplayClass23_0.<AcquireCredentialsHandle>b__0() in D:\j\workspace\windows-TGrou---74aa877a\src\System.Net.Security\src\System\Net\Security\SslStreamPal.Windows.cs:line 465

<SNIP>

2017-11-30 23:10:41,121: INFO: proc(54): run_and_log_output: Output:    at System.Net.Security.Tests.ServerAsyncAuthenticateTest+<ServerAsyncSslHelper>d__11.MoveNext()
2017-11-30 23:10:41,121: INFO: proc(54): run_and_log_output: Output:    at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
2017-11-30 23:10:41,121: INFO: proc(54): run_and_log_output: Output:    at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
2017-11-30 23:10:41,121: INFO: proc(54): run_and_log_output: Output:    at Xunit.Sdk.AsyncTestSyncContext+<>c__DisplayClass7_0.<Post>b__1(System.Object)
2017-11-30 23:10:41,121: INFO: proc(54): run_and_log_output: Output:    at Xunit.Sdk.MaxConcurrencySyncContext.RunOnSyncContext(System.Threading.SendOrPostCallback, System.Object)
2017-11-30 23:10:41,121: INFO: proc(54): run_and_log_output: Output:    at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
2017-11-30 23:10:41,121: INFO: proc(54): run_and_log_output: Output:    at Xunit.Sdk.MaxConcurrencySyncContext.WorkerThreadProc()
2017-11-30 23:10:41,121: INFO: proc(54): run_and_log_output: Output:    at Xunit.Sdk.XunitWorkerThread+<>c.<QueueUserWorkItem>b__5_0(System.Object)
2017-11-30 23:10:41,121: INFO: proc(54): run_and_log_output: Output:    at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
2017-11-30 23:10:41,121: INFO: proc(54): run_and_log_output: Output:    at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)
2017-11-30 23:10:41,121: INFO: proc(54): run_and_log_output: Output:    at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
2017-11-30 23:10:44,667: INFO: proc(54): run_and_log_output: Output: Finished running tests.  End time=23:10:44.66, Exit code = -2146232797
2017-11-30 23:10:44,667: INFO: proc(58): run_and_log_output: Exit Code: -2146232797
2017-11-30 23:10:44,667: ERROR: scriptrunner(87): _main: Error: No exception thrown, but XUnit results not created
2017-11-30 23:10:44,667: ERROR: helix_test_execution(83): report_error: Error running xunit None

@JeremyKuhne JeremyKuhne merged commit eb0d438 into dotnet:master Nov 30, 2017
@JeremyKuhne JeremyKuhne deleted the fastfind2 branch November 30, 2017 23:47
@karelz karelz added this to the 2.1.0 milestone Dec 4, 2017
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Fast file enumeration for Windows

Implements an optimized, low allocation replacement for file enumeration on Windows. In preliminary tests this improves performance at least two fold, particularly for recursive searches.
Removes aggressive filter validation.
Copies and cleans up filename matching code from FileSystemWatcher.
Add length setter for ValueStringBuilder.
Remove Unix .. check for enumerable.

Commit migrated from dotnet/corefx@eb0d438
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants