Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FindFirstFileEx(FIND_FIRST_EX_LARGE_FETCH) instead of FindFirstFile()? #1550

Closed
mehrdadn opened this issue Mar 12, 2018 · 14 comments
Closed
Assignees

Comments

@mehrdadn
Copy link

mehrdadn commented Mar 12, 2018

Edit: Apologies, never mind; it seems this doesn't improve the fetch performance of the first entry like I had thought.

@dscho
Copy link
Member

dscho commented Mar 12, 2018

It does not improve the fetch performance of the first entry. That was never the intention. It improves performance of subsequent FindNextFile() calls, possibly by a lot e.g. when reading over network (i.e. a UNC path or a mapped network share).

However, it is only available in Windows 7 and later, and Git for Windows still supports Vista (even if we officially dropped support for XP due to lack of active contributors).

And the most benefit would probably come from adding this to the FSCache feature. So what would make most sense is to have a static version check in the beginning of fsentry_create_list() and then continue either with FindFirstFileW() or with FindFirstFileExW() depending on that version test.

Note: this would make for an excellent first-time contribution.

@dscho
Copy link
Member

dscho commented Mar 12, 2018

More information on when to use FIND_FIRST_EX_LARGE_FETCH: https://blogs.msdn.microsoft.com/oldnewthing/20131024-00/?p=2843/

@mehrdadn
Copy link
Author

Well the reason I posted this originally was that I thought it would reduce the number of syscalls by at least 1 for since currently FindFirstFile only fetches 1 entry (and that entry is ., which is useless) no matter what. It's not as important to me for subsequent fetches since I'm dealing with large trees but not large directories per se. But if it seems useful and I get a chance to implement this I'll let you know! Currently I'm leaning toward calling NtQueryDirectoryFile directly and skipping the Win32 API altogether so I can reduce the number of syscalls, but we'll see.

@dscho
Copy link
Member

dscho commented Mar 15, 2018

if it seems useful and I get a chance to implement this I'll let you know!

Great! I still think this has merit, in particular for compat/win32/fscache.c.

@mehrdadn
Copy link
Author

mehrdadn commented Mar 15, 2018

Okay so I just did some tests and it seems this wouldn't make a huge difference like I had expected. Depending on the repo the difference for git status was anywhere from completely unobservable (small/normal repo) to 3% (large-ish repo with a few directories that have a few dozen files) to 8% (kotlin) to 18% (artificial repo with 30k files in a single directory). So it doesn't seem to really pay off enough to be worth it. Also note that my test used NtQueryDirectoryFile to get back a large batch on the first call too, so the improvement would be even less for FIND_FIRST_EX_LARGE_FETCH.

@dscho
Copy link
Member

dscho commented Mar 17, 2018

I'll take the 8%... :-) thank you for sharing your research!

@dscho dscho reopened this Mar 17, 2018
@dscho dscho self-assigned this Mar 17, 2018
@mehrdadn
Copy link
Author

You know, I reimplemented this last night and it seems it could be more like 10% if you just use NtQueryDirectoryFile's output directly and skip the step where I converted the results into WIN32_FIND_DATA format. Happy to share the code if you'd like to take copyright/ownership of it. :-) Although I didn't quite do the part regarding the reparse-point tags (I think you need to call FSCTL_GET_REPARSE_POINT for files that have reparse points so you can tell if they're symlinks) so that would need to be added too.

@mehrdadn
Copy link
Author

Oh apparently Windows uses a hack where it putsthe ReparseTag into the EaSize field, so no need for FSCTL_GET_REPARSE_POINT... go figure.

@dscho
Copy link
Member

dscho commented Mar 18, 2018

/remind me to take a stab at this on Thursday.

@reminders reminders bot added the reminder label Mar 18, 2018
@reminders
Copy link

reminders bot commented Mar 18, 2018

@dscho set a reminder for Mar 22nd 2018

@mehrdadn
Copy link
Author

Tip for whenever you take a stab at this. I would recommend you start off with a large buffer that your system already fills up as much as possible, and only increase the size if it was actually filled as much as possible. In my tests that meant starting with enough space for 16 FILE_FULL_DIR_INFORMATION structs that each accommodated 256 characters in their file names (I think that's 9312 bytes or so). And then I only increased that buffer size if there wasn't enough leftover room at the end of the buffer for another file with 256 characters in its name. This makes sure you're optimizing both the memory and the number of calls.

@reminders reminders bot removed the reminder label Mar 22, 2018
@reminders
Copy link

reminders bot commented Mar 22, 2018

👋 @dscho, take a stab at this .

@mehrdadn
Copy link
Author

I had a discovery today, not specifically related to this issue, but perhaps pertinent:
You can get a nice speed boost (> 2×) by listing directories in a multithreaded fashion. As you might expect, it's not a linear speedup, but with 8 threads on a 4×2 system I got speedups of around ~4×, especially when the system already had the directory entries in its cache.
However, I suspect this may depend strongly on the access characteristics of the storage medium... specifically my testing was on a recent SSD. I imagine on a hard drive it may end up worse (or the same, or better)... I really have no idea.
But it's one possible way to dramatically improve I/O speed in certain scenarios, if that becomes necessary later.

@dscho
Copy link
Member

dscho commented Nov 5, 2018

Addressed via #1908.

@dscho dscho closed this as completed Nov 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants