Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider walking directories in up-to-date checks #3586

Closed
Tracked by #6757
rainersigwald opened this issue Aug 7, 2018 · 14 comments
Closed
Tracked by #6757

Consider walking directories in up-to-date checks #3586

rainersigwald opened this issue Aug 7, 2018 · 14 comments
Assignees
Labels
OS: Windows Issues that only impact users on Windows. performance Performance-Scenario-Build This issue affects build performance. Priority:1 Work that is critical for the release, but we could probably ship without triaged
Milestone

Comments

@rainersigwald
Copy link
Member

@ccastanedaucf has observed (#3547 (comment)) significant time spent in up-to-date checks in https://github.com/Microsoft/msbuild/blob/680224d8ecf9a7da82c001d95db336fd9cfd12d8/src/Build/BackEnd/Components/RequestBuilder/TargetUpToDateChecker.cs#L937

Git for Windows got a big speedup by implementing some caching around checks for individual file timestamps (with the core.fscache setting). See https://github.com/git-for-windows/git/blob/master/compat/win32/fscache.c and its history.

The gist of the idea is that it's faster to enumerate all files in a directory and get every file's modification timestamp than it is to get the timestamps on an individual basis. Since there's probably a fairly high degree of overlap in the directories of files we're checking for timestamps, I suspect we could get some benefit from this technique too.

@rainersigwald rainersigwald added performance OS: Windows Issues that only impact users on Windows. labels Aug 7, 2018
@AndyGerlicher
Copy link
Contributor

@ccastanedaucf can you take a look at implementing this? Might help get Windows closer to Linux after #3547 is in.

@cdmihai
Copy link
Contributor

cdmihai commented Aug 7, 2018

Would be nice if you could implement it over the existing IFileSystem :)

Also, this might be a very good use case for the fast allocation free directory enumeration introduced by @JeremyKuhne https://blogs.msdn.microsoft.com/jeremykuhne/2018/03/09/custom-directory-enumeration-in-net-core-2-1/

@4creators
Copy link

The above problems are a huge burden in some dotnet repos see #2843

@rainersigwald
Copy link
Member Author

@4creators This will probably help with that, but I'd be shocked if it was a very high percentage of the overall incremental build time in those repos.

@ccastanedaucf
Copy link
Contributor

ccastanedaucf commented Aug 8, 2018

@cdmihai small bench comparing methods of getting last write time for 10 files on Windows. Right now we use the equivalent of File.GetLastWriteTimeUtc.

                       Method |      Mean |     Error |    StdDev |
----------------------------- |----------:|----------:|----------:|
     File_GetLastWriteTimeUtc | 257.60 us | 0.7359 us | 0.5745 us |
 DirectoryInfo_EnumerateFiles |  53.60 us | 1.1738 us | 1.4416 us |
         FileSystemEnumerable |  44.68 us | 0.8567 us | 0.8414 us |

@JeremyKuhne
Copy link
Member

small bench comparing methods of getting last write time for 10 files on Windows

Allocations would be interesting to measure as well. Using DirectoryInfo requires allocating a FileSystemInfo around every result, which will also allocate strings to do so. With FileSystemEnumerable you can get zero allocations (outside of the enumerable itself).

@ccastanedaucf
Copy link
Contributor

ccastanedaucf commented Aug 8, 2018

1000 files in same directory + memory allocations. Looks like FileSystemEnumerable has a constant memory allocation for a given directory.
Edit: wasn't actually enumerating past the first item on FileSystemEnumerable which explains why runtime didn't change, updated times/allocation. Still a huge difference and allocation stays constant.

                       Method |         Mean |       Error |      StdDev |   Gen 0 |  Gen 1 |  Gen 2 | Allocated |
----------------------------- |-------------:|------------:|------------:|--------:|-------:|-------:|----------:|
     File_GetLastWriteTimeUtc | 45,740.60 us | 666.3182 us | 590.6736 us |       - |      - |      - | 109.38 KB |
 DirectoryInfo_EnumerateFiles |  1,036.26 us |  19.6407 us |  18.3719 us | 56.6406 |      - |      - | 242.71 KB |
         FileSystemEnumerable |     357.7 us |    5.823 us |    7.151 us |       - |      - |      - |     296 B |

@JeremyKuhne
Copy link
Member

1000 files in same directory + memory allocations

Cool. Shows a pretty big difference. :)

I'm curious about the Allocated (1.71K) number. Is it measuring GC.GetAllocatedBytesForCurrentThread or GetTotalmemory? It is just a somewhat odd total for me to place.

@rainersigwald
Copy link
Member Author

#3608 is an implementation of this that should be revisited if we move forward with this bug.

@rainersigwald
Copy link
Member Author

Another possible implementation to help with this: #2848; it should also be looked at.

@mfkl
Copy link
Contributor

mfkl commented Aug 3, 2020

Has there been more data to help with a design for a solution to this problem?

Looking at previous comments/PRs, native calls vs DirectoryInfo_EnumerateFiles vs FileSystemEnumerable are some of the possible options I guess.

@panopticoncentral panopticoncentral added the Priority:1 Work that is critical for the release, but we could probably ship without label Mar 22, 2021
@ladipro ladipro removed the backlog label Aug 13, 2021
@AR-May
Copy link
Member

AR-May commented Nov 1, 2021

We are waiting for final measurements for PR #6974 from @rokonec. It is likely that this PR the issue with up-to-date checks would be mostly resolved and that further room for improvements would be too small.

@ladipro
Copy link
Member

ladipro commented Nov 8, 2021

Closing, assuming that #6974 will be merged soon. Please reopen otherwise.

@ladipro ladipro closed this as completed Nov 8, 2021
@ladipro ladipro added this to the VS 17.1 milestone Dec 8, 2021
@AR-May AR-May added the triaged label Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS: Windows Issues that only impact users on Windows. performance Performance-Scenario-Build This issue affects build performance. Priority:1 Work that is critical for the release, but we could probably ship without triaged
Projects
None yet
Development

No branches or pull requests