-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate improvements to our disk usage on file timestamp checks #5972
Comments
From #6001By @KirillOsenkov I've been profiling and IsMatchingSizeAndTimeStamp is showing up a lot: We should investigate if we can use a single file system call that gets both existence and timestamp in a single call. On Windows calling FindFirstFile is almost sure to be way faster and with lower overhead. |
@KirillOsenkov msbuild/src/Tasks/FileState.cs Line 104 in 2f1e9ca
Traces show more than one stack because there are multiple ways to initialize the ^ we're spending ~6 seconds of CPU here when building OrchardCore, definitely something to improve. |
Do we know if we're calling these multiple times for the same file? Are they cached? If they're called multiple times, how many times per file? Should we perhaps scan entire directories and then answer these questions based on the info we read from the file system? The assumption is that it could be better to scan a directory once, keep the file lists, sizes and timestamps in memory, and then answer questions off of that table (at least during evaluation where presumably the underlying file system shouldn't change). |
I have prototyped a cache with
The following table summarizes the CPU time burned when doing a no-change incremental build of OrchardCore. It is a cold scenario where no MSBuild process is running when the build is initiated (i.e. in-memory caches are empty).
Watchers were set up on demand for each directory with at least one affected file. At the end of the build MSBuild was watching 3400 directories and had cached metadata on 33000 files total. There was no attempt to unify/coalesce directories to reduce the number of directories watched. The perf numbers are great but there's a major catch. File watcher notifications come asynchronously on a separate thread and there are no guarantees that a notification for a file arrives and is processed before we make a build decision based on the file. Here's a simple program demonstrating the asynchrony: static volatile bool fileExists = false;
static void Main(string[] args)
{
string tempDir = Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString());
Directory.CreateDirectory(tempDir);
try
{
FileSystemWatcher watcher = new FileSystemWatcher(tempDir);
watcher.Created += (object sender, FileSystemEventArgs e) => fileExists = true;
watcher.Deleted += (object sender, FileSystemEventArgs e) => fileExists = false;
watcher.EnableRaisingEvents = true;
string tempFile = Path.Combine(tempDir, "File.txt");
using (var fileStream = File.Create(tempFile))
{
Console.WriteLine($"File exists after Create: {fileExists}");
}
File.Delete(tempFile);
Console.WriteLine($"File exists after Delete: {fileExists}");
}
finally
{
Directory.Delete(tempDir, true);
}
Console.ReadLine();
} On my Windows machine the program usually prints:
which is exactly the opposite of the correct output. We are not aware of the file immediately after it was created and we believe that it's still there right after deleting it. In this simple example the race window is small but there are no guarantees that it couldn't get much larger on a different platform, on a loaded system, or with other I/O patterns. This is a no-go for MSBuild. Our I/O has to be synchronous, or at least we would need a way to issue an "I/O barrier" before each metadata check to make sure that all watchers queued to run for I/O writes that happened before the current point in time have actually finished running. Even for evaluation which is supposed to be a read-only operation so it's OK to take a snapshot and ignore changes, we would be running into races if we attempted to cache metadata using watchers between evaluations. I am closing the issue based on the information above.
Please see #6822 (comment) for data on this.
Yes, it should be possible during evaluation. Outside of evaluation I'm afraid we could be running into same races as above. We are tracking it in #3586. |
One option brought up for improving performance has been to reduce the amount that we check files on disks. We discussed possibly having a file watcher or using one that's already out there like facebook/watchman. We could also potentially share watchers with the various IDEs we plug into as they already have watchers so alternatively we could build a way for them to plug into that and share their results with other tools through MSBuild.
This will require a lot more investigation so posting the summary for now for tracking.
The text was updated successfully, but these errors were encountered: