-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage of hard or symbolic linking leads to NuGet cache corruption #8273
Comments
Fixes #8273 Context Prevent overwriting the source of a hard/symbolic link. Changes Made Always delete the destination file (unless readonly and OverwriteReadOnlyFiles is false) Testing Added unit test.
Fixes dotnet#8273 Context Prevent overwriting the source of a hard/symbolic link. Changes Made Always delete the destination file (unless readonly and OverwriteReadOnlyFiles is false) Testing Added unit test.
Reopening since the change was backed out due to causing #8684. |
Hi @manfred-brands, we are very interested in fixing the problem with NuGet cache corruption as soon as possible. Since #8275 had to be reverted, I wanted to ask whether you plan to work on a fix? If not, then I'm more than happy to do it. |
This is the first time I heard it was reverted. I didn't get any GitHub notifications. I now see the item.mentioned above. |
I could add an 'is same file' check in the copy code, but shouldn't msbuild not detect that itself before even calling Copy? The destination file exists and has the same timestamp. |
I made a small test program, that emulates a copy to oneself to see how I expected one of three outcomes:
File.Copy("Program.cs", "Program.cs", true); It throws:
This would mean that in the current msbuild, this situation already throws an exception. // if this was just because the source and destination files are the
// same file, that's not a failure.
// Note -- we check this exceptional case here, not before the copy, for perf.
if (PathsAreIdentical(sourceFileState.Name, destinationFileState.Name))
{
return true;
} This states that |
My proposed change is: // If the destination file is a hard or symbolic link, File.Copy would overwrite the source.
// To prevent this, we need to delete the existing entry before we Copy or create a link.
// We could try to figure out if the file is a link, but I can't think of a reason to not simply delete it always.
if (ChangeWaves.AreFeaturesEnabled(ChangeWaves.Wave17_6) && destinationFileState.FileExists && !destinationFileState.IsReadOnly)
{
if (PathsAreIdentical(sourceFileState.Name, destinationFileState.Name))
{
return true;
}
FileUtilities.DeleteNoThrow(destinationFileState.Name);
} |
Great discussion here. The primary issue I've been wrestling with is how to accomplish this kind of check without regressing perf in the normal case in which you are not trying to copy a file on top of itself. As marcin-krystianc said, I noticed that MSBuild checks if the paths are the same, but it does no normalization, which means if you have an extra .\ in one of the paths, it thinks they're different. (More relevantly, it doesn't notice if you have one absolute path and one relative path.) As manfred-brands said, we don't do the full normalization there currently to avoid expensive operations when not necessary. I have a draft PR out that roughly aligns with manfred-brands's proposal: #8685. I think it works, both as far as not corrupting the NuGet cache and as far as not deleting a file copied onto itself, and I'm currently learning towards taking it. The only case in which I believe performance is regressed is if you are creating a hard/symlink, and it should be regressed by as much time as it takes to make a path into a full path. Especially since we only recently fixed our symlink creation behavior, I think that's acceptable. |
@JeremyKuhne GetFullPath is still significantly expensive, right? Although perhaps better than before, it will still hit the disk in some cases, right? |
It looks like the .net runtime on Windows calls into a system call GetFullPathW which has a remark that it does not check if the file exists. It should only do path magic: prefix with current directory and remove unnecessary directory up and down: `A/B/../C' becomes 'A/C'. On Linux, I'm surprised at the amount of Window specific (directly into OS) calls in MSBuild for the file system access. |
That is not always correct if |
Except that the dotnet code for Unix doesn't seem to do that:
Nor does the GetFullPath method on Windows. I also doubt that a user would expect that "a/b/../c" suddenly becomes "a/aa/c" |
@danmoseley On Windows it typically is just a string parsing/manipulation routine. It will get the current directory and possibly environment variables (for drive relative paths), but those are not disk based. There is some sort of check regarding legacy device names On .NET Framework (and early .NET Core versions) |
Right; I couldn't think of another way to validate that the paths really are the same, but I moved the check a little earlier and tried to unify some of the computation to minimize the impact in most cases. The only case where I think perf regresses is if you try to create a hard/symlink, and that's minor, per JeremyKuhne's comment, unless someone is using an older runtime. I undrafted the PR; from internal discussion, it sounds like we're leaning towards taking it, so that should hopefully resolve the writing-through-symlink problem without reintroducing the copy-onto-self-means-delete problem. |
Fixes #8684 Fixes #8273 Context After #8275, we delete any destination file as part of the Copy task if we determine that we really should copy onto it. Unfortunately, if we try to copy a file onto itself, we delete it before we can copy onto itself, which just means it's gone. Fortunately, we have a check earlier that ensures that we skip any copy operation from a location to the same location. Unfortunately, it's a direct string comparison that doesn't evaluate to full paths first, so it misses slightly more complicated examples. Changes Made Take into account full paths Testing Unit tests + manual test that it doesn't delete the file anymore Notes This implementation tries to remove now-unnecessary full path derivations downstream, hence some added complexity, but it still means extra computation on the happy path if we end up creating a hard/symbolic link. An alternate direction eliminating any full path derivations on the happy path would save about 4% of Copy's execution time, per a quick perf test. (With how many samples I used, "no change" is within a standard deviation.)
Issue Description
MSBuild can use hard or symbolic links to avoid excessive file copies on disk. It is a good method to speed-up builds but at the same time, it is very easy to silently corrupt the NuGet cache.
I have seen that this issue has been mentioned several times already, but it was never explained how and when exactly the NuGet cache gets corrupted:
Related issues:
Steps to Reproduce
MSBuild will try to update dependencies in the application's output folder. Sadly, instead of replacing existing links, it replaces actual files in NuGet cache (thus corrupting it).
In both cases file
newtonsoft.json\13.0.1\lib\netstandard2.0\Newtonsoft.Json.dll
is silently replaced withnewtonsoft.json\13.0.2\lib\net6.0\Newtonsoft.Json.dll
:Expected Behavior
Files in the NuGet cache remain untouched.
Actual Behavior
Files in a NuGet package are silently replaced with files from another version.
Analysis
Both Windows and Linux systems are affected, also it doesn't matter whether hard or symbolic links are used. The problem is that
File.Copy
operation, instead of replacing the link, replaces the file that the link is linking to. To safely replace a link with a different file or link,File.Delete
needs to be called first. Unfortunately, MSBuild calls File.Delete only when the usage of hard or symbolic links is requested. When the build doesn't use hard or symbolic links, then theFile.Delete
is not called.Versions & Configurations
The text was updated successfully, but these errors were encountered: