Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard linking and NuGet cache #1407

Closed
jaredpar opened this issue Nov 29, 2016 · 12 comments
Closed

Hard linking and NuGet cache #1407

jaredpar opened this issue Nov 29, 2016 · 12 comments

Comments

@jaredpar
Copy link
Member

Recently we decided to measure the impact of using hard linking in the dotnet/roslyn repo. The benefits were substantial:

Build Time Output Dir Size
No hard linking 3:38 6.27 GB
Hard linking 2:40 0.84 GB

Yet even with these benefits there is still significant push back on the team on adopting hard linking. The reason is that much of the savings generated above come from hard linking into NuGet assets that come from the NuGet cache. The NuGet cache is modifiable by the developer who ran the build. Hence when using hard links it means the developer can easily corrupt their NuGet cache by modifying the content in the build output directory.

Corruption of the NuGet cache is incredibly frustrating for developers. It results in hard to track down errors and, because it's shared amongst repos, allows for changes in one repo to inadvertently affect the build of another repo.

On the surface this may seem like a rare case but in practice it's quite common:

  • Developer runs a test script that copies files into the output directory.
  • Developer incorrectly modifies a build script such that double writes occur during build.

Any of these can inadvertently target a file which is in reality a hard link to the NuGet cache and in turn silently corrupt it. In fact this is pretty much what happened every time in the past when we enabled hard linking. It's frustrating because it has a real impact on developer productivity (faster builds) but can't enable it due to fragility.

After talking this over with a few people I wanted to make the following suggestion feature for hard linking:

Provide an option to allow hard linking within the output directory.

This means that hard links would only point to files within the output directory. The first time an asset was copied into the output directory a full copy would occur. The second time a hard link to that copy would occur.

This would give a substantial benefit to larger projects while avoiding some of the pitfalls of full hard linking:

  • Deleting the output directory would fix any corruption issues.
  • No cross repo contamination.

Seems like the best compromise for performance and reliability.

@DamianEdwards
Copy link
Member

Shame we can't have some kind of copy-on-write hard link so the corruption can't happen.

@pakrym
Copy link

pakrym commented Nov 29, 2016

@jaredpar what about having NuGet cache per solution directory and hardlinking to it? In case of an error it can be easily cleaned and re-restored, on other hand initial restore would take a bit longer.

@rainersigwald
Copy link
Member

Yeah, ReFS copy-on-write will fix everything for this . . . too bad that's a tiny subset of MSBuild uses 😞

This sounds like two related requests:

  • An argument to the Copy task along the lines of AllowHardlinksOnlyToDestinationsUnder="path"
  • Knowledge of the "whole build" output root, so you can hardlink outputs of other projects within your build (solution/repo root/whatever).

Right now a project knows its own obj and bin directories, but that's not the right granularity for the scenario you're describing.

@jaredpar
Copy link
Member Author

@rainersigwald ah didn't realize that was a possibility with ReFS. Yes that would definitely be a better option here.

@rainersigwald
Copy link
Member

@jaredpar I thought it was (or was planned) but I can't find any corroboration for that . . .

@jaredpar
Copy link
Member Author

@pakrym we discussed that but it's only a solution for local developers. In our CI environment we need to use a global cache that persists between runs in order to keep our CI system running at a decent pace. Otherwise we're downloading 1+GB of NuGet packages per run.

@pakrym
Copy link

pakrym commented Nov 29, 2016

@jaredpar set ACLs on hardlinks to be read-only, so it will crash anything that tries to modify files :)

@jaredpar
Copy link
Member Author

@pakrym too many star wars fans here that use the "force" when copying 😄

@pakrym
Copy link

pakrym commented Nov 29, 2016

@jaredpar I'm talking about security permission, it should stop -Force as well

@rainersigwald
Copy link
Member

@jaredpar this is promising for your scenario: https://blogs.windows.com/buildingapps/2016/12/02/symlinks-windows-10

I filed #1430 to see if anything's broken.

@imanushin
Copy link

Hi !

Do we have any updates here?

@livarcocc livarcocc added this to the Backlog milestone Nov 4, 2019
@livarcocc
Copy link
Contributor

Team Triage: this is something that we don't believe we will get to in the medium to longer term. So, closing this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants