-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build-server shutdown causes issues in the VMR orchestrator #4175
Comments
Here's some history on the use of |
This is surprising. In source-only modes it's pretty important to not use the toolset package to avoid prebuilts. I generally think we should not allow use of the toolset package in all VMR modes. Where were you seeing that? |
I think the right thing to do is probably propagate the remotes from the outer->inner |
Soapbox Start Imagine for a second if we didn't need to have a bunch of places in our infra where we called I'm not sure what the design of |
I.e. when enabling razor: dotnet/installer#18776. See the conversation in there and the builds that I queued. At the end I just disabled the toolset package as I wasn't able to kill the actual process that still had a lock on the toolset compiler assemblies.
@marcpopMSFT would you know who owns the build-server CLI frontend and the protocol to the compiler servers? |
@baronfel may know where it is too. |
There's not really a unifying protocol for the different build servers - there's just an interface with a There's not a ton of commonality.
SDK generally owns the frontend here, though I expect different teams have had the source ideas about how to shut down their own servers. |
Even if we would teach Roslyn's shutdown implementation to kill PIDs based on files on disk (what razor does), we would still shutdown all of the repo build servers, right? For the VMR, we are building multiple repositories in parallel with a single SDK (single dotnet) and only want to terminate the running PIDs that were triggered for the individual repo build. Example: roslyn and runtime build in parallel and when roslyn finishes, we only want to kill the PIDs that were opened for the roslyn repo build and not the ones currently running for the runtime repo build. Is that somehow possible today with build-server? If not, should we disable shared compilation completely? Alternatively, what if we just look for the open handles and kill the corresponding PIDs ourselves? |
Today, unless Roslyn and Runtime were using different SDKs (different layouts on disk) then we have no way to shutdown just the servers for each individual repo using the |
Roslyn's version is perfectly safe. You can kill it an any time and it will only impact perf. |
I think the long-term solution is that we shouldn't have to be cleaning up extra artifacts. The VMR should be reasonably efficient once we get used to building the product in one big run. I don't see the need to do intermediate cleanup. |
Was PID recycling considered in this design?
Is there a way that tasks can hook that engine? Like can I register an |
IDK, I see this as more of a repo level problem, there is a lot of content in the individual repo artifacts directories (e.g. 34GB in a linux Source build). Distro maintainers were hitting size limits in their build systems which is what prompted the changes to optionally clean after each repo build. |
As a long-term goal we probably want to build the VMR as fast as possible and generating extraneous artifacts in the repo builds probably impedes the perf goals. I expect we should get rid of that stuff through natural optimization. |
No but I think it's reasonable to want and build this. |
Here's another recent incident that highlights why we really need this. When building the product from source, analyzer assemblies that target the compiler API either use the N-1 or live version of it. N-1 when the repo doesn't depend on the roslyn repo being built first and live when it does. That means that the compiler used during the build (which loads these live built analyzer assemblies) must match. Here's what happens when it doesn't:
Source-build doesn't enforce the use of the N-1 / live compiler toolset package which results in the nonmatch. If a repository doesn't explicitly use the compiler toolset package, the compiler from the SDK is used. Whenever the compiler API revs its major/minor version and the compiler toolset package isn't used, the build starts to fail. If source-build would now starts enforcing the compiler toolset package then the current I wonder how difficult it would be to design the protocol to achieve what Jared mentioned above. |
I'm very open to ideas on how to solve this. Spent a bit of time trying to reason this out but can't find a way that is sufficiently robust to make me want to use it. The basic problem is that we need a way to identify either the VBCSCompiler processes or the location they were launched from. Ideally the processes though cause killing those is more robust. The ideas I've considered are the following:
Think there is potential for (1) to be a solution but can't get any kind of consensus on where a valid location for writing files is. Also it would only be a solution going forward. |
This is actually what the razor server does already: https://github.com/dotnet/sdk/blob/main/src/Cli/dotnet/BuildServer/RazorServer.cs It uses a path in the user profile: https://github.com/dotnet/sdk/blob/main/src/Cli/dotnet/BuildServer/BuildServerProvider.cs#L72 edit sorry, just realized this was already mentioned earlier in the thread 😄 |
One semi-serious proposal for this - to avoid bloating use a sqlite db for the data? |
They also write PIDs which means it's subject to PID recycle issues.
Understood but when you ask runtime team if this is the right decision you get my "ask three people and you get four answers" problem. |
Right, I just wonder given the Razor precedent whether that solution is "good enough" to make progress even though it doesn't cover every obscure setup. There is an env var to override the PID file location as an escape hatch. |
We use the
build-server shutdown
CLI command in the VMR orchestrator after each repo build so that we can clean the repository's artifacts (including the package cache). There are two issues with that appraoch:build-server shutdown
execs could cause issues, hopefully just perf and not stability.Is there a better way to handle this without killing all build-servers and only for the process tree spawned per repository? If not, should we disable shared compilation completely? What if we would disallow toolset compiler packages entirely and require the VMR to build with the compilers within the SDK? While that might me possible at some point, there are times when i.e. runtime depends on a newer language feature than what's in a given SDK.
I meanwhile worked around is for razor which was hitting this issue by disabling the compiler toolset package: dotnet/installer#18776
cc @jaredpar @mmitche @dotnet/source-build-internal
The text was updated successfully, but these errors were encountered: