Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FDD app uses DOTNET_ROOT even though it points to wrong architecture #68180

Closed
rseanhall opened this issue Apr 18, 2022 · 17 comments
Closed

FDD app uses DOTNET_ROOT even though it points to wrong architecture #68180

rseanhall opened this issue Apr 18, 2022 · 17 comments
Labels
area-Host question Answer questions and provide assistance, not an issue with source code or documentation.
Milestone

Comments

@rseanhall
Copy link
Contributor

rseanhall commented Apr 18, 2022

Description

nethost returns location of 64-bit hostfxr.dll to 32-bit process.

Reproduction Steps

I have a custom native host that uses nethost to locate hostfxr.dll. I'm running into a problem when testing this native host inside of Xunit tests from "dotnet test". "dotnet test" is 64-bit and is setting DOTNET_ROOT to "C:\Program Files\dotnet". This environment variable is propagating all the way to the testhost.exe process. In my test, I'm calling Process.Start so that is also propagating to my 32-bit custom native host. This means that nethost is returning the path to a 64-bit hostfxr.dll for my 32-bit process.

Expected behavior

Ignore DOTNET_ROOT since it is the wrong architecture.

Actual behavior

Trusts DOTNET_ROOT is correct.

Regression?

Yes. The issue seems to be from #53763. Before that change, x86 nethost would never have used DOTNET_ROOT, only DOTNET_ROOT(x86).

Known Workarounds

In my case, I can workaround this by clearing the DOTNET_ROOT environment variable. Presumably a 32-bit apphost would have the same problem, which might be harder to workaround.

Configuration

  • .NET 6.0.4
  • Win10
  • OS x64, app x86
  • runtime.win-x86.Microsoft.NETCore.DotNetAppHost, version=6.0.4

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added area-Host untriaged New issue has not been triaged by the area owner labels Apr 18, 2022
@ghost
Copy link

ghost commented Apr 18, 2022

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

nethost returns location of 64-bit hostfxr.dll to 32-bit process.

Reproduction Steps

I have a custom native host that uses nethost to locate hostfxr.dll. I'm running into a problem when testing this native host inside of Xunit tests from "dotnet test". "dotnet test" is 64-bit and is setting DOTNET_ROOT to "C:\Program Files\dotnet". This environment variable is propagating all the way to the testhost.exe process. In my test, I'm calling Process.Start with UseShellExecute=false so that is also propagating to my 32-bit custom native host. This means that nethost is returning the path to a 64-bit hostfxr.dll for my 32-bit process.

Expected behavior

Ignore DOTNET_ROOT since it is the wrong architecture.

Actual behavior

Trusts DOTNET_ROOT is correct.

Regression?

Yes. The issue seems to be from #53763. Before that change, x86 nethost would never have used DOTNET_ROOT, only DOTNET_ROOT(x86).

Known Workarounds

In my case, I can workaround this by using UseShellExecute=true to get a clean environment. Presumably a 32-bit apphost would have the same problem, which might be harder to workaround.

Configuration

  • .NET 6.0.4
  • Win10
  • OS x64, app x86
  • Not specific

Other information

No response

Author: rseanhall
Assignees: -
Labels:

area-Host, untriaged

Milestone: -

@agocke
Copy link
Member

agocke commented Apr 19, 2022

As you linked, this is by-design. DOTNET_ROOT is now the global override variable for the dotnet location. However, it doesn't really sound like the problem is the environment variable name -- it sounds like you just want to ignore DOTNET_ROOT entirely.

Since you already have your own custom host, could you simply clear the DOTNET_ROOT environment variable and proceed with your custom dotnet location?

Otherwise, it sounds like dotnet test is simply passing the wrong variable. Consider if you just wanted to run tests under x86 -- the x64 location would not be the correct one. dotnet test should probably have a configuration switch for that behavior.

@agocke agocke removed the untriaged New issue has not been triaged by the area owner label Apr 19, 2022
@agocke agocke modified the milestones: 7.0.0, Future Apr 19, 2022
@agocke agocke added the question Answer questions and provide assistance, not an issue with source code or documentation. label Apr 19, 2022
@rseanhall
Copy link
Contributor Author

However, it doesn't really sound like the problem is the environment variable name -- it sounds like you just want to ignore DOTNET_ROOT entirely.

This is not true. I would like my custom host to support standard functionality. What I want is for the nethost logic to be able to reject locations with the wrong architecture.

and proceed with your custom dotnet location?

I don't have a custom dotnet location. I need nethost to give me the standard path.

Since you already have your own custom host, could you simply clear the DOTNET_ROOT environment variable

I might have to do this, but as I mentioned above that is removing standard functionality so I would rather not.

dotnet test should probably have a configuration switch for that behavior.

Why would there be a configuration switch? As you said, they are simply passing the wrong variable. The problem is that this was a breaking change and dotnet test didn't do anything to address it. The only reason more people aren't seeing this is because it appears their testhost.exe is still a .NET Core 2.1 apphost even when building with the .NET 6 SDK targeting .NET 6.

@agocke
Copy link
Member

agocke commented Apr 19, 2022

Why would there be a configuration switch? As you said, they are simply passing the wrong variable.

Maybe I'm not understanding the scenario fully. It sounds like dotnet test is starting a 32-bit host process, but passing a 64-bit dotnet root. Is that right?

@vitek-karas
Copy link
Member

Thanks for reporting this @rseanhall!

The change in #53763 caused the DOTNET_ROOT to act as the ultimate fallback. If DOTNET_ROOT(x86) is set it still takes precedence over DOTNET_ROOT. Also after that change DOTNET_ROOT_X86 is the first one to be checked and if specified is used over anything else.

The x64 dotnet test only sets DOTNET_ROOT, but it sets it always! (which is problematic). So when it's trying to run an x86 test, it still sets DOTNET_ROOT to point to the x64 runtime. Before #53763 the x86 apphost would ignore DOTNET_ROOT (and only rely on DOTNET_ROOT(x86)), so in this scenario it would fallback to looking for default install location (usually this would mean reading the location from registry) and it would end up working. But after #53673 the apphost will now pickup the DOTNET_ROOT value, and eventually fail as it's the wrong bitness.

I still believe the design of #53673 was correct - as it makes the behavior of DOTNET_ROOT predictable: "If nothing else is set, it will be used as-is" and consistent across platforms. But the interplay with the dotnet test is broken. @rseanhall is right that it doesn't show up for normal tests because they still rely on testhost.exe which is built for netcoreapp2.1 and thus doesn't have #53763.
The design is described here: https://github.com/dotnet/designs/blob/main/accepted/2021/install-location-per-architecture.md

We should fix dotnet test, but this raises a question: #53673 is effectively a breaking change in this scenario... should we try to fix that somehow?

I don't think the host should try to detect if the DOTNET_ROOT points to the correct architecture (that's a really tricky thing to do correctly on all platforms). So on non-Windows I do believe the behavior is correct (and it behaved this way since forever). The question is on Windows where DOTNET_ROOT was not recognized inside x86 apphost and now it is. Maybe we need to special case x86 and make it ignore DOTNET_ROOT as it used to, but it doesn't feel right. Personally I would be fine if we speced this as an intentional breaking change (and we need to fix dotnet test).

As for dotnet test - it should behave similarly to what dotnet run does. dotnet run will only set DOTNET_ROOT (if it's not already set) if the architecture of the target project is the same as the architecture of the SDK itself. In the case where the architectures don't match there's no reason the SDK should set DOTNET_ROOT as it will basically always get it wrong. Also dotnet run will now set DOTNET_ROOT_X64 (or other architecture specific) if the app is TFM 6 or higher - in that case it will not set DOTNET_ROOT (without arch) at all. dotnet test should do more or less the same thing.

Fixing dotnet test is important for potential future improvements: for example once it supports single-file tests those will be only executables and thus it will have to get the env. variable correctly anyway.

@MarcoRossignoli @pavelhorak for dotnet test context.

@rseanhall
Copy link
Contributor Author

The test project is 64-bit. The tests run both 32-bit and 64-bit versions of the custom host. The test project needs to be 64-bit so it can easily access 64-bit files and registry keys.

dotnet test doesn't always have access to the project, in case that matters. For local development, the project is built and then the build outputs are copied to a VM to be run there.

This is not the first time that dotnet test configuration leaking to the testhost.exe process has broken us.

@vitek-karas
Copy link
Member

Thanks for the scenario description @rseanhall.
The answer is still the same - ideally dotnet test should only set the architecture specific env. variables for TFM 6+ - so in this case DOTNET_ROOT_X64. So for the same architecture, the test will use the same runtime as the SDK, for any other architecture it's left on its own (so system defaults unless modified by the tests, or externally).
We're making an assumption here: You want to use the same runtime as the SDK you're using to run the tests.

In your case, if your tests are actually spawning a child process, I would consider simply cleaning the environment for these child processes. Or have a different mechanism how to specify which runtime they should use. (That's not to say that dotnet test doesn't need fixing).

@rseanhall
Copy link
Contributor Author

Can we tweak the fallback behavior so that DOTNET_ROOT is used last? Meaning that the order would be:

  1. Architecture specific DOTNET_ROOT
  2. System installed locations and everything else
  3. DOTNET_ROOT

This would be a breaking change, but I think this is a better breaking change than the one that was done in .NET 6.

@vitek-karas
Copy link
Member

I understand that this order makes sense when thinking about child processes of dotnet test where currently DOTNET_ROOT is pretty much always set. But that's an exception to the rule - most of the time DOTNET_ROOT is not set, and if it is, it's an explicit action.
Also generally we don't ask people to use the architecture specific variables unless necessary - mostly it's not needed - since it's added complexity. DOTNET_ROOT is used most commonly to specify a private install location - which are almost always single-architecture. So the current behavior makes sense to me.

But I do agree that the current behavior of dotnet test makes this confusing to say the least. I think we should just fix that. There's even a chance that such a fix could be backported to 6.0, the chance of doing that for the host is very slim.

@rseanhall
Copy link
Contributor Author

I'm not sure about other operating systems, but for Windows the environment variables are propagated to child process by default. Starting a process on Windows with an empty environment is simple but not usable. Starting a process on Windows with the default environment variables set is quite complex.

DOTNET_ROOT is used most commonly to specify a private install location - which are almost always single-architecture.

I must be missing something here. I thought DOTNET_ROOT should always be pointing to a directory that only contains runtime(s) that are all the same architecture.

This issue affects all programs that are run with DOTNET_ROOT that then spawn child processes with the .NET 6 fxr resolver logic. If that child process is a different architecture than the parent process and the parent process doesn't somehow clear DOTNET_ROOT for the child process (who's going to know they need to do that?), then the child process will fail.

@vitek-karas
Copy link
Member

Vast majority of .NET programs are executed without DOTNET_ROOT set at all - so those are not affected.

By far the most common usage for DOTNET_ROOT is CI systems or local private installs for isolation purposes which all want to force usage of a specific .NET runtime/SDK. In those cases it's always a single architecture (and also almost always single version). In these scenarios it's actually mostly undesirable for anything to "escape" the private runtime bubble.

Another one is apparently dotnet test - but that one is weird (should be fixed) and somewhat special.

All the more reason that tools which do manipulate DOTNET_ROOT should use the architecture specific version only. And for developers/users who want to set this by hand to use DOTNET_ROOT - as it sort of avoids random apps to escape from the private bubble. This is BTW the exact reason why we are disabling DOTNET_MULTILEVEL_LOOKUP by default starting with .NET 7 - if somebody uses private runtime they don't want anything to randomly use the global install.

@rseanhall
Copy link
Contributor Author

@agocke @vitek-karas Please make sure microsoft/vstest#3586 gets prioritized correctly.

@agocke
Copy link
Member

agocke commented Apr 19, 2022

DOTNET_ROOT is used most commonly to specify a private install location - which are almost always single-architecture. So the current behavior makes sense to me.

I agree with this -- the design intent is that DOTNET_ROOT is meant to be an override for the framework location. It seems complicated and confusing for, say, macOS ARM64 users to have to use DOTNET_ROOT_ARM64 instead of DOTNET_ROOT to override the framework location, just because their OS default architecture happens to be ARM64 instead of x64.

This does, however, make things more complicated for users who want to start new processes. Most of the time, DOTNET_ROOT will not be set and there will be no action. But if DOTNET_ROOT is set, those users have to decide whether or not to pass the variable to their child process. There is no one correct answer here. People shipping their own private runtime and running system-installed tools, for instance, will likely not want to pass the variable to their children and will need to clear it during the Process.Start. People who want to use the same runtime for all their processes will want to pass it down.

I agree with @vitek-karas that the best solution here is to limit DOTNET_ROOT setting in dotnet test.

@agocke
Copy link
Member

agocke commented Apr 19, 2022

@rseanhall Reading through, it sounds like my characterization is incorrect:

Maybe I'm not understanding the scenario fully. It sounds like dotnet test is starting a 32-bit host process, but passing a 64-bit dotnet root. Is that right?

It sounds like that's not true. If dotnet test is running as an x64 process, and running an x64 test, it will be up to you to remove the DOTNET_ROOT variable if you start an x86 process. Vitek's proposed solution would only matter if the test host were 32-bit, not 64-bit.

@rseanhall
Copy link
Contributor Author

Vitek's proposed solution is for dotnet test to only set the architecture specific DOTNET_ROOT, which would fix my issue.

The answer is still the same - ideally dotnet test should only set the architecture specific env. variables for TFM 6+ - so in this case DOTNET_ROOT_X64
All the more reason that tools which do manipulate DOTNET_ROOT should use the architecture specific version only

@agocke
Copy link
Member

agocke commented Apr 19, 2022

Ah, I missed that, thanks.

@ghost ghost locked as resolved and limited conversation to collaborators May 19, 2022
@vitek-karas
Copy link
Member

Vitek's proposed solution is for dotnet test to only set the architecture specific DOTNET_ROOT, which would fix my issue.

Sorry to have missed this (and that it took almost a year :-()
That is unfortunately not entirely correct. If we were only talking about .NET 7, that would be ideal solution - never set DOTNET_ROOT itself, and always use the architecture specific variant.
Unfortunately, dotnet test can't do that right now, because it uses testhost which is built for .NET Core 3.1. There are many reasons for this, the main one is that the same host works for any test. That host doesn't recognize architecture specific DOTNET_ROOT_arch unfortunately. So dotnet test will still have to set DOTNET_ROOT for that case.

The test infra implemented a change which did the ideal solution (not setting DOTNET_ROOT at all) for .NET 6+ projects, but they had to revert it as it broke (I don't remember the exact reason, but it was something with the testhost again).

Looking at the current behavior the infra has two modes:

  • With testhost - this will set DOTNET_ROOT(x86) if that variable is set outside - and in this case it will not set DOTNET_ROOT. Otherwise it sets DOTNET_ROOT - if it was set before, it will leave it as is, otherwise it will set it to the path to the SDK currently running.
  • Without testhost - it will just run the right dotnet.exe (based on the target architecture) and not set any environment variables at all (as far as I can tell).

I tested the "with testhost" and the above description fits the observed behavior.

I don't know if there's a way to force the "without testhost" case, which would be ideal for the use case here.

@MarcoRossignoli or @Evangelink might know.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Host question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

3 participants