Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: coreclr retrieving an existing host handle #56896

Open
tippisum opened this issue Aug 5, 2021 · 10 comments
Open

[API Proposal]: coreclr retrieving an existing host handle #56896

tippisum opened this issue Aug 5, 2021 · 10 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-Host
Milestone

Comments

@tippisum
Copy link

tippisum commented Aug 5, 2021

Background and motivation

Currently, coreclr_initialize is not supported to be called more than once. On the other hand, there is not any method to retrieve an already initialized host, either.
As a result, if one ever want to execute any .NET Core code from native, he must be the first one to contact coreclr, or he is completely out of luck.

Situation: one is writing a plugin which is to be loaded by a native host. The native host requires the plugin to have a native entry point and is completely out of his control. At the same time, there are also other plugins which are also completely out of his control.
Then he want to write part of his code in C#, thus necessitate the hosting of .NET Core. However, he is not the first one to be clever enough to use .NET Core, and there is already someone loading coreclr into the process.
It is not hard to retrieve an already loaded coreclr module (by GetModuleHandleW, for example), but even if one has access to the coreclr module, he cannot run any managed code if coreclr_initialize is already called by others and the returned host handle is not shared to him.
He is okay with the limitations like only one version of coreclr can exist, the host can only be initialized once, etc. He is also willing to accept some caveats like the host initialized by others might not have properties set to what exactly he wants. All he want is just having a chance to even try to load a managed assembly and execute some code.
But he is still completely out of luck in this case. Any coreclr method to load managed code requires a host handle to start with, but he has no way to get such a handle unless he is the lucky one making the first ever call to coreclr_initialize among the whole process.

Request: A way for native users other than the first caller of coreclr_initialize to have a chance to try to load and execute any managed code, provided that they are already aware of any limitation and caveats, like only one coreclr version can exist and the host properties cannot be changed.

API Proposal

//
// Retrieving an already created host handle, provided that there is any.
//
// Parameters:
//  [out] hostHandle - Handle of an already created host
//
// Returns:
//  HRESULT indicating status of the operation. S_OK if there is a suitable handle.
//
extern int coreclr_get_active_host(void** hostHandle);

API Usage

HRESULT hr;
void* hostHandle;
hr = coreclr_get_active_host(&hostHandle);
if (SUCCEEDED(hr)) {
// coreclr_initialize is already called, try to load my code into this host.
} else {
// try to do host initialization myself.
}

Risks

It is not always possible to load an assembly to a host created by others, the .NET version may differ, some properties may cause conflict, etc.
Nonetheless, without such an API it will be plain impossible to ever have a try.

@tippisum tippisum added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Aug 5, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-Host untriaged New issue has not been triaged by the area owner labels Aug 5, 2021
@ghost
Copy link

ghost commented Aug 5, 2021

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Currently, coreclr_initialize is not supported to be called more than once. On the other hand, there is not any method to retrieve an already initialized host, either.
As a result, if one ever want to execute any .NET Core code from native, he must be the first one to contact coreclr, or he is completely out of luck.

Situation: one is writing a plugin which is to be loaded by a native host. The native host requires the plugin to have a native entry point and is completely out of his control. At the same time, there are also other plugins which are also completely out of his control.
Then he want to write part of his code in C#, thus necessitate the hosting of .NET Core. However, he is not the first one to be clever enough to use .NET Core, and there is already someone loading coreclr into the process.
It is not hard to retrieve an already loaded coreclr module (by GetModuleHandleW, for example), but even if one has access to the coreclr module, he cannot run any managed code if coreclr_initialize is already called by others and the returned host handle is not shared to him.
He is okay with the limitations like only one version of coreclr can exist, the host can only be initialized once, etc. He is also willing to accept some caveats like the host initialized by others might not have properties set to what exactly he wants. All he want is just having a chance to even try to load a managed assembly and execute some code.
But he is still completely out of luck in this case. Any coreclr method to load managed code requires a host handle to start with, but he has no way to get such a handle unless he is the lucky one making the first ever call to coreclr_initialize among the whole process.

Request: A way for native users other than the first caller of coreclr_initialize to have a chance to try to load and execute any managed code, provided that they are already aware of any limitation and caveats, like only one coreclr version can exist and the host properties cannot be changed.

API Proposal

//
// Retrieving an already created host handle, provided that there is any.
//
// Parameters:
//  [out] hostHandle - Handle of an already created host
//
// Returns:
//  HRESULT indicating status of the operation. S_OK if there is a suitable handle.
//
extern int coreclr_get_active_host(void** hostHandle);

API Usage

HRESULT hr;
void* hostHandle;
hr = coreclr_get_active_host(&hostHandle);
if (SUCCEEDED(hr)) {
// coreclr_initialize is already called, try to load my code into this host.
} else {
// try to do host initialization myself.
}

Risks

It is not always possible to load an assembly to a host created by others, the .NET version may differ, some properties may cause conflict, etc.
Nonetheless, without such an API it will be plain impossible to ever have a try.

Author: tippisum
Assignees: -
Labels:

api-suggestion, area-Host, untriaged

Milestone: -

@vitek-karas
Copy link
Member

This should be supported since .NET Core 3.0. I should start by saying that native hosting should not call coreclr directly, instead it should use native hosting APIs provided by hostfxr library. See the docs for a simple guide on what the recommended way to do this is.

There's a helper library nethost which implements get_hostfxr_path function, which can locate either the already loaded hostfxr or find it on disk. (Note that hostfxr acts as the native hosting entrypoint, native hosting should not call into coreclr directly). See design doc for more details on this.

Once you have the hostfxr loaded (using the nethost will return the path to the already loaded hostfxr so loading it again will return the already loaded module), you would call hostxfr_initialize_for_runtime_config as usual. This will internally either load the runtime (and return what's also called "primary" context) or it will simply try to "attach" to the already running runtime (internally this creates so called "secondary" context).

There many more details on this in the same design doc.

I don't think we have a sample which would do this, but the idea is that the hosting code should not really see a big difference - the methods to call are exactly the same. So the https://github.com/dotnet/samples/tree/main/core/hosting/HostWithHostFxr sample is basically the exact same code one would use.

The design doc also discusses what functionality exists to compare the requirements of the new plugin and the existing runtime in the process (framework version resolution and so on).

@vitek-karas vitek-karas added this to the Future milestone Aug 9, 2021
@vitek-karas vitek-karas removed the untriaged New issue has not been triaged by the area owner label Aug 9, 2021
@tippisum
Copy link
Author

Yes, I'm aware of the new hostfxr hosting options.
But the problem cannot be solved this way: A single component that chooses to load coreclr.dll directly will break all hostfxr hosting options, and unfortunately, there are already codes that load coreclr.dll directly.
I'm OK with trying the hostfxr method first, but a coreclr_get_active_host style API is still necessary in case there are thirdparty codes that use coreclr.dll directly: They already exist now, and they cannot be prevented in the future. Well, the coreclr hosting option is not even officially deprecated but just "not recommended in favor of the new hostfxr method" so it is to be expected that there will still be more and more of them in the future.

@vitek-karas
Copy link
Member

I agree that the messaging around this is problematic (we should make it much more clear that directly calling coreclr is not really supported).

That said, I don't think we should be providing two APIs to solve the same problem. The coreclr based API (which this issue is about) would not provide any guarantees about anything - no way to validate framework compatibility, runtime versioning, dependency resolution, ... - the runtime simply doesn't have this information available to it (fixable, but hard). Also this API would only be useful if you basically trust the other component which loaded the coreclr into the process - not for security reasons, but for versioning and compatibility.

In addition to that - it would basically mean that every component should be using the coreclr directly - since the hostfxr way can't handle that case.

there are already codes that load coreclr.dll directly

We can't prevent people doing weird things - you could call hostpolicy directly as well - definitely not recommended or supported, but you could. That would get you into yet another state which might be problematic for both hostfxr or coreclr based hosting calls.

I think we should fix the docs and basically remove the direct coreclr hosting pages - we can keep those in the repo as internal documentation, but avoid exposing these as "viable even if not encouraged" hosting option.

@tippisum
Copy link
Author

tippisum commented Aug 12, 2021

I totally agree that it is not a good idea to do weird things.

But the real problem is, it is not the one who have done weird things that will actually suffer from the consequence, in fact their components just work as expected. Hell, they cannot even be blamed of doing weird things, as long as the coreclr hosting option is not officially deprecated. It is the cooperative author that will suffer and find himself completely out of hope. This is not good.

As a cooperative plugin author, it is perfectly acceptable to prefer the "right thing to do" at first (e.g. use hostfxr hosting options) and stick to it whenever possible, but he also have to deal with the situation that there is already someone chooses to load coreclr.dll directly.
Of course there are caveats, like versioning and compatibility problems, but Is there actually any choice? "Accept and live with it, or die" is not something really about "trust".

I am aware that the proposed coreclr_get_active_host does not look great, but consider that

  1. "The right thing to do", that is, to support side-by-side hosting, is unlikely to be implemented anytime soon.
  2. There is not actually any choice. There can be a long list telling "why it is not a good thing to do and why you would not like to use it" (versioning, compatibility, etc.), but currently the only possible "alternative" is "to die" (i.e. to give up any hope to run any managed code).
  3. It is not hard to implement, and acts as the last resort.

It will be good if there is such a last resort before completely giving up and showing an ugly error dialog blaming others doing weird things (the end user will not care, of course) and hoping that the end user will understand this complex technical situation and find the right one to blame (they will not, of course).

Appendix.
The current workflow of loading .NET Core code works like this:

  1. Publish the managed library as a fake self-contained command line application, then prepare another .runtimeconfig.json which is identical to the published one except that it is framework-dependent. (Only application can be self-contained)
  2. In the native entry point, load hostfxr.dll from the published application (It is not possible to use the system-wide hostfxr.dll, see SDK version of hostfxr.dll cannot initialize self-contained app #56968)
  3. Try hostxfr_initialize_for_runtime_config with the framework-dependent .runtimeconfig.json. This ensures maximum compatibility since it will try to reuse any already loaded context, then resolve to the shared framework runtime.
  4. If there is not any loaded context, nor does the user installed any framework runtime, retry with hostfxr_initialize_for_dotnet_command_line and load the self-contained runtime.
  5. As long as there is no context conflict or anyone else loading coreclr.dll directly, the loading progress will success and can support any number of components sharing a single runtime host (either the shared framework one or the first loaded self-contained one).
  6. But if there is anyone that already loaded coreclr.dll and called coreclr_initialize, then all above will fail and there is nothing further one can do before exiting. <- Currently we are at here.
  7. Proposal: As a last resort, try to open a handle to the already loaded coreclr.dll (cooperative components never load coreclr.dll themselves), and attach to the initialized host by calling coreclr_get_active_host. <- Proposed API will be called here.

As you can see, the proposed API will not drive everyone to use coreclr.dll directly. In fact, a cooperative component author never loads coreclr.dll directly. The proposed API is only queried upon an already loaded coreclr.dll, and is only called after all the hostfxr methods fail.
As a result, adding a coreclr_get_active_host is not likely to cause any hammer. It will not drive anyone who are currently using hostfxr methods to load coreclr.dll directly, and it is not likely to make people prefer coreclr_get_active_host as a way to share host because the coreclr_get_active_host way has significantly more caveats compared to using hostfxr (as you have already mentioned).
You can also make it explicit in the doc that one should never try to load coreclr.dll directly in order to use this API, but only query it upon an already loaded coreclr.dll module.

@vitek-karas
Copy link
Member

If all components behaved the way you describe it, that would be great. I just don't have the confidence that it's going to be that case. But I guess that's also an argument for adding the new API (since as you already mentioned there are exiting bad-behaving components).

The reason I'm a bit hesitant still is that basically we would be adding an API which is unsupported from the very beginning. It's just really weird. The API would have to be "unsupported" because there are so many caveats around using it - there would be basically no meaningful way to use it and be confident that the scenario works - it would all be "hope it works". The other reason I'm hesitant is that if it doesn't work, the failure modes will be all over the place and super hard to diagnose.

I do understand the need for it if you run into the case you describe though.

Just to set expectations - even if we do decide to do it, it would not make .NET 6 (it's way too late for that).

I wanted to ask some additional questions since the 7 steps you describe above seem a bit weird to me:

1 - I assume you do all these tricks to have effectively self-contained component, right?

2 - I haven't had a chance to look into #56968 yet. My guess is that this is because the host has a bit of a weird behavior (long history) that if the app has hostfxr.dll and I think also coreclr.dll in its directory it will wrongly think it's self-contained, regardless of what the config says. But since you're trying to load a self-contained app, you should use the hostfxr from the self-contained runtime.
Another note - for step 3 to work correctly, you MUST use the already loaded hostfxr in the process - that's what nethost library is for - it will return the already loaded hostfxr if there's one.

4 - Using the same directory for self-contained and framework-dependent app leads to trouble - the host is just not made to work in that case. (which is probably the reason for the issues in step 2)

6 - Somebody loaded coreclr directly. Other than the discussion above, there's another problem with this case. If there's a component which loads coreclr directly, it will only work if it's the first one to be loaded. If the components got loaded in the reverse order (your nicely behaving one first, and then the bad-behaving one) it will create a mess - either it will fail if both try to use the exact same coreclr.dll file, or it will end up loading two .NET Core runtimes into the same process - which creates a whole new set of problems - it might work, or it might not. This is in no way fault of you nice-behaving component, I'm just describing how trying to cooperate with bad-behaving component is a loosing proposition.

If you really need to have the component self-contained (which is problematic and causes lot of trouble as you're well aware of), there might be a better way to do this:

  • Build the component as framework-dependent
  • Ship it with a local install of .NET (basically download the .zip for the right runtime, and extract it to some subdirectory) - note that size-wise this is not that different from self-contained app - there will be couple more files (like dotnet.exe), but it's almost the same set.
  • Load the hostfxr.dll from that local install to do the native hosting - if you were to use nethost (recommended) to find hostfxr for you, simply specify the local install path in the dotnet_root field of get_hostfxr_parameters structure. With this it will either return the already loaded hostfxr, or it will return the one from the local install.
  • Rely solely on hostfxr_initialize_for_runtime_config and specify the local install path in the dotnet_root field of the hostfxr_initialize_parameters (although this might not be 100% necessary, I think it defaults to the location from where the hostxfr is from - but it's better to do it anyway).

This will give you benefits:

  • No hacks building the component
  • Still happily loads into already running .NET runtime in the process (if compatible)
  • If the component is the first to load the runtime, it will load the private one (effectively just like self-contained) - no need for special behavior with hostfxr_initialize_for_dotnet_command_line

And some downsides:

  • You need to somehow figure out where to download the right .zip from (versioning mostly)
  • You need to unzip and ship the runtime with the component (dotnet build/publish will not handle it magically for you)

We're discussing a very similar approach to allow multiple executables to share effectively self-contained runtime here: #53834

It still doesn't solve the problem of cooperation with bad-behaved components though.

@tippisum
Copy link
Author

  1. Yes, I do want a self-contained component. I think that should be a very natural use case: if one does control the main application, then "hosting managed components" is likely not the point to start with since using managed entry point and loading native components saves a lot of effort and mess. So when a component must be hosted, it should be natural to assume that the main application is completely out of control and is completely unaware of the .NET Core. That means when a component gets loaded into the process, it cannot assume the existence of a system-wide framework runtime. Requiring a separated setup process for end users to "load a plugin" also seems not a good idea. So the only solution is to host a self-contained component.
  2. Well, in fact it will not fail if I uses the either the framework hostfxr or the self-contained one and initialize a framework dependent context. It will also success if I uses the self-contained hostfxr and initialize a self-contained context. It does fail if I load the framework hostfxr and try to initialize a self-contained context. This is not a blocking issue though, since I can always load the self-contained hostfxr. I'm just a bit curious since dotnet.exe (which should have a similar behavior as the framework hostfxr) does support executing self-contained application.
  3. Yes, I forgot to mention a step to first look for an already loaded hostfxr.dll. This step should be performed before loading one's own hostfxr.dll. However, this step can be performed without utilizing nethost (by directly calling GetModuleHandleW(L"hostfxr.dll") for example). The nethost is not used here, since I either opens an already loaded module or loads the private hostfxr.dll from a known location.
  4. A "portable and redistributable" .NET Core runtime does seems promising, but I have not found any docs about this use case...

I'm well aware that dealing with uncooperative third party components is nevertheless a mess and cannot be done in a clean and reliable way. But I do think the self-contained component is a justified use case (As I described in 1).
I have an impression that all the CoreCLR hosting APIs are making an overall assumption that every piece of the hosting application is written with hosting CoreCLR in mind, which is not always the case. As I have mentioned, the main application may have no idea about the .NET Core at all (so they will not bother to initialize and share a process-wide hosting environment) and there may be third party components that utilizes legacy coreclr hosting options which makes it hard for others to cooperate with.

The ultimate solution can be supporting side-by-side hosting, but I do not think this will be implemented anytime soon. So in the meanwhile we still need to deal with various limitations and caveats. I understand you hesitation of adding a public API whose use cases are questionable at its very nature, but we may think this problem the other way:

  1. Being an open source project, the distinction between "public API"s and "private implementation detail"s are not that big. If there is a large enough pressure, people will start hacking all the way around (like bypassing the API and dragging random internal stuffs). And if the alternative is not that "you can achieve this in a more appropriate manner" but simply "to die" then the pressure will be high. Even myself is open to the option of hacking into coreclr.dll internals. Adding an entry point for people with such need both eases the pressure and provides a chance to tell them what they should do first to be cooperative before start calling this (either by docs or by comments in header files, or the API can even has a special string argument which must be set to something like "I promise that I've already tried the hostfxr hosting options before and I'm fully aware that this API is not supported and I'm calling it as a last resort under my own risk" to make the API work).
  2. If you guys are start deprecating coreclr hosting options, then the proposed API is not strictly speaking a public API. On the other hand, if loading coreclr.dll is not going to be deprecated anytime soon, then the need of living with that kind of components will be more and more.
  3. Despite of the use case being questionable, the definition of the proposed API is pretty clear and stable in the foreseeable future. And it should not take much effort to implement and maintain. Being "not supported" is actually what we want: we want people to avoid using it whenever possible, only after they have no other choice.
  4. I agree that an API being "hope to work" does seem weird, but as we have already discussed, what we are really comparing here is not "hope to work" vs "guaranteed to work", but "hope to work" vs "no hope to work".

@vitek-karas
Copy link
Member

A "portable and redistributable" .NET Core runtime does seem promising, but I have not found any docs about this use case..

We don't have docs around this since there's no direct support for it in the SDK.

If you guys are start deprecating coreclr hosting options

We already consider them deprecated to a large degree - but it's not correctly reflected in the docs unfortunately. We definitely encourage everyone to NOT use it. Just a note on the word "deprecated" here - the APIs will remain and probably keep working as-is for a while (we kind of have to do that for backward compat reasons).

While I agree that with open source software it's much easier to "hack" around into the "unsupported" things I think there's still a big difference:

  • Unsupported things are just that - unsupported. If things break... we won't pay too much attention to it.
  • We will not think too much about breaking people by changing these - if it's beneficial to make changes, we will do it - and break people who rely on this.

So typically such solution might work well in one version, but can easily break in future versions.

On the topic of self-contained components (plugins) - while I understand the problem with distributing shared framework install, using self-contained deployment model is also not ideal. At least until there's some kind of support for hosting multiple versions SxS in one process. This doesn't have a good solution right now, but what we thought might be acceptable is:

  • Such components should ideally set rollForward=LatestMajor, so that they load the latest available framework on the machine
  • With such setting, it typically doesn't matter much in which order components are loaded, since either of them will end up loading the latest runtime anyway.

If one component breaks this (by either using different roll forward setting, or being self-contained), then it is very easy to break things by just loading components in different order. The self-contained is somewhat worse in that it could in theory ship some weird/unsupported version of the runtime (or for example delete some parts of it to save space and so on) - any component loaded after it would have to be able to run on that - which is unreasonable to support correctly.

@tippisum
Copy link
Author

I'm pretty aware of what you said about compatibility concerns, and I agree that a cooperative component should try loading the framework runtime whenever possible.
However, a component still needs to deal with the problem that the target machine may not have a shared framework runtime preinstalled, and, being a "component" means it is usually not a great idea to ask for a separate install phase which requires administrative privilege and installs a rather heavy "framework" into the user's machine. So some kind of self-containing is nevertheless necessary.
Currently I uses a fake self-contained application to achieve this (I do still try to load it as a framework dependent component at first, and only fallback to the self-contained runtime as a backup). It is rather hacky but it does work and to my knowledge it does not strictly break any API contract.
I'm also interested in other options like redistributing a portable .NET runtime, if it can make things work.

The core idea is, people need to make things work, the right/expected way or not. Whenever there is something not work, there will be a pressure. Even a nice and cooperative author is willing to stick to the best practice at first, he will start doing hacky/unexpected (but still strictly speaking legal) things if "the best practice" cannot work. And, well, if it still does not work, then even the most nice and cooperative author will become mad, and either start doing all kinds of "bad things", or he eventually comes to a conclusion that the whole framework does not deserve investment and then throws it out of the window and switches to something "all the way dirty and out of date but can make things work".

@vitek-karas
Copy link
Member

The CoreCLR direct hosting sample and guide has been removed in dotnet/docs#25818 and dotnet/samples#4786.

@agocke agocke added this to AppModel Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-Host
Projects
Status: No status
Development

No branches or pull requests

2 participants