Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom internal calls in .NET Core hosting #11941

Closed
nxtn opened this issue Jan 31, 2019 · 16 comments
Closed

Custom internal calls in .NET Core hosting #11941

nxtn opened this issue Jan 31, 2019 · 16 comments

Comments

@nxtn
Copy link
Contributor

nxtn commented Jan 31, 2019

P/Invoke is not so efficient as internal calls when working with frequent native code calls, for instance, game scripting runtime.

Internal calls in CoreCLR are hard coded in ecalllist.h and limited to mscorlib.dll scope. While Mono provides a mono_add_internal_call API which became the first choice of Unity and CRYENGINE.

Will you provide an API to register custom internal calls?

@benaadams
Copy link
Member

Would the calli proposal in the csharp "Compiler Intrinsics" https://github.com/dotnet/csharplang/blob/master/proposals/intrinsics.md#calli fit this need? (Though I don't know if that's the latest variant)

/cc @jaredpar

@nxtn
Copy link
Contributor Author

nxtn commented Jan 31, 2019

@benaadams Unfortunately, calli instruction may not be faster than P/Invoke.

I haven't tested it myself, but someone has:
https://ybeernet.blogspot.com/2011/03/techniques-of-calling-unmanaged-code.html

@benaadams
Copy link
Member

Behaviour should be improved in coreclr dotnet/coreclr#13756

@mattwarren
Copy link
Contributor

mattwarren commented Jan 31, 2019

(Hopefully I'm not completely wrong with this reply, this is what I've figured our from looking at the CLR source code.)

One issue I can see is that all the internal calls inside the runtime have to do a few things to remain safe, i.e. look at DebugStackTrace::GetStackFramesInternal (wired up here). If you look at the first few lines https://github.com/dotnet/coreclr/blob/83fcf2e552d190892d9ed7264ebef94b379f1f11/src/vm/debugdebugger.cpp#L333-L347

It has to play nice with the GC (GCPROTECT_BEGIN), erect a frame to make stack walking work and a few other things. See this gist for the expanded versions of some of these marcos.

I assume that all of this is taken care of, in the generated stubs, when using P/Invoke. But I guess that it would need to happen in any native code that the runtime called into, otherwise 'bad things would happen!'

For a bit more info see Calling from managed to native code, Is your code GC-safe? and PInvokes

@bencyoung
Copy link

bencyoung commented Jan 31, 2019

We currently use P/Invoke to call into things like MKL and the overhead does show up in backtraces. We only pass unsafe pointers to unmanaged code objects so if there was a faster way to invoke native code that:

  1. Didn't need marshalling
  2. Never called back into .NET
  3. Didn't manipulate any .NET objects

then we'd be extremely happy!

@jkotas
Copy link
Member

jkotas commented Feb 7, 2019

I think the proper way to address this is to add attribute to annotate PInvoke methods that always take less than microsecond and that do not do other problematic actions like taking locks that can deadlock with GC. The runtime would recognize this attribute and skip the full PInvoke transition for these.

More details in dotnet/coreclr#22383 (comment)

@nxtn
Copy link
Contributor Author

nxtn commented Feb 24, 2019

Thanks for your information @benaadams. I did a simple test and calli instruction turned out to be the best solution for now.

Interestingly, Math.Sqrt(Double) runs as fast as Convert.ToDouble(Double).

Line Chart

@mjsabby
Copy link
Contributor

mjsabby commented Feb 24, 2019

Not that I recommend this for your code, if you're satisfied with the performance ... but you can use calli with the managed calling convention to get better performance at the expense of delaying garbage collection. I did not see that in your benchmark you posted above.

On x64 the managed calling convention is the native x64 calling convention, and on Windows x86 I believe it is __fastcall.

You can read more about this in 15.5.6.3 Fast calls to unmanaged code in Partition II: Metadata Definition and Semantics (With Added Microsoft Specific Implementation Notes)

@nxtn
Copy link
Contributor Author

nxtn commented Feb 24, 2019

I think the proper way to address this is to add attribute to annotate PInvoke methods ...

I honestly don't know why delegates retrieved by Marshal.GetDelegateForFunctionPointer will be a lot slower.

@jkotas
Copy link
Member

jkotas commented Feb 24, 2019

calli instruction turned out to be the best solution for now.

The difference between calli and PInvoke is noise. I would recommend choosing the one out of these two that gives you the most maintainable code. The performance gains you get by being able to maintain and refactor your code easily will be much higher than what you get from calli vs. PInvoke.

Marshal.GetDelegateForFunctionPointer will be a lot slower.

The delegate has an extra indirection in it. This indirection costs extra instructions.

15.5.6.3 Fast calls to unmanaged code

If you consider doing something like this, make sure that the unmanaged code that you are calling meets all constrains listed in the doc.

@nxtn
Copy link
Contributor Author

nxtn commented Feb 24, 2019

What I actually need is to call function pointers provided by unmanaged code at runtime, which can not be retrieved by NativeLibrary.Load and NativeLibrary.GetExport. So P/Invoke doesn't work in this scenario, Marshal.GetDelegateForFunctionPointer seems to be slow and cumbersome, and calli is unsafe... I wonder whether there would be a better workaround.

@mjsabby
Copy link
Contributor

mjsabby commented Feb 24, 2019

Why can’t you use calli with the unmanaged calling convention? That will solve your problem without having to worry about the constraints and GC delaying of using a managed calling convention.

@nxtn
Copy link
Contributor Author

nxtn commented Feb 24, 2019

Not that I recommend this for your code

You said you don't recommend this?

@mjsabby
Copy link
Contributor

mjsabby commented Feb 25, 2019

calli with the managed calling convention, described in 15.5.6.3 Fast calls to unmanaged code is the one that I do not recommend without understanding the constraints.

calli with an unmanaged calling convention, like cdecl, stdcall, etc still erect a PInvoke frame and are as safe as .... well calling into any unmanaged code is, so that would work for the scenario you describe.

@nxtn nxtn changed the title Custom internal calls in custom .NET Core host Custom internal calls in .NET Core hosting Apr 15, 2019
@nxtn
Copy link
Contributor Author

nxtn commented Apr 16, 2019

As an experiment I tested calli with managed calling convention. It's super fast...

Results still in https://github.com/dotnet/coreclr/issues/22320#issuecomment-466736048

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@AaronRobinsonMSFT
Copy link
Member

@NextTurn I don't see any current action here so am closing this. Please file a new issue if there is some action that is being suggested or asked about. Thank you.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 14, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants