Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute for minimal runtime impact during an unmanaged call #30741

Closed
AaronRobinsonMSFT opened this issue Sep 1, 2019 · 29 comments
Closed
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.InteropServices
Milestone

Comments

@AaronRobinsonMSFT
Copy link
Member

The work in dotnet/coreclr#26458 is about exposing a way for users to indicate to the runtime that the P/Invoke (unmanaged call) is well-behaved (see API below) enough that the runtime doesn't need to erect a GC transition frame.

Rationale and Usage

There is non-trivial overhead when making a P/Invoke call in the runtime. If the target function is fast enough this overhead can dominate the call rather than the contents of the function. For example, the Win32 function GetTickCount() is typically less than 50 instructions however the transition to execute that function is over 60 instructions. This means that setting up to call the function is more expensive than actually running the function itself.

There are limits to the benefit here since this does allow the user to create a bad situation where GC starvation, data corruption, or immediate runtime termination can occur. The traditional P/Invoke method via DllImportAttribute is designed to ensure all calls are safe with respect to the runtime memory model. This attribute would provide a way to circumvent those guarantees. See the below proposed API for additional details.

Community context:

Usage example:

[DllImport("Kernel32", EntryPoint = "GetTickCount")]
[SuppressGCTransition]
public static extern int GetTickCount_NoOverhead();

[DllImport("Kernel32", EntryPoint = "GetTickCount")]
public static extern int GetTickCount_Overhead();

Proposed API

See related PR for complete context. Some alternative names have been proposed that focus on intent rather than implementation detail:

The benefit of focusing on intent is additional optimizations can be enabled with this attribute as they are discovered.

The prototype in the aforementioned PR uses the below proposed API.

namespace System.Runtime.InteropServices
{
    /// <summary>
    /// An attribute used to indicate a GC transition should be skipped when making an unmanaged function call.
    /// </summary>
    /// <example>
    /// Example of a valid use case. The Win32 `GetTickCount()` function is a small performance related function
    /// that reads some global memory and returns the value. In this case, the GC transition overhead is significantly
    /// more than the memory read.
    /// <code>
    /// using System;
    /// using System.Runtime.InteropServices;
    /// class Program
    /// {
    ///     [DllImport("Kernel32")]
    ///     [SuppressGCTransition]
    ///     static extern int GetTickCount();
    ///     static void Main()
    ///     {
    ///         Console.WriteLine($"{GetTickCount()}");
    ///     }
    /// }
    /// </code>
    /// </example>
    /// <remarks>
    /// This attribute is ignored if applied to a method without the <see cref="System.Runtime.InteropServices.DllImportAttribute"/>.
    ///
    /// Forgoing this transition can yield benefits when the cost of the transition is more than the execution time
    /// of the unmanaged function. However, avoiding this transition removes some of the guarantees the runtime
    /// provides through a normal P/Invoke. When exiting the managed runtime to enter an unmanaged function the
    /// GC must transition from Cooperative mode into Preemptive mode. Full details on these modes can be found at
    /// https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/clr-code-guide.md#2.1.8.
    /// Suppressing the GC transition is an advanced scenario and should not be done without fully understanding
    /// potential consequences.
    ///
    /// One of these consequences is an impact to Mixed-mode debugging (https://docs.microsoft.com/visualstudio/debugger/how-to-debug-in-mixed-mode).
    /// During Mixed-mode debugging, it is not possible to step into or set breakpoints in a P/Invoke that
    /// has been marked with this attribute. A workaround is to switch to native debugging and set a breakpoint in the native function.
    /// In general, usage of this attribute is not recommended if debugging the P/Invoke is important, for example
    /// stepping through the native code or diagnosing an exception thrown from the native code.
    ///
    /// The P/Invoke method that this attribute is applied to must have all of the following properties:
    ///   * Native function always executes for a trivial amount of time (less than 1 microsecond).
    ///   * Native function does not perform a blocking syscall (e.g. any type of I/O).
    ///   * Native function does not call back into the runtime (e.g. Reverse P/Invoke).
    ///   * Native function does not throw exceptions.
    ///   * Native function does not manipulate locks or other concurrency primitives.
    ///
    /// Consequences of invalid uses of this attribute:
    ///   * GC starvation.
    ///   * Immediate runtime termination.
    ///   * Data corruption.
    /// </remarks>
    [AttributeUsage(AttributeTargets.Method, Inherited = false)]
    public sealed class SuppressGCTransitionAttribute : Attribute
    {
        public SuppressGCTransitionAttribute()
        {
        }
    }
}

/cc @jkotas @stephentoub @jeffschwMSFT @jkoritzinsky

@scalablecory
Copy link
Contributor

Can you give more explicit guidance on appropriate usage? The examples you give don't give me enough confidence to apply this.

Supressing the GC transition is an advanced scenario and should not be done without fully understanding potential consequences.

What is a GC transition? What is the benefit that I am foregoing by disabling it?

Native function performs syscall (e.g. Any type of I/O).
Native function may execute for a non-trivial amount of time (more than 1 microsecond).

Does this include a context switch? What happens if there is a page fault?

Native function requires extensive object manipulation during marshalling.

What is "extensive"?

@scalablecory
Copy link
Contributor

CC @AndyAyersMS

@AaronRobinsonMSFT
Copy link
Member Author

Updated the API with some additional details.

Does this include a context switch? What happens if there is a page fault?

A context switch is most often during a syscall which is mentioned. Exceeding a thread's quantum isn't something that can be predicted so I didn't mention it. A page fault is an interesting question since it can be a syscall. It might be worth adding that allocating memory from the heap should be avoided. I will leave that clarification decision up to @jkotas.

@danmoseley
Copy link
Member

@JeremyKuhne was thinking about making pinvoke cheaper : probably you already talked.

@AaronRobinsonMSFT
Copy link
Member Author

@danmosemsft Unfortunately, we have not talked about light weight P/Invokes. I welcome any suggestions here or at dotnet/coreclr#26458.

@jkotas
Copy link
Member

jkotas commented Sep 2, 2019

allocating memory from the heap should be avoided

Correct. Allocating memory from the heap violates "Native function may execute for a non-trivial amount of time."

@jkotas
Copy link
Member

jkotas commented Sep 2, 2019

@lambdageek Does this concept make sense for Mono? What would be a good name for it from the Mono point of view?

@AaronRobinsonMSFT
Copy link
Member Author

@terrajobst Any chance to get this on the API review docket?

@JeremyKuhne
Copy link
Member

There are a number of cases where this will be helpful. One that just came up for me was getting the current system colors on Windows, which is nothing more than a bounds check and a de-index from an array. We used to cache our own table, firing up a thread to watch for system setting change messages so we could invalidate messages to avoid the call overhead. We now hit the API every time you use a Color that came from a known system color, which in some cases can end up being a lot of unnecessary work.

@terrajobst
Copy link
Member

terrajobst commented Oct 4, 2019

Before we review it, I'd like to get some input from the Mono folks. /cc @lambdageek @marek-safar

@marek-safar
Copy link
Contributor

Did I get it right that the attribute is valid only in the context of DllImport? Wouldn't be better to include there instead of having a new attribute or are you expecting the attribute to be useful elsewhere?

@lambdageek
Copy link
Member

lambdageek commented Oct 7, 2019

At first blush, this: a. it makes sense for Mono; b. has the right restrictions on what the function may do.

Need to think about it a bit more.

What would be a good name for it from the Mono point of view?

I've been trying to call them "thread state transitions" rather than "gc state transitions" because we use the same mechanism as part of suspending threads for other reasons (e.g. the soft debugger), but "gc state transition" is much more common in our code.

@MichalStrehovsky
Copy link
Member

Did I get it right that the attribute is valid only in the context of DllImport? Wouldn't be better to include there instead of having a new attribute

DllImportAttribute is a pseudocustom attribute - it's not encoded as a custom attribute in the file format, but as a set of flags and fields encoded across two tables. Changing it would be a ECMA-335 file format change. Not that we can't change the file format, but it's a thing to consider.

@AaronRobinsonMSFT
Copy link
Member Author

@marek-safar As @MichalStrehovsky mentioned the DllImport is a pseudo attribute that is much more complicated to change. On top of the large considerations in applying this to DllImport it would also potentially limit being applied to other scenarios where this might be interest (i.e. interface function interop).

@lambdageek What about @jkotas's suggestion of TrivialUnmanagedMethodAttribute?

@lambdageek
Copy link
Member

TrivialUnmanagedMethodAttribute

I don't like it for all the same reasons that are listed as benefits. 😁
It's unclear exactly what it may do in the future. If this is meant for advanced users, I think it's better to be specific.


I need to think about what functions with this attribute can do in Xamarin.iOS and Xamarin.Android - I think it's ok to extend the lifetimes of Java or Objective-C objects (if there's a way to do it without an API call), but probably not to shorten them.


I wonder if it's worth adding to the restrictions:

   /// * Does not loop on a condition that relies on another thread to execute in order to exit the loop

So for example some kind of busy wait that requires another thread to toggle some value in shared memory before the current function returns.


We've had situations on Linux in Mono where a function in Cooperative mode called the first function from a shared library (ie, we pinvoked foo() from libfoo.so which tried to call bar() from libbar.so which was the first time the process did anything with libbar) which deadlocked the runtime - the dynamic linker needed to take a lock that was held by another thread. The lock was held by a thread that was preempted while pinvoking backtrace, which happens to need the same lock.

So maybe we should also add:

   /// * Don't call any function for the first time

We basically handwaved and pretended like this wasn't happening.


  /// * Native function does not take locks.

This should be

  /// * Native function does not manipulate locks or other concurrency primitives

On Linux, glibc pthreads signalling a condition variable with pthread_cond_signal and pthread_cond_broadcast may need to lock a mutex. (To rebalance some internal queues)


  /// * Native function does not performs syscall

For the reasons above, this should really be does not call into any library.


Questions:

  • How does this attribute interact with argument marshalling?

  • How does this attribute interact with [DllImport (..., SetLastError = true)] ?

@AaronRobinsonMSFT
Copy link
Member Author

@lambdageek Thanks for the feedback.

How does this attribute interact with argument marshalling?

This attribute has no impact on marshalling of types.

How does this attribute interact with [DllImport (..., SetLastError = true)] ?

This attribute has no impact on SetLastError=true, those semantics are maintained.

@jkotas
Copy link
Member

jkotas commented Oct 7, 2019

For the reasons above, this should really be does not call into any library.

How do you define library?

@AaronRobinsonMSFT
Copy link
Member Author

@lambdageek I have applied some of your suggestions. Please see the updated example above.

@jkotas
Copy link
Member

jkotas commented Oct 12, 2019

Another naming idea: LimitedUnmanagedMethodAttribute. It is the name we use for similar purpose in the unmanaged runtime annotations (https://github.com/dotnet/coreclr/search?q=LIMITED_METHOD_CONTRACT).

@AaronRobinsonMSFT
Copy link
Member Author

Thanks @jkotas. I have updated the issue with that suggestion as well.

@terrajobst
Copy link
Member

terrajobst commented Oct 22, 2019

Video

  • Would probably make sense on DllImportAttribute but this attribute is specially encoded in metadata, so we can't easily do that
  • We believe this can't be internal and library authors will likely want this
  • This concept should work function pointers, however, it's unclear how this can be expressed. There is currently no way to express this.
  • The more generic API names (TrivialUnmanagedMethodAttribute, LimitedUnmanagedMethodAttribute) seem problematic because we're probably not likely to turn more runtime optimizations on when a method is attributed as trivial, which makes specific naming like this more useful.
  • We should probably make this attribute work for UnmanagedFunctionPointerAttribute which names we should include Delegate in the attribute targets.
    • @AaronRobinsonMSFT please confirm whether this feature will work for the existing umanaged function pointer delegates.
  • How would this feature work with function pointers?

@jkotas
Copy link
Member

jkotas commented Oct 22, 2019

We should probably make this attribute work for UnmanagedFunctionPointerAttribute

Marshaled delegates have non-trivial by-design overhead. It is not that interesting to make this work for marshaled delegates as performance optimization. The interesting case to make this work as performance optimizations are function pointers.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 5.0 milestone Feb 1, 2020
@terrajobst
Copy link
Member

@AaronRobinsonMSFT was this feature abandoned?

@AaronRobinsonMSFT
Copy link
Member Author

@terrajobst Nope. We did this work a while ago dotnet/coreclr#26458.

@PathogenDavid
Copy link
Contributor

We should probably make this attribute work for UnmanagedFunctionPointerAttribute

Marshaled delegates have non-trivial by-design overhead. It is not that interesting to make this work for marshaled delegates as performance optimization. The interesting case to make this work as performance optimizations are function pointers.

Does this mean that conceptually this feature could apply to reverse p/invokes too? Or am I misunderstanding and this statement is only meant to apply to function pointers to unmanaged code?

To put it another way: Could SuppressGCTransition make conceptual sense for UnmanagedCallersOnly methods? (I'm trying to determine what--if anything--I might be able to do to increase the performance of calling an UnmanagedCallersOnly method from unmanaged code.)

@AaronRobinsonMSFT
Copy link
Member Author

To put it another way: Could SuppressGCTransition make conceptual sense for UnmanagedCallersOnly methods?

The SuppressGCTransition is for cases where one is in a managed environment (Cooperative mode) and would like to call into an unmanaged environment without making a transition to Preemptive mode - at least at present. If one is calling a method marked with UnmanagedCallersOnly the caller has already paid that transition cost. Furthermore I think it would very difficult for us to permit not transitioning into Cooperative mode. The number of operations would be very limited and would probably cause more grief than benefit.

(I'm trying to determine what--if anything--I might be able to do to increase the performance of calling an UnmanagedCallersOnly method from unmanaged code.)

The SuppressGCTransition is not going to help with that. The current UnmanagedCallersOnly implementation, other than x86, is about as lean as one can get. Are you observing some overhead that could be reduced in a non-x86 scenario?

@PathogenDavid
Copy link
Contributor

Thanks for the info!

Are you observing some overhead that could be reduced in a non-x86 scenario?

The issues I'm observing are on x64, but I still need to investigate more to determine for sure if UnmanagedCallersOnly is to blame or something else. (This was just something I thought of while trying to investigate low-hanging fruit.)

@nxtn
Copy link
Contributor

nxtn commented Aug 29, 2020

The constraints on this attribute look very similar to those on the "Fast calls to unmanaged code". Do they work the same way under the hood?

@jkotas
Copy link
Member

jkotas commented Aug 29, 2020

Yes, the mechanics are similar, with one subtle difference.

SuppressGCTransition calls get a GC probe (~2 extra instructions on the hot path, amortized if there is more than one call in the same basic block). FCalls do not do it today that makes them prone to starving GC e.g. if they are called in a loop. #13820 is an example.

You can think about the mechanics that we have implemented for these calls as "FCalls done right". We may switch FCalls to get the GC probe in future as well, as we work on improving worst case latencies.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.InteropServices
Projects
None yet
Development

No branches or pull requests