Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime should be updated to support the __vectorcall calling convention #8300

Open
tannergooding opened this issue Jun 6, 2017 · 29 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.InteropServices
Milestone

Comments

@tannergooding
Copy link
Member

tannergooding commented Jun 6, 2017

Rationale

Today, the runtime supports the __fastcall calling convention, which not only allows interop with any native code that uses that calling convention but also allows it to take advantage of the additional registers that are available on the underlying architecture.

However, it means that operating with certain data types is still "sub-optimal".

Microsoft Windows provides the __vectorcall calling convention just for this purpose (https://msdn.microsoft.com/en-us/library/dn375768.aspx). It extends the existing __fastcall calling convention to additionally allow SIMD vector types and Homogeneous Vector Aggregate values to be passed via register rather than on the stack.

The System V AMD64 ABI already defines vector sized types (__m128, __m256) and supports passing them in register.

Proposal

The runtime should add support for the __vectorcall calling convention, not only to improve performance, but to also provide better interop with native code that uses it.

namespace System.Runtime.CompilerServices
{
+    public class CallConvVectorCall
+    {
+        // This type has no members and is identical in structure to other `CallConv*` types
+    }
}

Alternative API proposal

The __vectorcall calling convention could be exposed on System.Runtime.InteropServices.CallingConvention as VectorCall.

@tannergooding
Copy link
Member Author

FYI. @mellinoe, who may be interested.

@tannergooding
Copy link
Member Author

This would significantly improve performance for the System.Numerics.Vector package, where all of the exposed types could be passed in register rather than passed on stack.

@tannergooding
Copy link
Member Author

@sdmaclea
Copy link
Contributor

sdmaclea commented Feb 8, 2018

https://docs.microsoft.com/en-us/cpp/cpp/stdcall

On ARM and x64 processors, __stdcall is accepted and ignored by the compiler; on ARM and x64 architectures, by convention, arguments are passed in registers when possible, and subsequent arguments are passed on the stack.

https://docs.microsoft.com/en-us/cpp/cpp/vectorcall

On ARM machines, __vectorcall is accepted and ignored by the compiler

For ARM64 the standard calling convention (ARM64 AAPCS64) passes vectors in registers as Short Vectors or HVA.

A brief glance at the Vector ABI for ARM64 (VPCS) doc looks like it is similar to the ARM64 AAPCS64 except it changes Callee/Caller register save responsibilities to eliminate some of the issues with preserving/restoring high bits of vector registers. __atribute__((aarch64_vector_pcs))

@sdmaclea
Copy link
Contributor

sdmaclea commented Feb 8, 2018

Looks like attribute aarch64_vector_pcs is not recognized by gcc 6.3.0. Tested using latest Arm64 gcc on Compliler Explorer https://godbolt.org/

@tannergooding
Copy link
Member Author

With the support for hardware intrinsics in addition to the existing support for things like System.Numerics.Vector, this may be more important.

This currently represents a scenario where the Windows ABI actually loses out on performance as compared to the System V ABI.

This performance difference is readily measurable in native code, and will become more measurable in managed code as the the CoreCLR System V ABI implementation continues getting improvements.

@tannergooding
Copy link
Member Author

CC. @CarolEidt

@AndyAyersMS
Copy link
Member

On a similar note we should explore the custom xmm call convention on x86, at least for invoking our own math helpers, to avoid transitioning in and out of x87 like we do now.

__vectorcall on x86 looks pretty hacky.

@tannergooding
Copy link
Member Author

On a similar note we should explore the custom xmm call convention on x86, at least for invoking our own math helpers, to avoid transitioning in and out of x87 like we do now

@AndyAyersMS, I thought we removed all the x87 FPU code with RyuJIT? At the very least, I remember doing some work to ensure the System.Math helpers were able to call the CRT implementations (which use SSE/SSE2 when that compiler switch is specified), rather than using the hand-coded assembly.

@tannergooding
Copy link
Member Author

__vectorcall on x86 looks pretty hacky.

How so? It should just be (roughly speaking) the x86 __fastcall convention plus enabling HVA arguments

@mikedn
Copy link
Contributor

mikedn commented Mar 26, 2018

I thought we removed all the x87 FPU code with RyuJIT?

The standard x86 calling convention returns FP values in ST(0).

@AndyAyersMS
Copy link
Member

Hmm, maybe I misread the "spec" -- it seems like if we made vectorcall the default for all methods it looks like it would give us XMM pass/return for floats on x86. The description here is not all that easy to parse as it also says the convention for floats is not impacted.

@mikedn
Copy link
Contributor

mikedn commented Mar 26, 2018

it seems like if we made vectorcall the default for all methods it looks like it would give us XMM pass/return for floats on x86

Yes, it does that. For example: https://godbolt.org/g/ZsJv5y

@AndyAyersMS
Copy link
Member

Also it interesting to see that __fastcall on x86 has some limited aspects of __vectorcall. I am pretty sure the jit doesn't do this for manged methods with HFA/HVAs.

Maybe interop knows about it?

@mikedn
Copy link
Contributor

mikedn commented Mar 26, 2018

The description here is not all that easy to parse as it also says the convention for floats is not impacted.

The documentation page (https://msdn.microsoft.com/en-us/library/dn375768.aspx) does say that vector types include FP types:

A vector type is either a floating-point type—for example, a float or double—or an SIMD vector type—for example, __m128 or __m256.

And then for x86 it says:

Vector type results are returned by value in XMM0 or YMM0, depending on size. HVA results have each data element returned by value in registers XMM0:XMM3 or YMM0:YMM3, depending on element size. Other result types are returned by reference to memory allocated by the caller.

@jkotas
Copy link
Member

jkotas commented Mar 26, 2018

Maybe interop knows about it?

Interop does not support FastCall calling convention. From https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.callingconvention?view=netcore-2.0 : FastCall This calling convention is not supported.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@LifeIsStrange
Copy link

Friendly ping as two years passed and I believe it's an "easy" yet probably very significant performance optimization!

@CarolEidt
Copy link
Contributor

@LifeIsStrange - this was something that we had hoped to be able to make progress on for the 5.0 release (starting with supporting the correct standard calling conventions for both Linux and Windows, where the former passes vectors in registers, and both conventions call for returning vectors in registers). However, there was enough complexity between the runtime stubs and the JIT handling, that it didn't get completed.

@AaronRobinsonMSFT
Copy link
Member

@tannergooding Any interest in updating this issue with a proposal for leveraging the design in #51156?

@AaronRobinsonMSFT
Copy link
Member

Moving to 8.0.

@AaronRobinsonMSFT AaronRobinsonMSFT modified the milestones: 7.0.0, 8.0.0 Jun 13, 2022
@ghost
Copy link

ghost commented Oct 7, 2022

Tagging subscribers to this area: @dotnet/interop-contrib
See info in area-owners.md if you want to be subscribed.

Issue Details

Rationale

Today, the runtime supports the __fastcall calling convention, which not only allows interop with any native code that uses that calling convention but also allows it to take advantage of the additional registers that are available on the underlying architecture.

However, it means that operating with certain data types is still "sub-optimal".

Microsoft Windows provides the __vectorcall calling convention just for this purpose (https://msdn.microsoft.com/en-us/library/dn375768.aspx). It extends the existing __fastcall calling convention to additionally allow SIMD vector types and Homogeneous Vector Aggregate values to be passed via register rather than on the stack.

The System V AMD64 ABI already defines vector sized types (__m128, __m256) and supports passing them in register.

Proposal

The runtime should add support for the __vectorcall calling convention, not only to improve performance, but to also provide better interop with native code that uses it.

The __vectorcall calling convention should be exposed on System.Runtime.InteropServices.CallingConvention as VectorCall.

Author: tannergooding
Assignees: -
Labels:

area-System.Runtime.InteropServices, area-Interop-coreclr

Milestone: 8.0.0

@bartonjs
Copy link
Member

bartonjs commented Oct 18, 2022

Video

  • The existing CallConv types that end in "conv" use a lowercase C, so this should as well.
  • Do we also need to update System.Reflection.SignatureCallingConvention and/or System.Runtime.InteropServices.CallingConvention?
namespace System.Runtime.CompilerServices
{
     public class CallConvVectorcall
     {
         // This type has no members and is identical in structure to other `CallConv*` types
     }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Oct 18, 2022
@AaronRobinsonMSFT
Copy link
Member

Do we also need to update System.Reflection.SignatureCallingConvention and/or System.Runtime.InteropServices.CallingConvention?

@bartonjs Nope. Both of these enums map to metadata encodings and can't/shouldn't be updated without updating ECMA-335. We created the UnmanagedCallConv specifically to avoid this limitation. All new calling conventions should use the CallConv* convention we are employing here.

@AaronRobinsonMSFT
Copy link
Member

/cc @lambdageek We are considering this for .NET 8. Would there be any concerns here on the mono side?

@lambdageek
Copy link
Member

Cc @lateralusX

@lambdageek
Copy link
Member

@AaronRobinsonMSFT I think we would want to do this together with support for simd ABIs on non-windows platforms. @fanyang-mono had started the work in net7 for AOT, but we had to revert because it wasn't usable without JIT or interp support. __vectorcall would probably face many of the same issues.

Cc @SamMonoRT

@BruceForstall
Copy link
Member

Does CallConvVectorcall as specified here apply exclusively to the Windows x86/x64, Microsoft-defined, __vectorcall convention (https://learn.microsoft.com/en-us/cpp/cpp/vectorcall?redirectedfrom=MSDN&view=msvc-170)?

Do we need a similar but different one to support the Arm64 Vector Procedure Call Standard (AAVPCS, referenced above #8300 (comment), defined here: https://github.com/ARM-software/abi-aa/blob/main/vfabia64/vfabia64.rst)?

@benaadams
Copy link
Member

Would be a very welcome addition (especially if used internally for parameters and returns)

1 week to 10 year anniversary of Introducing ‘Vector Calling Convention’ blog post https://devblogs.microsoft.com/cppblog/introducing-vector-calling-convention/

@jkoritzinsky
Copy link
Member

Moving to .NET 9 as we aren't going to get to this before feature-complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.InteropServices
Projects
Status: No status
Development

No branches or pull requests