-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime should be updated to support the __vectorcall
calling convention
#8300
Comments
FYI. @mellinoe, who may be interested. |
This would significantly improve performance for the |
https://docs.microsoft.com/en-us/cpp/cpp/stdcall
https://docs.microsoft.com/en-us/cpp/cpp/vectorcall
For ARM64 the standard calling convention (ARM64 AAPCS64) passes vectors in registers as Short Vectors or HVA. A brief glance at the Vector ABI for ARM64 (VPCS) doc looks like it is similar to the ARM64 AAPCS64 except it changes Callee/Caller register save responsibilities to eliminate some of the issues with preserving/restoring high bits of vector registers. |
Looks like attribute |
With the support for hardware intrinsics in addition to the existing support for things like System.Numerics.Vector, this may be more important. This currently represents a scenario where the Windows ABI actually loses out on performance as compared to the System V ABI. This performance difference is readily measurable in native code, and will become more measurable in managed code as the the CoreCLR System V ABI implementation continues getting improvements. |
CC. @CarolEidt |
On a similar note we should explore the custom xmm call convention on x86, at least for invoking our own math helpers, to avoid transitioning in and out of x87 like we do now.
|
@AndyAyersMS, I thought we removed all the x87 FPU code with RyuJIT? At the very least, I remember doing some work to ensure the System.Math helpers were able to call the CRT implementations (which use SSE/SSE2 when that compiler switch is specified), rather than using the hand-coded assembly. |
How so? It should just be (roughly speaking) the x86 __fastcall convention plus enabling HVA arguments |
The standard x86 calling convention returns FP values in |
Hmm, maybe I misread the "spec" -- it seems like if we made vectorcall the default for all methods it looks like it would give us XMM pass/return for floats on x86. The description here is not all that easy to parse as it also says the convention for floats is not impacted. |
Yes, it does that. For example: https://godbolt.org/g/ZsJv5y |
Also it interesting to see that Maybe interop knows about it? |
The documentation page (https://msdn.microsoft.com/en-us/library/dn375768.aspx) does say that vector types include FP types:
And then for x86 it says:
|
Interop does not support FastCall calling convention. From https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.callingconvention?view=netcore-2.0 : FastCall This calling convention is not supported. |
Friendly ping as two years passed and I believe it's an "easy" yet probably very significant performance optimization! |
@LifeIsStrange - this was something that we had hoped to be able to make progress on for the 5.0 release (starting with supporting the correct standard calling conventions for both Linux and Windows, where the former passes vectors in registers, and both conventions call for returning vectors in registers). However, there was enough complexity between the runtime stubs and the JIT handling, that it didn't get completed. |
@tannergooding Any interest in updating this issue with a proposal for leveraging the design in #51156? |
Moving to 8.0. |
Tagging subscribers to this area: @dotnet/interop-contrib Issue DetailsRationaleToday, the runtime supports the However, it means that operating with certain data types is still "sub-optimal". Microsoft Windows provides the The ProposalThe runtime should add support for the The
|
namespace System.Runtime.CompilerServices
{
public class CallConvVectorcall
{
// This type has no members and is identical in structure to other `CallConv*` types
}
} |
@bartonjs Nope. Both of these enums map to metadata encodings and can't/shouldn't be updated without updating ECMA-335. We created the |
/cc @lambdageek We are considering this for .NET 8. Would there be any concerns here on the mono side? |
Cc @lateralusX |
@AaronRobinsonMSFT I think we would want to do this together with support for simd ABIs on non-windows platforms. @fanyang-mono had started the work in net7 for AOT, but we had to revert because it wasn't usable without JIT or interp support. __vectorcall would probably face many of the same issues. Cc @SamMonoRT |
Does Do we need a similar but different one to support the Arm64 Vector Procedure Call Standard (AAVPCS, referenced above #8300 (comment), defined here: https://github.com/ARM-software/abi-aa/blob/main/vfabia64/vfabia64.rst)? |
Would be a very welcome addition (especially if used internally for parameters and returns) 1 week to 10 year anniversary of Introducing ‘Vector Calling Convention’ blog post https://devblogs.microsoft.com/cppblog/introducing-vector-calling-convention/ |
Moving to .NET 9 as we aren't going to get to this before feature-complete. |
Rationale
Today, the runtime supports the
__fastcall
calling convention, which not only allows interop with any native code that uses that calling convention but also allows it to take advantage of the additional registers that are available on the underlying architecture.However, it means that operating with certain data types is still "sub-optimal".
Microsoft Windows provides the
__vectorcall
calling convention just for this purpose (https://msdn.microsoft.com/en-us/library/dn375768.aspx). It extends the existing__fastcall
calling convention to additionally allow SIMD vector types and Homogeneous Vector Aggregate values to be passed via register rather than on the stack.The
System V AMD64 ABI
already defines vector sized types (__m128, __m256) and supports passing them in register.Proposal
The runtime should add support for the
__vectorcall
calling convention, not only to improve performance, but to also provide better interop with native code that uses it.Alternative API proposal
The
__vectorcall
calling convention could be exposed onSystem.Runtime.InteropServices.CallingConvention
asVectorCall
.The text was updated successfully, but these errors were encountered: