-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal rename System.Runtime.Intrinsics.X86.dll to System.Runtime.Intrinsics.dll #24595
Comments
Do we have I am not familiar with CoreFX configuration, cc @eerhardt @weshaggard |
I think it is the best way forward. Anyway intrinsics will be implemented in runtime and available on all platforms supported. |
Yes. See src directory https://github.com/dotnet/corefx/tree/master/src/System.Runtime.Intrinsics.X86 |
The library still exists but there is no nuget package just for it any longer. As for one assembly vs many that is a good question. I don't think we have enough information here to make that decision. One thing I would like to understand is how many API's are we talking about for each architecture? cc @dotnet/fxdc for potential advise. |
Ah, I see.
I think we should have one with the name |
@weshaggard, the number of types is small (around 10 for each architecture right now). The number of APIs (methods) varies greatly based on the type (ranging from 1 to more than 50). The number of shared types is also small (they are all opaque structs). Currently 2, but expected to expand to 6+ once AVX-512 and SVE extensions are implemented |
Thanks @tannergooding given that I would agree with putting them all in the same assembly, especially since they already have a way to check if they are supported or not at runtime. |
I agree with the idea of putting them in the same assembly. Especially given that we don't plan to have any significant implementation in them (i.e. the size of the IL will be small). In the rare case where we plan to support a variable argument where the target has only immediate forms, we are leaning toward generating the code in the JIT when the IL method for the intrinsic is compiled. So, we have the following general pattern for the IL "implementation" of a generic intrinsic
|
I can work on this change this week. |
So far in corefx the reference implementation is simply public static Vector128<T> Foo(Vector128<T> value) where T : struct
{
throw null;
} In CoreCLR the pattern you describe is effectively implemented, but it is not implemented in IL as simply as you describe. It is actually split. // For unsupported platforms
public static Vector128<T> Foo(Vector128<T> value) where T : struct
{
throw new PlatformNotSupportedException();
} // For sometimes supported platforms
public static Vector128<T> Foo(Vector128<T> value) where T : struct
{
return Foo<T>(value);
} JIT HW intrinsic implementation handles the bulk of your code if (Bar.IsSupported)
{
ThrowHelper.ThrowNotSupportedExceptionIfNonNumericType<T>();
// In this recursive invocation, Foo will always be expanded by the JIT,
// even if it requires generating a switch-table expansion.
return Foo<T>(value);
}
else
{
throw new PlatformNotSupportedException();
} There is also an argument range exception possible, for immediate values. Allowing JIT to handle bulk of the work allows a lot of the code to be shared instead of replicated into every intrinsic. Although it would be good to eliminate one of the two C# forms. |
The implementation could probably just be changed to This would SO on any runtime which doesn't support hardware intrinsics (which is already the case on any platform that gets the "sometimes supported" implementation, but doesn't handle these). The x86 implementation is currently returning Changing the |
if we change to NamedIntrinsic All platforms can have a This is roughly what my draft ARM64 code is doing. |
So, IIUC this would leave it up to the JIT to do the
|
@sdmaclea - I think that your |
Does |
The reference assemblies return |
@tannergooding - do you recall what the concern was that was raised? At least from my perspective, I think I have a much better feel at this point for how much cleaner this approach is, now that we have gained a bit of experience and perspective on these intrinsics. And I would like to extend my appreciation to @sdmaclea , @fiigii and @tannergooding for the significant contributions (and patience) so far! It's been quite instructive to have multiple targets actively under discussion and implementation. |
Not off the top of my head. I think it might have been @jkotas that raised it however. I'll try and see if I can find it (no guarantees however, lots of threads and lots of comments 😄) |
If you mean that e.g. this pattern https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Runtime/Intrinsics/X86/Avx.cs#L237, you would need more JIT helpers to make this work with the simple above pattern to throw the right exception. My preference is to have fewer JIT helpers when I have a choice. If you think that having more JIT helpers is better choice in this case, I am fine what that. |
Thanks for clarifying. I believe I do think that having more JIT helpers would be a better choice, since it saves a lot of boilerplate implementation in the library. That said, perhaps there are more than I realize. |
Could you clarify on what you mean by JIT helper? Are you referring to a managed helper such as |
The JIT helper throws |
Right, we would need to have |
@jkotas There is no reason for generated code to call the managed method. JIT needs to know anyway to support the inlining rules required. JIT can either throw or not throw based on the template arguments and platform features. |
Right. And to throw exception with the right message, we would need a new |
Today we are already doing: For
We already have to have an |
This gives you the generic "platform not supported" message. You rather want the "this type is not supported" message. |
Ah, OK. This was the part I was missing. I also think this is two parts. One for specific error messages. the other part is that we could simplify S.P.CoreLib to have a single implementation cross-arch(currently the recursive implementation only exists on the target platform, we have an identical implementation for other archs that throws the default PNSE) |
I want to keep this so that bring ups of new platforms do not need to worry about the hardware intrinsics. |
So today, the exact behavior primarily depends on the implementation of corlib you have at runtime. For CoreCLR, regardless of RyuJIT or the legacy JIT, you will get the recursive implementation on x86/x64. This means that you will end up with a SO exception assert for any x86 intrinsic which is not recognized but which has a recursive managed implementation (the impIntrinsic will fall out and a GT_CALL will be emitted). On ARM, you will currently get the On other code generators (such as Mono or the full framework runtime), their corlib implementation does not currently define these types and you will get a Type/Method not found exception (assuming you compiled using CoreFX references and attempted to execute using their runner). |
I think the advantage of calling the managed method instead of directly generating the throw (in the non-recursive case) is that the exception happens in the method itself. Also, to me it seems cleaner that the JIT only supports the supported case in the "happy path", and all the other cases are handled in the recursive case. |
I am on board. So for the extract case.. C# the supported platform T Extract<T>(Vector128<T> value, byte index)
{
ThrowHelper.ThrowNotSupportedExceptionIfNonNumericType<T>();
byte elements = Unsafe.SizeOf<Vector128<T>>()/Unsafe.SizeOf<T>();
if(index >= elements)
{
ThrowHelper.ThrowArgumentOutOfRangeException(index);
}
return Extract<T>(Vector128<T> value, byte index);
} |
We are already doing that today. We only emit the |
I guess I don't see that we are at a consensus yet. I believe there are two options under consideration:
OR
The latter doesn't seem like a big win in exchange for the additional IL that needs to be in the library. |
Since the ref assembly is so small, it doesn't pay to split each architecture into separate assemblies. Also, there are common types shared between architectures - ex. Vector128. Fix #26194
OR
This is currently what some of us have been trying to implement. |
IL implementations of the methods is in the library in either case. If we wanted to optimize the IL size of the library, we would implement these methods using |
It doesn't look like
I don't think we are achieving this today. It depends on the underlying corlib implementation in the end. If a user compilers against one ref assembly and attempts to run it on any given runtime, one of a few different things will happen today:
|
This may provide the most consistent experience everywhere. In all cases it would be:
|
If we used a wrapper exception throw new We can detect a HWIntrinsic by the exception it throws |
I'm not crazy about the idea of a special mechanism for this. Perhaps this pattern is best for the IL method of
This behaves mostly as @tannergooding suggests above, but allows the JIT to recognize the recursive case and do the appropriate (simple or switchtable) expansion. |
@CarolEidt, how would We can't just define |
I believe we can; Or do you mean indirect calls to |
Yes. It would impact indirect calls (Reflection, Delegates, etc). |
I do not think we should be making exceptions like this. In this scheme, I think the most straightforward option to get a working IsSupported would be to substitute the IL for it on the EE side: https://github.com/dotnet/coreclr/blob/master/src/vm/jitinterface.cpp#L6970 |
To make sure I understand. You are suggesting that, if we went with the approach @CarolEidt suggested (https://github.com/dotnet/corefx/issues/26194#issuecomment-356435124), |
What @tannergooding said |
Right. |
If and only if JIT is RyuJIT? |
I would believe this would be on any JIT that supports the HWIntrinsics. As I indicated earlier:
|
@tannergooding I assumed that comment indicated LegayJIT was broken. |
It might be (in which case we should log a bug and track getting the HWIntrinsic functionality ifdef'd appropriately) |
Since the ref assembly is so small, it doesn't pay to split each architecture into separate assemblies. Also, there are common types shared between architectures - ex. Vector128. Fix #26194
Since the ref assembly is so small, it doesn't pay to split each architecture into separate assemblies. Also, there are common types shared between architectures - ex. Vector128. Fix #26194
Since the ref assembly is so small, it doesn't pay to split each architecture into separate assemblies. Also, there are common types shared between architectures - ex. Vector128. Fix #26194
For HW Intrinsics X86 has created
System.Runtime.Intrinsics.X86.dll
. It currently contains Vector128, and Vector256 fromSystem.Runtime.Intrinsics
as well as the X86HW intrinsics from
System.Runtime.Intrinsics.X86
ARM64 intrinsics will use Vector128 from
System.Runtime.Intrinsics
ARM64 intrinsics will add Vector64 to
System.Runtime.Intrinsics
The question arises about where to put ARM64 intrinsic reference assemblies.
It seems any application which takes the time to optimize code using intrinsics may want to support multiple platforms.
My preference would be to add ARM64 intrinsics to the same reference assembly. Therefore to rename the reference assembly to a more generic name
System.Runtime.Intrinsics.dll
rather than creatingSystem.Runtime.Intrinsics.ARM.ARM64.dll
and laterSystem.Runtime.Intrinsics.ARM.ARM32.dll
.@eerhardt @CarolEidt @RussKeldorph Please advise
@tannergooding @fiigii FYI
The text was updated successfully, but these errors were encountered: