Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal rename System.Runtime.Intrinsics.X86.dll to System.Runtime.Intrinsics.dll #24595

Closed
sdmaclea opened this issue Jan 5, 2018 · 56 comments
Assignees
Milestone

Comments

@sdmaclea
Copy link
Contributor

sdmaclea commented Jan 5, 2018

For HW Intrinsics X86 has created System.Runtime.Intrinsics.X86.dll. It currently contains Vector128, and Vector256 from System.Runtime.Intrinsics as well as the X86
HW intrinsics from System.Runtime.Intrinsics.X86

ARM64 intrinsics will use Vector128 from System.Runtime.Intrinsics
ARM64 intrinsics will add Vector64 to System.Runtime.Intrinsics

The question arises about where to put ARM64 intrinsic reference assemblies.

It seems any application which takes the time to optimize code using intrinsics may want to support multiple platforms.

My preference would be to add ARM64 intrinsics to the same reference assembly. Therefore to rename the reference assembly to a more generic name System.Runtime.Intrinsics.dll rather than creating System.Runtime.Intrinsics.ARM.ARM64.dll and later System.Runtime.Intrinsics.ARM.ARM32.dll.

@eerhardt @CarolEidt @RussKeldorph Please advise

@tannergooding @fiigii FYI

@fiigii
Copy link
Contributor

fiigii commented Jan 5, 2018

Do we have System.Runtime.Intrinsics.X86.dll? I removed the package in dotnet/corefx#24490

I am not familiar with CoreFX configuration, cc @eerhardt @weshaggard

@4creators
Copy link
Contributor

I think it is the best way forward. Anyway intrinsics will be implemented in runtime and available on all platforms supported.

@sdmaclea
Copy link
Contributor Author

sdmaclea commented Jan 5, 2018

Do we have System.Runtime.Intrinsics.X86.dll?

Yes. See src directory

https://github.com/dotnet/corefx/tree/master/src/System.Runtime.Intrinsics.X86

@weshaggard
Copy link
Member

The library still exists but there is no nuget package just for it any longer.

As for one assembly vs many that is a good question. I don't think we have enough information here to make that decision. One thing I would like to understand is how many API's are we talking about for each architecture?

cc @dotnet/fxdc for potential advise.

@fiigii
Copy link
Contributor

fiigii commented Jan 5, 2018

The library still exists but there is no nuget package just for it any longer.

Ah, I see.

As for one assembly vs many that is a good question.

I think we should have one with the name System.Runtime.Intrinsics.dll because an HW intrinsic should be visible on all the platforms. @jkotas

@tannergooding
Copy link
Member

One thing I would like to understand is how many API's are we talking about for each architecture?

@weshaggard, the number of types is small (around 10 for each architecture right now). The number of APIs (methods) varies greatly based on the type (ranging from 1 to more than 50).

The number of shared types is also small (they are all opaque structs). Currently 2, but expected to expand to 6+ once AVX-512 and SVE extensions are implemented

@weshaggard
Copy link
Member

Thanks @tannergooding given that I would agree with putting them all in the same assembly, especially since they already have a way to check if they are supported or not at runtime.

@CarolEidt
Copy link
Contributor

I agree with the idea of putting them in the same assembly. Especially given that we don't plan to have any significant implementation in them (i.e. the size of the IL will be small). In the rare case where we plan to support a variable argument where the target has only immediate forms, we are leaning toward generating the code in the JIT when the IL method for the intrinsic is compiled.

So, we have the following general pattern for the IL "implementation" of a generic intrinsic Foo in namespace Bar:

public static Vector128<T> Foo(Vector128<T> value) where T : struct
{
    if (Bar.IsSupported)
    {
        ThrowHelper.ThrowNotSupportedExceptionIfNonNumericType<T>();
        // In this recursive invocation, Foo will always be expanded by the JIT,
        // even if it requires generating a switch-table expansion.
        return Foo<T>(value);
    }
    else
    {
        throw new PlatformNotSupportedException();
    }
}

@eerhardt
Copy link
Member

eerhardt commented Jan 9, 2018

I can work on this change this week.

@eerhardt eerhardt self-assigned this Jan 9, 2018
@sdmaclea
Copy link
Contributor Author

sdmaclea commented Jan 9, 2018

@CarolEidt

we have the following general pattern for the IL "implementation" of a generic intrinsic Foo in namespace Bar

So far in corefx the reference implementation is simply

public static Vector128<T> Foo(Vector128<T> value) where T : struct 
{ 
  throw null; 
}

In CoreCLR the pattern you describe is effectively implemented, but it is not implemented in IL as simply as you describe. It is actually split.

// For unsupported platforms
public static Vector128<T> Foo(Vector128<T> value) where T : struct 
{ 
  throw new PlatformNotSupportedException(); 
}
// For sometimes supported platforms
public static Vector128<T> Foo(Vector128<T> value) where T : struct 
{ 
  return Foo<T>(value);
}

JIT HW intrinsic implementation handles the bulk of your code

    if (Bar.IsSupported)
    {
        ThrowHelper.ThrowNotSupportedExceptionIfNonNumericType<T>();
        // In this recursive invocation, Foo will always be expanded by the JIT,
        // even if it requires generating a switch-table expansion.
        return Foo<T>(value);
    }
    else
    {
        throw new PlatformNotSupportedException();
    }

There is also an argument range exception possible, for immediate values.

Allowing JIT to handle bulk of the work allows a lot of the code to be shared instead of replicated into every intrinsic.

Although it would be good to eliminate one of the two C# forms.

@tannergooding
Copy link
Member

The implementation could probably just be changed to public static Vector128<T> Foo(Vector128<T> value) => Foo<T>(value) everywhere. (Although, I think there was some pushback last time this was suggested.)?

This would SO on any runtime which doesn't support hardware intrinsics (which is already the case on any platform that gets the "sometimes supported" implementation, but doesn't handle these).

The x86 implementation is currently returning nullptr for the first pass (mustExpand = false), and is doing NO_WAY("JIT must expand the intrinsic!"); for the second pass (mustExpand = true).

Changing the NO_WAY to return gtNewMustThrowException(CORINFO_HELP_THROW_PLATFORM_NOT_SUPPORTED, JITtype2varType(sig->retType), sig->retTypeClass); instead would make the PNSE handling consistent.

@sdmaclea
Copy link
Contributor Author

sdmaclea commented Jan 9, 2018

if we change to lookupHWIntrinsic(methodName, isa) to

NamedIntrinsic lookupHWIntrinsic(const char* namespace, const char* className, const char* methodName);

All platforms can have a NI_PlatformNotSupported named intrinsic. It can be returned if namespace or class name is not supported on the current platform (Unless method get is get_IsSupported)

This is roughly what my draft ARM64 code is doing.

@CarolEidt
Copy link
Contributor

The implementation could probably just be changed to public static Vector128 Foo(Vector128 value) => Foo(value) everywhere. (Although, I think there was some pushback last time this was suggested.)?

So, IIUC this would leave it up to the JIT to do the IsSupported check at JIT-time, and:

  • If supported and constraints (e.g. immediates) are met, generate simple expansion.
  • Otherwise, if non-recursive generate a direct call.
  • If recursive, generate "full" expansion:
    • If not supported, simply generate a throw of PNSE (i.e. the gtNewMustThrowException).
    • Otherwise (supported, but immediate constraint isn't met), generate switchtable expansion

@CarolEidt
Copy link
Contributor

CarolEidt commented Jan 9, 2018

@sdmaclea - I think that your NI_PlatformNotSupported intrinsic could be considered an "internal implementation detail of my proposal. Ideally, it would be good to express this requirement in a way that is independent of the internal JIT or runtime details. Does that make sense?

@sdmaclea
Copy link
Contributor Author

sdmaclea commented Jan 9, 2018

changed to public static Vector128 Foo(Vector128 value) => Foo(value) everywhere

Does everywhere include the CoreFX copy?

@tannergooding
Copy link
Member

tannergooding commented Jan 9, 2018

The reference assemblies return null by design, since they should only ever exist at compile time. Loading it at runtime is an error bug.

@CarolEidt
Copy link
Contributor

The implementation could probably just be changed to public static Vector128 Foo(Vector128 value) => Foo(value) everywhere. (Although, I think there was some pushback last time this was suggested.)?

@tannergooding - do you recall what the concern was that was raised? At least from my perspective, I think I have a much better feel at this point for how much cleaner this approach is, now that we have gained a bit of experience and perspective on these intrinsics. And I would like to extend my appreciation to @sdmaclea , @fiigii and @tannergooding for the significant contributions (and patience) so far! It's been quite instructive to have multiple targets actively under discussion and implementation.

@tannergooding
Copy link
Member

do you recall what the concern was that was raised

Not off the top of my head. I think it might have been @jkotas that raised it however.

I'll try and see if I can find it (no guarantees however, lots of threads and lots of comments 😄)

@jkotas
Copy link
Member

jkotas commented Jan 9, 2018

The implementation could probably just be changed to public static Vector128 Foo(Vector128 value) => Foo(value) everywhere.

If you mean that e.g. this pattern https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Runtime/Intrinsics/X86/Avx.cs#L237, you would need more JIT helpers to make this work with the simple above pattern to throw the right exception. My preference is to have fewer JIT helpers when I have a choice. If you think that having more JIT helpers is better choice in this case, I am fine what that.

@CarolEidt
Copy link
Contributor

you would need more JIT helpers to make this work with the simple above pattern to throw the right exception

Thanks for clarifying. I believe I do think that having more JIT helpers would be a better choice, since it saves a lot of boilerplate implementation in the library. That said, perhaps there are more than I realize.

@tannergooding
Copy link
Member

My preference is to have fewer JIT helpers when I have a choice. If you think that having more JIT helpers is better choice in this case, I am fine what that.

Could you clarify on what you mean by JIT helper?

Are you referring to a managed helper such as ThrowHelper.ThrowNotSupportedExceptionIfNonNumericType<T>(); or actual native code in the JIT?

@fiigii
Copy link
Contributor

fiigii commented Jan 9, 2018

Could you clarify on what you mean by JIT helper?

The JIT helper throws NotSupportedException with messages to match the behavior of System.Numerics.Vector<T>.

@jkotas
Copy link
Member

jkotas commented Jan 9, 2018

Right, we would need to have CORINFO_HELP_XXX equivalent of ThrowHelper.ThrowNotSupportedExceptionIfNonNumericType<T>();. The JIT does not have capability to introduce call to ThrowHelper.ThrowNotSupportedExceptionIfNonNumericType<T>(); today. It can only introduce calls to CORINFO_HELP_XXX helpers.

@sdmaclea
Copy link
Contributor Author

sdmaclea commented Jan 9, 2018

@jkotas There is no reason for generated code to call the managed method. JIT needs to know anyway to support the inlining rules required. JIT can either throw or not throw based on the template arguments and platform features.

@jkotas
Copy link
Member

jkotas commented Jan 9, 2018

JIT can either throw or not throw based on the template arguments and platform features.

Right. And to throw exception with the right message, we would need a new CORINFO_HELP_XXX helper.

@tannergooding
Copy link
Member

Today we are already doing:
https://github.com/dotnet/coreclr/blob/master/src/jit/hwintrinsicxarch.cpp#L683

For ThrowNotSupportedExceptionIfNonNumericType it should just be:

if (IsNonNumericType())
{
     return impUnsupportedHWIntrinsic(CORINFO_HELP_THROW_PLATFORM_NOT_SUPPORTED, method, sig, mustExpand);
}

We already have to have an IsNumericType check today, in order to determine whether or not to fallback on something like StaticCast:
dotnet/coreclr@038cb09#diff-46e84d907ca2183faf22c1345e4e04aeR534

@jkotas
Copy link
Member

jkotas commented Jan 9, 2018

This gives you the generic "platform not supported" message. You rather want the "this type is not supported" message.

@tannergooding
Copy link
Member

tannergooding commented Jan 9, 2018

This gives you the generic "platform not supported" message. You rather want the "this type is not supported" message.

Ah, OK. This was the part I was missing.

I also think this is two parts. One for specific error messages. the other part is that we could simplify S.P.CoreLib to have a single implementation cross-arch(currently the recursive implementation only exists on the target platform, we have an identical implementation for other archs that throws the default PNSE)

@jkotas
Copy link
Member

jkotas commented Jan 9, 2018

currently the recursive implementation only exists on the target platform, we have an identical implementation for other archs that throws the default PNSE

I want to keep this so that bring ups of new platforms do not need to worry about the hardware intrinsics.

@tannergooding
Copy link
Member

Right, there are other code generators out these in addition to RyuJIT.

So today, the exact behavior primarily depends on the implementation of corlib you have at runtime.

For CoreCLR, regardless of RyuJIT or the legacy JIT, you will get the recursive implementation on x86/x64.

This means that you will end up with a SO exception assert for any x86 intrinsic which is not recognized but which has a recursive managed implementation (the impIntrinsic will fall out and a GT_CALL will be emitted).

On ARM, you will currently get the throw PNSE implementation (for x86 intrinsic).

On other code generators (such as Mono or the full framework runtime), their corlib implementation does not currently define these types and you will get a Type/Method not found exception (assuming you compiled using CoreFX references and attempted to execute using their runner).

@CarolEidt
Copy link
Contributor

There is no reason for generated code to call the managed method.

I think the advantage of calling the managed method instead of directly generating the throw (in the non-recursive case) is that the exception happens in the method itself. Also, to me it seems cleaner that the JIT only supports the supported case in the "happy path", and all the other cases are handled in the recursive case.

@sdmaclea
Copy link
Contributor Author

sdmaclea commented Jan 9, 2018

I am on board. So for the extract case..

C# the supported platform

T Extract<T>(Vector128<T> value, byte index)
{
  ThrowHelper.ThrowNotSupportedExceptionIfNonNumericType<T>();

  byte elements = Unsafe.SizeOf<Vector128<T>>()/Unsafe.SizeOf<T>();
  if(index >= elements)
  {
      ThrowHelper.ThrowArgumentOutOfRangeException(index);
  }
  return Extract<T>(Vector128<T> value, byte index);
}

@tannergooding
Copy link
Member

I think the advantage of calling the managed method instead of directly generating the throw (in the non-recursive case) is that the exception happens in the method itself.

We are already doing that today. We only emit the gtNewMustThrowException node if mustExpand is true (so it only emits when we are inside the GT_CALL to the intrinsic).

@CarolEidt
Copy link
Contributor

I guess I don't see that we are at a consensus yet. I believe there are two options under consideration:

  • The JIT is responsible for doing all the checks, and, in the recursive/mustExpand case, it generates either the appropriate throw or the switchtable expansion as needed. In this case the implementation is simple, but even the legacy jit would have to minimally support throwing PNSE.

OR

  • The JIT still has to do all the checks in the non-recursive/!mustExpand case, but is only responsible for generating the switchtable expansion as needed.

The latter doesn't seem like a big win in exchange for the additional IL that needs to be in the library.

eerhardt referenced this issue in eerhardt/corefx Jan 9, 2018
Since the ref assembly is so small, it doesn't pay to split each architecture into separate assemblies.  Also, there are common types shared between architectures - ex. Vector128.

Fix #26194
@sdmaclea
Copy link
Contributor Author

sdmaclea commented Jan 9, 2018

@CarolEidt @jkotas

OR

  • The JIT is responsible for doing all the checks, and, in the recursive/mustExpand case, it generates either the appropriate throw or the switchtable expansion as needed. In this case the implementation is simple. On non-supported platforms and JITs, alternate mscorlib HW intrinsic src implementation is used to throw PNSE.

This is currently what some of us have been trying to implement.

@jkotas
Copy link
Member

jkotas commented Jan 9, 2018

The latter doesn't seem like a big win in exchange for the additional IL that needs to be in the library.

IL implementations of the methods is in the library in either case.

If we wanted to optimize the IL size of the library, we would implement these methods using throw new PlatformNotSupportedException() body everywhere. It is same body for all methods, and so the metadata of all methods will point to the same blob. The fallback expansion would need to be done differently in the JIT compared to what it is today.

@tannergooding
Copy link
Member

tannergooding commented Jan 9, 2018

In this case the implementation is simple, but even the legacy jit would have to minimally support throwing PNSE.

It doesn't look like impIntrinsic (or the relevant pieces in the method) is under a LEGACY_BACKEND ifdef, so I think it is already attempting JIT intrinsic expansion today (might have missed something, however).

On non-supported platforms and JITs, alternate mscorlib HW intrinsic src implementation is used to throw PNSE.

I don't think we are achieving this today. It depends on the underlying corlib implementation in the end.

If a user compilers against one ref assembly and attempts to run it on any given runtime, one of a few different things will happen today:

  • Implementation is recursive
    • JIT does recognize, in which case code is emitted as normal (expanded for direct calls, compiler fallback or expansion for indirect calls)
    • JIT doesn't recognize, in which case a StackOverflow or NO_WAY assert will occur
  • Implementation is non-recursive
    • JIT does recognize
      • Expanded for direct calls
      • IL is jitted for indirect calls
    • JIT doesn't recognize, in which case IL is jitted for all calls

@tannergooding
Copy link
Member

If you wanted to optimize the IL size of the library, we would implement these methods using throw new PlatformNotSupportedException() body everywhere. It is same body for all methods, and so the metadata of all methods will point to the same blob. The fallback expansion would need to be done differently in the JIT compared to what it is today.

This may provide the most consistent experience everywhere.

In all cases it would be:

  • JIT recognizes
    • Intrinsic is expanded for direct calls
    • Indirect calls would expand or emit compiler callback in the method body (this would require some mechanism to detect this)
  • JIT does not recognize, IL is jitted for all calls (throwing a PNSE)

@sdmaclea
Copy link
Contributor Author

sdmaclea commented Jan 9, 2018

we would implement these methods using throw new PlatformNotSupportedException() body everywhere

If we used a wrapper exception throw new HWIntrinsicNotSupportedException() it might provide the advantage of being explicit.

We can detect a HWIntrinsic by the exception it throws

@CarolEidt
Copy link
Contributor

Indirect calls would expand or emit compiler callback in the method body (this would require some mechanism to detect this)

I'm not crazy about the idea of a special mechanism for this. Perhaps this pattern is best for the IL method of Foo.Bar():

if (Foo.IsSupported)
{
    Bar();
}
else
{
    throw new PlatformNotSupportedException();
}

This behaves mostly as @tannergooding suggests above, but allows the JIT to recognize the recursive case and do the appropriate (simple or switchtable) expansion.

@tannergooding
Copy link
Member

@CarolEidt, how would bool IsSupported { get; } be defined?

We can't just define bool IsSupported => false, as that won't work for indirect calls. We also can't just define bool IsSupported => IsSupported, as that would SO in the cases I mentioned above (JIT doesn't recognize)

@CarolEidt
Copy link
Contributor

We can't just define bool IsSupported => false, as that won't work for indirect calls.

I believe we can; IsSupported would be always be recognized as an intrinsic by any platform/JIT that supports HW intrinsics, in which case it would return the true value.

Or do you mean indirect calls to IsSupported? I think we can reasonably say that IsSupported could never be invoked indirectly.

@tannergooding
Copy link
Member

Yes. It would impact indirect calls (Reflection, Delegates, etc).

@jkotas
Copy link
Member

jkotas commented Jan 9, 2018

I think we can reasonably say that IsSupported could never be invoked indirectly.

I do not think we should be making exceptions like this.

In this scheme, I think the most straightforward option to get a working IsSupported would be to substitute the IL for it on the EE side: https://github.com/dotnet/coreclr/blob/master/src/vm/jitinterface.cpp#L6970

@tannergooding
Copy link
Member

In this scheme, I think the most straightforward option to get a working IsSupported would be to substitute the IL for it on the EE side: https://github.com/dotnet/coreclr/blob/master/src/vm/jitinterface.cpp#L6970

To make sure I understand. You are suggesting that, if we went with the approach @CarolEidt suggested (https://github.com/dotnet/corefx/issues/26194#issuecomment-356435124), IsSupported would be implemented as bool IsSupported => false and we would update the EE to replace the IL with the recursive implementation such that indirect calls would also work as expected?

@CarolEidt
Copy link
Contributor

What @tannergooding said

@jkotas
Copy link
Member

jkotas commented Jan 9, 2018

we would update the EE to replace the IL with the recursive implementation such that indirect calls would also work as expected?

Right.

@sdmaclea
Copy link
Contributor Author

sdmaclea commented Jan 9, 2018

Right

If and only if JIT is RyuJIT?

@tannergooding
Copy link
Member

If and only if JIT is RyuJIT?

I would believe this would be on any JIT that supports the HWIntrinsics.

As I indicated earlier:

It doesn't look like impIntrinsic (or the relevant pieces in the method) is under a LEGACY_BACKEND ifdef, so I think it is already attempting JIT intrinsic expansion today (might have missed something, however).

@sdmaclea
Copy link
Contributor Author

sdmaclea commented Jan 9, 2018

@tannergooding I assumed that comment indicated LegayJIT was broken.

@tannergooding
Copy link
Member

It might be (in which case we should log a bug and track getting the HWIntrinsic functionality ifdef'd appropriately)

eerhardt referenced this issue in eerhardt/corefx Jan 10, 2018
Since the ref assembly is so small, it doesn't pay to split each architecture into separate assemblies.  Also, there are common types shared between architectures - ex. Vector128.

Fix #26194
eerhardt referenced this issue in eerhardt/corefx Jan 11, 2018
Since the ref assembly is so small, it doesn't pay to split each architecture into separate assemblies.  Also, there are common types shared between architectures - ex. Vector128.

Fix #26194
eerhardt referenced this issue in dotnet/corefx Jan 12, 2018
Since the ref assembly is so small, it doesn't pay to split each architecture into separate assemblies.  Also, there are common types shared between architectures - ex. Vector128.

Fix #26194
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants