Proposal rename System.Runtime.Intrinsics.X86.dll to System.Runtime.Intrinsics.dll #24595

sdmaclea · 2018-01-05T23:23:53Z

For HW Intrinsics X86 has created System.Runtime.Intrinsics.X86.dll. It currently contains Vector128, and Vector256 from System.Runtime.Intrinsics as well as the X86
HW intrinsics from System.Runtime.Intrinsics.X86

ARM64 intrinsics will use Vector128 from System.Runtime.Intrinsics
ARM64 intrinsics will add Vector64 to System.Runtime.Intrinsics

The question arises about where to put ARM64 intrinsic reference assemblies.

It seems any application which takes the time to optimize code using intrinsics may want to support multiple platforms.

My preference would be to add ARM64 intrinsics to the same reference assembly. Therefore to rename the reference assembly to a more generic name System.Runtime.Intrinsics.dll rather than creating System.Runtime.Intrinsics.ARM.ARM64.dll and later System.Runtime.Intrinsics.ARM.ARM32.dll.

@eerhardt @CarolEidt @RussKeldorph Please advise

@tannergooding @fiigii FYI

The text was updated successfully, but these errors were encountered:

fiigii · 2018-01-05T23:34:44Z

Do we have System.Runtime.Intrinsics.X86.dll? I removed the package in dotnet/corefx#24490

I am not familiar with CoreFX configuration, cc @eerhardt @weshaggard

4creators · 2018-01-05T23:37:56Z

I think it is the best way forward. Anyway intrinsics will be implemented in runtime and available on all platforms supported.

sdmaclea · 2018-01-05T23:38:46Z

Do we have System.Runtime.Intrinsics.X86.dll?

Yes. See src directory

https://github.com/dotnet/corefx/tree/master/src/System.Runtime.Intrinsics.X86

weshaggard · 2018-01-05T23:40:48Z

The library still exists but there is no nuget package just for it any longer.

As for one assembly vs many that is a good question. I don't think we have enough information here to make that decision. One thing I would like to understand is how many API's are we talking about for each architecture?

cc @dotnet/fxdc for potential advise.

fiigii · 2018-01-05T23:46:16Z

The library still exists but there is no nuget package just for it any longer.

Ah, I see.

As for one assembly vs many that is a good question.

I think we should have one with the name System.Runtime.Intrinsics.dll because an HW intrinsic should be visible on all the platforms. @jkotas

tannergooding · 2018-01-05T23:51:58Z

One thing I would like to understand is how many API's are we talking about for each architecture?

@weshaggard, the number of types is small (around 10 for each architecture right now). The number of APIs (methods) varies greatly based on the type (ranging from 1 to more than 50).

The number of shared types is also small (they are all opaque structs). Currently 2, but expected to expand to 6+ once AVX-512 and SVE extensions are implemented

weshaggard · 2018-01-05T23:54:34Z

Thanks @tannergooding given that I would agree with putting them all in the same assembly, especially since they already have a way to check if they are supported or not at runtime.

CarolEidt · 2018-01-06T00:20:19Z

I agree with the idea of putting them in the same assembly. Especially given that we don't plan to have any significant implementation in them (i.e. the size of the IL will be small). In the rare case where we plan to support a variable argument where the target has only immediate forms, we are leaning toward generating the code in the JIT when the IL method for the intrinsic is compiled.

So, we have the following general pattern for the IL "implementation" of a generic intrinsic Foo in namespace Bar:

public static Vector128<T> Foo(Vector128<T> value) where T : struct
{
    if (Bar.IsSupported)
    {
        ThrowHelper.ThrowNotSupportedExceptionIfNonNumericType<T>();
        // In this recursive invocation, Foo will always be expanded by the JIT,
        // even if it requires generating a switch-table expansion.
        return Foo<T>(value);
    }
    else
    {
        throw new PlatformNotSupportedException();
    }
}

eerhardt · 2018-01-09T17:28:27Z

I can work on this change this week.

sdmaclea · 2018-01-09T17:44:37Z

@CarolEidt

we have the following general pattern for the IL "implementation" of a generic intrinsic Foo in namespace Bar

So far in corefx the reference implementation is simply

public static Vector128<T> Foo(Vector128<T> value) where T : struct 
{ 
  throw null; 
}

In CoreCLR the pattern you describe is effectively implemented, but it is not implemented in IL as simply as you describe. It is actually split.

// For unsupported platforms
public static Vector128<T> Foo(Vector128<T> value) where T : struct 
{ 
  throw new PlatformNotSupportedException(); 
}

// For sometimes supported platforms
public static Vector128<T> Foo(Vector128<T> value) where T : struct 
{ 
  return Foo<T>(value);
}

JIT HW intrinsic implementation handles the bulk of your code

    if (Bar.IsSupported)
    {
        ThrowHelper.ThrowNotSupportedExceptionIfNonNumericType<T>();
        // In this recursive invocation, Foo will always be expanded by the JIT,
        // even if it requires generating a switch-table expansion.
        return Foo<T>(value);
    }
    else
    {
        throw new PlatformNotSupportedException();
    }

There is also an argument range exception possible, for immediate values.

Allowing JIT to handle bulk of the work allows a lot of the code to be shared instead of replicated into every intrinsic.

Although it would be good to eliminate one of the two C# forms.

tannergooding · 2018-01-09T18:01:23Z

The implementation could probably just be changed to public static Vector128<T> Foo(Vector128<T> value) => Foo<T>(value) everywhere. (Although, I think there was some pushback last time this was suggested.)?

This would SO on any runtime which doesn't support hardware intrinsics (which is already the case on any platform that gets the "sometimes supported" implementation, but doesn't handle these).

The x86 implementation is currently returning nullptr for the first pass (mustExpand = false), and is doing NO_WAY("JIT must expand the intrinsic!"); for the second pass (mustExpand = true).

Changing the NO_WAY to return gtNewMustThrowException(CORINFO_HELP_THROW_PLATFORM_NOT_SUPPORTED, JITtype2varType(sig->retType), sig->retTypeClass); instead would make the PNSE handling consistent.

sdmaclea · 2018-01-09T18:11:51Z

if we change to lookupHWIntrinsic(methodName, isa) to

NamedIntrinsic lookupHWIntrinsic(const char* namespace, const char* className, const char* methodName);

All platforms can have a NI_PlatformNotSupported named intrinsic. It can be returned if namespace or class name is not supported on the current platform (Unless method get is get_IsSupported)

This is roughly what my draft ARM64 code is doing.

CarolEidt · 2018-01-09T18:13:12Z

The implementation could probably just be changed to public static Vector128 Foo(Vector128 value) => Foo(value) everywhere. (Although, I think there was some pushback last time this was suggested.)?

So, IIUC this would leave it up to the JIT to do the IsSupported check at JIT-time, and:

If supported and constraints (e.g. immediates) are met, generate simple expansion.
Otherwise, if non-recursive generate a direct call.
If recursive, generate "full" expansion:
- If not supported, simply generate a throw of PNSE (i.e. the gtNewMustThrowException).
- Otherwise (supported, but immediate constraint isn't met), generate switchtable expansion

CarolEidt · 2018-01-09T18:15:26Z

@sdmaclea - I think that your NI_PlatformNotSupported intrinsic could be considered an "internal implementation detail of my proposal. Ideally, it would be good to express this requirement in a way that is independent of the internal JIT or runtime details. Does that make sense?

sdmaclea · 2018-01-09T18:22:32Z

changed to public static Vector128 Foo(Vector128 value) => Foo(value) everywhere

Does everywhere include the CoreFX copy?

tannergooding · 2018-01-09T18:24:22Z

The reference assemblies return null by design, since they should only ever exist at compile time. Loading it at runtime is an ~~error~~ bug.

CarolEidt · 2018-01-09T18:37:07Z

The implementation could probably just be changed to public static Vector128 Foo(Vector128 value) => Foo(value) everywhere. (Although, I think there was some pushback last time this was suggested.)?

@tannergooding - do you recall what the concern was that was raised? At least from my perspective, I think I have a much better feel at this point for how much cleaner this approach is, now that we have gained a bit of experience and perspective on these intrinsics. And I would like to extend my appreciation to @sdmaclea , @fiigii and @tannergooding for the significant contributions (and patience) so far! It's been quite instructive to have multiple targets actively under discussion and implementation.

tannergooding · 2018-01-09T18:54:32Z

do you recall what the concern was that was raised

Not off the top of my head. I think it might have been @jkotas that raised it however.

I'll try and see if I can find it (no guarantees however, lots of threads and lots of comments 😄)

jkotas · 2018-01-09T19:25:33Z

The implementation could probably just be changed to public static Vector128 Foo(Vector128 value) => Foo(value) everywhere.

If you mean that e.g. this pattern https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Runtime/Intrinsics/X86/Avx.cs#L237, you would need more JIT helpers to make this work with the simple above pattern to throw the right exception. My preference is to have fewer JIT helpers when I have a choice. If you think that having more JIT helpers is better choice in this case, I am fine what that.

CarolEidt · 2018-01-09T19:30:08Z

you would need more JIT helpers to make this work with the simple above pattern to throw the right exception

Thanks for clarifying. I believe I do think that having more JIT helpers would be a better choice, since it saves a lot of boilerplate implementation in the library. That said, perhaps there are more than I realize.

tannergooding · 2018-01-09T19:42:03Z

My preference is to have fewer JIT helpers when I have a choice. If you think that having more JIT helpers is better choice in this case, I am fine what that.

Could you clarify on what you mean by JIT helper?

Are you referring to a managed helper such as ThrowHelper.ThrowNotSupportedExceptionIfNonNumericType<T>(); or actual native code in the JIT?

fiigii · 2018-01-09T19:45:44Z

Could you clarify on what you mean by JIT helper?

The JIT helper throws NotSupportedException with messages to match the behavior of System.Numerics.Vector<T>.

jkotas · 2018-01-09T19:46:37Z

Right, we would need to have CORINFO_HELP_XXX equivalent of ThrowHelper.ThrowNotSupportedExceptionIfNonNumericType<T>();. The JIT does not have capability to introduce call to ThrowHelper.ThrowNotSupportedExceptionIfNonNumericType<T>(); today. It can only introduce calls to CORINFO_HELP_XXX helpers.

sdmaclea · 2018-01-09T19:51:30Z

@jkotas There is no reason for generated code to call the managed method. JIT needs to know anyway to support the inlining rules required. JIT can either throw or not throw based on the template arguments and platform features.

jkotas · 2018-01-09T19:54:12Z

JIT can either throw or not throw based on the template arguments and platform features.

Right. And to throw exception with the right message, we would need a new CORINFO_HELP_XXX helper.

tannergooding · 2018-01-09T19:54:12Z

Today we are already doing:
https://github.com/dotnet/coreclr/blob/master/src/jit/hwintrinsicxarch.cpp#L683

For ThrowNotSupportedExceptionIfNonNumericType it should just be:

if (IsNonNumericType())
{
     return impUnsupportedHWIntrinsic(CORINFO_HELP_THROW_PLATFORM_NOT_SUPPORTED, method, sig, mustExpand);
}

We already have to have an IsNumericType check today, in order to determine whether or not to fallback on something like StaticCast:
dotnet/coreclr@038cb09#diff-46e84d907ca2183faf22c1345e4e04aeR534

jkotas · 2018-01-09T19:57:31Z

This gives you the generic "platform not supported" message. You rather want the "this type is not supported" message.

tannergooding · 2018-01-09T20:01:24Z

This gives you the generic "platform not supported" message. You rather want the "this type is not supported" message.

Ah, OK. This was the part I was missing.

I also think this is two parts. One for specific error messages. the other part is that we could simplify S.P.CoreLib to have a single implementation cross-arch(currently the recursive implementation only exists on the target platform, we have an identical implementation for other archs that throws the default PNSE)

jkotas · 2018-01-09T20:03:35Z

currently the recursive implementation only exists on the target platform, we have an identical implementation for other archs that throws the default PNSE

I want to keep this so that bring ups of new platforms do not need to worry about the hardware intrinsics.

tannergooding · 2018-01-09T21:07:42Z

Right, there are other code generators out these in addition to RyuJIT.

So today, the exact behavior primarily depends on the implementation of corlib you have at runtime.

For CoreCLR, regardless of RyuJIT or the legacy JIT, you will get the recursive implementation on x86/x64.

This means that you will end up with a SO exception assert for any x86 intrinsic which is not recognized but which has a recursive managed implementation (the impIntrinsic will fall out and a GT_CALL will be emitted).

On ARM, you will currently get the throw PNSE implementation (for x86 intrinsic).

On other code generators (such as Mono or the full framework runtime), their corlib implementation does not currently define these types and you will get a Type/Method not found exception (assuming you compiled using CoreFX references and attempted to execute using their runner).

CarolEidt · 2018-01-09T21:08:53Z

There is no reason for generated code to call the managed method.

I think the advantage of calling the managed method instead of directly generating the throw (in the non-recursive case) is that the exception happens in the method itself. Also, to me it seems cleaner that the JIT only supports the supported case in the "happy path", and all the other cases are handled in the recursive case.

sdmaclea · 2018-01-09T21:14:26Z

I am on board. So for the extract case..

C# the supported platform

T Extract<T>(Vector128<T> value, byte index)
{
  ThrowHelper.ThrowNotSupportedExceptionIfNonNumericType<T>();

  byte elements = Unsafe.SizeOf<Vector128<T>>()/Unsafe.SizeOf<T>();
  if(index >= elements)
  {
      ThrowHelper.ThrowArgumentOutOfRangeException(index);
  }
  return Extract<T>(Vector128<T> value, byte index);
}

tannergooding · 2018-01-09T21:15:14Z

I think the advantage of calling the managed method instead of directly generating the throw (in the non-recursive case) is that the exception happens in the method itself.

We are already doing that today. We only emit the gtNewMustThrowException node if mustExpand is true (so it only emits when we are inside the GT_CALL to the intrinsic).

CarolEidt · 2018-01-09T21:44:21Z

I guess I don't see that we are at a consensus yet. I believe there are two options under consideration:

The JIT is responsible for doing all the checks, and, in the recursive/mustExpand case, it generates either the appropriate throw or the switchtable expansion as needed. In this case the implementation is simple, but even the legacy jit would have to minimally support throwing PNSE.

OR

The JIT still has to do all the checks in the non-recursive/!mustExpand case, but is only responsible for generating the switchtable expansion as needed.

The latter doesn't seem like a big win in exchange for the additional IL that needs to be in the library.

Since the ref assembly is so small, it doesn't pay to split each architecture into separate assemblies. Also, there are common types shared between architectures - ex. Vector128. Fix #26194

sdmaclea · 2018-01-09T22:06:40Z

@CarolEidt @jkotas

OR

The JIT is responsible for doing all the checks, and, in the recursive/mustExpand case, it generates either the appropriate throw or the switchtable expansion as needed. In this case the implementation is simple. On non-supported platforms and JITs, alternate mscorlib HW intrinsic src implementation is used to throw PNSE.

This is currently what some of us have been trying to implement.

jkotas · 2018-01-09T22:17:11Z

The latter doesn't seem like a big win in exchange for the additional IL that needs to be in the library.

IL implementations of the methods is in the library in either case.

If we wanted to optimize the IL size of the library, we would implement these methods using throw new PlatformNotSupportedException() body everywhere. It is same body for all methods, and so the metadata of all methods will point to the same blob. The fallback expansion would need to be done differently in the JIT compared to what it is today.

tannergooding · 2018-01-09T22:19:54Z

In this case the implementation is simple, but even the legacy jit would have to minimally support throwing PNSE.

It doesn't look like impIntrinsic (or the relevant pieces in the method) is under a LEGACY_BACKEND ifdef, so I think it is already attempting JIT intrinsic expansion today (might have missed something, however).

On non-supported platforms and JITs, alternate mscorlib HW intrinsic src implementation is used to throw PNSE.

I don't think we are achieving this today. It depends on the underlying corlib implementation in the end.

If a user compilers against one ref assembly and attempts to run it on any given runtime, one of a few different things will happen today:

Implementation is recursive
- JIT does recognize, in which case code is emitted as normal (expanded for direct calls, compiler fallback or expansion for indirect calls)
- JIT doesn't recognize, in which case a StackOverflow or NO_WAY assert will occur
Implementation is non-recursive
- JIT does recognize
  - Expanded for direct calls
  - IL is jitted for indirect calls
- JIT doesn't recognize, in which case IL is jitted for all calls

tannergooding · 2018-01-09T22:22:05Z

If you wanted to optimize the IL size of the library, we would implement these methods using throw new PlatformNotSupportedException() body everywhere. It is same body for all methods, and so the metadata of all methods will point to the same blob. The fallback expansion would need to be done differently in the JIT compared to what it is today.

This may provide the most consistent experience everywhere.

In all cases it would be:

JIT recognizes
- Intrinsic is expanded for direct calls
- Indirect calls would expand or emit compiler callback in the method body (this would require some mechanism to detect this)
JIT does not recognize, IL is jitted for all calls (throwing a PNSE)

sdmaclea · 2018-01-09T22:28:01Z

we would implement these methods using throw new PlatformNotSupportedException() body everywhere

If we used a wrapper exception throw new HWIntrinsicNotSupportedException() it might provide the advantage of being explicit.

We can detect a HWIntrinsic by the exception it throws

CarolEidt · 2018-01-09T22:28:25Z

Indirect calls would expand or emit compiler callback in the method body (this would require some mechanism to detect this)

I'm not crazy about the idea of a special mechanism for this. Perhaps this pattern is best for the IL method of Foo.Bar():

if (Foo.IsSupported)
{
    Bar();
}
else
{
    throw new PlatformNotSupportedException();
}

This behaves mostly as @tannergooding suggests above, but allows the JIT to recognize the recursive case and do the appropriate (simple or switchtable) expansion.

tannergooding · 2018-01-09T22:31:04Z

@CarolEidt, how would bool IsSupported { get; } be defined?

We can't just define bool IsSupported => false, as that won't work for indirect calls. We also can't just define bool IsSupported => IsSupported, as that would SO in the cases I mentioned above (JIT doesn't recognize)

CarolEidt · 2018-01-09T22:50:38Z

We can't just define bool IsSupported => false, as that won't work for indirect calls.

I believe we can; IsSupported would be always be recognized as an intrinsic by any platform/JIT that supports HW intrinsics, in which case it would return the true value.

Or do you mean indirect calls to IsSupported? I think we can reasonably say that IsSupported could never be invoked indirectly.

tannergooding · 2018-01-09T22:54:32Z

Yes. It would impact indirect calls (Reflection, Delegates, etc).

jkotas · 2018-01-09T23:01:17Z

I think we can reasonably say that IsSupported could never be invoked indirectly.

I do not think we should be making exceptions like this.

In this scheme, I think the most straightforward option to get a working IsSupported would be to substitute the IL for it on the EE side: https://github.com/dotnet/coreclr/blob/master/src/vm/jitinterface.cpp#L6970

tannergooding · 2018-01-09T23:15:28Z

In this scheme, I think the most straightforward option to get a working IsSupported would be to substitute the IL for it on the EE side: https://github.com/dotnet/coreclr/blob/master/src/vm/jitinterface.cpp#L6970

To make sure I understand. You are suggesting that, if we went with the approach @CarolEidt suggested (https://github.com/dotnet/corefx/issues/26194#issuecomment-356435124), IsSupported would be implemented as bool IsSupported => false and we would update the EE to replace the IL with the recursive implementation such that indirect calls would also work as expected?

CarolEidt · 2018-01-09T23:16:00Z

What @tannergooding said

jkotas · 2018-01-09T23:40:29Z

we would update the EE to replace the IL with the recursive implementation such that indirect calls would also work as expected?

Right.

sdmaclea · 2018-01-09T23:42:05Z

Right

If and only if JIT is RyuJIT?

tannergooding · 2018-01-09T23:43:47Z

If and only if JIT is RyuJIT?

I would believe this would be on any JIT that supports the HWIntrinsics.

As I indicated earlier:

It doesn't look like impIntrinsic (or the relevant pieces in the method) is under a LEGACY_BACKEND ifdef, so I think it is already attempting JIT intrinsic expansion today (might have missed something, however).

sdmaclea · 2018-01-09T23:46:24Z

@tannergooding I assumed that comment indicated LegayJIT was broken.

tannergooding · 2018-01-09T23:49:07Z

It might be (in which case we should log a bug and track getting the HWIntrinsic functionality ifdef'd appropriately)

Since the ref assembly is so small, it doesn't pay to split each architecture into separate assemblies. Also, there are common types shared between architectures - ex. Vector128. Fix #26194

eerhardt self-assigned this Jan 9, 2018

eerhardt closed this as completed in dotnet/corefx#26245 Jan 12, 2018

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 2.1.0 milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 18, 2020

Proposal rename System.Runtime.Intrinsics.X86.dll to System.Runtime.Intrinsics.dll #24595

Proposal rename System.Runtime.Intrinsics.X86.dll to System.Runtime.Intrinsics.dll #24595

Comments

sdmaclea commented Jan 5, 2018

fiigii commented Jan 5, 2018

4creators commented Jan 5, 2018

sdmaclea commented Jan 5, 2018

weshaggard commented Jan 5, 2018

fiigii commented Jan 5, 2018

tannergooding commented Jan 5, 2018

weshaggard commented Jan 5, 2018

CarolEidt commented Jan 6, 2018

eerhardt commented Jan 9, 2018

sdmaclea commented Jan 9, 2018

tannergooding commented Jan 9, 2018

sdmaclea commented Jan 9, 2018

CarolEidt commented Jan 9, 2018

CarolEidt commented Jan 9, 2018 • edited Loading

sdmaclea commented Jan 9, 2018

tannergooding commented Jan 9, 2018 • edited Loading

CarolEidt commented Jan 9, 2018

tannergooding commented Jan 9, 2018

jkotas commented Jan 9, 2018

CarolEidt commented Jan 9, 2018

tannergooding commented Jan 9, 2018

fiigii commented Jan 9, 2018

jkotas commented Jan 9, 2018 • edited Loading

sdmaclea commented Jan 9, 2018

jkotas commented Jan 9, 2018 • edited Loading

tannergooding commented Jan 9, 2018

jkotas commented Jan 9, 2018

tannergooding commented Jan 9, 2018 • edited Loading

jkotas commented Jan 9, 2018

tannergooding commented Jan 9, 2018

CarolEidt commented Jan 9, 2018

sdmaclea commented Jan 9, 2018 • edited Loading

tannergooding commented Jan 9, 2018

CarolEidt commented Jan 9, 2018

sdmaclea commented Jan 9, 2018 • edited Loading

jkotas commented Jan 9, 2018 • edited Loading

tannergooding commented Jan 9, 2018 • edited Loading

tannergooding commented Jan 9, 2018

sdmaclea commented Jan 9, 2018 • edited Loading

CarolEidt commented Jan 9, 2018

tannergooding commented Jan 9, 2018

CarolEidt commented Jan 9, 2018

tannergooding commented Jan 9, 2018

jkotas commented Jan 9, 2018

tannergooding commented Jan 9, 2018

CarolEidt commented Jan 9, 2018

jkotas commented Jan 9, 2018

sdmaclea commented Jan 9, 2018

tannergooding commented Jan 9, 2018

sdmaclea commented Jan 9, 2018

tannergooding commented Jan 9, 2018

CarolEidt commented Jan 9, 2018 •

edited

Loading

tannergooding commented Jan 9, 2018 •

edited

Loading

jkotas commented Jan 9, 2018 •

edited

Loading

jkotas commented Jan 9, 2018 •

edited

Loading

tannergooding commented Jan 9, 2018 •

edited

Loading

sdmaclea commented Jan 9, 2018 •

edited

Loading

sdmaclea commented Jan 9, 2018 •

edited

Loading

jkotas commented Jan 9, 2018 •

edited

Loading

tannergooding commented Jan 9, 2018 •

edited

Loading

sdmaclea commented Jan 9, 2018 •

edited

Loading