Determine how to expose hwintrinsics that are only supported in 64-bit mode #10617

tannergooding · 2018-07-02T16:57:21Z

There are several hwintrinsics exposed that are only emittable in 64-bit mode and throw a PNSE exception if invoked in 32-bit mode (regardless of the fact that the general Isa.IsSupported check returns true).

We should have a deeper discussion on how to properly expose this data to the consumer of these APIs.

The text was updated successfully, but these errors were encountered:

tannergooding · 2018-07-02T16:57:44Z

FYI. @CarolEidt, @fiigii, @eerhardt, @terrajobst

This tracks the discussion raised here: dotnet/coreclr#18734 (comment)

pentp · 2018-07-10T20:35:57Z

At least ConvertScalarToVector128(U)Int64 and ConvertTo(U)Int64 should be made to work in 32-bit mode. I have tried to use intrinsics in some Decimal calculations and ended up with many #if blocks like this:

#if BIT64
  var x1 = Sse2.ConvertScalarToVector128UInt64(pdecR.Low64);
#else
  var x1 = Sse2.LoadScalarVector128((ulong*)Unsafe.AsPointer(ref Unsafe.AddByteOffset(ref pdecR, (IntPtr)8)));
#endif

The JIT is better positioned to choose an optimal solution than user code - it can load directly from memory with MOVQ or spill the low+high registers and then use MOVQ or use 2xMOVD + PUNPCKLDQ or use MOVD + PINSRD (with SSE4.1). For the reverse it can choose between storing directly to memory with MOVQ or to registers with MOVD + PEXTRD (with SSE4.1).

tannergooding · 2018-07-10T20:47:29Z

At least ConvertScalarToVector128(U)Int64 and ConvertTo(U)Int64 should be made to work in 32-bit mode.

The general principle of the intrinsics so far is that they are not emulated, ever. This simplifies the implementation overall and provides some very nice guarantees about what the intrinsics will do. The downside is that the consumer ends up having to do a bit more, but I believe that can/will be alleviated by having some separate "helper" library. Such a library could do things like:

abstract away ARM vs x86 differences
abstract away 32-bit vs 64-bit differences
providing wrappers around common functionality
- Such as providing a Permute wrapper that does a shuffle on SSE hardware and a Permute on AVX hardware
etc

pentp · 2018-07-10T20:57:55Z

These proposed libraries would be sub-optimal because a long is internally two int registers or a memory location, but managed code can't access these registers directly nor affect register allocation/spilling and has to either always spill or always load in two parts. If fallbacks for intrinsics are undesirable, then at least there could be a helper intrinsic in Sse2 class.

tannergooding · 2018-07-10T21:09:59Z

but managed code can't access these registers directly nor affect register allocation/spilling and has to either always spill or always load in two parts

This is something that is much more general than just hardware intrinsics. If there is an issue here, it is probably worth addressing more generally (or via some separate intrinsic).

CarolEidt · 2018-07-10T21:18:03Z

There is a general issue of the JIT not generally being able to determine whether it is best to use a value from memory, the "natural" register type for the value (generally 2 int regs for a long), or an xmm register. As @tannergooding points out, this is a general issue, though certainly of greater impact for code like this.

As a design issue, I think it's clearly best to keep the intrinsics as true to the hw as possible. Then, as we identify scenarios (esp. scenarios that can be illustrated to have some impact on real-world-code) that aren't well addressed, we can determine how best to get the desired performance. Creating more and more intrinsics doesn't feel like the right answer to me, but we'd clearly have to do some more analysis first.

tannergooding · 2018-11-09T04:35:39Z

We discuseed and resolved this as part of dotnet/corefx#32721

tannergooding closed this as completed Nov 9, 2018

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the 3.0 milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine how to expose hwintrinsics that are only supported in 64-bit mode #10617

Determine how to expose hwintrinsics that are only supported in 64-bit mode #10617

tannergooding commented Jul 2, 2018

tannergooding commented Jul 2, 2018

pentp commented Jul 10, 2018 •

edited

Loading

tannergooding commented Jul 10, 2018

pentp commented Jul 10, 2018

tannergooding commented Jul 10, 2018

CarolEidt commented Jul 10, 2018

tannergooding commented Nov 9, 2018

Determine how to expose hwintrinsics that are only supported in 64-bit mode #10617

Determine how to expose hwintrinsics that are only supported in 64-bit mode #10617

Comments

tannergooding commented Jul 2, 2018

tannergooding commented Jul 2, 2018

pentp commented Jul 10, 2018 • edited Loading

tannergooding commented Jul 10, 2018

pentp commented Jul 10, 2018

tannergooding commented Jul 10, 2018

CarolEidt commented Jul 10, 2018

tannergooding commented Nov 9, 2018

pentp commented Jul 10, 2018 •

edited

Loading